<%BANNER%>

Quantitative Analysis of CIS-Regulatory Sequences in Genes of Arabidopsis


PAGE 1

QUANTITATIVE ANALYSIS OF CIS REGULATORY SEQUENCES IN GENES OF ARABIDOPSIS By MATTHEW GENE KETTERLING A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGR EE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2003

PAGE 2

Copyright 2003 by Matthew Gene Ketterling

PAGE 3

iii ACKNOWLEDGMENTS I would first like to thank my adviser Dr. Donald McCarty for the opportunity to work in his laboratory as well as helping me develop a research project that catered to my interests in molecular biology and computers. I would like to than k Dr. George Casella for his statistical advisement pertaining to my thesis work as well as a special thank you to Dr. Masaharu Suzuki for his advisement as both a committee member and a friend; I am not sure where I would be right now if it were not for h is added guidance. I appreciate the help from the rest of the McCarty lab and members of Dr. Mark Settles lab for discussions as well as support from all my friends in Fifield Hall; they of course will not be forgotten. I would also like to thank my par ents: Gene Ketterling, and Cathy and David Eide for their support. This thesis would not have been possible it was not for them. Lastly, I wouldd like to thank my brother Benjamin Ketterling and the rest of my family for their understanding and support o f the decisions I have made in my life.

PAGE 4

iv TABLE OF CONTENTS Page ACKNOWLEDGMENTS ................................ ................................ ................................ .. iii LIST OF TABLES ................................ ................................ ................................ .............. vi LIST OF FIGURES ................................ ................................ ................................ ........... vii ABSTRACT ................................ ................................ ................................ ........................ ix CHAPTER 1 LITERATURE REVIEW ................................ ................................ ................................ 1 Abscisic Acid ................................ ................................ ................................ ................ 1 Microarrays ................................ ................................ ................................ ................... 4 Gene Regulation Networks ................................ ................................ ......................... 10 Cis elements Discovery ................................ ................................ .............................. 12 2 IDENTIFICATION OF CIS ELEMENTS ................................ ................................ ..... 16 Introduction ................................ ................................ ................................ ................. 16 Materials and Methods ................................ ................................ ............................... 20 MotifFinder algorithm ................................ ................................ ......................... 20 MotifFinder implementation ................................ ................................ ............... 21 MotifMapper algorithm ................................ ................................ ....................... 21 Promoter database ................................ ................................ ............................... 22 Testing for false positive motifs ................................ ................................ .......... 22 PERL version and modules ................................ ................................ ................. 23 Results ................................ ................................ ................................ ......................... 23 Analysis of VP1/ABA regulated genes ................................ ............................... 23 Analysis of flan king sequence motifs ................................ ................................ .. 27 Motifs associated with the ABRE ................................ ................................ ....... 29 Motif LOGOs ................................ ................................ ................................ ...... 29 Analysis of cold regulated genes ................................ ................................ ......... 31 Distribution of motifs in VP1/ABA and cold regulated genes ............................ 34 AlignACE ................................ ................................ ................................ ............ 37 Discussion ................................ ................................ ................................ ................... 37 APPENDIX

PAGE 5

v A SOFTWARE DOCUMENTATION ................................ ................................ .............. 42 B SOURCE CODE: MotifFinderp1v4.pl ................................ ................................ .......... 47 C SOURCE CODE: MotifFinderp2v6.pl ................................ ................................ .......... 53 D SOURCE CO DE: MotifMapperp1v3.pl ................................ ................................ ........ 57 E REPRESSED ABA REGULATED GENES ................................ ................................ .. 69 F ACTIVATED ABA REGULATED GENES ................................ ................................ 80 G REPRESSED VP1 AND ABA DEPENDENT GENES ................................ ............ 91 H ACTIVATED VP1 AND ABA DEPENDENT GENES ................................ .......... 102 I REPRESSED ABA DEPENDENT GENES ................................ ................................ 119 J ACTIVATED ABA DEPENDENT GENES ................................ ................................ 124 K VP1 ACTIVATED/ABA ACTIVATED SUBCLASS OF VP1 OR ABA DEPENDENT GENES ................................ ................................ ............................. 139 L ACTIVATED VP1 DEPEN DENT GENES ................................ ................................ 145 LIST OF REFERENCES ................................ ................................ ................................ 152 BIOGRAPHICAL SKETCH ................................ ................................ ........................... 162

PAGE 6

vi LIST OF TABLES Table page 2 1. Summary of known and putative regula tory elements among VP1/ABA regulated genes ................................ ................................ ................................ ......................... 26 2 2. Summary of known and putative regulatory elements among cold regulated genes. 33

PAGE 7

vii LIST OF FIGURES Figure page 1 1. Structure of abscisic acid. ................................ ................................ ............................ 1 1 2. Schematic diagram of cDN A mi croarray.. ................................ ................................ .. 6 1 3. Schematic diagram of high density oligonucleotide microarray ................................ 7 2 1. Classification of genes analyzed by MotifFinder: (A) Hierarchical classification from Suzuki et al. showing subclasses of VP1/ABA regulated genes. (B) Hierarchical diagram of Cold regulated gene classes defined by Fowler and Thomashow. ................................ ................................ ................................ ............. 24 2 2. The motifs wi th highest significance have similarity to G box related ABA responsive elements (5 G[A/C]CACGTG 3). ................................ ....................... 25 2 3. Comparison of ACGT flanking sequences. ................................ ............................... 28 2 4. Summary of regulatory elements that are associated with ACGT core elements among VP1/ABA regulated genes: (A) Associations to ACGT core were calculated by comparing our Motif dictionary of 43,168 motifs with 20bp flanking ACGT core vs. 20 bases selected at random. (B) Sequences were located by eye using the color output of MotifMapper. ................................ ................................ ... 30 2 6. Distribution of the top 75 statistically significant motifs among Motif Finder analysis of 353 VP1/ABA regulated genes ................................ ................................ ........... 35 2 7. Activated and Repressed subclasses show strikingly different distributions of cis elements in relation to the ATG start site: (A) The a symmetrical distribution of the top 75 significant motifs of the activated ABA dependent class and 75 randomly generated motifs which are symmetrically distributed. (B) The symmetrical distribution of the top 75 significant motifs of the repressed ABA de pendent class has a similar distribution to 75 randomly generated motifs. ................................ .... 36 2 8. Distributions of significant and random motifs of the four classes of cold regulated genes defined by Fowler an d Thomashow: (A) Top 75 significant motifs in promoters of genes up regulated long term by cold along with 75 randomly generated motifs. (B) Top 75 significant motifs in promoters of genes up regulated transiently by cold along with 75 randomly generated motifs. (C) Top 75 significant motifs in promoters of genes down regulated by cold along with 75

PAGE 8

viii randomly generated motifs. (D) Top 75 significant motifs in promoters that are regulated by CDF1, CDF2 and CDF3 over expressing lines along with 75 randomly generated motifs. ................................ ................................ ...................... 38

PAGE 9

ix Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Master of Science QUANTITATIVE ANALYSIS OF CIS REGULATORY SEQUENCES IN GENES OF ARABIDOPSIS By Matthew Ge ne Ketterling August, 2003 Chair: Dr. Donald R. McCarty Major Department: Plant Molecular and Cellular Biology In order to rigorously analyze regulatory sequences that are shared by clusters of co regulated genes, we have developed a quantitative computa tional approach for objective analysis of promoter sequences. The algorithm, implemented in a PERL program called MotifFinder, identifies sequence motifs that are over represented in promoters of co regulated genes relative to a control set of randomly se lected promoters. We used MotifFinder to analyze promoter sequences of co regulated Arabidopsis genes classified according to expression data generated by oligo microarray experiments. We have successfully used this software to detect and analyze the dis tributions of known cis elements that mediate Viviparous 1 (VP1) binding and abscisic acid (ABA) signaling as well as cis elements shared among cold responsive genes regulated by the CRT/DRE Binding Factor (CBF). In addition to known cis elements, MotifFi nder revealed several previously unidentified motifs that may have biological significance. A second program, MotifMapper, color codes the shared motifs according to statistical significance and

PAGE 10

x displays the putative cis elements as highlighted regions wi thin the promoter sequences. The combined application of this software allows quantitative analysis and visualization of promoter elements using no prior knowledge of transcription factor binding sites.

PAGE 11

1 CHAPTER 1 LITERATURE REVIEW Abscisic Acid The hormone abscisic acid (ABA) (Figure 1 1) is found in all plant tissues and regulates both development and responses to environmental stress (Addicott and Carns, 1983). Since its identification in the 1960 s, A BA has been implicated in embryogenesis, seed dormancy and cell division as well as stress responses to temperature fluctuations, drought, salt, and UV radiation (reviewed in Koornneef et al., 1998; Leung and Giraudat, 1998; McCourt, 1999; Rock, 2000; Fink elstein et al., 2002). Pathways of ABA synthesis and catabolism have been well documented (Hirai, 1986; Zeevaart, 1999), and analysis of mutants with impaired ABA expression or responsiveness has revealed phenotypes such as withering during humidity chang es, induced germination, and germination resistance under salt stress (reviewed in Leung and Giraudat, 1998; McCourt, 1999; Rock, 20 00; Finkelstein et al., 2002). Figure 1 1. Structure of abscisic acid.

PAGE 12

2 ABA signaling cascades lead to both early and lat e phase responses. During drought, for example, ABA rapidly induces stomatal closure to inhibit water loss (Mac Robbie, 1998; Assmann and Shimazaki, 1999) and stimulates the transcription of drought resistance genes (Ingram and Bartels, 1996; Bray, 1997). During seed development in maize, ABA responses are regulated by the gene VP1. Vp1 mutants express normal levels of ABA but are deficient in ABA perception which results in early seed germination ( Robichaud and Sussex, 1986; Neill et al., 1987). This p henotype can be corrected by exogenously applying 10 fold more ABA than required for maturation in wild type (Robichaud et al., 1980). Transposon tagging of VP1 showed that it was expressed exclusively in developing seeds (McCarty et al., 1989). Vp1 is a transcriptional activator, and over expression of Vp1 can trans activate other ABA inducible genes (McCarty et al., 1991; Kao et al., 1996; Hagenbeek et al., 2000). VP1 has four shared functional domains A1, B1, B2, and B3 (Giraudat et al., 1992). The l argest domain, B3, has been shown to bind specifically to the Sph cis element in (Suzuki et al., 1997). While the B3 domain has not been shown to bind to G box (Suzuki et al., 1997; Hill et al., 1996), there is evidence that various G box motifs may act a s coupling elements which possibly enhance VP1 binding to Sph elements of VP1 regulated genes (Hattori et al, 2002; Shen et al., 1996; Vasil et al., 1995; Hobo e t al., 1999b). The B3 domain is highly conserved in transcription factors in many plant specie s; the Arabidopsis proteins ABI3, LEC2 and FUS3 all contain B3 domain which could indicate redundancy in the signaling pathway ( Giraudat et al., 1992; Luerssen et al., 1998; Stone et al., 2001 ). B2 domains have been implicated in enhancing the binding aff inity of various transcription factors for their respective target promoters in vitro as well as trans

PAGE 13

3 activating an ABA inducible early methionine labeled LEA gene (Em) promoter (Hill et al., 1996). The A1 acidic domains of VP1 and ABI3 function as trans criptional activators of ABA regulated genes (McCarty et al., 1991; Rojas et al., 1999). Genetic screens for ABA insensitivity in Arabidopsis were conducted by germinating plants on plates containing ABA concentrations high enough to inhibit normal germina tion (Finkelstein, 1994), and five ABA insensitive mutants were discovered (abi1 abi5). ABI1 and ABI2 genes encode type 2c Ser/Thr protein phosphatases (PP2Cs) which function in ion channel regulation of genes induced by ABA, drought and cold (Gilmour and Thomashow, 1991; Pei et al., 1997; Chak et al., 2000). The mutants, abi1 and abi2, both encode Gly to Asp missense mutations that result in reduced phosphatase activity in vitro (Leung et al., 1997; Sheen, 1998) and lead to a wide range of effects such as stomatal movements, germination effects and changes in adapted growth (Rock and Quatrano, 1994; Leung and Giraudat, 1998). ABI3, ABI4 and ABI5 genes of Arabidopsis are found in an ABA signaling pathway and are implicated in seed development (Finkelstei n, 1994). ABI4 is a transcription factor belonging to the APETELA2 family (Finkelstein et al., 1998) and is expressed in seeds and vegetative tissue (reviewed by Rock, 2000). ABI5 is characterized as a basic leucine zipper (bZIP) and confirmed the presen ce of bZIP transcription factors in ABA signaling. The abi5 1 mutant protein lacks the dimerization and DNA binding domains required for function (Finkelstein and Lynch, 2000). ABA is one of many signals generated during plant stress responses to trigger transcription of stress response genes. One family of ABA regulated genes that become transcriptionally activated during cold or drought conditions are the COR genes. COR

PAGE 14

4 genes can be activated in an ABA dependant or independent manner and they function in freezing tolerance and desiccation tolerance by interacting with other cellular components (Ismail et al., 1999; Thomashow, 1999). Most COR genes contain the predominant cis element CCGAC, which has been named the C repeat/dehydration responsive elemen t (DRE). The functions of the DRE are to regulate cold regulated and dehydration regulated genes in an ABA independent manner (Liu et al., 1998; Thomashow, 1999). COR gene promoters are recognized by the APETELA2 like transcription activators cold bindin g factor (CBF) and dehydration responsive element binding factor (DREB) (Baker et al., 1994; Yamaguchi Shinozaki and Shinozaki, 1994; Stockinger et al., 1997; Gilmour et al., 1998; Liu et al., 1998). Overexpression of CBF and DREB lead to increased tolera nce to cold and desiccation. Microarrays The invention of oligo nucleotide arrays and cDNA microarrays (Fodor et al., 1993; Schena et al., 1995; Lockhart et al. 1996) has allowed scientists to measure the transcription of thousands of genes in relatively short amounts of time and has greatly increased our knowledge of gene expression. Microarray analysis allows the quantification of mRNA transcripts on a genome wide scale and provides insight on differential gene regulation in response to environmental or artificial stimuli ( Eisen et al., 1998) Microarray techniques can be broken down into stationary and mobile phases (Schena et al., 1995) The stationary phase consists of known strands of fluorescently labeled DNA, which have been fixed to a solid supp ort composed of silicon or glass. The mobile phase component of a microarray experiment contains a collection of labeled unknown DNA, which has been collected from tissue undergoing an environmental or

PAGE 15

5 artificial stress. When sequences of similarity come in contact with each other, the two complementary strands hybridize, and the relative fluorescent signals can be measured. Two common variants of microarray analysis are glass slide and gene chip arrays (review Schulze and Downward ( 2001) for a more info rmation on glass slide and chip microarrays). Both types of microarray use a similar strategy Glass slide microarray (Figure 1 2) is made up of known, PCR amplified oligomers that are spotted onto specific positions of a poly lysine coated glass slid e a nd attached via UV fixation. Spotting of DNA can be very precise, and thousands of cDNA spots can be fixed to a chip in duplicate or triplicate for better representation and control. The glass slide is then incubated with unknown, chemically labeled sampl es of cDNA that can be fluoresced and interpreted. One problem with glass slide microarrays is the laborious work of amplifying and purifying each clone to be spotted. Co regulation studies usually involve many hundreds of genes that potentially respond to a given treatment, and thus, preparing the glass slide array can be difficult. Further advancements in microarray technologies included the development of DNA microarray chips (Figure 1 3) This process is similar to the glass slide technique, but the oligomers are synthesized directly onto the chip. A solid support is prepared by attaching exposed 3 OH groups and nucleotides are added a layer at a time by photo lithography (Wodicka et al., 1997). This method is convenient because the chips can be m ade in mass quantities and include enough genes to represent an entire genome. Gene chips have become increasingly more popular and less expensive during the past few years.

PAGE 16

6 Figure 1 2. Schematic diagram of cDNA microarray. cDNA libraries are amplifie d using gene specific primers and the PCR products are fixed to glass slides. RNA from two different cell populations are used to reverse transcribe cDNA with Cy3 and Cy5 fluorescent probes attached. The Cy3 and Cy5 labeled cDNAs are then mixed with hyb ridization buffer and hybridized to a glass slide were competition between the populations would take place. High resolution confocal scanning of the glass slide is then used to detect different wavelengths of each dye.

PAGE 17

7 Figure 1 3. Schematic diagram of high density oligonucleotide microarray. To prepare the array chip, 20, 25 mer seuqnences are chosen from an mRNA reference library which corresponds to a unique transcript in the 5 untranslated region. Oligomers are then synthesized directly on the chip via photolithography. Double stranded cDNA is created from RNA samples from tissues or cell populations; the cDNA is then transcribed using biotin labeled nucleotides to make cRNA. The cRNA from each sample is hybridized to two different arrays and the relative intensities can then be measured.

PAGE 18

8 Both glass slide arrays and gene chips are usually processed through high luminescence scanners which are able to illuminate fluorescent probes that are attached to the DNA on the chip. Intensities can be me asured in terms of cy3 to cy5 ratios in the case of glass slides or overall intensity as compared to a normalized background signal in the case of gene chips. The scanned images are usually digitalized for storage and data processing. A number of softwar e programs are available for interpreting digitalized intensities and converting those intensities to numerical values for data analysis. Because microarray analysis is relatively fast and easy, a plethora of data can be generated on a whole genome scale i n a very short time. However, the scope of microarray analysis is also its largest downfall. If 30,000 genes are analyzed for differential transcript levels over several time points in response to several experimental treatments, the sheer number of data points can be overwhelming. Interpretation is virtually impossible by eye, and researchers have learned to rely on computer analysis. Numerous methods have been used to break gene expression patterns into classes of co regulated genes. Although no consens us has been reached for a correct way of interpreting microarray data, there are several preferred approaches which are dependent on experimental setup and design and are often validated by empiricism. Cluster analysis is often used to separate genes into different classes. Statistical clustering algorithms such as Hierarchical Clustering, K means, Self Organizing Maps (SOMs) and others have been applied to gene regulation studies (Altman and Raychaudhuri, 2001; De Smet et al., 2002; Eisen et al., 1998; H eyer et al., 1999; Tavazoie et al., 1999). An algorithm can be either supervised, which takes biological significance into account, or unsupervised, which functions independently of prior biological knowledge (Heyer et al., 1999).

PAGE 19

9 Simple fold change has also been commonly used as a method of classifying genes (Chen et al., 1997; Suzuki et al., in press) Microarray technology has been a critical tool in dissecting gene regulatory networks. Key early experiments analyzed changes in gene expression during various stages of the cell cycle in yeast (Spellman et at, 1998; DeRisi et al., 1997; Velculescu et al., 1997; Wodicka et al., 1997; Wolfsberg et al.,1999 ) and humans (Cho et al.,2001; Shedden and Cooper, 2002 ). Bulyk et al (2001) employed a microarray bas ed approach for testing the binding affinities of zinc finger transcription factors that were expressed on the surface of phages. By fixing all possible 3 base pair motifs onto a glass slide and treating each glass slide with a different variation of zinc finger motif, they were able to determine the optimal binding affinity for that particular transcription factor. Such an approach can also be applied to other families of transcription factors such as homeodomains, helix turn helix motifs, beta sheets, l eucine zippers and steroid receptors (Bulyk et al., 2001). Although other transcription factors would require analysis of motifs longer than 3 base pairs, such larger oligomers could be constructed and spotted onto glass slides, with the only limiting fac tor being the exponential numbers of sequences needed as the size of each motif increased (Bulyk et al., 2001). There are a few drawbacks to microarray technologies. The equipment for producing oligo microarray chips (Gene Chips) is expensive is mainly av ailable commercially (Jain, 2001). Commercial chips are usually made for organisms whose genomes have been sequenced; those individuals studying a lesser characterized organism may have to resort to glass slide arrays or northern blot analysis. Another p roblem is the availability of proper annotation. Mis annotated chips can cause wrongful

PAGE 20

10 interpretations of gene expression data do to inaccurate transcription start sites and improper naming. This problem is becoming less of an issue due to reannotation and proofing of databases, especially with fully sequenced organisms. Gene Regulation Networks Basic gene regulatory mechanisms combine to form complex regulatory networks capable of controlling all aspects of metabolism in a cell, tissue or organism. Multiple transcription factors work together in maintaining these gene regulatory networks. As exemplified by the ABA responsive genes, numerous instances of cross talk occur between signaling pathways to generate differential gene expression from a limit ed set of signaling components to allow for varied responses to cold, desiccation, salt stress, UV radiation, and other stimuli One way of deciphering the complex nature of gene regulation is to examine the role of downstream transcription factors. Tran scription factors act as biological switches to regulate gene expression by binding to short, specific DNA sequences called cis elements which are usually found in the promoter region immediately upstream of the transcription start site. Genes that are co regulated in response to developmental and environmental signals often share these conserved motifs in their promoters thus allowing one transcription factor to mediate expression of an entire set of genes. Likewise, genes often contain binding sites of v arious affinities for multiple transcription factors to permit differential regulation. Transcription factors can recruit other co activating or chromatin remodeling proteins through protein protein interaction domains, which leads to further regulation of gene expression. Although promoter structure usually involves multiple degenerate cis elements, understanding the mechanisms of transcriptional regulation requires a

PAGE 21

11 knowledge of the conserved motifs used to bind transcription factors. By locating these conserved motifs among co regulated genes, we can begin to dissect the primary steps in gene regulation. Promoter structure can be complex. Transcription factors generally recognize short motifs of semi conserved nucleotides which may be continuous or sep arated by non conserved sequence. Binding site length, spacing, copy number, and degree of conservation vary depending on the specific transcription factor. A particular transcription factor may have a preferred binding site, but may allow some degenerac y; this can usually be seen as differences in fold inductions of various genes. In studies of absc isic acid response elements (AB REs), ACGT containing cis elements serve as binding sites for as many as 80 predicted basic region leucine zipper (bZIP) trans cription factors in Arabidopsis (Riechmann et al., 2000; Jakoby et al., 2002). The G box conserved nucleotides ACGT are most often used as a core recognition sequence (Hobo et al., 1999a; Pla et al., 1993); however, other studies of ABRE s have found that GCGT or AAGT can be substituted in place of ACGT (Hobo et al., 1999a; Ezcurra et al., 2000; Hattori et al., 2002). Sequence differences on flanking portions of the ACGT core of G box directly affect the affinity of binding proteins. Differential gene reg ulation can be achieved by the combined effects of transcription factors binding to G box and interacting with other regulatory sequences (Donald and Cashmore, 1990; Weisshaar et al., 1991; Rogers and Rogers, 1992). This promiscuity of cis elements hinder s discovery of shared motifs of co expressed genes. Even though a transcription factor may recognize a particular cis element, other motifs with similar sequences might also be recognized due to the inherent promiscuity

PAGE 22

12 of transcription factor binding. H attori et al., (2002) performed a mutation analysis of one of the G box binding sites for leucine zipper transcription factors and demonstrated that different b ZIPs bind to the ACGT core of G box with different affinities. In the same study, the authors also synthesized several different flanking sequences to accompany the ACGT core and tested expression by coupling each promoter element to a reporter gene. They found that ACGTGGC and ACGTGTC were the preferred binding sequences but that other ACGT core containing sequences were partially functional (Hattori et al., 2002). Various genetic mechanisms can often complicate transcription factor binding to cis elements. The presence of a cis element does not necessarily indicate that the corresponding gene will be activated or repressed along with other genes containing similar cis elements. Some transcription factors can bind in tandem, such as bZIP transcription factors, and cause differential expression. Chromatin remodeling factors may need to be prese nt in order to acetylate, methylate or modify nucleosomal DNA before promoters are available for regulation (Ng and Bird, 1999, 2000; Grunstein, 1997; Struhl et al., 1998; Richards, 1997). Genes could possibly require enhancer sequences that could be loca ted a great distance from the transcriptional start site (review Blackwood and Kadonaga, 1998), or transcription could be blocked due to silencing elements which favor the formation of heterochromatin (review Blackwood and Kadonaga, 1998). Cis elements D i scovery Several problems inherent to the discovery of regulatory elements have thus far hindered computational analysis. Regulatory motifs are short so that false candidates are

PAGE 23

13 often encountered; gaps can occur in conserved sequences; certain degrees of degeneracy are tolerated, and the positions of the elements relative to the transcription start sites are not fixed. Despite the complexity associated with gene regulation, multiple techniques have been developed for cis element discovery. Computational c is element analysis using microarray and genomic sequence information has increased the speed at which novel binding site are found; however, for each putative binding site uncovered by statistical analysis, functional information must be acquired in order to verify that the motif does have regulatory function in biological systems. Programs for cis element discovery fall into two general categories, enumerative and iterative motif finding. Enumerative motif finding is based on word counting, pattern matc hing and methods of determining background from signal (Zhang, 1999; Wolfsberg et al., 1999; Jensen and Knudsen, 2000; van Helden et al., 1998, 2000; Sinha and Tompa, 2000; Tompa, 1999). Iterative motif finding is based on a statistical model of finding b est fit motifs by adding and subtracting commonly occurring sequences (Bailey and Elkan, 1995; Hughes et al., 2000; Liu et al., 2002; Roth et al., 1998; Workman and Stormo, 2000). Previously published motif finding programs, such as those utilizing Gibbs s ampling algorithms were first used for finding conserved protein sequences (Lawrence et al., 1993) but were later modified to find conserved DNA motifs. Gibbs sampling algorithms use randomly selected starting points to generate a weighted motif matrix wh ich is then modified by sequential comparison to the original data set. Motifs are added and subtracted from the original matrix until only highly conserved, best fit motifs remain (Lawrence et al., 1993; Thijs et al., 2002). Hughes et al. (2000) Used Al ignACE,

PAGE 24

14 a Gibbs sampling algorithm, to identify transcription factor binding sites among Saccharomyces cerevisiae A total of 248 genes where clustered into various classes, and AlignACE returned numerous known and unknown transcription factor binding sit es among the promoters in 25 classes of clustered genes. Liu et al (2001) developed another variation of the Gibbs sampling algorithm called BIOPROSPECTOR that also uses Markov background models derived from either the user or a specified background seque nce file. BIOSPECTOR was able to identify S. cerevisiae RAP1, Bacillus subtilis RNA polymerase, and Escherichia coli CRP binding motifs with varying accuracies depending on the background models used in each experiment (Liu et al., 2001). There are some p roblems inherent to Gibbs sampling algorithms. A background model for the organism is often needed in order to correct for unequal distributions of nucleotide frequencies which often occur in biological organisms. Another problem with Gibbs sampling algo rithms is that many repetitive or low complexity motifs are often returned as significant results. Since organisms contain long repetitive sequences, many putative motifs generated by the algorithm tend to be biological irrelevant. Other methods have bee n used to increase the accuracy of the reference set prior to invocation of the Gibbs sampling algorithm as well as slight modifications of the algorithm itself (Hughes et al., 2000; Thijs et al., 2001, 2002; Liu et al., 2001; Kel et al., 2001; Ohler and N iemann, 2001). Enumerative methods, such as the approach taken by van Helden (1998) and Wolfsberg et al. (1999) for application to the yeast genome, have also proven to be useful in locating putative cis elements. These methods involve searching for conse rved pentamers or hexamers and determining the probability that a given motif is in fact a

PAGE 25

15 putative binding site. Van Helden (1998) searched for commonly occurring motifs using motifs of six base pairs in length and successfully located a known binding si te among nitrogen response genes in S. cerevisiae In a later work by Wolfsberg et al. (1999), a similar algorithm was used for analysis of cell cycle related genes which also revealed a number of known and unknown motifs presumably involved with cell cyc le regulation. Enumerative pattern matching among sequences is advantageous because it uses a full representation of all possible motifs rather than relying on a random start set such as those used in iterative approaches. However, enumerative algorithms are limited by the motif size used for searching. As the motif length increases, the search dictionary increases exponentially which greatly slows the speed at which the algorithm can be conducted. Enumerative methods used by van Helden (1998) and Wolfs berg et al. (1999) also did not allow for degenerative motifs. Degeneracy is inherent to many cis elements, and this could compromise many current pattern matching models.

PAGE 26

16 CHAPTER 2 IDENTIFICATION OF CI S ELEMENTS Introduction Transcription factors act as biological switches to regulate gene expression by binding to short, imperfectly conserved DNA sequence elements which are often located in the promoter region upstream of t he transcription start site. Genes that are co regulated in response to developmental and environmental signals often share conserved motifs in their promoters, thus enabling one transcription factor to coordinate expression of a larger set of genes. Co mbinatorial interactions between multiple transcription factors and sequence variation that creates binding sites differing in binding affinities contribute to the complex patterns of differential gene expression observed in eukaryotic organisms (Miklos an d Rubin, 1996; Aronone and Davidson, 1997; Dhaelseleer et al., 2000; Mahalingam et al., 2003). Microarray technology has made possible the large scale analysis of gene expression levels under various environmental conditions of numerous genes simultaneou sly such that genes can be classified into groups of similar expression patterns (Schena et al., 1995, 1996; Strachan et al., 1997; Lashkari et al., 1997; DeRisi et al., 1997). Quantitative statistical analysis of patterns in regulatory sequences of co re gulated gene sets remains a challenging problem. In general, regulatory motifs shared by co regulated genes show wide variation in length, spacing, copy number, and degree of conservation vary depending on the specific transcription factor. Substantial de generacy is often allowed in specific binding sites. In some cases, such differences are correlated with differences in fold inductions of gene

PAGE 27

17 expression (Hattori et al., 2002). In studies of abscisic acid response elements (ABREs), containing an ACGT c ore (eg. G box), ABREs serve as a binding site for multiple basic region leucine zipper (bZIP) transcription factors (Hobo et al., 1999a; Pla et al., 1993); however, other studies of ABREs have found that GCGT or AAGT can be substituted in place of ACGT (H obo et al., 1999a; Ezcurra et al., 2000; Hattori et al., 2002). Some ABREs do not contain ACGT core motif are coined Coupleing Elements (Shen et al., 1996) Coupling elements with consensus sequence CGCGTG are similar to other ABREs except for an A to G substitution in the ACGT core; examples of coupling elements are CE3 of barley HVA1 gene, motif III of rice rab16B and a synthetic element, hex 3 (Shinozaki and Yamaguchi Shinozaki, 1996). Sequence differences on flanking portions of the ACGT core of G bo x directly affect the affinity of binding proteins (Izawa et al., 1993). Differential gene regulation can be achieved by combined effects from transcription factors binding to G box and other regulatory sequences (Donald and Cashmore, 1990; Weisshaar et a l., 1991; Rogers and Rogers, 1992). This promiscuity of cis elements requires the use of degenerate motifs while parsing promoters for over represented motifs among co expressed genes. There are some indications that different ABREs can be correlated in a tissue specific manner such as ABI5 and TRAB1 (Hobo et al., 1999b; Finkelstein and Lynch, 2000) which are expressed mainly in seeds. Other binding factors such as ABA responsive element binding proteins (AREBs) are expressed in roots as well as in vascul ar tissue (Uno et al., 2000; Kang et al., 2002). A number of ABI5 like bZIP factors containing putative Ser/Thr phosphorylation sites have been discovered that bind to various ABREs (Choi et al., 2000); additively, proteins such as 14 3 3 proteins have be en

PAGE 28

18 shown to interact with various bZIP proteins ( Schultz et al ., 1998 ) and may ultimately affect protein binding affinities to different cis elements. HvABI5 of barley shows a high similarity to a subfamily of bZIP proteins with closet homology TRAB1 of r ice (Kim and Thomas, 1998) and AREB2 (Uno et al.,2000); and ABI5 (Finkelstein and Lynch, 2000) in Arabidopsis. Casaretto and Ho (2003) found that HvABI5 binds to ABRCs in a sequence specific manner by electrophoretic mobility shift assays with recombinant HvABI5. It was also found that when using either a mutated CE3 element or a mutated ACGT box, that the affinity for ACGT boxes was greater than the affinity for coupling element 3 (CE3) additionally, using two copies of each ACGT box or CE3 produced that same results (Casaretto and Ho, 2003). VP1 is a transcription factor that mediates germination, developmental arrest and desiccation tolerance in maize by blocking ABA perception in seed tissues (Robichaud and Sussex, 1986). VP1 has 4 functional domains A1, B1, B2, and the largest B3 (Giraudat et al., 1992), which has been shown to bind specifically to Sph cis elements (Suzuki et al., 1997). While the B3 domain does not bind specifically to the G box (Suzuki et al., 1997), there is evidence that various G box motifs may act as coupling elements which possibly enhance VP1 binding to Sph elements of VP1 regulated genes (Hattori et al., 2002; Shen and Ho, 1996; Vasil et al., 1995). CRT/DRE Binding Factors (CBF) are a family of AP2 transcriptional activators that can bind to CRT/DRE elements causing gene up regulation in response to cold stress which contribute to an increased freezing tolerance (Baker et al., 1994; Yamaguchi Shinozaki and Shinozaki, 1994; Stockinger et al., 1997; Gilmour et al., 1998; Liu et al., 1998). Fowler and Thomashow (2002) have recently performed a global analysis of cold

PAGE 29

19 regulated and CBF regulated genes in Arabidopsis. These data defined several distinct classes of genes via Affymetrix microarray that respond to cold and serve as g ood candidates for cis element analysis. Computational algorithms such as Gibbs sampling have been developed to detect over represented cis elements among co regulated genes. Gibbs sampling algorithms use randomly selected starting points to generate a weighted motif matrix which is then modified by sequential comparison to the original data set. Motifs are added and subtracted from the original matrix until only highly conserved, best fit motifs remain (Lawrence et al., 1993; Thijs et al., 2002; Ohler and Niemann, 2001). To eliminate noise, Gibbs sampling methods require a statistical model for the species and class of sequence analyzed. Since Gibbs sampling algorithms rely on random starting points that are later refined, the sampling of possible mot ifs of a given level of complexity is incomplete. A second limitation is the filtering of repetitive sequences and low complexity sequences in DNA. Other methods have been used to increase the accuracy of the reference set prior to invocation of the Gibb s sampling algorithm as well as slight modifications of the algorithm itself (Hughes et al., 2000; Thijs et al., 2001, 2002; Lui et al., 2001; Kel et al., 2001; Ohler and Niemann, 2001). Here we describe an alternative approach based on an enumerative meth odology. Enumerative motif finding is based on word counting, pattern matching coupled with various methods of determining background from signal (Zhang, 1999; Ohler and Niemann, 2001; Wolfsberg et al., 1999). MotifFinder operates by generating a complet e set of motifs of a specified length and degeneracy and then comparing the frequency of each motif in a set of putative co regulated gene promoters to the frequency of that same

PAGE 30

20 motif in a random set of unrelated genes. MotifFinder eliminates the relianc e on random starting points by generating a non redundant dictionary representing all possible motifs. We have implemented an enumerative algorithm in the MotifFinder program and applied it to analysis of cis elements in sets of ABA and cold regulated ge nes, responses mediated by VP1 and CBF transcription factors respectively. Materials and Methods MotifFinder algorithm MotifFinder is based on the approach described by Wolfsberg et al. (1999) for application to the yeast genome. The MotifFinder progra m first generates a motif dictionary consisting of a complete, non redundant set of motifs of a specified length with a specified number of degenerate bases. Reverse complements and trivial redundancy where motifs differ only in having one or more degener ate bases at opposite ends are excluded. For example: NNGGAAGC, NGGAAGCN, GGAAGCNN and the reverse complements: GCTTCCNN, NGCTTCCN, NNGCTTCC are considered equivalent. MotifFinder builds a control set of promoters by extracting 1000 random published seque nces of the 7402 Arabidopsis genes present on Affymetrix gene chip. The 1000 randomly selected promoters are used to determine the expected frequency of each motif in Arabidopsis promoters and therefore common DNA sequences such as repetitive DNA or motif s for general transcription factors are not mistaken as candidate cis elements by MotifFinder. The motif dictionary is then compared to the 1000 control promoters to determine the average frequency of the motif in unrelated genes. The dictionary file and

PAGE 31

21 background frequencies are saved and can be used repeatedly for multiple promoter comparisons. To extract putative cis elements from genes grouped based on microarray expression data, MotifFinder scores each motif in the dictionary against the promoters o f potentially co regulated genes. Statistical significance is determined by a chi square test performed for each motif using the expected frequency from the 1000 random promoters versus the observed frequency from the test promoters. The resulting p valu es are screened by an adjustable p value cutoff such that only motifs of specified significance are returned as a tab delimited text file that can be inserted into a spreadsheet for easy viewing and sorting purposes. MotifFinder implementation In this pap er, we use a dictionary of 43168 motifs containing 6 conserved and 2 degenerate bases, but other combinations are possible. We have thus far used MotifFinder for analysis of A. thaliana gene regulation, but genes from any organism with sufficiently descri bed genomic sequence could be evaluated. We focused on a set of 353 strongly expressed genes that were induced or repressed greater than 3 fold in an effort to identify patterns of regulatory sequences that differentiate these response classes. MotifMapp er algorithm MotifMapper is a visualization program that maps lists of motifs onto promoters. Each significant motif found by MotifFinder is aligned with each promoter analyzed. The bases in each promoter are assigned a numerical score according to the number of significant motifs that co align. Motifs that are highly significant among groups of related genes congregate which increases a particular bases score. MotifMapper

PAGE 32

22 produces a multi color representation of the promoter set which displays the lo cations of mapped motifs relative to each other and to the transcription start sites along with a composite score for each promoter which is derived by adding the number of significant motifs present. A matrix file is created containing base significance information that can easily be inserted into a spreadsheet for promoter graphing purposes (output not shown). MotifMapper also generates a separate file ranking putative cis elements by a numerical score based on the number of hits per base as well as the upstream location of each putative cis element. MotifMapper collates data derived from MotifFinder to permit easy identification of both known and unknown DNA binding motifs in a promoter. Promoter database A database containing upstream regulatory seq uences ( 600 nt of the annotated coding sequence) of 7402 annotated Arabidopsis genes represented on the Affymetrix 8k Gene Chip was constructed by parsing the XML format chromosome assemblies ( http://www.tigr.org ). The promoter sequences were extracted from the XML files using a Perl script modified from sample code developed at TIGR. Testing for false positive motifs To test for motifs that are significant due to random chance, a set of 75 random promoters were compa red to 1000 randomly selected promoters and chi square values were calculated for each motif. An average of 5 motifs had a significant p value of at least 10 5 when 5 replications are considered.

PAGE 33

23 PERL version and modules PERL v5.6.1 built for MSWin32 x86 multi thread Copyright 1987 2001, Larry Wall Binary build 632 provided by ActiveState Corp. http://www.ActiveState.com P values were calculated using Math::CDF module that can be found on the CPAN website ( http://www.cp an.org ). Results Analysis of VP1/ABA regulated genes Our analysis of ABA regulated gene expression (Suzuki et al., in press) revealed a large set of genes that are up and down regulated within 12 hours of an ABA treatment. The ABA response is strongly altered in plants carrying 35S VP1 transgene resulting in multiple classes of VP1 and ABA regulated genes. In order to identify patterns of regulatory sequences that differentiate these response classes we developed methods for rigorous statistical analys is of cis regulatory sequences in co regulated genes. The key analysis was implemented in the MotifFinder program. We focused on a set of 353 strongly expressed genes that were induced or repressed greater than 3 fold in an effort to identify patterns of regulatory sequences that differentiate these response classes. We first constructed a database containing 600 nucleotides of five prime flanking DNA sequence from each of the 7402 genes represented on the Affymetrix Gene Chip. Promoters of the 353 ABA/ VP1 regulated genes were extracted form this data set. MotifFinder was used to analyze regulatory motifs that distinguish various VP1 and ABA response classes defined by Suzuki et al. (in press). We have previously shown that

PAGE 34

24 promoters in the VP1 and AB A dependent response class have significantly higher frequencies of G box and Sph elements compared to promoters in the ABA dependent subclass (Suzuki et al, in press). We have extended this analysis to other response classes ( Figure 2 1) including an ana lysis of motifs which distinguish acti vated and repressed promoters. Figure 2 1 Classification of genes analyzed by MotifFinder. (A) Hierarchical classification from Suzuki et al. showing subclasses of VP1/ABA regulated gen es. Underlined classes of genes were analyzed for known and novel cis elements. (B) Hierarchical diagram of Cold regulated gene classes defined by Fowler and Thomashow. We extracted numerous commonly occurring motifs visually with the aid of MotifMapper using motifs with a significance cut off of 0.00001 and 0.001 in activated VP1/ABA co regulated ABA regulated VP1 AND ABA dependent VP1 OR ABA dependent VP1/ABA regulated genes VP1 dependent ABA dependent activate d repress e d activate d represse d activate d represse d activate d represse d 35 3 256 89 44 45 82 114 32 70 44 62 18 26 2 28 Down regulated CBF over expressed Cold regulated genes Up regulated long term Up regulated transiently 29 3 150 86 39 57 A B VP1 activated, ABA activated

PAGE 35

25 and repressed classes respectively. Motifs related to known ABRE elements were the most highly enriched in the activated gene classes ( Figure 2 2 ). F igure 2 2. The motifs with highest significance have similarity to G box related ABA responsive elements (5 G[A/C]CACGTG 3). The top 30 motifs are listed. Columns in order: motif, number of promoters containing that particular motif in test set, numbe r of promoters hits in control set, p value of Chi square. Because G box has the highest significance, other motifs will be present in slightly higher p value ranges. In addition to known ABREs MotifFinder analysis revealed several other significant mot ifs in found among activated and repressed classes ( Table 2 1). Activated classes all contained a significant (p value < 0.00001) ACGT core containing sequence along with other distinguishing motifs in each class. In contrast, motifs found among represse d genes tended to have lower statistical significance (p value < 0.001). For each motif the percentage of promoters were calculated to give the relative occurrence of each over represented sequence; the motif length would directly affect the occurrence of each nacacgtn 151 173 0 gacacntn 110 152 6.88E 15 nacgtgnc 109 132 0 nacgngtc 84 106 1.35E 14 cncgtgtn 96 113 0 tnacncgt 97 130 2.26E 14 nacncgtg 95 111 0 annacacg 111 156 2.30E 14 nanacgtg 134 176 0 nncgtgtc 90 118 3.51E 14 gccacgnn 72 69 0 acacgnnt 106 148 5.04E 14 nncacgtg 91 105 0 gncacgng 76 94 5.73E 14 ncgncacg 52 50 2.22E 16 tngncacg 76 94 5.73E 14 nccangtg 91 113 2.22E 16 cnacncgt 68 81 9.57E 14 ngccangt 83 100 4.44E 16 cncgtgnt 68 82 2.20E 13 gnnacgtg 105 140 1.22E 15 nacangtg 136 210 3.10E 13 gncacntg 99 130 2.22E 15 acntgnca 110 159 3.20E 13 gccnngtg 54 55 2.44E 15 cacntgnc 107 154 4.74E 13 gacangtn 119 168 4.22E 15 anncacgt 115 170 6.24E 13 acgtgnng 79 96 4.77E 15 cnnacacg 79 103 8.47E 13 Enriched oligo mers with similarity to G box G[A/C]CAC GTG

PAGE 36

26 Table 2 1. Summary of known and putative regulatory elements among VP1/ABA regulated genes. Describes the motif, the percentage of promoters containing each motif and the p value cutoff of each class analyzed. In the ABA regulated class, the p va lue of 4.4E 05 was used due to large amounts of motifs above the 1E 05 cutoff which tend to clutter the MotifMapper output 1.0E 05 50.0% TCGGC 1.0E 05 86.4% ACGT core VP1 act/ABA VP1 or ABA dependent 1.0E 03 20.5% TGCAT 1.0E 03 31.8% ACCCCAAA repressed 1.0E 05 32.0% Sph 1.0E 05 81.4% ACGT core activated VP1 and ABA dependent 1.0E 03 27.8% GCGCG 1.0E 03 55.6% CCTCAT 1.0E 03 33.3% TGCGAG repressed 1.0E 05 79.0% ACGT core activated ABA dependent 1.0E 05 78.6% ACGT core activated VP1 dependent 1.0E 03 82.2% ACTTG 1.0E 03 35.6% GCCTA 1.0E 03 33.3% CATTGA 1.0E 03 82.2% GCATT 1.0E 03 11.1% CATGATGG repressed 4.4E 05 38.6% TGGCANNA 1.0E 05 86.4% ACGT core a ctivated ABA regulated p value cutoff % of promote Motif Enriched motifs in VP1/ABA regulated genes

PAGE 37

27 sequence therefore longer sequences may have lower percentages but there is also a decrease chance i n which the motif occurs by chance. Analysis of f lanking s equence m otifs Many of the most significant motifs enriched in ABA/VP1 activated genes were related to consensus G Box related ABREs. Because sequence differences flanking the ACGT core of the ABR E affect the affinity of binding proteins, we analyzed the frequency of each variant consisting of a ACGT core plus three flanking bases among the enriched sequences. Of the 64 7 bp ACGT elements of form XXXACGT, six motifs accounted for the majority of e nriched elements were selected (each accounting for greater than 10%). Comparisons between the five response classes revealed two ABRE motifs that may be correlated with differential regulation of distinctive classes. Since there are many types of bZIP t ranscription factors that bind to various ABREs, having asymmetric distributions of different types of ABREs may facilitate binding of related but not identical transcription factors which may affect protein protein interactions as well as gene direct gene induction. The TAC variant (5 TACACGT 3) is strongly enriched in the activated ABA regulated as well as the activated VP1 and ABA dependent response classes, whereas, the AAA variant ( 5 AAAACGT 3) is solely enriched in the Vp1 or ABA activated res ponse class, specifically the Vp1 activated/ ABA activated subclass ( Figure 2 3 ). The four other variants, GCCACGT, GACACGT, ACCACGT and AACACGT, are uniformly distributed among the five activated classes and fit an established consensus (G/A)(C/A)CACGT ( Figure 2 3) for the ABRE (Hattori et al., 2002).

PAGE 38

28 Figure 2 3. Comparison of ACGT flanking sequences. All possible combination of three base pairs were generated and appended to ACGT. The number of promoters containing each variant motif was calculated and an overall percentage was calculated based on total hits of all motifs. Each bar represents only those motifs that were at least 10% of the total the percentage in one or more of the classes. (A,C) as compared to (B,D,E) contain a higher percent of TACACGT (first column) (A E) columns 2 5 have similar percentages of motifs. (E) as compared to (A D) contains AAAACGT as its most frequent occurring ACGT containing motif, whereas (A D) doesnt contain any.

PAGE 39

29 Motifs associated with the ABRE Various coupling elements have been implicated in combinatorial interactions with the consensus ABRE eg Casaretto and Ho (2003). In order to identify conserved elements in ABA/VP1 regulated promoters that may distinguish response classes through combinatorial interactions with the ABRE, we analyzed sequences enriched in the near vicinity of ACGT containing motifs. To do this, we extracted 20 nucleotides located upstream and downstream ( Figure 4 a) of the two most frequent ACGT variants for each promoter class. MotifFinder was used to identify motif that were enriched in 20 base ABRE flanking segments relative to a set of randomly selected 20 nucleotide segments. Consensus motifs derived from the enriched 8 mer motifs were visually extracte d with the aid of MotifMapper and are summarized in ( Figure 4b ). One or more putative coupling motifs were identified for each of the VP1 activated response classes. In an interesting contrast, no significant coupling motifs were found in a set of 44 ABA activated genes that are not regulated by VP1. The Sph element and a C box like motif of 5 CAATT 3, which could possible serve as a LEC1 binding site, showed a significant association with ABREs in the VP1 and ABA activated class. Among the ABA depen dent class, a 5 CTTT 3 repeat was common however, there seemed to be high amounts of clustered motifs which contained mostly pyrimidine rich motifs and didnt correspond to exact 5 CTTT 3 repeats. Motif LOGOs To refine the definition of consensus ABRE flanking motifs, we used Weblogo ( http://www.bio.cam.ac.uk/cgi bin/seqlogo/logo.cgi ), an implementation of the sequence logo algorithm developed by Schneider and Stephens (1990). The most c onserved

PAGE 40

30 Figure 2 4. Summary of regulatory elements that are associated with ACGT core elements among VP1/ABA regulated genes. (A) Associations to ACGT core were calculated by comparing our Motif dictionary of 43,168 motifs with 20bp flanking ACGT core vs. 20 bases selected at random. A Chi Square value is calculated and the p value is returned for each motif. (B) Sequences were located by eye using the color output of MotifMapper. The table describes the motif, the perc entage of promoters containing each motif and the p value cutoff of each class analyzed. In the ABA regulated class, the lowest p value observed was 0.32 which is insignificant. 1.0E 03 50.0% TCTTCTT 1.0E 03 22.7% ACGT core VP1 act/ABA act VP1 "or" ABA dependent 1.0E 03 70.0% Sph 1. 0E 03 81.4% CAATT 1.0E 03 15.7% ACGT core activated VP1 "and" ABA dependent 1.0E 03 40.3% ACTACNNC 1.0E 03 25.8% CTTT repeat 1.0E 03 15.4% ACGT core activated ABA dependent 1.0E 03 65.4% TTTAANNC 1.0E 0 3 10.2% ACGT core activated VP1 dependent 0.32 No Associations activated ABA regulated p value cutoff % of promoters Motif VP1 /ABA regulated genes and associations to enriched motifs flanking ABA responsive elements xxxACG 20b p 20b p A B

PAGE 41

31 consensus motifs found by MotifMapper (p value = 0.001) were used as the se arch criteria to extract ten bases flanking each motif. Extracted sequences were then converted to logos to find best fit consensus sequences. Figure 2 5 illustrates consensus sequences for motifs associ ated with ACGT core sequences. In the activated VP1 dependent class, a four base motif, 5 TTAA 3, was found to have other correlated bases in positions 10,15 and 17 ( Figure 2 5a). Among activated ABA dependent genes a significant motif of ACTAC was used as search criteria in which another significant bas e in position 17 was identified ( Figure 2 5b). The activated subclass of VP1 and ABA dependent regulated class revealed a best fit motif of CAATT ( Figure 2 5c) and the VP1 activated/ABA activated subclass of VP1 or ABA dependent genes identified a con sensus motif of TCTTCT ( Figure 2 5d). The CTTT repeat motif found in the activated subclass of VP1 and ABA dependent genes was used in constructing a motif logo due to the lack of a definite consensus sequence. Analysis of cold regulated genes Analysis of VP1 regulated genes demonstrated the ability of MotifFinder to identify novel as well as previous known regulatory motifs. As an independent test of the MotifFinder approach, we analyzed a set of CBF regulated genes classified from microarray expressi on data reported by Fowler and Thomashow (2002). We analyzed upstream 600 base flanking regions of 293 genes representing three response classes identified by Fowler and Thomashow (2002): up regulated long term, up regulated transiently and down regulated by Cold. As expected, MotifFinder found G box and CBF binding motifs previously identified in CBF regulated genes (Fowler and Thomashow, 2002) confirming the validity and efficacy of the software.

PAGE 42

32 Figure 2 5. Sequence logos of consensus sequences were made to detect significant bases outside identical consensus motifs. Each underlined sequence was searched against significant bases of MotifFinder output and flanking sequences were returned for logo construction. (A) Sequen ce logos of VP1 dependent activated class matching the consensus sequence TTTAANNC. (B) ABA dependent activated class matching the consensus sequence ACTACNT. (C) VP1 and ABA dependent class matching the consensus sequence CAATT. (D) VP1 "or" ABA depe ndent matching the consensus sequence TCTTCT.

PAGE 43

33 In addition to the known G box and CBF binding sequences, MotifFinder identified several other significantly enriched putative cis elements among the different subclasses of CBF regulated genes ( Table 2 2 ). Ta ble 2 2. Summary of known and putative regulatory elements among cold regulated genes. Describes the motif, the percentage of promoters containing each motif and the p value cutoff of each class described by Fowler and Thomashow. Among 57 genes that were up regulated long term by cold, the CBF binding motif 5 CCGAC 3 and G box core 5 ACGT 3 were highly significant. Another highly significant motif of 5 GATAT 3 was found in 53 promoters and 40 promoters as single and tandem motifs respectively. ACGT motifs were also enriched in 150 genes that were up regulated transiently by cold. Among 86 genes that were repressed in response to 1.0E 05 38.5% TAGCTA 1.0E 05 64.1% CCGAC over expressers 1.0E 05 59.3% GTGTG down regulated 1.0E 05 40.0% ACNNGT up regulated transiently 1.0E 05 91.2% GATAT 1.0E 05 86.0% ACGT core 1. 0E 05 40.4% CCGAC up regulated long term p value cutoff % of promoters Motif Cold induced genes

PAGE 44

34 cold, 5 GT 3 repeats were highly significant indicating a possible cold regulated repr essor binding site. Fowler and Thomashow (2002) also obtained microarray expression data for CBF1, CBF2 and CBF3 over expressing lines and categorized constitutively expressed genes as part of the CBF regulon. The CBF regulon may be further divided into s ub regulons controlled by RAP2.1 and RAP2.6. Of the 39 CBF activated genes, 14 did not contain the known CBF binding motif 5 CCGAC 3. MotifFinder analysis revealed a enriched palindromic sequence, 5 TAGCTA 3, that is present in 9 of the 14 genes indi cating a potential cis acting determinant of this sub regulon of CBF genes. Distribution of motifs in VP1/ABA and cold regulated genes Graphing the distribution of the number of significant motifs versus their upstream locations indicates that putative ci s elements of VP1 regulated genes are generally located 100 to 300 base pairs upstream of the annotated translation start site ( Figure 2 6). By comparison, 75 randomly selected motifs mapped onto the VP1 regulated gene promoters are evenly distributed. T herefore, the uneven distribution of significant motifs is most likely due to a preferential location of cis elements near the transcription start site. Comparison of the distributions of enriched motifs in activated and repressed sub classes of the VP1 a nd ABA dependent genes indicated that the highly asymmetric distribution of enriched elements is characteristic of activated promoters (Figure 2 7 a ) whereas motifs enriched in repressed genes are uniformly distributed (Figure 2 7b). Similar patterns were observed in comparisons of ABA regulated (44 activated and 45 repressed) and ABA dependent response classes (64 activated and 18 repressed, data not

PAGE 45

35 shown). Other comparisons between activated and repressed genes were not done due to inadequate numbers o f promoters in either or both classes. In cold regulated promoters, a similar though less dramatic concentration of enriched motifs in a region 100 to 300 upstream of the annotated coding sequence was detected. As in the case of ABA/VP1 regulated genes, t he clustering of enriched motifs among Cold regulated genes was evident in activated promoters (Figure 2 8 a and 3 8 b), but not in down regulated promoters ( Figure 2 8c ). The strongest asymmetry was evident in genes activated in CBF over expressing lines ( Figure 2 8d). Distribution of the top 75 motifs enriched motifs 0 5 10 15 20 25 30 35 40 45 0 200 400 600 Distance from ATG Average over 20 bases VP1/ABA regulated Random Figure 2 6. Distribution of the top 75 statistically significant motifs among MotifFinder analysis of 353 VP1/ABA regulated genes. Black line indicates the distribution of the top 75 significant motifs from all VP1/ABA regulated genes; note the non linear distribution of co regulated genes compared to 75 random motifs. Gray line shows the distribution of 75 randomly generated motifs of identical length and degeneracy to those of the VP1/ABA regulated genes. Motifs were counted by deter mining the position of the proximal side of each motif. Number of hits were summed across all promoters and an averaged over 20 bases was calculated in order to smooth the distribution lines.

PAGE 46

36 Figure 2 7. Activated and Rep ressed subclasses show strikingly different distributions of cis elements in relation to the ATG start site. (A) The asymmetrical distribution of the top 75 significant motifs of the activated ABA dependent class and 75 randomly generated motifs which a re symmetrically distributed. (B) The symmetrical distribution of the top 75 significant motifs of the repressed ABA dependent class has a similar distribution to 75 randomly generated motifs. ABA regulated, VP1 dependent, and ABA dependent classes have a very similar distribution pattern but are not shown. Top 75 VP1 and ABA dependent activated 0 2 4 6 8 10 12 14 16 0 200 400 600 Distance from ATG Average over 20 bases VP1 and ABA activated 75 random motifs Top 75 Vp1 and ABA dependent repressed 0 1 2 3 4 5 0 200 400 600 Distance from ATG Average over 20 bases VP1 and ABA repressed 75 random motifs A B

PAGE 47

37 AlignACE We compared output of our motif finding program to AlignACE, a widely used implementation of the Gibbs sampling algorithm (Roth et al., 1998; Hughes et al., 2000; McGuire et al., 2000; McGuir e and Church, 2000; Schachter and Kohane, 2002; Masuda and Church, 2003). We searched for shared motifs in the VP1 regulated gene promoters using AlignACE and found ACGT containing motifs as well as SPH element but lacked other putative cis element found by MotifFinder ( data not shown ). In addition, the AlignACE output included various motifs of low complexity sequence such as single or double nucleotide repeats, which are abundant throughout the genome but which are not highly specifically enriched in a particular co regulated gene set (Hughes et al., 2000). When looking at the AlignACE output of the four subclasses of cold regulated genes, we found a few occurrences of the putative motifs found by MotifFinder/MotifMapper (data not shown); however, we wer e not able to build a consensus sequence using AlignACE itself due to the background of low complexity sequences. Because MotifFinder filters repetitive bases or tandem repeats through a comparison to a set of randomly selected promoter sequences, the sig nificant motifs found using MotifFinder are more likely to represent actual cis elements. Discussion Several problems inherent to the discovery of regulatory elements have thus far hindered computational analysis. Regulatory motifs are short so that fal se candidates are often encountered; gaps can occur in conserved sequences; varying degrees of degeneracy are tolerated, and the positions of the elements relative to the transcription start sites are not fixed. Despite the complexity associated with gene regulation, we have successfully developed software to isolate putative transcription factor binding sites.

PAGE 48

38 Figure 2 8. Distributions of significant and random motifs of the four classes of cold regulated genes defined by Fow ler and Thomashow. (A) Top 75 significant motifs in promoters of genes up regulated long term by cold along with 75 randomly generated motifs. (B) Top 75 significant motifs in promoters of genes up regulated transiently by cold along with 75 randomly gen erated motifs. (C) Top 75 significant motifs in promoters of genes down regulated by cold along with 75 randomly generated motifs. (D) Top 75 significant motifs in promoters that are regulated by CDF1, CDF2 and CDF3 over expressing lines along with 75 ra ndomly generated motifs. Up-regulated transiently by cold 0 2 4 6 8 10 0 200 400 600 Distance from ATG Average hits over 20 bases Up-regulated transiently 75 random motifs U Up-regulated long-term by cold 0 1 2 3 4 5 0 200 400 600 Distance from ATG Average hits over 20 bases Up-regulated longterm 75 random motifs Down-regulated by cold 0 1 2 3 4 5 0 200 400 600 Distance from ATG Average hits over 20 bases Down-regulated 75 random motifs CBF over-expressing genes 0 1 2 3 4 5 0 200 400 600 Distance from ATG Average hits over 20 bases CBF ox 75 random motifs B C D A

PAGE 49

39 MotifFinder, a program written in the highly portable Perl language, compares the frequency of a given motif in a control set of random promoters to the frequency of the same motif in a set of promoters grouped by expression data o r by other association. The algorithm may be extended to a variety of motif definitions including incorporation of gaps and higher degeneracy. Although the motif dictionary used in the analysis can be constructed having any combination of motif length an d complexity, empirically we find that using 8 mer motifs with 6 conserved and 2 degenerate bases provide a reasonable balance of sensitivity and simplicity for detection conserved cis elements. Searching with shorter or more degenerate motifs yielded inc reased the background noise level hindering discrimination of blocks of enriched sequence while searching for longer motifs caused some over represented sequences to fall out. Inclusion of 2 degenerate nucleotides allows for substantial variation among co nserved elements because a given cis element will match multiple overlapping 8 mers. For this reason, strongly conserved elements are revealed as clusters of significant 8 mer motifs. Hence, putative transcription factor binding sites of all sizes can be uncovered using the software. The combined application of MotifFinder and MotifMapper allows for rapid prediction of putative transcription factor binding sites and offers various methods for visualization of promoter structure. MotifFinder accurately d etected known binding sites such as G box and SPH elements in VP1 regulated genes as well as several other putative motifs and successfully identified ABA responsive elements and CBF binding motifs in a set of CBF regulated genes. In addition to previousl y characterized binding sites, two other putative binding sites were found among the CBF genes, one of which was already

PAGE 50

40 speculated to exist based on a proposed model by Fowler and Thomashow (2002) for differential regulation by RAP2.1 and RAP2.6. The loca tion distribution of significant motifs found in the CBF and VP1 gene sets indicate that distance from the transcription start site can be a useful parameter for prediction of co regulated sets (Kel et al., 2001). Binding proteins usually recognize a cons erved sequence in gene promoters; however, some bases within a conserved sequence may tolerate a certain degree of degeneracy. By searching with a degenerate motif dictionary, we eliminate the requirement for a perfectly conserved binding motif and allow for the possible inclusion of insignificant bases within a cis element. Gene regulation by cis elements can be dependent on position, orientation and location relative to other cis elements, all of which can be easily interpreted using the color displays produced by MotifMapper. For the analysis of VP1 and CBF genes, we employed a dictionary of 8 mer motifs containing 6 conserved and 2 degenerate bases to search for putative cis elements. Several other dictionaries were tested, but the 8 mer motif diction ary yielded the optimal stringency for analysis of cis elements within VP1 and CBF regulated gene promoters. Because cis elements can vary in size depending on the binding protein involved, analysis of other promoter sets using MotifFinder could be perfor med using alternate motif dictionaries depending on the degree of stringency desired. We recommend that future applications of the MotifFinder software include analysis using multiple motif dictionaries for optimization of stringency. MotifFinder is desig ned to located motifs that are overrepresented in a set of genes, yet a critical motif present in a smaller subset will not receive a high significance score

PAGE 51

41 among a larger gene set. For example, we easily isolated G box as overrepresented among 353 VP1 r egulated genes but the more diverse and less common Sph binding elements were not included in the highest scoring motifs. To identify Sph elements we analyzed smaller sub classes of VP1 regulated genes. We therefore caution that MotifFinder analysis shou ld be conducted on both large groups of related genes and potentially co regulated subsets for optimal probability of isolating less prevalent cis elements. In performance and sensitivity, MotifFinder compared favorably with AlignACE, returning fewer low c omplexity motifs while finding known ABA and cold response elements plus several previously unidentified motifs that may be biologically relevant. Gibbs sampling methods require a statistical model for the species or classes analyzed. MotifFinder elimina tes the reliance on random starting points by generating a non redundant dictionary representing all possible motifs. We have shown that MotifFinder can successfully find shared motifs among co regulated genes. While we have applied this analysis to upstr eam flanking sequences, the approach can be applied to comparisons of arbitrary DNA sequences where a comparable control set of random sequences can be constructed. The method is highly sensitive allowing detection of significant motifs in a set containin g a few co regulated promoters. MotifFinder and MotifMapper can be used as tools to identify putative transcription factor binding sites that are over represented in a given gene set, but further biochemical characterization should be performed to assess the role of putative cis elements i n regulation of gene expression.

PAGE 52

42 APPENDIX A SOFTWARE DOCUMENTATI ON Files/Descriptions: MotifFinderp1v4 Generates A motif dictionary composed of a non redundant set of sequences of a given length including a given number of degenerate bases. Gives a score for each motif pertaining to the number of promoters that contain that motif. A given number of control promoters are selected from a promoter database at random. MotifFinderp2v6 Compares a scored motif dictionary against a set of test promoters defined by the user and returns the p valu e for the Chi square Distribution. MotifMapperp1v3 Maps a list of motifs onto a set of DNA sequences in various colors depending on the number of hits on each base. Two lists of motifs can be projected onto DNA sequences as well, where three colors are us ed to indicate uniqueness and over lapped motifs. Config.ini Defines inputs, outputs and options for all three programs. MotifFinderp1v4: Arguments and Options: Requires an input file containing a large database of promoter sequences that that are uniq ue to the organism of interest. This file will be used for selecting a given number of promoters to be used as a control set and determine background sequence information. Requires an output file name for saving a motif dictionary file containing all pos sible motifs and scores. Options for total motif length, number of degenerate bases and number of randomly selected promoters. Output File Example:

PAGE 53

43 Using: motiflength=8, fourway=2 and twoway=0, generates the format shown below. The top definition line is a summary of options. 8 2 0 1000 43168 naccagng 87 naccagnc 88 naccagna 136 nnaccagg 46 nnaccagc 70 nnaccaga 214 nnaccact 189 Note: See sample config.ini file for configuration examples MotifFinderp2v6: Arguments and Options: Requires a file con taining a motif dictionary created in MotifFinderp1 Requires a file containing a list of promoters that you wish to find over represented motifs. Requires an output file name for saving motifs and p values. Requires an output file name a sub dictionary file which is comprised of the number of observed hits in the test set. This file can be used as a motif dictionary to analyze further subsets of the test set. Option for an adjustable p value cutoff. Only motifs below the cutoff will be stored in the o utput file. Output File Example: Motifs: Hits in test set: Hits in control set: Chi Square (p): cngncacg 11 51 2.75E 08 cgtgntna 16 100 1.87E 07 ccnacgng 10 51 9.07E 07 nnggacgt 8 37 2.06E 06 nacncgtg 16 111 2.35E 06 cncgtgtn 16 113 3.51E 06

PAGE 54

44 No te: See sample config.ini file for configuration examples MotifMapperp1v3: Arguments and Options: Requires either one or two lists of motifs to map onto sequences. Requires a file containing a list of promoters that you would like to map. Requires an output file name s for saving mapped promoters a file containing a summary of possible cis elements called a Strengthfile and a matrixfile containing base strength information for each promoter. Options for capitalization of significant bases, font size, and applying a score based on summing each bases value within the promoter. Output File Example: Mapping One Class: 20412_at||#F13M23.280#At4g25140 oleosin, 18.5K;|COORDS: 11864996 11865757 ttggctctggagaaagagagtgcggctttagagagagaattgagaggtt tagagagagatgcggcggcgat gagcggaggagagacgacgaggacctgcattatcaaagca G tg ACGTG g TG aaatttggaacttttaagag gcagatagatttattatttgtatccattttcttcattgttctaga ATG t C g CG gaacaaattttaaaa CT a a ATCC taaatttttctaattttgttgccaatagtggatatgtgggccgtatagaaggaatctattgaaggc ccaaacccatactgacgagcc caaaggttcgttttgcgttttatgtttcggttcgatgccaacgccacatt ctgagctaggcaaaaaac A a ACGTGTC T ttgaatagactcctctcgtta A CAC a TG C A gcggctgcatggt gacgccatt A ACACGTGGC CTACA at TGC atgatgtctcca T TGACACGTG a C T t C t CGT c TC ctttctta atatatctaacaaacactcctacctcttccaaaatatatacacatctttttgatcaatctctca ttcaaaa tctcattctctctagtaaacaagaacaaaaaa score :552 1 2 3 4 >4 OR Mapping Two Classes: 20412_at||#F13M23.280#At4g25140 oleosin, 18.5K;|COORDS: 11864996 11865757 ttggctctggagaaagagagtgcggctttagagagagaattgagaggtttagagagagatgcggcggcgat gagcggaggagagacgac gaggacctgcattatcaaagca G tg ACGTG g TG aaatttggaacttttaagag gcagatagatttattatttgtatccattttcttcattgttctaga ATG t C g CG gaacaaattttaaaa CT a a ATCC taaatttttctaattttgttgccaatagtggatatgtgggccgtatagaaggaatctattgaaggc ccaaacccatactgacgagcccaaaggttcgttttgcgttttatgtttcggttcgatgcca acgccacatt ctgagctaggcaaaaaac A a ACGTGTC T ttgaatagactcctctcgtta A CAC a T GCA gcggctgcatggt

PAGE 55

45 gacgccatt AACAC GTGGCC TACA at TGC atgatgtctcca T T GA CA CGTG a C T t C t CGT c TC ctttctta atatatctaacaaacactcctacctcttccaaaatatatacacatctttttgatcaatctctcattcaaaa tctcattctctctagtaaacaagaacaaaaaa score :552 Motif list 1 Overlap Motif list 2 AND Strengthfile: Pulls out significant bases of an adjustable minimum length with an adjustable minimum base threshold to help identify cis elements. Gives positional information and a score based on the summation of each bases value. (value based on added hits from each motif). 19009_at||#F2H17.12#At2g36270 352 TtCACGTGGAC 90 18991_s_at||#MGF10.3#At3g27660 159 ACGTGTCG 71 18718_s_at||#T20D16.13#At2g23240 397 CGaACA 14 128 CACaTGACACaTG 63 10 1 GCCACGT 55 Note: See sample config.ini file for configuration examples CONFIG.INI default [MOTIFFINDERP1] motiflength=8 fourway=2 twoway=0 numrandprom=1000 outputfile=c: \ perl \ data \ outfile.txt promfile=c: \ perl \ data \ promoter_database.txt [MOTIFF INDERP2] dictionaryfile=c: \ perl \ data \ motif_dictionary.txt promoterfile=c: \ perl \ data \ proms.txt

PAGE 56

46 outputfile=c: \ perl \ data \ mfoutput.txt subdictoutputfile=c: \ perl \ data \ subdict.txt pvaluecutoff=0.1 [MOTIFMAPPER] motiffile=c: \ perl \ data \ motifs.txt motiffile2=c: \ p erl \ data \ motif2.txt promfile=c: \ perl \ data \ proms.txt outputfile=c: \ perl \ data \ mapped.txt strengthoutputfile=c: \ perl \ data \ strengthfile.txt matrixoutputfile=c: \ perl \ data \ matrixfile.txt basethreashhold=3 minimummotiflength=5 capital=true scoreprom=true fontsize =10 mapone=true maptwo=false [EOF] END OF FILE DONOT REMOVE

PAGE 57

47 APPENDIX B SOURCE CODE: MOTIFFI NDERP1V4.PL #! /usr/bin/perl # MotifFinderp1v4.pl # counts number of PROMOTERS with one or more hits. # adds new anagram method lots faster # better memory usage # takes arguments from config.ini file ###################### ############## use strict; use warnings; my $configfile = 'c: \ perl \ programs \ config.ini'; my @arguments = config(); ### grabs arguments from config.ini my $arguments; my $motiflength = $arguments[0]; my $fourway = $arguments[1]; my $twoway = $arguments[ 2]; #non functional my $numrandprom = $arguments[3]; my $outputfile = $arguments[4]; my $promfile = $arguments[5]; my @list; print "Generating a motif dictionary: \ n \ tmotif size = $motiflength \ n \ tfour way bases = $fourway \ n"; print \ ttwo way bases = $two way \ n \ trandom promoter size = $numrandprom \ n \ toutput file = $outputfile \ n \ n"; my $starttime = time(); ### initial start time print \ nBuilding Motif List \ n"; my @newlist = buildmotiflist(); ### calls buildmotif list function print \ nScoring Motif List \ n"; my @motifdict = scorelist(@newlist); ### calls scorelist list function print \ nPrinting Motif List \ n"; open (OUT, ">".$outputfile); my $arraylength = scalar @motifdict; print OUT "$motiflength \ t$fourway \ t$twoway \ t$numrandprom \ t$arraylength \ n"; my $md; foreach $md (@motifdict) {

PAGE 58

48 print OUT "$md \ n"; } close OUT; ### prints the time elapsed from time() function stored as $starttime variable if (time() $starttime<=60){print \ t$arraylength motifs found in ",time() $starttime," seconds";} elsif((time( ) $starttime)/60<60){print \ t$arraylength motifs found in ",(time() $starttime)/60," minutes";} else{print \ t$arraylength motifs found in ",(time() $starttime)/3600," hours";} print \ n \ nProgram Finished \ n \ n"; exit(0); ################################ # # Subroutines # ################################# sub config { open (CIN, "$configfile"); my @config = ; close CIN; my $config = join( '', @config ); my $arguments; my @arguments; if ( $config =~ m/ \ [MOTIFFINDERP1 \ ](.+) \ [MOTI FFINDERP2 \ ]/is ) { $arguments = $1; } @arguments = split(/ \ n/, $arguments); foreach (@arguments) { if ( $_ =~ m/ \ =(.+)/is ) { $_ = $1; } } shift(@arguments); return @arguments; } sub buildmotiflist { print \ tgenerating all possible mo tifs..."; my $tnum;

PAGE 59

49 my $counter; my $motif; my @motif; my $ns = substr("nnnnnnnnnn",0,$fourway); for (my $number = 0;$number < 10000;$number++) { $tnum = $number; while ( 4 > length($tnum) ) { $tnum = "0".$tnum; } if ( substr($tnum,0,1 )+substr($tnum,1,1)+substr($tnum,2,1)+substr($tnum,3,1) == $motiflength ($fourway + $twoway) ) { $motif = ""; for ($counter = 0; $counter < substr($tnum,0,1);$counter++ ) { $motif = $motif."a"; } for ($counter = 0; $counter < substr ($tnum,1,1);$counter++ ) { $motif = $motif."c"; } for ($counter = 0; $counter < substr($tnum,2,1);$counter++ ) { $motif = $motif."g"; } for ($counter = 0; $counter < substr($tnum,3,1);$counter++ ) { $motif = $motif."t"; } $motif = $ns.$motif; @motif = split(//, $motif); permute([@motif], []); } } print "Done \ n"; print \ tremoving duplicates.............."; @list = removedups(@list); print "Done \ n"; print \ tremoving reverse compliments....."; my $revc ompline; my $line; while ( scalar @list > 0 ) {

PAGE 60

50 my %search = map { $_ => 1 } @newlist; $line = shift (@list); $revcompline = reverse $line; $revcompline =~ tr/acgt/tgca/; if ( !$search{$revcompline} || $search{$line} ) { unshift (@newlis t, $line); } } @newlist = removedups(@newlist); my $newlistline; foreach $newlistline (@newlist) { while ( $motiflength length($newlistline) ) { $newlistline = "n".$newlistline; } } print "Done \ n"; return @newlist; } sub removedups { my (@array) = @_; @array = sort(@array); my $previous = "not equal to $array[0]"; @array = grep($_ ne $previous && ($previous = $_, 1), @array); return @array; } sub scorelist { print \ topening promoter file............"; my (@newlist) = @_; my @randpromlist; ########## OPENS AND FORMATS A PROMOTER ARRAY open (PIN, "$promfile"); my @raw = ; close PIN; my $raw = join( '', @raw ); $raw =~ s/ \ n/=/g; $raw =~ s/>/ \ n>/g; $raw =~ s/= \ n/ \ n/g; chop($raw); @raw = split(/>/, $raw); shift(@raw); print "Done \ n";

PAGE 61

51 ########## scores a list of random promoters print \ tscoring in progress.............."; for (my $i=0; $i<$numrandprom; $i++) { my $randpos = int(rand(scalar @raw)); unshift (@randpromlist, $raw[$randpos]) } @raw = (''); my $mo tif; my $num; my @motifdict; my $randprom; my $m; my $rm; my $rcomp; foreach $motif (@newlist) { $num = 0; $rcomp = reverse($motif); $rcomp =~ tr/acgt/tgca/; $m = makewildcards($motif); $rm = makewildcards($rcomp); foreach $randprom (@r andpromlist) { if ( $randprom =~ m/$m/g || $randprom =~ m/$rm/g) { $num++; } } unshift (@motifdict, "$motif \ t$num"); } @newlist = ('empty'); print "Done \ n"; return @motifdict; } sub makewildcards { my ($m) = @_; $m =~ s/n/[actg] /g; return "$m"; } sub permute

PAGE 62

52 { my @items = @{ $_[0] }; my @perms = @{ $_[1] }; my $newperm; unless (@items) { $newperm = join( '', @perms ); while ( substr($newperm,0,1) eq "n" ) { $newperm = substr($newperm,1,length($newperm) ) } whi le ( substr($newperm,length($newperm) 1,1) eq "n" ) { $newperm = substr($newperm,0,length($newperm) 1 ) } push (@list, $newperm); if (scalar @list == 5000000) { @list = removedups(@list); } } else { my(@newitems,@newperms,$i); fore ach $i (0 .. $#items) { @newitems = @items; @newperms = @perms; unshift(@newperms, splice(@newitems, $i, 1)); permute([@newitems], [@newperms]); } } }

PAGE 63

53 APPENDIX C SOURCE CODE: MOTIFFI NDERP2V6.PL #! /usr/bin/perl # MotifFinderp2v6.pl # counts number PROMOTERS with one or more hits. # uses Math::CDF to acquire p value # takes arguments from config.ini file ################################################## use strict; use warnings; use Math::CDF qw(:all); my $configfile = 'c: \ perl \ programs \ config.ini'; my @arguments = config(); ### grabs arguments from config.ini my $arguments; my $dictionaryfile = $arguments[0]; my $promoterfile = $arguments[1]; my $out putfile = $arguments[2]; my $subdictoutputfile = $arguments[3]; my $pvaluecutoff = $arguments[4]; motiffinder(); calls motif finder function exit(0); ################################# # Subroutines # ################################ # sub config { open (CIN, "$configfile"); my @config = ; close CIN; my $config = join( '', @config ); my $arguments; my @arguments; if ( $config =~ m/ \ [MOTIFFINDERP2 \ ](.+) \ [MOTIFMAPPER \ ]/is ) { $arguments = $1; } @arguments = split(/ \ n/, $a rguments);

PAGE 64

54 foreach (@arguments) { if ( $_ =~ m/ \ =(.+)/is ) { $_ = $1; } } shift(@arguments); return @arguments; } sub motiffinder { ########## OPENS AND FORMATS A PROMOTER ARRAY AND COUNTS NUMBER OF PROMOTIERS open (PIN, "$promoterfile "); my @raw = ; close PIN; my $raw = join( '', @raw ); $raw =~ s/ \ n/=/g; $raw =~ s/>/ \ n>/g; $raw =~ s/= \ n/ \ n/g; chop($raw); @raw = split(/>/, $raw); shift(@raw); my $counter = 0; while ($raw =~ m/>/g) { $counter++; } ########## OPENS AND FOR MATS A MOTIF LIST open (MIN, "$dictionaryfile"); my @mlist = ; close MIN; my $mlist = join( '', @mlist ); chomp($mlist); @mlist = split(/ \ n/, $mlist); ########## EXTRACTS INFORMATION FROM DEFINITION LINE OR EXITS PROGRAM if ($mlist[0] =~ m/[a cgtn]/g) { print "definition line is missing, must be in the form: \ n oligosize \ tfixedbases \ ttwoway \ t#ofcontrolproms"; exit; } my $defline = shift(@mlist); my @defline = split(/ \ t/, $defline); my $mlistlength = scalar @mlist;

PAGE 65

55 ########## OPENS OU TPUT FILES open (OUT, ">".$outputfile); print OUT "Motifs: \ tHits in test set: \ tHits in control set: \ tChi Square (p): \ n"; open (DICTOUT, ">".$subdictoutputfile); print DICTOUT "$defline[0] \ t$defline[1] \ t$defline[2] \ t$counter \ t$mlistlength \ n"; ##### ##### COUNTS THE NUMBER OF OCCURENCES OF EACH MOTIF my $line; my $m; my $motif; my $rm; my $rcomp; my @mline; my $num; my $rawprom; foreach $line (@mlist) { $num = 0; @mline = split(/ \ t/, $line); $rcomp = reverse($mline[0]); $rcomp =~ tr /acgt/tgca/; $m = makewildcards($mline[0]); $rm = makewildcards($rcomp); foreach $rawprom (@raw) { if ( $rawprom =~ m/$m/g || $rawprom =~ m/$rm/g) { $num++; } } print DICTOUT $mline[0]," \ t",$num," \ n"; ########## DEFINES EACH I NPUT FOR THE CHI SQUARE SUB ROUTINE my $promsintest = $counter; my $hitsintest = $num; my $promsincontrol = $defline[3]; my $hitsincontrol = $mline[1]; if ($hitsincontrol == 0) { $hitsincontrol = 1; # fixes a divide by zero error (conserva tive value)

PAGE 66

56 } ########## CALCULATES CHI SQUARE VALUE my $exp = ($promsintest*$hitsincontrol)/$promsincontrol; my $obs = $hitsintest; my $chi = (($obs $exp)*($obs $exp))/$exp; my $pvalue = pfromchi($chi); ########## ONLY PRINTS MOTIFS THAT HAVE A LOWER P VALUE THAN THE CUTOFF if ( $pvalue <= $pvaluecutoff) { print OUT $mline[0]," \ t$num \ t",$mline[1]," \ t$pvalue \ n"; } } close OUT; } sub makewildcards { ### Translates any n's to [acgt] for pattern matching my ($m) = @_; $m =~ s/n/[actg]/g; return "$m"; } sub pfromchi { my ($chi) = @_; my $df = 1; ########## CALCULATES PVALUE my $ncp = 0; # the optional non centrality parameter my $p = pchisq($chi, $df, $ncp); $p = 1 $p; return $p; }

PAGE 67

57 APPENDIX D SOURCE CODE: MOTIFMA PPERP1V3.PL #! /usr/bin/perl # MotifMapperp1v3.pl # maps one or two classes of motifs onto a promoter in color # each promoter is given a relative score based on significant bases # prints a Motiffile that contains actual pun itive motifs of a minimum base score and motif length # prints a matrix version of the color output for graphing in a spreadsheet. # takes arguments from config.ini file # internal config for: # one class vs two classes # file names and paths # capitalizat ion of significant bases # promoter scores # fontsize ################################################## use strict; use warnings; my $configfile = 'c: \ perl \ programs \ config.ini'; my @arguments = config(); ### grabs arguments from config.ini my $argume nts; my $motiffile = $arguments[0]; my $motiffile2 = $arguments[1]; my $promfile = $arguments[2]; my $outputfile = $arguments[3]; my $strengthoutputfile = $arguments[4]; my $matrixoutputfile = $arguments[5]; my $basethreashhold = $arguments[6]; my $minimum motiflength = $arguments[7]; my $capital = $arguments[8]; my $scoreprom = $arguments[9]; my $fontsize = $arguments[10]; my $mapone = $arguments[11]; my $maptwo = $arguments[12]; $mapone = uc($mapone); $maptwo = uc($maptwo); $fontsize = $fontsize*2;

PAGE 68

58 $capit al = uc($capital); $scoreprom = uc($scoreprom); my $line; my $rawline; my @rawlinearray; my $annotation; my $prom; my @prom; my $raw; my @prommatrix; ########## OPENS AND FORMATS A PROMOTER ARRAY open (PIN, "$promfile") || die; my @raw = ; close PIN ; $raw= join( '', @raw ); $raw =~ s/ \ n/=/g; $raw =~ s/>/ \ n>/g; $raw =~ s/= \ n/ \ n/g; chop($raw); @raw = split(/>/, $raw); shift(@raw); ########## selects the mapping of one class or two classes of motifs based on config if ($mapone eq "TRUE") { maponeclas s(); } elsif ($maptwo eq "TRUE") { maptwoclass(); } else { print 'Either mapone or maptwo must be TRUE'; exit(0); } ########## end of rtf file print OUT \ \ par \ n"; print OUT \ }"; close(OUT); close(STRENGTHOUT); close(MATRIXOUT); exit(0); ######### ######################## # Subroutines # #################################

PAGE 69

59 sub config { open (CIN, "$configfile"); my @config = ; close CIN; my $config = join( '', @config ); my $arguments; my @arguments; if ( $config =~ m/ \ [MO TIFMAPPER \ ](.+) \ [EOF \ ]/is ) { $arguments = $1; } @arguments = split(/ \ n/, $arguments); foreach (@arguments) { if ( $_ =~ m/ \ =(.+)/is ) { $_ = $1; } } shift(@arguments); return @arguments; } sub maponeclass { ########## OPENS AND FORMAT S MOTIF ARRAY 1 open (MIN, $motiffile) || die; my @list1 = ; close MIN; my $list1 = join( ',', @list1 ); $list1 = lc($list1); $list1 =~ s/ \ n//g; @list1 = split(/,/, $list1); @list1 = revcomp(@list1); ########## OPENS AND FORMATS A RTF FILE FOR C OLOR,STRENGTH AND MATRIX OUTPUT open (STRENGTHOUT, ">".$strengthoutputfile); print STRENGTHOUT \ { \ \ rtf1 \ \ ansi \ \ ansicpg1252 \ \ deff0 \ \ deflang1033 \ { \ \ fonttbl \ { \ \ f0 \ \ fmodern \ \ fprq1 \ \ fcharset 0 Courier New \ ; \ } \ } \ n \ \ viewkind4 \ \ uc1 \ \ pard \ \ f0 \ \ fs$fontsize "; o pen (OUT, ">".$outputfile); print OUT \ { \ \ rtf1 \ \ ansi \ \ ansicpg1252 \ \ deff0 \ \ deflang1033 \ { \ \ fonttbl \ { \ \ f0 \ \ fmodern \ \ fprq1 \ \ fcharset 0 Courier New \ ; \ } \ } \ n \ { \ \ colortbl \ ; \ \ red255 \ \ green0 \ \ blue0 \ ; \ \ red128 \ \ green0 \ \ blue128 \ ; \ \ red0 \ \ green0 \ \ blue255 \ ; \ \ red255 \ \ g re en0 \ \ blue255 \ ; \ \ red0 \ \ green128 \ \ blue128 \ ; \ } \ n \ \ viewkind4 \ \ uc1 \ \ pard \ \ f0 \ \ fs$fontsize ";

PAGE 70

60 open (MATRIXOUT, ">".$matrixoutputfile); ########## maps motifs foreach $rawline (@raw) { @rawlinearray = split(/=/, $rawline); $annotation = $rawlinearray[0 ]; $prom = $rawlinearray[1]; $prom =~ s/ \ s//g; $prom =~ tr/ACGT/acgt/; my $seq = $prom; $prom =~ s/a/a0,/g; $prom =~ s/t/t0,/g; $prom =~ s/c/c0,/g; $prom =~ s/g/g0,/g; @prom = split(/,/, $prom); ########## degenerate motif matched against pr omoter ########## makes reverse comps my @tobemapped = getrealmatches($seq, @list1); @tobemapped = sort(@tobemapped); my $prev = "not equal to $tobemapped[0]"; @tobemapped = grep($_ ne $prev && ($prev = $_, 1), @tobemapped); ########## Finds the position of the motif ########## increases the value of the base ########## use significant n's if true in config my @nline; my $motifbase; my $position; foreach $line (@tobemapped) { @nline = split(/,/, $line); $position = $nline[ 1]; @nline = split(//, $line); foreach $motifbase (@nline) { if ($motifbase eq "a" || $motifbase eq "c" || $motifbase eq "t" || $motifbase eq "g") { my $base = substr($prom[$position],0,1); $prom[$position] = $base.(substr($prom[ $position],1,length($prom[$position]) 1) + 1); }

PAGE 71

61 $position++; } } ########## capitalizes significant bases if set to true in config if ($capital eq "TRUE") { my $arraylength = scalar @prom; for (my $i = 0; $i < $arraylength; $i ++) { if ( substr($prom[$i],1,length($prom[$i]) 1) >= 1 ) { my $ucase = uc(substr($prom[$i],0,1)); $prom[$i] = $ucase.substr($prom[$i],1,length($prom[$i]) 1); } } } ########## prints formatted promoter array (@prom) to a text file using color ########## @prom carries bases, number of hits and position information. printrtf ($scoreprom, @prom); } printmatrix (@prommatrix); } sub maptwoclass { ########## OPENS STRENGTH FILE FOR OUTPUT NOT REALLY USED FOR TWOCLASS open (STRENGTHOUT, ">".$strengthoutputfile); print STRENGTHOUT \ { \ \ rtf1 \ \ ansi \ \ ansicpg1252 \ \ deff0 \ \ deflang1033 \ { \ \ fonttbl \ { \ \ f0 \ \ fmodern \ \ fprq1 \ \ fcharset 0 Courier New \ ; \ } \ } \ n \ \ viewkind4 \ \ uc1 \ \ pard \ \ f0 \ \ fs$fontsize "; ########## OPENS AND FORMATS MOTIF ARRAY 1 open (MIN, $motiffile); my @list1 = ; close MIN; my $list1 = join( ',', @list1 ); $list1 =~ s/ \ n//g; @list1 = split(/,/, $list1); @list1 = revcomp(@list1); ########## OPENS AND FORMATS MOTIF ARRAY 2

PAGE 72

62 open (MIN, $motiffile2); my @list2 = ; close MIN; my $list2 = join( ',', @list2 ); $list2 =~ s/ \ n//g; @list2 = split(/,/, $list2); @list2 = revcomp(@list2); ########## OPENS AND FORMATS A RTF FILE FOR OUTPUT open (OUT, ">".$outputfile); print OUT \ { \ \ rtf1 \ \ ansi \ \ ansicpg1252 \ \ de ff0 \ \ deflang1033 \ { \ \ fonttbl \ { \ \ f0 \ \ fmodern \ \ fprq1 \ \ fcharset 0 Courier New \ ; \ } \ } \ n \ { \ \ colortbl \ ; \ \ red255 \ \ green0 \ \ blue0 \ ; \ \ red128 \ \ green0 \ \ blue128 \ ; \ \ red0 \ \ green0 \ \ blue255 \ ; \ \ red255 \ \ g reen0 \ \ blue255 \ ; \ \ red0 \ \ green128 \ \ blue128 \ ; \ } \ n \ \ viewkind4 \ \ uc1 \ \ pard \ \ f0 \ \ fs$fontsize "; ########## maps motifs foreach $rawline (@raw) { @rawlinearray = split(/=/, $rawline); $annotation = $rawlinearray[0]; $prom = $rawlinearray[1]; $prom =~ s/ \ s//g; $prom =~ tr/ACGT/acgt/; my $seq = $prom; $prom =~ s/a/a0 ,/g; $prom =~ s/t/t0,/g; $prom =~ s/c/c0,/g; $prom =~ s/g/g0,/g; @prom = split(/,/, $prom); ########## degenerate motif1 matched against promoter ########## makes reverse comps my @tobemapped1 = getrealmatches($seq, @list1); @tobemapped1 = sort (@tobemapped1); my $prev1 = "not equal to $tobemapped1[0]"; @tobemapped1 = grep($_ ne $prev1 && ($prev1 = $_, 1), @tobemapped1); ########## degenerate motif2 matched against promoter ########## makes reverse comps my @tobemapped2 = getrealmatch es($seq, @list2); @tobemapped2 = sort(@tobemapped2); my $prev2 = "not equal to $tobemapped2[0]"; @tobemapped2 = grep($_ ne $prev2 && ($prev2 = $_, 1), @tobemapped2);

PAGE 73

63 ########## Finds the position of the motif ########## increases the value of th e base my $line1; my @nline1; my $motifbase1; my $position1; foreach $line1 (@tobemapped1) { @nline1 = split(/,/, $line1); $position1 = $nline1[1]; @nline1 = split(//, $line1); foreach $motifbase1 (@nline1) { if ($motifbase1 eq "a" || $motifbase1 eq "c" || $motifbase1 eq "t" || $motifbase1 eq "g") { my $base1 = substr($prom[$position1],0,1); $prom[$position1] = $base1.1; } $position1++; } } my $line2; my @nline2; my $motifbase2; my $position2 ; foreach $line2 (@tobemapped2) { @nline2 = split(/,/, $line2); $position2 = $nline2[1]; @nline2 = split(//, $line2); foreach $motifbase2 (@nline2) { if ($motifbase2 eq "a" || $motifbase2 eq "c" || $motifbase2 eq "t" || $motifbase2 e q "g" ) { if ( substr($prom[$position2],1,length($prom[$position2]) 1) == 0 ) { my $base2 = substr($prom[$position2],0,1); $prom[$position2] = $base2.3; } elsif ( substr($prom[$position2],1,length($prom[$position2]) 1) == 1 ) { my $base2 = substr($prom[$position2],0,1);

PAGE 74

64 $prom[$position2] = $base2.2; } } $position2++; } } ########## capitalizes significant bases if set to true in config if ($capital eq "TRUE") { my $arrayle ngth = scalar @prom; for (my $i = 0; $i < $arraylength; $i++) { if ( substr($prom[$i],1,length($prom[$i]) 1) >= 1 ) { my $ucase = uc(substr($prom[$i],0,1)); $prom[$i] = $ucase.substr($prom[$i],1,length($prom[$i]) 1); } } } ########## prints formatted promoter array (@prom) to a text file using color ########## @prom carries bases, number of hits and position information. printrtf ($scoreprom, @prom); } } sub getrealmatches { my ($seq, @l) = @_; my @tobemapped ; my $motif; my $term = 0; foreach $motif (@l) { my $m = makewildcards($motif); while( $seq =~ m/$m/g ) { my $length = length($`); unshift(@tobemapped, $motif.",".$length);

PAGE 75

65 } } return @tobemapped; } sub makewildcards { my ($m) = @_ ; $m =~ s/n/[actg]/g; return "$m"; } sub revcomp { my (@rc) = @_; my @fulllist; my $revcomp; foreach $line (@rc) { unshift(@fulllist, $line); $revcomp = reverse $line; $revcomp =~ tr/acgt/tgca/; unshift(@fulllist, $revcomp); } return @ful llist; } sub printrtf { my ($scoreprom, @prom) = @_; buildprommatrix(@prom); my $basescore = 0; my $score = 0; push (@prom, "z0"); my $arraylength = scalar @prom 1; my $i = 0; print OUT \ \ b ",$annotation," \ \ b0 \ \ par \ n"; while ($i < $arraylength ) { if ( substr($prom[$i],1,length($prom[$i]) 1) == substr($prom[$i+1],1,length($prom[$i+1]) 1) ) { print OUT substr($prom[$i],0,1); } elsif ( substr($prom[$i],1,length($prom[$i]) 1) != substr($prom[$i+1],1,length($prom[$i+1]) 1) )

PAGE 76

66 { if ( substr($prom[$i+1],1,length($prom[$i+1]) 1) >= 5 ) { print OUT substr($prom[$i],0,1), \ \ cf5 "; } elsif ( substr($prom[$i+1],1,length($prom[$i+1]) 1) < 5 ) { print OUT substr($prom[$i],0,1), \ \ cf", substr($prom[$i+1],1,leng th($prom[$i+1]) 1), "; } } $i++; } print OUT \ \ par \ n"; if ($scoreprom eq "TRUE") { foreach $line (@prom) { $basescore = substr($line,1,length($line) 1); if ($basescore > 0) { $score = $score + $basescore; } } print OUT "score \ :", $score; } print OUT \ \ par \ n \ \ par \ n"; #### base Strength file my @annotationarray = split(/ \ s{3,3}/, $annotation); print STRENGTHOUT \ \ b ",$annotationarray[0]," \ \ b0 \ \ par \ n"; $i=0; my $zerocount = 0; my @strengthmotif; my $stre ngthmotif; my $strengthmotifsize; my $strengthline; my $strengthmotifpos=0; my $strengthmotifscore=0; while ($i < $arraylength) { if (substr($prom[$i+1],1,length($prom[$i]) 1) >= $basethreashhold & $zerocount >=2 | $i == $arraylength 1 ) {

PAGE 77

67 if (scalar @strengthmotif >= 4) { for (my $j = 0; $j<2; $j++) { if ( substr($strengthmotif[0],1,length($strengthmotif[0]) 1) <$basethreashhold ) { shift @strengthmotif; } $strengthmotifsize = scalar @strengthmoti f; if ( substr($strengthmotif[$strengthmotifsize 1],1,length($strengthmotif[$strengthmotifsize 1]) 1) <$basethreashhold ) { pop @strengthmotif; } } } if (scalar @strengthmotif >= $minimummotiflength) { print STREN GTHOUT \ \ par \ n"; print STRENGTHOUT ", $strengthmotifpos + scalar @strengthmotif 1, \ t"; foreach $strengthline (@strengthmotif) { $strengthmotifscore = $strengthmotifscore + substr($strengthline,1,length($strengthline) 1); print STRENGTHOUT substr($strengthline,0,1); #print STRENGTHOUT substr($strengthline,1,length($strengthline) 1); } print STRENGTHOUT \ t", $strengthmotifscore; $strengthmotifscore = 0; } @strengthmotif=""; shift @strengthmotif; $zero count = 0; } if (substr($prom[$i],1,length($prom[$i]) 1) >= $basethreashhold) { push( @strengthmotif,$prom[$i] ); $strengthmotifpos = $arraylength $i; } elsif ( $i>=1 & substr($prom[$i 1],1,length ($prom[$i 1]) 1) >= $basethreashhold) {

PAGE 78

68 push( @strengthmotif,$prom[$i] ); } elsif ( substr($prom[$i+1],1,length ($prom[$i+1]) 1) >= $basethreashhold) { push( @strengthmotif,$prom[$i] ); } elsif (substr($prom[$i],1,length($prom[$i]) 1) < $basethreashhold) { $zerocount++; } $i++; } print STRENGTHOUT \ \ par \ n \ \ par \ n"; } sub buildprommatrix { my (@buildmatrix) = @_; my $buildmatrix = join( '', @buildmatrix ); $buildmatrix = lc($buildmatrix); $buildmatrix =~ tr/actg/>>>>/; push( @prommatrix,$annotation.$buildmatr ix ); } sub printmatrix { my (@prommatrix) = @_; my $prommatrix; my $j=1; while ( $prommatrix[0] =~ m/>/g ) { $j++; } for( my $i=0; $i<$j; $i++ ) { my $prommatrixline; foreach $prommatrixline (@prommatrix) { my $prommatrixlinearray; my @prommatrixlinearray = split(/>/, $prommatrixline); print MATRIXOUT $prommatrixlinearray[$i]," \ t"; } print MATRIXOUT \ n"; } }

PAGE 79

69 APPENDIX E REPRESSED ABA REGULA TED GENES 17867_at||#F6F22.30#At2g19670 putative arginine N methyltransferase;|COORDS: 8449663 8447827 aacaggaaactgaaaaaatgtaatttcagaagcaaggcatccaaaagca TAAG g C t TT t C a CA C ATATC a A gctaaatcactaaaaccctagtcctcaattccagactaacaaatcg aacgaatccgaattatggaaaaatt aagcaattcaaaacctaa A aa GCATT ATCGCAG T G A a GAG atttcaaattagatctgagactcaaacacat aaaaaaacaaactttaaacccctaaaaaccaaaatccaaaaga GC g A a GAA gaattttagaccaaaatatc agaatcttaaacgaaataaaagaagaatgaatcggtgatcgtaagcaagccataccttttggccgtgctca ccactcacca CT a C c AGC gtcagaatcttttcctccaaattt CAT ct ACA ct GC ct TATA attacgacaat aaccgaaccggatcggtataaaccaaaccggtttatctgtcaaaaacaattgaaccggcggctgaa ATAGA at C A T AC A CATA a CAT a AAG tt C G TA T C c A tcctaaaaccctaaatctttttgcttattcgaatctgtt A G CC A TC C A aaacctctct CT t TGCC gcaaaaaa score :138 12483_at|12484_g _at||#F13M22.19#At2g37690 putative phosphoribosylaminoimidazole carboxylase;|COORDS: 15754654 15758783 g GCATT t TC ttccgtctcctcctttccgtctgataaaaatccgatctttggtgcgaccaaatcaaat CGT g GA g C ctaatcaaaaaccacatctcatcatacccggttcggttttcttaaccgaatcaaat TAT ac ATA aaa cggatt aggcttctccggttttcgactgcgttttgtataaaacctctcaaaaaccttttcttctctccccg tcagaccgatcttctctctcctcgccgccgcctccaaatccggaagaagcttttttccacctctattatcg aactt GC tt TATA aaaatctaacct T t G g ACCT gactttccgacttctctctctgt CTAGCT gcttctct C a CCC a GG tttgccttcc T ct CAAGT t G aagctgcggctttat TAG t T t A G ctattgactctaagaat GTT A C T T G a A tc CAT tc ACA tgagcttcttacgtcgttttcattggtttagctattgacgt T G a CT TG G gttttg cttgagttcattctctattgtggtctaccatacttttt CA t GT T AC t T G attgca TAT tt ATA aaacggtg ctaaaagtattccgttttgt A ta GCATT c T ag score :144 16412_s_at||#F18E5.20#At4g21400 serine threo nine protein kinase like protein;receptor protein kinase (IRK1), Ipomoea trifida, gb:U20948|COORDS: 10366207 10363716 atc GT T AC t T G atttcgtttta AGGT a C a A tttgatagaag TAT at ATA ta TG tc AGTA agtttagaaaat T g GATA a G agatttatattgagtccacata TAT aa ATA tgagagttgaaagatgatccactaa t CAT ct AC A gaagtaaaagaaattgtaaaataa AT G G C T T A G AC gaaattagatggttcctagttagtcttatatttt C t AGT a GG tttgaaaagtattcaaactctttagttttctttcctggtact TGTG g G g A aagaag TGA GC A A T GA A cacttctaatttaaataacgttttcaca C CAAG TC G ttagtttc ATA tt GTG agtctttgaatc TACT a C C A T C A T G GAC C cc C acaacct CAT a AAG act CC ACT T G t G tttg CT a CT g GG ttttataattgttattg atcaacaagttttgttctttctgctgttggatattgatt TAG t TCA aagaccatactgcttttgaataatt ccaa G G T C A AGT TAG a T tgacttcttccatttttagactcgttattacattctggggttttctcatataga ttacagaagaagaaaaaag CT t C a AGC gtcca score :222 20345_at||#T15B16.5#At4g01700 putative chitinase;similar to peanut type II chitinase, GenBank accession number X82329, E.C. 3.2.1.14|COORDS: 732487 731413 acttcaaaaacaattctttgatagttttgtt T TC ATT GC taatatttctctcaatttttttttattgtttt tatttttgtatagaatagtattcaaaatgaaactataattcagtatggaacactttct gttttatttaata tttacttcataatgaattatgtttcaaattttcaaagaata TAT tt A TA t TTG a A tttggt T a G t ACCT tc caaagaagcttaat T c C c CACA ttaaaatt C a A CTTG G tcttcatattaacattattcaattgaattcaga

PAGE 80

70 ctgaaa TGGT t C A C tt TAT ctggtttgaaatatatgtcat GCATT t TC caaatcaagaatgaatcttcttg aaagaaaaaaaatgagaaaaaagaagaagg agaggaatgggt CTC c T a AC aatgaaaaagagaataaaccc ttaa A TA a T TGCA A c TA T ataaaataaaaagtataaactccactaaaaaaagtcaaaa CGT a GA c C aaaac aaaaa C A t GT A AC TAAG t C g T ccactactacatctctat G ac TCTAT ataacaa C a AAG c CA aaaccacat c T t C t CACA caaaacacaaaaaagaacaaata score :144 18228_at||#MDC8.16#At3g16530 putative lectin;similar to lectin SP:P02874 (Onobrychis viciifolia); contains Pfam profile: PF00139 legume lectins beta domain|COORDS: 5625415 5624585 ttattagcatgattacttttatctttcttttttatcaaaaaactttttaatgaga TTGCC a A acgtttttt tgca T c C a CGTA aactcctc TG a A c GTA aacttttccagagtgtatttatgtggacgtcagaaaaacgtat tttgca T c C A C GT AA C g A aaa TAG t TCA tataaactaaaagtggct A t G t CT TA CA ga TA tgtacacgacg catgaaattcgtaatgtttcgtagagaaaggggacaagaaaatataattaatcaattattattcaaaacaa tttataatcgattttttttttttcttataattaatt A t TACT c G aatcaattatattttatcaaaagaata aaattaatt TGT tc ATG acatctgatg GGTC c A t T t TCAC TG C T A GG ccctttgtggtggcgtccgatctg aatcgaat C C A CT T G GC ttaattt A t T t GACC atttttcgaagtataaatttgaatattcaaa CT c G A T G G C T TA cgttatcccttatataaa CAC at TAT caacc T c CAA a TA ttttcatcatcaatctttcaatcacaaa aaacaacaacaccaaaaaaaaaacgcacacaa score :168 18235_at||#T7N9.8#At1g27020 unknown protein;|COORDS: 9378756 9380540 ttttga C T G A t C T A GT c GG acataaccaaacacccatttactgttattatcatctggaaacccaatgagaa gataagagttttatgaagca C a C a AGTG agaaacaaaaataaggaacatatttggtt G ga TCTAT cattaa accaa AG a T A G A T A A G AGGG aaacgaaactggagtaat aatcaaaatacaatctaccatg T a GATA g G gaa A a TACT c G gtggttt TA g TTG a A aaatgtaaa TA g TTG a A agttatcaaaattttctaatatttaaacacc aa A g T t GA C C A A G A T G G caaaaaggcaagaaacaagaaaatgaaagacaactatcgttatgatacaa A t T t GACC aaaatgaagatatgataatgtttccaaaattctcga T G CA AG T A GTT c GCTCA attat T t GATA a G a atgatttc C C C A C T A G C T CA ataaaaaacaattaatccttgtacgcaagttatcatagtgaa GCATTGAC A c TG tgac CC T t G GC A T T AT A aattacgaatttaagataaagaagagaagaatcaagcaaaagttgtgtttc t CAT a AAG atctcaaagatattcctactacac score :282 17957_at||#F26B6.25#At2g23600 putative acetone cyanohydrin lyase;|COORDS: 99 92188 9990872 aaaattatgttttatggtttctccattgatttaaacttataggatacagatttata CAC ca TAT ttcatga taatctctatcaaaataatgaatttataagcaatcgagtaaa T A CA A a TA TAG gc GCC gaatggagatgta cggtacaactgcgaatttttt TAAG t C a T aagtaatttttaaaaaatgttatctctatttt T a G T T AC G A G T A a T tgttttaatattatatcaaaaat taatattattg TC C A A C T AG tgtg A t TT G A CC TG gtacatgaca cggaacaaacatctaaca C a ACT TG t CA AGT t G tatatcactgtaacatcagctttaattagaaagatact tttatttt GCATT t T aa TA ag TGTA aagtttcttatccgcagagccaaaatgtcgcttttaaaaagaaaaa ttccaaagttattt CTT t ATG caagaaaaaattcgaaattttaattatcacgtaaaatcataaaagaatc t caaaggaaagcgacaattttggaaaattcaacgataa GC tt TATA atttttctacttatgagtgga TGT G T GAAGAG tagggaaaaagtacgaaaggaaaata score :168 20114_at||#F11I11.30#At4g34790 putative protein;auxin induced protein 10A, Glycine max., PIR:JQ1099|COORDS: 15559030 15559356 tcacacat attttaatataatactaaaagatttgttatttcaaaattatatatggaaaa CCAT C T CGT t AC tcatcttt C g T CC c AG c AG tac GGC cc CTA gtttgagagcgaaggaaaatatactttaaaagactttggaa cttat TAT ac ATA ctct T t CAA c TA taacaatctccaatattgctaaaag T g GCTCA aga A CTG GACC ca C ttagtctgat CCAA G T CA C at TAT tcaccatttcgaggctctaatttaa TA T aa AT A c CTA a CT ttt TAT a t A TA t T TG CA at CAT at ACA agaaacttatgttatgt GCATT t T atttttatataaaga TGA G C T A AG c C A T g TGAG C c AGATGG atgacgg GAGGGC ccaacagcccacaactttttttttttttttttagtaatcaaaat atcgcatcgtactgttttttta GCA T T G A aagtcttttccttatcttgaagttacatactttaagttcact

PAGE 81

71 ac TAT aa ATA GA aa C ttcttc CC CTC g T tgtc T at CAAGT catttactttttaaaaaaatttacctaaaca aa T t GATA a G ttccaaacataaataagaagac score :228 17425_at||#F6I18.290#At4g30800 ribosomal protein S11 like;ribosomal protein S11, Arabidopsis thaliana,PIR2:C35542|COORDS: 13965709 13966885 ttt CAC gt TAT ccaac tt TG ag AGTA ctgtccttgtgttggccaataagtggttgaatatatggtaaaagt ataaaagctc CC A T C AT G TT G G A C C T A C TA G C T tt GCTA T AT A TA aa A TG t CC T G T C A t TG C ataacgtac cactttctgaatctgttt TA cg TG C CTA TC a A tgatttgtctttaacacttaccgtaatattc T A CA A t TA agttcaag TGCC T A G attttctaaatgtgtataaagacatta TATA tg GC aaatgatcc tttagataacca tttcatttaaatcaagcattcgattc GCATT c AT agataaaatactattcacc GGTC c AG T tctaact A G C C A TC C A G tctggttagaaagcaaaattgtttattttattttaggttcagttttcagatcggttttctaaca ctgttttggtt TGG T t C t C GTA tcggttcaaccggattggatagt CAT a AAG aatttgaagtccttaatta agcccaaaa CAAGT a G agccca TAT at A TA C T aa CA aagccgtgattagggtttttggttttcgtcagaag atttcaaggtttagggtttctgcaattcaacc score :252 20372_at||#T9A21.50#At4g18200 putative protein;W30DMY32F,W15DMY32F|COORDS: 9034215 9039562 agaagaccaagagcttcctcagtctaaagaagaagaagaacaaaaa CAAGT a G ataccattcat GTC c A a G C t T A GGC A AAG atccaagaaaaccaaaagcttttgtaatttgaaccagtttaatctaaaataaaaatgagt TAT at ATA ttg TAGGCA c T A CA A a TA aaattggtaagtgtctcaaggtctcacagt CTT c ATG tatctttg ttgacttctgaat C CA A G t C A AGT gtactt ATG g CCT tcccgtcttgttgtatcgttgttgacttccaaat ccacaaaatagat TACT aa CA taacaaaaa C a AGTA a T aattttttt t T tt C A AGT G G AC g G a C ACTT G A G C c A ataagattagaataagatacaacaatatcaa G cg TCTAT atatctgttttcaccaaactcagcctact ttaacaacgttgtgtataagaagttttcttaaaaccgccaattaaaaggtttcaataatt C a AAG t CA acg gtaaacatctttacagaaatatgatgcacttgttcccaccaaccttca TAT at ATA aaagctttaccaaaa ctctgtt TGTG g G a A gtcc tcttagagtggtc score :174 12430_at||#F13M14.20#At3g10520 class 2 non symbiotic hemoglobin;identical to class 2 non symbiotic hemoglobin GB:AAB82770 (Arabidopsis thaliana)|COORDS: 3277772 3276523 cgagcgaacagctcaatataataggaattgtcagcttctctta CAT g AAG ctgtga gttttttctttttct ttttgtgtcttactaaattatttttttattaactactttttaa GC g A t GAA ttgaatttattgtcactatt tttctttcttcttaat TATA ca GC tgattt TGTG T G a A A GTA gaaaagaacaaatgattgaagctatg C gg GATGG agattttactacgcagaa GA c AATGC aagtttttatttatctttgtttgttccttttaatcttaac t TA t T TG T A tca A t CTA a CT catgtatt atcctacgtctatctagactgatc GGAC g G a CATA at GTGTAT cc ATA t TTC t T g G CTACG cgtgtccaccttttagaga CT a TGCC tt TA ggtagtagatgttttactacaaa ataaacatatttagtcaaataaaataaaatttgagagaatcttctacataag T A G C TCA cagacccaacca aaggaccattgaatacca TAT at AT A TAGA ta C acagacatataaacacacaaatattcgtgtttttttca aactgtgagagaaaaagaaagagagaaagaga score :138 19815_at||#F7A19.31#At1g14210 putative ribonuclease;contains Ribonuclease T2 family histidine protein motif|COORDS: 4857937 4856898 tatcttttgtaacttctcttctcctttttagagtccaaaccaaattcatggggattctaaaatctaaaa T t CAA a TA agataaaccttcgtatattggtaaaattcc TAATGC atcatacgtttcataaaacttcagcccta atg G c T AATGC ataagtttaacagaatttggtcagtcacggtacaatagtttttta ATA at GTG ctaaagt attatt TAG t T t AG tatt ATAGA aa C aataatcggaatgtagtagaatcgtttaatctcctt T t G A G G GCA T T ATC ggatcccagttcaagccaatgttacca A ga GCATT g T T G c A T G GCA TT T A T ctgttacaaaa GTC c AT c C gattaagtttcacaaagagaattaatgacgcagcgtaat CAT t AAG g T TGCC T A TA ttttttta G a C A AGT G G a G a A C CA AG T t G agaataataaattaataaacaaattaagaaaatattctttatatattcatttt tggaaacaattgattgtgaggtacctttccctttt TACA ta TA gattgggaaataaattcaaggaaacaaa aaataatcaagaaacaca gagagcaaaacaca

PAGE 82

72 score :210 16150_at||#T1P17.70#At4g12480 pEARLI 1;|COORDS: 6371358 6370852 atttttttccgaggaagttcgagaattttgttagcatgtcttcct CT t CT g GG accagtttgtgataaaac acatcctctcggaaaagagtgtagagcacaacttcctctcgaatgtactaaga CCACT a G actaacgt ATA GA ag CT ct CAAGT aaaa T g G c TACG atccaaagagaatc TG a A g GTA tgt GCA a TG A GG t CAT gaa CC A T C AT G A T GG tggtgataataacagatta A ca GCATTGAC A atttgaaaataatagtaatatgaacgcacaaat catatttatttcttaaatagaaatgttttacaaaaacgattaatgtctaaattaattca AG G TT C t A CG aa tcta CAT a AAG gaaagtagaaaggtcagaatt TG T A t A TG T AG A T A A GGGCA aa TA aataa ataaacagat attttgtagaat TG CA A a TA T atgtgaataatcaaatataatagaa C AA G TT G G T C C T C t T c AC atccttc taaaaccctataag T a C c CACA ccctctcttcatatatcttcatctctcactctct C a AAG a CA ctgaata aatccttaaaacaaacttttgaaagaaaaaaa score :264 16610_s_at||#F14D16.20#At1g19050 response regulator 5, p utative;similar to response regulator 5 GI:3953599 from (Arabidopsis thaliana)|COORDS: 6579078 6577919 aggaaaatacaaaatacaatgttgaagttaaaatgaatatagaagtagattaat TGAGC a A taaccaccgg ttt TAG a TCA ttga T t GATA T G t AA GT A a AATGC aaag TAG ta G C C G A CTTGGC A A A G AA aaagaaaaaag aagagagag agaggtaaaactaaaataaaattactaaaagcagaaggaaagttc CA G G T C A C G A G GG agta gtcgctgtcggattgaaagaaacagaccaaacaaagatctttgaaatcttaaaaaagattcggtgcccaaa gattttgacctttgaagacacatttgattatttgtgttttctgacacttcccaaacacataa CAT ca AC A C T T G A C C catttttaagattttgtttcctcttcaaagatcttaccccgcaaga tccgatcttccaaatctct ctctatctacta TA C t A GTA c T tattaaaccccaccacctctcccttcttatttatgtcac CT a A c CTA a C T tcttcttctttcattttatcacattcatttcctttgtcttcaatccaaagctttcct C T t CT T GG aacca atctgctctcttttttattctgagtttgacaa score :192 18928_at||#F18O19.1#At2g43620 putative endochit inase;|COORDS: 18043568 18042497 acgcatgac TA t TTG g A aattgcaataattgagttggatttttctaattttggttgattttgatt TAT aa A TA GA aa C attttggcttcactagtcattt T t C t CACA attccatacaatttttgttaaaaatcaaag TAAG a C t T taaaagaacgttc T a AATGC ta T att A g T t GACC aaa AA a AATGC ta T attagctaacaatatcgtt TGA G C T A a ttaacaaaa ACTTG ga A ctattcaatagaaaaatctcaaacgtt TGA a CT AA t CTA aacttga ttatctcaatcaagtttttatgagaatgatttt C a T C C AA GT A AC T TG g C tctttaaaattttgattacat attcgtttttgat C T G A t CTA tga CC ga CATG gaattt CTC a T a AC gacaagagaaaaaactg T G T CA T T G AC ttttgttaagtggtacaaag TG GCATTGAC TT t G actcagaaaaa GCCA at CA ataatcgtgaaagatg tctaacactgatcaatatttcaattt GAA T a GAC C aaatttacac TAT aa ATACAT ca ACA caccttcttc atttc T t C a CACA acaccctc CAT ac ACA aaa score :282 13103_at||#T3A5.120#At3g50740 UTP glucose glucosyltransferase like protein;UTP glucose glucosyltransferase, Manihot esculenta, PIR:S41951|COORDS: 18733811 18732375 tatttatctaacatttttat CTC a T a AC t C G TA T C g A tg A TG g CCT G tttcgaatttgtttagcgatgtgt ttgtcagtttaataagtacaagaaa CT c A t CTA agaat CTTT AT G t ATA catgacatgacatactatatat gat GCATTAT GG CCT A T GA G C T A G TA a T tcacata TA T A C A T AT A C AAGT c G G aagcttacca ttattatc ttct GG TC g A C G C ATT c A acatttttatcattcacgcgacaacttatgt G GTC a A TTC ttttcta TAT aa A TA aata CAC ga TAT gga T g CAA t TA acttaga TA CA T t T A CA cttccatgtttatatttacttttcttgta aaacgattaaagctggtatatgatatcaaaaataaaacaaaaaaaatgaaagtgata TA T A GA T A A G a C t T taaaga TAC TT a CA aatgagttataaagtctttt G gt TCTAT atattcaactc ACT TGCAA c TA T gagaga gtaaaccgacacaataacaacaacaacaacaacaaaaaaaaaaaaaaaaaaggccaacgttcag TGA t CTA g GC t A t GAA gattacaaaaccacatgtggcca score :258

PAGE 83

73 16465_at|16916_s_at||#T22P11.80#At5g02490 dnaK type molecular chaperone hsc70.1 like;dnaK type mole cular chaperone hsc70.1, Arabidopsis thaliana, PIR:S46302|COORDS: 552563 550294 aagctacgga TA t TTG a A a CAT at ACA aaacagtgaaatcagtaggactgtgaattttcttatcctcaaag cctttgtagtgtaacagacaaatcaaataaatagcttcgcttatac CT a TGCC caatcagtaaaaagaaaa gcctcacctcggcccacagc CA a TG A C A gcct ttagt A t G t CTTA ctggcccattactctactacgaaatc ttacg TG t C A AGT A a T aaata A a T c GACC ccacatatgaacgt TATA ga GC agttttccaattttcaacac gcgcttttcagtttttggacaatatcactttttcccttttcaaaatttggacaatatcagttttttcctct ttttttccaggaac A TA t T TGCC T A G G atttggaaaag G aa TCTAT ca CT t A g CTA ccgtcagatcaacct caaa cataatccaacggt TAT aa ATA attacagaacattctggaaactcaagtctca GC ct TATA aaaaac aagacttcaacgctctttcttcacaaacctaaaa AC CTA G C T ctattctttctctttgctgctcttaaact tt TTC t T a GC tttttctatcttcgtgtgataa score :162 19363_at||#F14N22.12#At2g42610 unknown protein;|COORDS: 17696547 176 97080 acaaatccacaacgttttttcacatcct TAT at ATA tgtg TGTG t G t A aatattgcttcgtccaacaagaa gagtattgtttgtagttttttaatcaatagaaaatgttatcatacaacccacaa TG A G C T A a CT aacacat t T cc CAAGT ataaaatccctttagaaaactatatatgatcttgagctgaag T c CAA a TA gtctctgactct ctgaaatcgatgtaa GCA T T G A atcttctaaccatta taaaga CAT at ACA gtatccaattttgaaatcgt ggtggagaactcaaacttggtgatctttcaacaaaaactatgtaaatctttttagccaatgaatttgatta ataaatactgatacgtaat TA G TTAA G AG agagagaggaagaaagaaaacaaaattttgaaagggcccata ac ATG t CCT ctctaacgtcaacatcatctgctactat TAC TT t CA ccctctcatcaaactcactcaacctt tcatctttc tttcttattacataattcttcagatccataaaacaaattgatcccttaaccttcttcaccaa catctctcttacactcaacaagagatcatatc score :90 14779_at||#F23F1.7#At2g30010 hypothetical protein;|COORDS: 12754375 12757768 aaattggtaattt CA T T G A C CTA G C T T t GAC tgagtatcacca CA G T G AC GT G gctttttcacaatta taa aaaaaatgatcatattctaga CA TG g AG G TA aagtaaa GC a A a GAA tct GCATT t TT gtattttctctaag gaacaattacaaaatcttgataccttgtgagat CAT a AAG aaggtcgttaactcgttatctatta TAT tt A TA taccttgaattc T t G A TA C G tttgttttacctttttttc TAT tg ATA atacgtttgaagtttgttttac tgttacacattgattttatatgtgtttcccaaaaccgttt aagtttcagagtccggagcagtaactcgtaa ttcact A a TAC T g G TA ttaaatgttgttatttatgggtaaagcataaatacttaaa T t CAA t TA ttataag aaaataaattaaa AC cc GACC ccatatttttcgtccttttaaaagatggagtcctcgaaaacattacgtat tagtaagttccatgcaaaggacaaacatataataacaacaacttctcttctttactcgtttctctctctct cttttcttttct ttt CT t C a AGC gcttcttta score :150 20698_at||#T3G21.10#At2g40330 unknown protein;|COORDS: 16794247 16793720 aaaac GTC c AT t C taattat A g T AATGC ttgtg T g G t TACG acaacatgctcattcattgtaataaactgg tgaaa TATA ta GC ac CA TT GA C A aatagacagcctcagtaacaaa GC a A g GAA aaacaaaattaaaattta ata accggttatattaaatatc G a G a ACCA gtccgtagaatttcatgggaaaccgg G c CG GTCC ttgatat aaagaaagagagagacgtttcttaactgaagcagtacacaaactccaaacaaa TCC C A AG T G G acaaccga agaacccacaca A a CTA a CT ctctttcctaatccac A TA CT T G CA TT t T T AT at ATA aacactctgtcctt atatagttctgtatat TA C AT G TA aatatctctcattaatacaacc tcacgaagaaaaccatttgttttct tagagaga GC c A a GAA tattaaaagagatatagagaaaagatttgcttt AAT AA TGC C a A c G TC g AT a C ag tttcagagatcctccaccgccgcagaagcagccaacgccaccgtaagaaactatccccaccaccatcagaa ac AGGT t C a A aaagtgagcctcacgcgcggga score :216 15531_i_at|15532_r_at||#T22A6.170#At4g24340 putative protein;storage protein Populus deltoides, PIR2:S31580|COORDS: 11571982 11573646 atgtagagagattttagaaaaattgttaaatgtcaaatatttaattcgaagattttttttctaattattaa aaacaaatattaaaat CC A G TG GCA accattgtaaataagt CT c A a CTA cagaatttatttcacaaaatgg

PAGE 84

74 ctgcaaaaatg TAT at A TA gatgtttaaagttaatttaatagccaaattaatatcttctttc TAT cc ATA a aatcataccctaaggtttactgagtttattaggatgtatcaattaaagcaaaacatcccttctaattaatc attttgagtgactgattaataagg AGG a CAT gttaa TA ta TGTA gaa GCATT AA aa C A c G A TGG tct T A AG G C A T T AA G ctattaagaatttttaaaaagtttatgcatggtctca CA c TGAC gaataaaa tacgttatcat ctttgatcacgaataaaaaatattaatgaaacgagaatcatgtgtaaagaagaaagttcatactagtttcg gctcatgtgagccttgtagaggtt T G a CT T G G A t G CTA T A T A TATGTA gaca TG AG CTAGGCA TTT A T ctc aaaccaaccaaccaatatatca GA t GGCT tcc score :210 18396_at||#T31E10.2#At2g34640 unknown protein;|COORDS: 14532888 14530604 atcacatattgattt TA t T TG a A A GTA TAT ca ATA taataaactgtttagaagatgtt TTGCC a A aaaaga aaaagatgttttgtt C T t CT T GG A a G tgtttgttaaagcccaagagaga CT t C a AGC tcacaatttctcaa agatccatt A a TACT g G gcctt AG t TAG a T ata CCAAG c C catatacttt CTC t T c AC aaattcccgttct aataccggcgcgtcgtgaaatct agagattttgccgtcaccagaatatgttaataaacgcaggaggagata aatggataaacccaaaaac TG a CAAGT gaaaaccttttgaagccaaaacccccttcaaaatcaagctccat gatagaaaa CCCTC t T cctacataatctcctggtaaaacaaaaaggttagtctttttctcctcaacttcga gtgtgaagcgtttgt AA a AATGC ccaaaaaaa T c G t ACCT tgttatggacaatttgttgtatcagg ttctt cttatctgggttttatgttttgtactagtttgaaactgattctgagca TA a TTG a A tcggttgtttt TGT a a ATGATA ca GTG agaagtagaaagagtgtaaa score :132 15137_s_at||#F16B22.35#At2g44790 phytocyanin;identical to GB:U90428|COORDS: 18411773 18410723 tcaaatactctgttt TAA a G GCA TT AA aa ATA ac TG C gtttcagaaaaatattgaaattttagctgatctt ttgctacaaatttaaggaat CTTG G C A CCT gca G aa TCTAT aacatgtt CAT t A A GT AATGC aa T agtta T A CA A t TA tacat TA t TTG c A tcatacttatattatagtgatattaacaaacccatgttctcagcacacttt tacgtagaaaaacataaaaacccaaataggaagaa GCCA ct CA taaggataatgggtttatataattcaca GC a A a GAA A GCC a TC g A actattcgattaattatccattctttttttttttagtt TG a A t GTA taagaaca aagag T t G t TACG catca TG a CAA T G t CTT AG aaaacaaaagaaatgaataaaaaagtaaaacgaaaaata aaaagtgaggatgaagttgttgaatgagttggcgaggcggcgactttttcatacattccatttacttaatt cctaaagtcc T t C t CACA tctctttgttatataatgacaccataaccattt ctt CTC t T c AC aatctttac aagaatatctctcttctacagtaaacaaaaaa score :156 12777_i_at|12778_r_at||#F15I1.8#At1g54000 myrosinase associated protein, putative;similar to myrosinase associated protein GI:1769967 from (Brassica napus)|COORDS: 19429049 19427232 agtg ttac CAT tt A CA C cc TAT atttccgtgaaatttgtatgtttaattctacttct TGCC tc T AAATGC a cgatcttttcagcacttcgaagaggttaaagtgacttttacccgatgagaagggaat TG tg T GGC A ga TA g aatt GCTAGG gaatctctttcttttctgaattatgatcccaaattgtattc TATA tt GC cggattgggtta aagct TA tg TGTA ttcagactcttgtg TA a T TG T A agttgttgttga accaaaaaaaaaaaagcacgatct ttct TGG T A C t CGTA gtaacaatagatattttgttgagaattaaaaataaagtagaagagaaa C c A G T A A T GC gtcagacgtgattaagcgaactatataatataattgatcgataatcgtttgctagtgcgttaaa TA gg T GTA tagtaaaatgaacctcgcgcta CAC gt TAT TA t T TG T A tagtaaaa TGA a CTA ttatt T T C AT T G C C A A attgt A g T A CT T G CA A TA T A a ATA taacgaatc G g T C TA C GAG C t AAG g CA caaaagcaatcttcttctc ttttcacagttctgtttctccctctctctaaa score :216 15162_at||#F7O18.21#At3g04720 hevein like protein precursor (PR 4);identical to hevein like protein precursor GB:P43082 (Arabidopsis thaliana), similar to w ound induced protein (WIN2) precursor GB:P09762 (Solanum tuberosum); Pfam HMM hit: chitin_binding proteins|COORDS: 1286539 1285699 tcat TA tc TGTA t C gc GATGG ggaagatacgcaatgcttagttataaaccgtccgatcaga TG a CTT t G ac CA g TGAC cttccgttcaaagaatatcgacggttcttagacacatttttcacc ggagtctcaccctca C AGG t CA T cttaat T t C t CGTA aaagagcatattacattgatattccaaattataccacttcataatttattttt

PAGE 85

75 attt T c CAA a TA aaattaatttattaaaacttcagcactccaattctgtagagaaattttaagaggagaaa cccaattttcttttattttgtaacgaatctcttttaaaaaaatctaacaacagaattatatattttcatta ttctatcgaatccaacgaaatttc aata TACA ga TA T ct ATA T A CA A a TA tttaa CC ta CA TG TG t G t A tc atatggtttaag C AC GT T AC c T G TA AC gtatgtaattaaatcaattgcgtagaatcacaagaaaaacatta ttatataattaactattgatcctataattttattattttca TAT at ATA tagata TATAG A TA G A TG C A T T AGACC accaagaaaaca A a G a C T TA TC g A tca score :180 19424_at||#F3I6.22#At1 g24280 glucose 6 phosphate 1 dehydrogenase;strongly similar to GB:S71245, location of EST gb Z35060 and gb|T04591|COORDS: 8609492 8612380 ttgtaatggaaaaatatttagcaaatagattataaacttacatga G a CAAG T AT aa ATA attattataaac ttattaagtttaagatcaaggcttttgtgcaatgtatcaatg aatgttagatgtgatatgatgaaagcaat gttttaaacacatacata GTCA t TG atcggaatgtgtgttatt A ga AATGC a T GCC T A a G ccgatagggtt atctatgtttg G t CTTGG acattatagccaaatttcgaatctaattcttccaatatatatttttttttttt tgcttag GGC c AC TAC TA G TA t T g C t TATC a A ttttaagagctcat GA a AATGC aacaatatag TA g TTG c A aatccttgtttca agagaaatcaaaggg CC ACT T G t GA attgaataataata A TA t T T GCA A a TA acctt tcactaaaccataccaacaaaaccacacagat T t G G C A AA G A C A taacctttgggagacgtgaaaaggctc aaaatttgacaattgtccttacaaat T c GCTCA ttagtgcaattgtgagatttgtttgcatccaaatccaa ttcataactcacactcgtctcaaattcgaaaa score :168 16614_s_at|1 9456_i_at||#F3I6.19#At1g24260 putative floral homeotic protein, AGL9;strongly similar to GB:O22456, MADS box protein, Location of EST gb H37053|COORDS: 8595859 8593787 ggttttttgtaggatt T t CAA t TA ttaatctctataattcga TGA a CTA agtaaaaaagcatcaaact TT C TTG GC A gaatca catttttct CT a A a CTA aatatggactgaaattgaaaaattaaa CCA CTA G CT AG a ATA aa GTG ttggtgagagtggaactctaatttctctcctttactaat TAT gt ATA aacac AA a AATGC accaaa tttttaggtttgaaaatatctaagca T g GATA g G gtaattaacattttttctttcaattttgcaa TA t TTG a A taaatcct A t GAGGG tctt TGGT a C a C aa TA a TTG g A gggtata TAG t T g AG t c TG ag AGTA tattaga aagagaata TT t C A A G T A a T gaagctgacatgtt TA ta TGTA ctttgagagaagtgttgtgagatttgtac aaa TG T A t A TG TA cactttaaaaa GC aa TATAAG a TAG a T aaaaaaaatataaagaaaaaaagaaagaaag aaagaaagaaagagagaggctca TAT at ATA tagaattgctt GC a A g GAA agagagagagagagattgaga tatcttttgggagaggagaaagaaaaa gaaaa score :174 12523_at||#F10D13.18#At1g69530 expansin (At EXP1);identical to expansin (At EXP1) (Arabidopsis thaliana) GI:1041702|COORDS: 25353446 25354463 aatatattatcccaaggtaaagttgg TGT tt ATG tgtgt T t GA a GGC gcctgaaaaaa ATAGA aa C cacac gactcgtct TG t A t G T AG C T CA catg T G T C T T T G C C tctttgctttttca TGCC at TA aatattaaatctt cagcggtttt TGGT c C c C aacaataaaaatatattttcttactatattcta TAG t TCA ttaatattacatt gaataaacctaatgttttttaatatgaaaaaccagaaaaataaa A tt GCATT cggagtttt GGTC tt GT tt tgaaaagaaaagaaaaaagtgga AG C TA G T T GC A T T A AC A ttaaa CA g TG g CA aaaacgtaaa tcattcac tcactaagtttttctataaat TGA a CTA ctc CCCTC c T ctgctttttccaattctaaaccaaacaacagat tctcataatcatctcttcttttttc CTC t T t AC gaaaagaagaaagatcaaac CT T C C A A GT A a T catttt ctttctctctctcacacacacacattcactagttttagcttcacaaaatg TG A t CTA a CT tcatttaccta tatgcaggtttacacaaaaagaaaaaagaacg sc ore :186 13538_at||#F21C20.130#At4g20780 calcium binding protein like;calcium binding protein, Solanum tuberosum, gb:L02830|COORDS: 10098382 10097807 tcgattaaaaattcggtaactactacgac GT g AC T TG t C tcttctccgttgaaa A GC C A T C G A C A gttatg tatgagtgtatcatttat C a ACTTG aa A gcaatttcagtttttaaaataa T c CAA a TA ctaataaaccaga aattaacccatcatc A aa GCATT AAT aatcttcattaccatttgaaaccgaaacttgtatcacaattaaat aacgatcataaaattagacgtttttttttctctatctccatttt CC c GGG g G aaaaaagatcggatttttg

PAGE 86

76 ttctaa T G T CA t TG t CA aaatattattcgacagatcacaaaaaaggtgaccgtttttttttttttttttt t T t GGCAA aaaaaaaggtgacctttttattt TG a CA t TG at AGTA tgaattatggggtctgat GCT a G a AG t tctta CATG gc GGGCCA ga C AAA GTCAATGC cggcaacaacggactcc TACA ca TA aattttttaattcac caattgtttctcttctataatt A G CCC TC a T c AC tcaatttctcaatctcagattataattccacaaacga acaagaagaagaagaagaacacaagataacaa score :22 2 16522_at||#MGI19.6#At5g63850 amino acid transporter AAP4 (pir |S51169);|COORDS: 24845320 24847200 catgggattaatggatgctaaaagaataaaa CACT c G a G aagaagag T A C AC G TA aaatcactcacacaca aagaatgagagatgggagaagaatactacaaagtgtttgttttttttaaatcggctcatatataaatttgg gcgatact ctac T a C c CAC A t TAT tcttttaaacaatgatataaa TG T G t A TG G C CT A t G tagccgaactt tttacatatc TG a CCTGTAT at ATA taataattatttatcaaagaaattagagaaagattcttccactt T t G A t G G C T TA taaaagatgatgacccaaaacactc A a AATGC agcagcaactgctctctcaccacccatctc ctcatttctctttt GCATT tc T ttctttttttctgc A tt GCATT c TT ttga ggggtttaattttctgcata gctttgtctaatctcttagagctcaataagaga AGGT a C t A taactgatctct CCCTCTT t CAAGT ttttt ttttgtttggtttatagaaaactataaccgccattatccgttagttttaaccgttttttgaaactagtgaa tc CTC a T a AC tttggtttctcactaaaaacag score :144 15859_at||#T17D12.13#At2g28570 unknown protein ;|COORDS: 12190606 12190842 acacacacacacg TAT tc ATA agcttcttagaacac C t C c AGTG G tgtttctt A aa GCATT c TT gaatatt catgacaaaagtacaaggtattataaaa TAT ta ATA aaatata A TA tg TGC C cc TA gaaac CA TT GACCC A c C A T G CATT c T gaataaacaccgaaccttaccacatttc TGAGC a A tttttccacagaagaccaattccta atcatgtttaata attaattaatttggattttgg A g CTA t CT ttactct TT C T T T GC C T A TA aattaaact ccatctctcttcttatccttctacttttctttcctcttctcagcaaaaaaacatttttagtaacatttgtt ctaataagaaatttgtctcttgctttaacaagaagcaagactagcct TA tc TGTA cctttctttctgaatt aaagaaccctttaagattcatatgattttcttaattctttttta TA ta TGTA gatt tgtttcttaaa CAC t a TAT ttatgttttacaataattgttttatatgttttcaacttctttgtagatttgcttct T a C a CACA cta atgttttatctttgcagagaa ACTTG a CA ttt score :168 17963_at||#T1P17.60#At4g12470 pEARLI 1 like protein;Arabidopsis thaliana pEARLI 1 mRNA, PID:g871780|COORDS: 6366337 6365 852 taatttat GCATT AA agaatcttaattaaatt CAT aa ACA t TACA ta TA tattaatcatatatta TAT ac A TA tcacaaattttcgagtaagtttctaaatactttgggtgtgtctccatggccgggctcagcttgattata actacactaggaatttattat A t T A CT c G ACC tgacgt A TAG AA G C C G T C A A GT aaaaga GC t A c GAA cca agataaatctgaatatcttgtgcatccacgaact C A T G A T GG tgttgataata A ca GCATTGAC A ACTTG a CA ttaatacttat AT g AA TGC A C G TA T at ATA aagtattttcttaattttaaaatggtttacaaaaaacaa aa T c CAA a TA atttca AG G TT C t ACG T A C C T A G CT AC a T g CA tataggaaaaatggtcagaatt TAT at AT ATAT at ATATAT at A T ACA A a TA at T A CA A a TA T ct ATA aataaattttaaactaaat C AA G TT G G T C C t C ttcccatcctt ctaaaatcc TAT aa ATA ccaacatcttctctt C a TATC t A tttattcaataacccttaca acaccgaatataactttgaaaaaaaaaacaaa score :360 13587_at||#F21M11.2#At1g04040 unknown protein;Similar to acid phosphatase; Location of ESTs 110C2T7 gb T42036, and 110C2XP, gb|AI100245|COORDS: 10 43820 1042565 tcatgatgtatccatttcaaatcctaaaaattaatttccaacttcatttgt TA a TTG c A atgtgtatacta tttg TGCC ga TA gtcaaatacatga TA CA T c T A CA attc TAC a T a CA ttatgtaattataca C t T t G GCA A a TA T A tc GC cactaatcgaggaaaacaa T t GCTCA taccatcatactttttttatttttt T t GATA a G ttc atacaatcata CT T CT T GGCAAG T T G G CA gt TA attatctttaatttatccaaatcttttagtttcacccg catattaagttaaactttcataattttgcctcccataaaatgttggaa AG C TAG a T C A TA tc GTG ca TG a A t GTA tagtaaacaac TAT at ATA tttttaaaaatactataatgtataattacaaccaaattattattttat ttatgaaaaaaataatgaaacaccaaaaactttgattcagatttgtgtattaattatttttctttattgt t

PAGE 87

77 agggctgccagtttgactcttctacttataaaagc CTC t TTA C a CACA aaca CAT ac A C A CT T G t GA tcaa aatctcaagatccaaaactcttctttcaaaca score :204 16818_at||#F18E5.30#At4g21410 serine threonine kinase like protein;|COORDS: 10369523 10366961 atttctttgat G CAT T A ACA aa GCATT c A gatttgagg tctcattcagcgagtaagat C CA AG a C A T a AAG atcttccacaatccgcacctcataagtataaa T t CAA a TA cagc A g GAGGG atgtgaaaaaaaaaaggagt ct CT a GG A T G t CCT taagtagtttatttcccattaatggaagattagttttagttt C t AGT t GG ttaaagt ttcagactttgagttcttattggtcctggagagaaaagaagtgagaaatataaacttattttaaatgtttt C tt GTAAC tt ataacttcttat T t C a CACA c CGTT G AC C TG a TTC g T t GC accaactcctaaatttttata ttgaaatttttatatatcat CATG ca GG acccaacaagcttacaat G g C t GT CCACT T G t G tctgtctgtt actgaaatttataattgtttttgatcag C a ACTTG tgttctttcggctg TG A GCT A G t T taaagaacacac ggcttttttttaa GGTC a A a T taggtcgacatttaccatataaaga CTC a T t A C a CACT g G t G ggtttctc a TAG a T t AG agaaaaaaag CT t C a AGC tttca score :210 18298_at||#T25K17.10#At4g26200 1 aminocyclopropane 1 carboxylate synthase like protein;ACC synthase Malus domestica, U73816|COORDS: 12239804 12241443 taacaatattgtattagtgaaggattattaattac gagactattatctaa A ac GCATT A g T gacccttttg gatttaaatcg TAG ca GCC gactaaaccaaaatatttagaaagttttccaaaaacaaacaacattaaaaca catttaaactcgtacacaata ATAGA aa C T a A a CTA tttttagcagactaaagtctaaacaacattacgtt taaaaattgataatcagtgttttatagaactggcgatctcctaaagatgaacca CC g AG a AG tttcaagat aaaaaca aaaataaaaata GCCA aa CA ct C TT A T C T A ttcatccatttg CAG G T C A A TG A C A C cg TAT ttt tccacaatcgaatcttccttattctttacacaaacttcttctc TG t AT G TAAGTA aactaatattaaaatt ga ACTTG t CA aagacttt C ct CTT GG TC ac GT ct GCTA T AT A TA ctcctt CA T t CA CA CA aatgtttaact ccaacacacttctctttctctcaacctctctccctctctctctccgattg tgtttttagagct GT c AC GT G tcacgtgtc AGGT t C g A agctctaagaaaaaa score :174 16298_at||#T8O5.60#At4g21850 putative protein;CGI 131 protein, Homo sapiens, AF151889|COORDS: 10556637 10555097 catatataatttgaaaaacatgtttttgtttagtgtctaagt TAT a T ATA ct GC tatactcaaattcattc at tcaaaacgta TGA a CTA aagttttaatgt CAC at TAT attttctactagaatataatgtcaaaggaaag gaaaa TAAG g C t T tagtcaacaaaa GGC tt C TAC A T GT AA C G A G acaaatatgaaataagagaagagaaaa agcg TGCC cc TATAT tt ATA aaaatattacacaacctgacaaccatcaaag T c C a CA CA GG c CA ca G c CTT GG taaggttttttttttgtcacctttctataat ATA ca TGC gtca attagatcta G a TC t ACG tgtgagtt g TA a T TG T A ac CAT at ACA atcatttttttattattatggtttcttaggtgttaggaagattcatattgtc acatttcagacgaaatagaataaaatatgat T GT C t ATG C gtatttaaacccccctttgtagttcttctac tcactcacaaaattaattcttcttctcttttcttttcttaaaagagctcaaaac AC ac GACC atctttccg ctgtc TTC t T a GC tt C T T A T C T A ccgtaatca score :192 17485_s_at||#dl4170c#At4g16260 beta 1,3 glucanase class I precursor;|COORDS: 8165944 8164797 agtt CAT a AAG tcaaaaaatcagtaaccaattcatgagtggaattatacttgatt CA a TG A C A c TG tttag gag C TT A T C T A aattgcacaagtttt GT T A C AC GTA aaacacaaatatgccaaattaga catgtttctcgt cattctaaaa CAT ta ACA aatattta C AT A GA C g C agtcaaataaatcaaatttcgtatgataaaaaaaaa acaaaaaaaatagcaaggttgatt T GTCAATGC tggcaaaaataagca ATG t CCT aatt T ct C AA G T T G G a aagtgacatttctctacccattttttgaaaagatattctcttttgcatataagatttgataaattgaacgc A TA t T TG CA ctcaagctaacgttatgtactt g A t T A CT T G A C C cgacta A t G a CTTA t TG a CTT t G attaa taaacaaagatattagccactccattctacattatatacaatctatac G CATC TAT at ATA tctactaaaa aaat TG t AA G TA C a T a C ATATC a A tgttaatttgtatataagga GC t A a GAA caaacccaat T A GGC A AC T A gcaattgctaaaacacgtaagatctcaaata score :288

PAGE 88

78 12312_at|12313_g_at||#F22K18. 20#At4g24780 putative pectate lyase;pectate lyase, Musa acuminata, PATX:E209876|COORDS: 11736719 11735129 gttc TA ca TGCC caatcaagaaaaagtcaaacttttc T C A A TGC cgactttactgttctaaaatgtatgaa aatacacattaata T A CA A t TA T aa A TA C a T G C A A a TA T atattgttataaactatataatg GT T AC TT G A T A t G gatccttata CT t A a CTA a CT taacaccaa C t CCC t G G a TCTAT aatgtatatattgggctaaa C A G G T C A TGT ggaagt TA t TTG a A aagta CACC a TAT C g A cctaattagattggaggctttttatttagttttt tgtcaacaattagatttgagtaacggatgtaaccatattatttgttccta CC T CC T A GGCA TT ATT atagt ataaggtttttttttaactaaagatgatttaaaaaaaaaaaag T g G g ACCT aaaccttaaaaattccaata acaaatagtatagaggtcctttttctctttatataaacccccaaaatttcctcttctctgtattctccact caatacttatttcctctgtcttccttcgctt CCCTC t T ctctcaccacaatccgagagagagacagtactg taatccaagtgagagagagagagaaatgaaaa score :216 20239_g_at|20238_at||#MMM17.26#At3g13790 beta fructofuranosidase 1;identical to GB:S37212 from (Arabidopsis thaliana)|COORDS: 4535836 4533089 gagcaagtaaaacatattct T A CA A a TA ac T ta CA A GTT AC A T G TA taacaacactcttatctttatcatg taaacaaaaaaaaacgctat TA a T TG T A aatatttcatctgtactttgtaaatttgacatgacaaaaaatt tagatacttct aaaaatgtacaactttttagccgttagaaagaaaaggtgtacaacttttcaatacgaatt cttatattaacttttgttccttcagttt TA T AC A TATA GT C A A TG A C A t TG aaaaaccagcacaaactcct tcagttgatcacca ATAGA aa C aacaaaatacagtataatatttctaatttttagtgaaaatttacga ATA tt GTG gaaacaacttttatagatgtctgagtctgactta TG tc AGTA ttatttt ttcttcacttata GC ag TATA acattcattatttcac ATT AATGC caaaaaaaaattaaaaactttattatcaaatataatattcaaa tctcgatattgacaggttt G CCCTC g T c CTT TGCC T A TA aatttgaca T G AT CT A C GT t CA ttcaaacaaa aa C CA AG c C A caaagaaattaaa T a C a CACA a score :228 12879_at||#F12G12.27#At1g33960 AIG1;identical to AIG1 (exhibits RPS2 and avrRpt2 dependent induction early after infection with Pseudomonas) GB:U40856 (Arabidopsis thaliana) (Plant Cell 8 (2), 241 249 (1996))|COORDS: 12346670 12348299 ttgtttg TAC TT t CA tattctgattatccaaattttctgtttcttaatttaattcctctttattagt ttca ttcatttttctttttgtttcttagtatttctt C t AGTGG tcaccaatctaaaacttca T AT A AT GC ag T ct aaaaaaggatcct TA ta TGTA agactaattatgtatttatttattattatatagatttta TAT tt ATA tt T A t TTG a A gaaaaaaggactttcaattgttctatttaa ACTTG aa A a CTA G CT T g ATG atgaaccacttc TG CC T A G C atct GT t A a GAG ctaccagaaaaatcaagttcg tcattttcacaccttttttttttttttttaaa tttctcatcgatcaatttgtatggtttacgat TA cg T GCC C TC t A agctttctctgtatttcatgtgattt taattgacaacttacaagctttaagtttccagttttt CTT c ATG tgattatattttccttttctcaaacat CGTA a C c A T A C AC G TA tagctttttcaacgtttttatgt TGA t CTA ttagctaataattaacattt TACG t G a A aaatgaag ggtttaaggaaatttcagcaa score :168 16342_at||#T1F9.13#At1g61380 hypothetical protein;similar to putative serine threonine kinase GI:4585880 from (Arabidopsis thaliana)|COORDS: 21860819 21857695 aagtttgcaacttttgg CT a C a A GC A ca TAT ttttctgagtcgtc GTCA c TG tctctg tctcttcaaagtc ccacga TA g TTG a A gtttactttgaatc T g GCTCA tctgaattat GCA T T G A atgtgttcgatgaaa TGCC TA aacgagaatttaaattggttttatcattcttttgttttttgctggtcttctaaggttttggatcttcac tttg C a ACTTG ca A aacgtttgtcggtaatttccagttttgtgttgtccttctcg AA g AATGC aaaagctc aa CT T C T TGG C TT a G tagaaacgagaaagt tgacttat G a G t ACCACAC gt TAT tttctgtggacagtaag gttgggagattctttgtcaaggtcgtctattattttattatttttgtttctctctttgctggtcaataggg tattgttt A g T A CTT G T A A tattcaaagaagtagt G t G a ACCA agatctagatatttaattatctcaggat acgttaacataaaagtttttttttctcttggtc T C A A TGC t CAC tc TAT ccc CAT g AAG tt T t GCTCA tac tc aagttatcttgcagagaaatagaaaaaaga score :174

PAGE 89

79 19720_at||#T22J18.14#At1g22690 putative gibberellin regulated protein;contains similarity to gibberellin regulated protein 2 precursor (GAST1) homolog gb U11765 from A. thaliana|COORDS: 8027326 8027959 aagaaa cctatgaaaacaccaat A ca AATGC ga T attgttttca G t TC g ACG tttcatgtttgttagaaaa tttctaatgacgtttgtataaaatagacaattaaac GCCA aa CA ctacatctgtgttttcgaacaatattg cgtctgcgtttccttc A t CTA t CT ctct CA g TG t CA caatgt C T G A a CTA agagacagctgtaaactat CA TT AAG a C a T aaactaccaaagtatcaagctaatgtaaaaat TACT ct CA tt T c C A C GT AA C a A attg AG t T AG c T taagatattagtgaaactaggtttgaattttcttcttcttcttccatgcatcctccgaaaaaa G g G a ACCA atcaaaactgtttg C a TATC a A actccaacactttaca G ca AATGC aa T ctataatctgtgatttat ccaataaaaacctgtgattta TG tt TGGC tcca GC g A t GAA a GTC t ATGC atgtgatctctat CC A a C A TG agtaattgttcagaaaataaa aagtagctgaaat GTA T CT AT A taaagaat CAT cc A C AAGTA c T att T t C a CACA ctacttcaaaatcactactcaagaaat score :204 19450_at||#F17M19.3#At1g71880 sucrose transport protein SUC1;identical to GB:S38197 from (Arabidopsis thaliana)|COORDS: 26265752 26267518 ccaaacaaaataattgtt ctccgataatcagatgtgacttatgtgatatag TACATA T A a ATA tacacact gtttaaatttgtctacccaatcggaatc ATAGA aa C tttattgtttacccaaacaaaacggtactgaatat cggaactttttttattaaaaaaaaactgtgagagagaaattgaatcaa C g T CC A A G T C A C T T GATG caaga aaaaagcgaaaccaattaaa T t C c CGTA aaaacagaacacaaaagaacaggagagttaatg ttctaactga cacgtgtccctacctt GCCA ta CA ctcacacaattaaaatttctaactctgtctcttatccgaaaaataat catctccaagtgtaataagaaaatcaaaataaaactctcatttcttcttc TTC c T c GC c TAT aa ATA caac tccattctctcatctcctacatcacaaaacaaaaacctcacttaaaaaaaaaaaacagaagacagaaaaaa acaaaaaaaaaaagaaaaagaaaaataaacaaa ttttcttttcttttttttcctctaaa G tt TCTAT ttt G TC t AT t C gtgtttttttttttacttcctgata score :96 15985_at||#MHJ24.8#At5g64100 peroxidase ATP3a (emb CAA67340.1);|COORDS: 24945888 24944650 cattttagtttttacaaaaaaaaaatcacattttagtgtttcttgaatagatttgatcactttt CTT a ATG tatcgttc atttttt T c CAA t TA atcaactgcaattttggccgatttagctacatgtcgaaaaacactaat cgatttttcttccagagttttaagatgctttacgtttataccatt CAC ct TAT attagttttcattttgtt ttcgaccaaaactattgatccaaattattat A G C TA G G T A C GA tttgaacaaactttgattctcttaccaa t CAT ct ACA acttaaaacca AG a TAG c TT c GCTCA aaaagaaaaaaataat gacaacatgtataa TACA ta TA tgttc A a T G GAC C G a C TC A TCT A CA ctgttattcaaactaattattttataaatgattactaaatcagc ttattaaattcccataatttctgcgtcgtg TGCC g A a G ttg C T C G T t AC aattgtta T t C c CACA actttt t T TGCC T A T A t ATA caaaccctttaacatcaaactcaaaacacacaacaaacacaacttctacaagactca aata G tt TCTAT ttaattactaa aaagaaaaa score :156 14039_at||#F3P11.19#At2g19590 1 aminocyclopropane 1 carboxylate oxidase;|COORDS: 8425903 8424788 aacaccacg C a TATC c A aatttggtgacaccagaataacacatt T T A C A AG T A cgc CA a TGAC gcttagat ggg TAG t T t AG tgaagttttaaagaaaaaaaaaatcagtgaaaagaaaaacataatacaca attaga C A T G A T GG CT A GG tccatcgaatatgagtagggataagaaaaaaaaaaaaaaaaaaaaaaaactttccttgtgaa gaagaaagaaacttcacataaatgat GTTAC aa G gaaacacaatgggagtgacccaaaac CAT t AAG aaat tatgaaaaattcaaaa ACTTG tg A tcacatggtt G t AT g GAC tgaacttggacttttcttcaatcctattg ttcattttacatttccaaagcagattcttatac aattttttattcattaaacgtaatgatgcttcaaattc aatgagttaacagaatagtgttgaaagaaactattgtttttg C c T C C A AGT A GGC A A C T A T GC C TC c T a AC acacaagagaaacagagatttaattttcactta TGCCTAT A a ATA ctctttctag A tt GCATT c A acaaat ctgtg TGTG a G a A aaagtaaataaaaaagaga score :216

PAGE 90

80 APPENDIX F ACTIVATED ABA REGULA TED GENES 15611_s_at|17406_i_at||#K24M7.4#At5g52310 low temperature induced protein 78 (sp Q06738);|COORDS: 20534755 20537152 tatttcatctacttcttttatcttc TA c C a GTA gaggaataaacaatatttagctccttt GT aa ATAC aaa ttaattttcgttcttgacatcat tcaattttaattttacgtataaaataaaagatcatacctattagaacg attaaggagaaatacaattcgaatgagaaggatgtg CCGT t TG ttataataaacag C c ACACGAC G T A a A C GT aaaatg AC CAC A TG at GGG ccaatagacatg G a CCGAC tactaataatagtaagttacattttaggatg gaataaatatcata CCGAC a T cagtttgaaagaaaagggaaaaaaagaaaaaataaataaaagata tacta CCG AC ATG a GT tccaaaaagcaaaaaaaaagatcaa GC CG AC A C a G ACACG c GT AG agagcaaaatgactt tgacgtcacaccacgaaaacagacgcttc A T ACGTGT C C ctttatctctctcagtctctctataaacttag tga G a CCC t CC tctgttttactcacaaatatgcaaactagaaaacaatcatcaggaataaagggtttgatt acttctattggaaagaaaaaaatctttggaaa score :228 15695_s_at||#T27K22.8#At2g18050 histone H1;|COORDS: 7794644 7795219 taa ACA at TGT tttatgtatatattttttttttgatagggttaaggatttttttctatttttgtttttaaa tgtaataaaatttga A A C AC a T G T aaatatcgtatta GT aa ATAC cgaccaaaaaaaatattgtattagta aatttgacacatatcgcaatttttgtgagctaac aattttaaaaatcaaataagatgacgaacaaagctct ggtttaaactttctcccatcaattttttcattaaaccaaatttaaccaattatttggcctaataactgc G t c TACGT tattaagaataagaacttattttgtgtttcagtagaaaac ACAC tc GT tcacaaaatgcctagta agagtaaaggacgatcaccgcca CCA a GTG tgttt C t CGG a TAA a CAC a TG G aa T c C a GCCA ttacttaa A C GACAC GTGT A CG C tcatgatttattaat G C A C ACGT A at C gatcctctgacaaaaaccataacgaataca gaa A ac A CACG a A T acacttccctgcgctataaataagctagcacgaaaaaatttaacagatagagacaag acaagcaaagcaacactttcactaatcctcta score :348 18969_g_at|18968_at||#MUA2.12#At5g57550 endoxyloglucan transferase (gb A AD45127.1);|COORDS: 22600210 22598881 cgaaatcaaaccggaattttgcatgtaatttgattggtgt CGTTAC tttaaatctttaatccacaaaacaa aatttactcgatttta GTAT ta AC cgaaccaattatagtttattgaaatttaattttaattctatcaaatt gcatatgtattcttgagttattttttataaaaatactgaaaccaactaaaataatagagtt TGGC g G a A ct acc gtaccaaatttgattgtatttggagtatcatttttgcaaacctaattagcctgaagactgagatatcc ttgtccactcttatgaagaaccaatttaacaaggtgaaaaccagaatctctaaaccaaa C a TGGCA tc A ac tgaaccggatcagg CAG a C t TA aaccaaaacaaagaaca A G C AC ACGT a G C A tgaggcaaaatta A g CAC a TG cttgctttacttcaaaacaaaaaccagctgttcacagctaaaac tac ACA a G a GT ca CA a ACGG cgaac tatactacaaaaagactaagacttgcctcccttatataaaaccccccaacacataaggtcccaatgaatga tttcaattctctatcttattgacacataaaca score :102 19177_at||#MQJ16.4#At5g22500 male sterility 2 like protein (emb CAA68191.1);|COORDS: 7369180 7372555 gtttactga attctatagctcttaccttgcacgactatgtcccaaggagaggaagtaccttaactataatt ctgaacataattttgtctatcttggtgagtattatatgacctaaaccctttaataagaaaaagtataatac tggc GTAACG taataaattaacacaatcataagttgttgacaagcaaaaaaacatacataatttgtttaat gagatatattagttatagttcttatgtcaaagtacaattatgcctaccaaaa ttaattaatgatttcaaca ggaagtctgag ATG at G G G C CG ACGTGT A G T t ACGT t TC ttgaattgtgagagatggtatttattatactg aagaaaacattatttactaaataaattttcatttcacatcttctgtaatcaatgcgggtagatgaagaagt

PAGE 91

81 t GT ta ATAC gatggccaaccatatggatctcttttttggcgtttctatatatagtaacctcgactccaaa G GCA tt AC GTG a C tcaataaaatca agtcttttgtttccttttatccaaaaaaaaaaaaaagtcttgtgttt ctcttaggttggttgagaatcatttcatttca score :102 17929_s_at||#MHM17.19#At5g57050 protein phosphatase 2C ABI2 (PP2C) (sp O04719);|COORDS: 22381546 22383129 ctaattactttgttgttcactaaaacatctcacattgtgctattttttttaat aaataaagacctctctct ttaaaactgattatcccctaaactatttgtttgtgagaaagaaagagaaaaaaaagtttttttggttggaa attaattgagggagagagagagttcagaatctctaaa TGGC t G a A acgaattgcccagaatccaggaaact ctgattttagtccttt TT t CC t CG g G a A a A CGT t T G C tttttcttttttgtgtgtgtttggttctttaatt gacaaatctctctctatctgttt CA t GTG c T agctaaggttgtcttcactttcctctaatgagtcatgagt ttcttgtcttcatcttcatctaataaagttaacaacactctcatcaatttttattatgttcaaatctgtct gacttttcgttcttttctctta C a CCCA a G cgttttaacaatcta ACA tc TGT ctctttgaatacttagat atccaacttcaatttctctcctttctcttcccaactttgattcctgatttgggtttttgttaaagttc aag aaagttcttttttctttttttttcctccttta score :60 12994_s_at|13004_at||#T19K4.110#At4g35980 putative protein;physical impedance induced protein, Zea mays, gb:AF001635|COORDS: 15998351 15995305 agaaaaaattttcatcatcataattgattataccttttaatcattttttttgccttttatg ttttctaata gaaaaaaaaaccaaagttttcctccatggtttaggtaagatcc TAC t G t TA aatagaatttcatt T GGCA a a A C aaa GT ca ATAC agattaccattatcatagtcaattacacatcttaattacttgaataattcttgttta gtgtgacactaaaaaag G T GG C A a A A C aataa ACA aa TGT acaatatacactcaacaaaaagaaattttgg atataaacaaatttaatatttggataaa G C G c G G G A A aatctcctaaaatagttaaaaaaaaggttaaacc ggcccggttcatagtcaaaaagtatgctcctcttgaaacacgcaa C G C C CACGT C T t C ctatctccgagac tccgacagcaaggttacttaacagttaatcaaactagattcgctttcttccatcactaatcttctgacttt ttctat T t CTGCC ccaaatctttttcaaattcatataaataacca A A C AC t T G T tcttcttcattccccct tcgtc ttcctcctcctaaatcaccggcgataa score :144 14062_at||#F17A22.43#At2g47780 unknown protein;|COORDS: 19518609 19519397 gcgttggtctgaaatcaaactgtgagattcttaaaaggcgatcgttttacttttatgtcgccgaaacggaa caaatt CA a ACGG taaagccaactatactttgatcagttttgggttgggccgttgacactgttttg tgggc ttctcagtca GTGG t A a A cctaaaattaatataggaactcgaggattagggcccaatccatttt A A CACGT G a T G atatcatgtaaggtttggagggaatatacaaaaagtggatgaatgatagcgaacgagaggtccttgt gcttttttgatttaccctcga G A C t C G T G CCGAC ggagtggc CGTG GC t T t CGT gcaacagtctcgaccca tacct C GACACGTGT T G aaaacggtccattaagagtaa g ACA tc TGT caatagg ACA gc TGT ccta G a GGT C A C tc GTC tctcacaatccaatccctttcgcttaaattaaaactaaaatggag C a T GG CA G c A aca T GCCA CGTGG C tcaccagtttcaaatctgagccgtccggtgtgatataagtctttttaaagagagag A ga GCGGT t cgtttgtc TGC a ACG gcaccaagtctttgaca score :498 20042_at||#F28A23.230#At4g34010 p utative protein;|COORDS: 15262345 15262509 tgcagttctggagaaagtgattgagagaaggcaaaaaaggatgataaagaatagggaatcagctgcaaga T CCC gc GC tcgcaagcaagtgagtgtttgtttaaattttggagattaaagaaaccttaaaactgtgaccatg ttatttactttttcactttcttgcttga CAG g C t TA tacgatggaactggaagcagaaattgcgcaact ca aagaattgaatgaagagttgcagaagaa AC A a GT GT gtctcgcttcttccctatcacaattaagaatctcg agattttcatattttcttgaggtt GTAT tc AC tgaccaaatgtttcatgcaggttgaaatcatggaaaagc agaaaaatcaggtactgtcttgatttgaatatcctctatggttgttggctaggcttttaactctcactcat aatgaattacacttttggacagtattctaagcttttgagta gaata GTGT aa GC tataccatgaagtgaga catatcatcacatttttga T t T C CCAC tctgcataaagtatttaagatttgtgaatatgttgcaatgccaa tttggatatttcatgagactaatctgacgagc score :54

PAGE 92

82 18701_s_at||#F1N13.100#At5g15960 cold and ABA inducible protein kin1;|COORDS: 5132890 5133371 gaaaaaaacatg aaaaatacgggaggttcg GC aa ACAC aacatttaacttgcc A a ACGT a T C atctaac T t T C CCAC cttatacaaggaaccattttttcaataataaagtttttttttttttgtcttcgcaaataagagca cgaaatgtttgccaaacgcatatgcaacaaa C C CA CGT T A C ataattctgtt T a C A G C C ATA gagcaagct atattgttaaagacctaaaaaaaactttactataacatatagaggcttcgagata tttcgaaagactcaac ttatatataaataaactcaaaaag A aa AC ACG g A g GC gagaggatcatactctcacacagaaagagtcaca ttattatatcctctaaaaaaccaaactaaa AC GACACGTG aagtcttgatcagccgataaatagcta CCGA C a T aa GGCA aa AC tgatcgtaccatcaaatgtaat C CACGTG G T tttagat T A C t CGTGGCA cc A C actcc ctttagcctataaatataaaccattaa gcccacatctcttctcatcatcactaaccaaaacacacttcaaa aacgattttacaagaaataaatatctgaaaaa score :330 18594_at||#F22L4.3#At1g01470 hypothetical protein;contains similarity to 1 phosphatidylinositol 4 phosphate 5 kinase(AtPIP5K1) GI:3702691 from (Arabidopsis thaliana )|COORDS: 172826 172295 tgggcttatggcctgtggcccatttaagtttgaccttaatatccatcagaaacccataccaat A CG g GTC a T aagtgcattgtccatatagtgt ACCGC aa TG C a T GG GAG tactactacattcca CCG AC G T G cat AT G G g GGG a C atg GTCGG t C cattaa TCCC tt GC gcatcttattccttacgcgaatacttttccctcttttagttt tgtataatactttccat ttttttggtattaaatttatcgtgtgactatctaaaaaggtatttaattatcta gtttagctataactgcaaacaaaatatctattttcatgaatgatatatttttgaaataaaatattgttgtc at GTAACG aaaaaatcaaaaatcaagcag A a GACACGT A T aac CCACG ct T tcactctc A C t C G T G G a A AA cc C tc AC ACG T A ca C aaaccattataaaatttaatccatttacatc G a CCGAC t T cataa gaacatcgtaa tcccatccgtttatatatatattccaagttaaacttcatatcatacacacaaacctaaaacaccgaaacaa aaacaaagagatttaaacaagaagagttatta score :216 12319_at||#T10P11.10#At4g02630 putative serine threonine protein kinase;|COORDS: 1150683 1152161 attgcgatgagggcagagttgcgagca gggatcgtagctctcc CAG c C a TA tttgattcaactacacacga taaacaattccagaacaaaaatcaatccagcgaagaagacaattatcgaaattgggtaaaaaccctgatca attcctgatcttagattcgaagataaagcaaaagctcatagattttactcattacctctttttctctgttc ttctctccttcttcgatttgatttgatagatagatttgtgtaaatcacaaccaaagaccacgcgcaaaaa a aaaacttgttc A ga A CACG at T C aattgacaatatta C c CCTG a C tatacctttaataacgatactaccct cctttaataattgctcat TA a C a GTA aaa G TGG C A a AA gtgtaaatatatactaaaaataatgatgagtgc aagtcccatcaaagccagcttgcctcttcga G gg TACGT ttgtcttttcgacatctcttttcttgtctcta cttcagaaaaacctcctgtgatgattattatatgtgaagcag aacaaaagcaaaaagagagagtcagagga agaagaagaagaagaagagtcagtgagctgca score :54 14832_at||#F9D16.70#At4g23600 tyrosine transaminase like protein;tyrosine transaminase (EC 2.6.1.5) rat, EMBL:X02741|COORDS: 11275155 11277383 aataaaattta GTAT ta AC tattacgtcacatt aaaatatacttgttttaaaattaaatataaatttatgc A t G A CACG G A T A accttgtaaagataataaca ACG a GTA aatcatgaaatcgta CGTTAC tatttatattt taaaatg GGCA aa AC ataacaaaaaac C a TGGG t G gttttcctcctctc GTCGG t C gaagacagagacaaa aaatatttaaatacgtaagaatctttgcttcttcttgatgctgatgagtgatgagtaatcgacagagttat tttgt ttcttttgtataataaaggagattgcgactctacaagagagggctaatatagcatgcattaagatc aatcatacagtttacttaaacaagatttaagatcaccgaaggctcgagattcgagatacattattttcttt ttgtcttctatgttgatgttaa T a T C CCACGTG agatatcaaatgtttagaacaatgtgaaaaccat TGGC A aa A aaatgaaaagtggttgggaattttcactatatattcgactt CGT a GCA atgagagttagtcacccaa aaacaaaatctaatagagaaaaataagtagat score :150

PAGE 93

83 18935_at||#T26J12.4#At1g23200 putative pectinesterase;similar to GB:AAB57669, location of EST gb Z35063 and gb|Z35062|COORDS: 8227234 8229398 gaattatattaactgaaatagttaaatttttgctaattgtt atactatttcaaatcacattttctgt ACGT GG attgttcttttccttatgatatgttatctttcatttggagtctcagactttct GA C A t G T G G CA tt A C g ggaaaggtgtgtgtctctcaaacttctt AC A C A T GT GG caggccattagccaaatctcatttgagtatcta tactaatttgattattagcttgagttccttt CC t ACGT agttgttttaatttacataaagctctaaattat gttgtgttcaatg tttatgcaataagttcgaaccaatggccattgctaacttggattctataacatataat ttgttcttaatttttatttttttgttcttaattttcgttttcatatgtttatatcaaacataaatcagacg cttt G A CG TG GC t G C A T ACGTG T A TAC attttcatttatgagattgaaagagagtatccat ACA tt TGT ta cttatacatgcatttatatataactctctacttccctaagaaaacaa T a T C CCAC a tctctactcatccac attagtcaaagatctctattggtacttctcaa score :180 15476_at||#F2G1.17#At2g21560 unknown protein;|COORDS: 9179844 9179020 gttatttacaagaca ACA a G a GT tatggtaatgatttttgcttgttaaaatataagctttatattactaaa ctattatgattttgctttgatgatacaaatcatatatcttacatataaac GT aa ATAC aatcttaaaaata caacaagacaatggagtctatggagtcaaatgaaaatataaggacacaaaactatgatttttattagattt tcagttcaagtagattcataatttaagggtgaaagatgaattaatttttctgtttttgtcaaataagaaca taa TA a C t GTA atccaattgaaaaaagaaatggtt C G T GG a A AA agaaagcaaagcactcagtgactcttt ataaatactctcattgcaaagt tacattctttcaagctaacaaaaaacacacaaacaaagagagagagaaa aacagacacacactcttctctgtttttttttgctccacaaacttaaaatcaaattctcctttttctctaca acatctttgtttgatcttcaaagaagatttaagcttgaagttacaaagcttctctccttcagatcctactg ttttaagttaatctttttcccttcaacgccaa score :36 15776_at||#T14C9.150# At5g25610 dehydration induced protein RD22;|COORDS: 8768168 8765982 aaataactcgaaaatatctgaactaagttagtagttttaaaatataatcccggtttggaccg GGCA G T A T g t AC ttcaatacttgtgggttttgacgattttggatcggattg G G C G GGC C A gccagattgatctattacaa attt C A C CT G T C aacgctaactccgaacttaatcaaagatt ttgagctaaggaaaactaatcagtgatcac ccaaagaaaac AT t CGTG aataattgtttgctttc C a T GG CA G c A aaacaaataggacccaaataggaatg tcaaaaaaaaga A a G AC A CG aaacgaagtagtataacgtaacacacaaaaataaactagagatattaaa A A C AC a T G T C ca CAC a TGG atacaagagcatttaaggagcagaa G g CACGT agtggttagaaggtatgtgata taattaatcggcc caaatagattggtaagtag TA g CCGT ctatatcatccatactcatcataacttcaacc tcagctcctttctactaaaacccttttactataaattct A C G T ACACGT ac C acttcttctcctcaaattc atcaaacccatttctattccaactcccaaaaa score :180 15629_s_at||#F11A6.8#At1g17740 unknown protein;|COORDS: 6101158 6104980 aagtc ttgttctttcatgtttcttttcatat AC AC t T GT taatgaataaagttgaaaaa AT t CGTG aggaa aagtccgaatctatcttctctgcaaaaataaaattaagtaaaagaaacataaagttatggtttaaagatgc tttcggaaactccatcatttta GTGG G A g A atcacataaaagattggtcgtcaattatttttcttttatta ataaggatgcacatgatgatgaatat TA a C C GT A atatactaatatag atgacaatgtatttcagtgctaa attatagtaacaattaattgaatgcttaaatgaatcttttattatatactactttatatgtgttttttttg ttttaattgagacaatgatgatgcacctagacaagaacgtcctactctctgtg CA t GTG c T tctcatttta tttactcttttaccaaacaaactgttcaacttaatacataatgttggtagcccacaatgtgaaagattaaa ttaaaactcatccattacct cctgtaagctctatattaaccactttagctcttct GC ac ACAC tcacaaca accaatcatatttattattaaaaaaaagagtc score :60 16141_s_at||#F19C14.3#At1g58360 amino acid permease I;identical to amino acid permease I GI:22641 from (Arabidopsis thaliana)|COORDS: 20949311 20953001

PAGE 94

84 ttag aacactataaattagttttacaagttcttagaaatg TA t C t GTA aatttcaaaaaggaaaaatatag catttaattttgaagatttttttctacattatatatatgataaaaatattgtattttgtactttgtagtta caaaaagtcattatatcaacaaatctaaatataaaatatttttctatatattactccaaattaactgtcag aataaaaaagaagaataattattacagaatctgaacattaaaatc G t CCC t C C AT at G t GGTC t C tgtcta gtccaaaagcaatttacaca TCCC aa GC cgaaactatattaaataaacatttttttttctttaactaaaac atttataacatttaacaataaaagttaaaaat C G a ACACGT A T aacgtattttttt AC G T A T A C GT c T tgt tggcatatatgcttaaaaacttcattacatacatatacaagtatgtctatatatatgatattat GC aa ACA C aaatctgttgactataat tagacttcttcatttactctctctctgacttaaaacatttattttatcttct tcttgttctctctttctctttctctcatcact score :114 15625_at||#F22G10.9#At1g53580 glyoxalase II, putative;similar to GI:1644427 from (Arabidopsis thaliana)|COORDS: 19265928 19264226 agttttttaacaattatctcatg G A T t C GTG G t A T A CC G T TA C ttaataacaattataaactgtaaaatat aaatatttaataaaaataaaatttgcaagttttaatatatattacttttaaaaaataaatcgtcccgcgat at A C CG C GGT TA aaatctagtttgatatttatgaagttgtactcaaactcaaaggtcaaaaccaaatgcta taatttcatctttgtaatggacctacaaatagaatcataacgaagatcctgactgttgaccta A ac C GTGG agtccac CAG a C a TA tgacttgtt GTC GGC A ac AC tttggtttttagttaagattaagcccaatagcccaa tataagcccatta GTGT ag GC cctattaagtccatctacactcgaaaacttcatcatcatcttcaaggttt aagattggcttcatgaatctccaatcactgttctcagatgattgtttaatcttatctgataaaaatcattt gcttccataattgttact GA at CGTG cagaactgatcaa t CA C G g GTC CC A C C acattgtttgtttcaaga agctatta A C GT C G G C tgattccaaatggtga score :162 19987_at||#F6F22.16#At2g19810 putative CCCH type zinc finger protein;|COORDS: 8498968 8500047 aatatttttcacaataatttttacaaaattgacatcaatgaaataactaaataattttactactattttaa atagcaaaa taa ACGG t TA tgagtgattttatctaaaaatgttagattatgtgttcccttaatttctaatt aattaaaaatataaatttttagttttgtcgtttatcagcgtatctaaaagttttcttccaa T c T t CCAC tg ccttcggaactcaaagaaaattaaaatcaaatttttagtaatataacagccgctcact CACG g GG ctgtca cctaaaagtctatcacaaatcat C CACGT C aattaaatatccttgtggaaag gtctcaagccgctgtcctt aacctctgcatca CA C A C A CGT C A C tgtagcttctcactttcgattt TA ACC G CG G T tgataaaagcagaa atatttaaccatggttaggttttaacccaaaggtggtttaagta T a T C CCAC tactcagcatcagtttata aagaagtaaagctttgccggagattta GC aa ACAC aaacacaaaaaacaaaaccaactcagacgaatgtga ttttttcttcttgagtgaattgtt gtaaaaaa score :108 14025_s_at|18909_s_at|18908_i_at||#F3L12.2#At2g04160 hypothetical protein;|COORDS: 1406689 1400446 caggtcataaattatcagttt T t CT GCC A aatatagcatcaatagaaactccattat TGG ga CGC aat G t G AC C A C t TG agcatcttatcatttttatttcaaaaccaccttattacactgagcttccctt ttttcagcaac aaattaatgtcaaatcaattattaaaaatacaatataaattctgggagaataacaagtatttattagtcat tacacataaaaaaccgtattttggtgtttcgtagaaagtgatatcatactagtctataatgtgtttcacaa aaaaaaaaaaaaaacttatatcaaaatcaaaatttcatgttcttgaaaagatttcaatatatttttttttt ttgtcaatcgatattaatatagactcaaacga ccctaccccaaaagatattgtaatttcaatattttatta acataaaagat CG C A C ACG G tacaaccaaaatagttttcgaaaccaataaaaaaggaaataaaaaccgatt ctttttttttgcaatcccattataaaggagcaacgcacaagccacaagtaaaaagtgaaagtccaacaaag atacaaagaagaagagagaagaagaagaagaa score :60 14720_s_at||#F14D7.2#At1g35720 Ca2+ dependent membrane binding protein annexin;idenctical to GB:AAD34236 from (Arabidopsis thaliana)|COORDS: 13226651 13228286 aaaaaaaaaaactgttaaaagcatttttagataatggtcattgtgttactcctcacatatgaacattcaaa taaagttttggatactgtc TA t C t GTA tatcgccagaattagtaagagattt cttatgcatagtaggagat taaaaaaaaaaatgcatagtaggagatttgtaaagaaagaaataagtttttttttgtataatgagttctaa tgga CA t C A CGT t T G C A a GTG a T aataaaaaaatcttac AC t C t TGT actct G T A ACG T G T taaaataata

PAGE 95

85 caaaatggattataataacgaaatctaaacacttataatcttgcggatgattttgtgacaaccaatgaaac cgttaatgcgagattacgatagtt ttttatgaaatcatgatttgtgcataaagttaagcaaatgctaaata attcataaagtgtaaga A ACACGTGGCA G A A gaattaataattcgtttgggatatttttg GT ATAT AC gga ca G GC CACG TC G T GT C c T aaacctcttagcctttccctttataagtcaatctt GT GT CG GC ttcgact CCC aa CAT acacaaaacactaaaagtagaagaaaa score :294 15672_s_at||#F14M13.13 #At2g22470 unknown protein;|COORDS: 9487342 9486947 taacagttgaaaattttgatagacatactatatatgaatatgaacttaaataatgacccatttttcgtata atgttaattatt TAC t CGT aaacgcgtta T T t CC A CG aaacatta GGCA aa AC tcaagttaatt TACG CC T G G C A tt GTAA CG C G G TTA accaaaaagcaaattacgcagagtcaaatcatatctaaa aaccaatataaac A t A ACACGTGTC a ATAC ttaactgatctcagaattaacatcgttaagag A a A ACACGTGGCA GA gatctgtg TA t CCGT t TG gtgctccttcatgtagatgattcttcaagaaaacttcaaaaact C aa A CACG T C aagttta agaaagaaaaaagacaacaattattttaa ACCGC ca T tgaaaagctaagccatgttgtatttttgtatg TG G tt CGC atgattagtgtcacaccaataat taattattaactatttcccaac CA t C g CGT atatatagagct ctcttctctcattgttctacaccatcaacaaaaataaagaagagtttataacattaaacagagagagtttc aagattcagacaaagaagagataatctaaaaa score :426 19060_at||#F17O7.17#At1g70300 potassium transporter, putative;similar to potassium transporter GI:2654088 from (Arabidopsis thaliana)|COORDS: 25692154 25689404 ctttttctttctttctcaaaccattctctgtttctcaactcttcactttccagtggtaagatttgaatcat gggtttctcaaattcttaacttttcacaaaag CCC ac CAT ggaaatcgaatcaggaagttatcaaaacgcc aaggtaagtcatcatcagatttgatcacaaagttcttacctttacga aaactctcttctc TT t CC t CG taa tcatcgatatctgattttgttgattgcagaaggaatcatggagaaca GTAT ta AC attagcgtatcaaagc ttaggtgttgtatacggagacttgagtatttcacctctatatgtatacaaaagcacattcgctgaagacat tcatcactctgaatccaacgaagagatctttggtgttttgtctttcatcttttggacaatc AC t C t TGT cc ctcttcttaa A t ACGT t T t C atagttcttagagcagatgataatggtgaaggaggcacttttgctctttac tcgcttttatgt CG ACACG CA C G a G TC AA C t CGT T AC cgagttgtcaattggctgatgagcagcttattga gta TA a G a CTG attcaattggttcgtcttcga score :108 13176_at||#T16L1.30#At4g33540 putative protein;|COORDS: 15095371 15098167 atcgtctcc gccaaactcctcatcaatctcagacccatcgcctccactgctttct TC T t CCACGTG aaaca tcaatcaccgttggaaaacactgaagatctcgagattgtgattcagattcgtatctctgatccaaggaaac aggattggaattggtgtttttgagagattgagagatggaagagagagattgatctacatacactggagagg a C c TGG CA CGAAT gagaaagaa G C t T ACACGTGTC C aatcatgattggattc gagactcacggtttaagga aaaacaaaccagaccaaattaggcttaaccgctaaaaaaccgggttctcgttttgaaagattgagagagac gatctacaaaggaggac A g GAC c CG g CACG a AT gagaagaa G C t T ACACGTGTC C aatcaggattgaacga tttaatcaagct TA a C C GT ATG taaaccggattttagctgggtccacaagtagtcaaatatagatttttta atagtcaaataattttcatagggg cgaagttcaagatgagttactacactcatcaaagctcacaaaaagag aagagaagagacgaggatcaatcaccattctc score :486 19673_g_at|19672_at||#F1I21.18#At1g43160 AP2 domain containing protein, putative;similar to AP2 domain containing protein GI:2281637 from (Arabidopsis thaliana)| COORDS: 15592159 15592833 aaa G a GACC a C cgacgaatcattttgggttcacaaaattgtacttcgatttc TA a G c CTG aatgtgaacg C ACGT t T ttgaatattt C A ACACGTGT T tcatatttcattacatgcattataacataaatattacatctttg agtctttaactagttgaccaacaaaaaaaaaaactttaactaagtctagctagttttgttactacatatat aaaaacaaaaccgaa ataaatatttaaaatttataatatatttgtgtggctaaatcaatca A CGTGTC a T g aaggtctaattcaagttggtaaggaaatcttttgtttatgtcca TT T C CCACGTGTC actatttgtatg AC GG c TA gagaaagacatgttgaattaactagtgactccggattatataagcaagcatctactaaaaagatag gaacaacacaatttgattacactgagcaacacaaaactg GCG aa CC A ACGT G a C tcta acgaagaaaccgg

PAGE 96

86 caatggccagtatcactacaatgccgaagaaataacaagaatcataaacgagccagaatattatcccccgg gttacaacttg T c T a CC AC CGC aa T ttcaaac score :384 20390_s_at||#T23O15.3#At2g04350 putative acyl CoA synthetase;|COORDS: 1515081 1518173 ggttatgggt GTAACG ttttaatatttaaaatcgca aaattgga ACCGC tt T tttaaactgagtgaaatat agcgttttgaaaaa A g ACGT t T C tggtggcttctgaattgaaaaatcagcgtttttacttttcacaatatt caattggagaatctttagaacttcaacgaagtagcgtctccggaatttcag G t A a A CGTGGCA A g A aaaag c CA a ACGG agaagcat CCACGTGTC C C t C C AT tagcca T C T C CCACGTGTC C attaccgtaccgacgaaac attcctta accaaaggcttcccgaaaaaggacagagtccacaagcttcggattgttcacaacttcaaagca tcgaaaaggtttaaactttacgtttttctctggaacccgattcgaagaaatcatcaagaagtttcctttca ctgatttctcaatctattcgtaggtaacccaattccaaa A a ACGT t T C ttttttctcgatgccttagcttc tttggattctgttgtgatctaccgattttgaatttgcttgatcgatcatcg gatcactgtgattagtgtta ttgttcctttcctttgtgtagctttcgcgctt score :414 17962_at||#F7D19.21#At2g42790 putative citrate synthase;|COORDS: 17754533 17751674 CCGTt TG tattttccatcttttgattaatgataactataatttatgaagaacatccataaactcatgagtt ttattttgaacgcaacatccatatttagcttt acattcatttttttaataagattt CAG t C a TA gttcgaa aaatga ATG tt GGG ttggttccgatggtaaagcgagcc GTAT ca AC aaaaaaggcccaaataaaaa GTGG t A t A ctgttgggcctaaagatgaatatacagaagcccatattaagacaattcggcccattcagcattg CGT a GCA atcgaattgatatttagcaaat CG a GG a AA gcaattgactaactagataa TAC t G a TA agtggg TCCC AC at aattctgacagcttagagatggagaagatcatcaccgttagtccaagttgggtcaataaggcttacg ccgttggatatcaaga TA t CCGT tagatc G C a T ACACGTGG ATAC tgttcaata C t CCTG a C c GC ag GGGA t A a GACACGTGT T a T attaacgaagagtaagaatggtttctaataatttcttcgctccctttcttcttctt ctctccttctttgaattttgattttggttgaa score :348 19 884_at||#F2K11.25#At1g63370 flavin binding monooxygenase, putative;contains Pfam profile: PF00743 Flavin binding monooxygenase like|COORDS: 22716779 22714626 atctttccaaaggtt TCCC aa GC ttcttta ACCGC tt T tcgaata GTGT ct GC ttctgat G A a ACGT c T ag ct CA a G TG CTACG gcttggact ccgatagcaccaaagctgttgatttcagaacagagagagttgagacgat C a ACACGACG a G c TG ctgctacaatcttacaaccagctttgcatagatcaagacagatctctctccctata ccagagga A g C A C C T G T C acaagaaccactttatctttaagttcacaccatggctccaattgcttcaatac ctgcttctgtccacaaaaaaaacaaacaatcatcaagaaagtggaagggcaaaatgagagaaaag aaatct cacagtttgatgattgctcatattgattgatgtttcactgtctcggtggagaagaaatacaatg ACAC aa G T tatataataaaagcatttctctctatcaacca A a A CGTGGCA A tc TA c G t CTG taataaca G A T t CGTG T aa G tttctaacacaccaaattgattattaaaaacattgtaatcataaattgta AC A a GT GT gtgtgattgt gattcatagattctaatcaaagaacacacaaa scor e :186 13036_at||#F23E12.140#At4g35300 putative sugar transporter protein;sugar transporter, Arabidopsis thaliana, db_xref PID:g1495273|COORDS: 15763562 15760923 tcactaatcccataggcttctctctctgttcactgtccttccgcacaaaaaaaaaaagtattcatttttat attct CA CGG t TA cttc gaaatcta GC t T t CGT tttctttttttttgatttaaaccaaaagggtaaacaat gctttatgattgcttagtgacgccattgaagcagcggctcactgatagtggcttc T t CTGCC tccaattcc aagttgctaaccctagttttcatctctctctctgtatatcatattctctcgtactttcttactttctcttc ta C t TGG t AG agataaaaagtagtcaaattttattcaaagttttgattttgacatctgag gtttcacctct gttttttttttctagaactcctttgggtatgtgtcttttacgtcactaaagggtttgttatagtcttaggt catgctttttggtgttatctctctctgttttatcttttagctca C CA a GTG a T tgtgtctgattttcgaat ttt CA g GTGG attttgtttcactcactcatcgtcgttgatcagtgactctgtttttgcgaaattctcttca gatttcttgataaaagatagaagacggaatta score :48

PAGE 97

87 20471_at||#F2G19.32#At1g46768 AP2 domain containing protein RAP2.1;similar to AP2 domain containing protein RAP2.1 GI:2281627 from (Arabidopsis thaliana)|COORDS: 16537949 16537515 ttttaa CCC at CAT taacatctctacttgaatcaaatt TA a CCG a G aagtactggacca aaaccaa AC CAC t TG attaataggagagcatctaactcaagataacattttgtattagtttatgtggtcctgtgataccaaca ttcaagtctctgtcatacc A aa ACACG G GT A tttgtcttaataagcaacttaggaataatatatgaaacaa tcatcatctatgaaaataaaatatttatcaagagtttgaattattgaatcaaatatttacaaaggttgatc attttctaaaagttcttgatttgtatataaa ctgagttttcgtgattgtttttataatttttgtaagccat ctgatctcgatgaatcttcagtattcaatttca GTCGG t C tcctaaccatccatgacaaattcagtgataa tatagtaaatagataacaagt GA g ACGT acgaggttacttggctatttctacaaaaccaatattt A t GTCG G CC atagttctatattaaatattgtcaccatacttaataaacttccatt GCG cc CCA aaagtttgtctttt atc aatggaaagagaacaagaagagtctacga score :120 12443_at||#T4L20.60#At4g34480 putative protein (fragment);|COORDS: 15448478 15446450 acttatcatggatagaaaaagttgcaatcctgtttttggaatatgcagttatttatttaacctttttcttt ggttttgttttgtatatgat G t GACC a C agtcaacataataattgtgagttgt acattgtggaatc ACG a A a GC taaaattattggtggttgtgggtgaaactttgaagaatgtttaacttggactactgagaaatagcatg gaaacgaacaaatggt GTAT tt AC acactgaaatatgtgttttgcgatactaatcatcctatggaatacct catgcttcaaaagcaaactcaaaaaacaaatgctctcttttcgattctcctaaagatttcctctccagaaa tagcacttttcagtttccaaattag atgacgagaaaaatcaaaagtataaaattacaaaataaagaccgaa acgaagacttaagctcaa GAC tt GTG aacagaagt C A ACACGTGT A G TGTGG a A a A acaagcaca A G ACGT GTC t T tggagtttgtata TA a C CGT TAC agtaaaaaacctcttctttaagtaacaacagcaacaacaacac aattcctaaatctctctctctctcaccatcaa score :270 15088_at||#F19G10.19#At 1g23040 unknown protein;Location of EST gb AA395277 gb T44807|COORDS: 8165023 8165457 atacacattttttcaagataaaccatttagtacaacaaatatagtatgaaaattaca TAC a G t TA tattag tggaccactataatatacctctaaaacatgtataaattaatgtatagtttaagccttttgttgtggtggac aacg GTGGGA ttttaatcgctaa ttaatgattctacacaaaaaaaaataaaaaattaatggcgataaaaac caaaatagttc ATG ag GGG atccacctaacatgaaaaaatacgaccaattagtg G a CCGAC g ACG a GTA ca gtgctccttaactaaccttcttcctaatttatcatttacgttca G AC ta GT G TATAC ttttatttaagtat aatttcgtttaaaaaaatgactagaagcttacagaaaaattgttatatacacaatacactatttaa taagt ctatga CTC CC A t G C acttgacttac GT A ACG a G aacgcattcatcatcacctctcaattcatttt C ACGT C T C C tcttaattttcaactctttaagctcaccatatatacaaaactctgtttttgattttctctttgatca aaatcaaaaaacgccgatcacaagttcacaac score :114 14990_at||#F6P23.2#At2g16990 putative tetracycline transporte r protein;|COORDS: 7331746 7336335 ttgaagtgctttatttgtaactgatccttattttcatgcatgtgagtgatatcaagttgtactttgttgtt ttgaaatgcctagaaatgacaa A ag CGT GT G GG A atggacataatcgtttttacaaaatt GTGGGA atgga catgatccttatttatggtttgtacgcg ACGG a TA aatttttatttaatt ACGTA at C gtctggatttata tatctt taatgcagattgaaata T GGC A G t A C caaattttttgcatagtcttaacttcaaatcgatcgtag ttattacaaaatgtcacataattcgtaatgtgtaattaaaaggtactccctacacaagagatggtgactaa tgatcga GTCGG CC agaaaataccaaaaaagtcctcaaaagtctgatgtgat CTC CC A t G C ttcaaccgaa tttttggaataaagaatagattggaaaaata T c T g CCAC attgtagtat tttcaagtaagctttctttgtt tgttcaataatcaaaagcaaaaacaaaaacaaaactctcaacactttttgtgtttattttctgttattttg aatcg A a GTCGG C aagcataggctaaaaggtg score :114 15058_s_at|13242_at||#F20D10.100#At4g37980 cinnamyl alcohol dehydrogenase ELI3 1;|COORDS: 16817150 16818782

PAGE 98

88 tagac ttgccaaaggggatgttagatatcgatttgtcattgacatttctaatacattggctgctactcgat cttaattaaagtcgatgttctatatgtattcaaaataatctggatttcaa TCCCAC aaaacttaaggatat atatatatatatatatatagtctattttatataaatggagtatagtcaaataaatatgcattatcaacgat atatagtcttctattacata G A T ACGTGG GA G ttcac CC a AC GT a G A T AC GT tcggttgaaacaagtcaat ttcatcaatgcctcttccaaaaaaaaaacaaattgcattattgatg A a CAC a TG catcattatcaaatagg ttggttaaaatgaccaagatgactaaagccaatcacactactaccagatcgagtaaccattagggaccatt aatt CACGTG G a C gtagtgaatatggtccttgtgaattaat G ag TACGT aattgtcctcattcatatatgg atcggttccacaaacatttc ctgtataaaattctacatctttcctctcattattatctctacacttctcat ctctcaatcccattcgcttattatttatccat score :150 16524_at||#F15I1.19#At1g54100 aldehyde dehydrogenase homolog, putative;similar to aldehyde dehydrogenase homolog GI:913941 from (Brassica napus)|COORDS: 1947 1537 19468119 attgttggtaatcgtttagtggacgagattgaatcaaaggtt CA a GTG GT aatcgtttt C t CCTG a C gcaa aatcgaaagaaaaaagatcggtag CGT c GCA tcctaatcgggtgacccggaaaccaatagttgattcgttt tag TGGC g G t A aaacccggtttgatgaacaaatattaatgggcctggcccatacgaggatgat CG T GG C A A tgtcgatgataacaacaactcctctat tcgggtttatgttgacccggaaa ACG a A a GC at AG GACACGTG A C A C A T GT G a T gtgagtgaagccaaaaataataatattgggaaaggatgaacacagcagctca GC t T t CGT c ttctccgtcaatccaataaaaaaatcagcaaccgttgtttgtttttaagctttttttacaaaag A C G T ACA CGT C T C tctctctcactccctctttaagatcagaagctcatttcttcgatacgatcaaccattaggtgat t tttttctctgatcttcgagttctgataattgctcttttttctctggctttgttatcgataatttctctgga ttttctttctggggtgaatttttgcgcagaga score :306 12767_at||#F21P24.18#At2g23120 unknown protein;|COORDS: 9790649 9790900 ata T c T t CCAC aattaattataaatgccgctcctctgattttctcaccaattaaagaagaat cactaaatt aacatgttctagtctcaaaaaaaaaaaacatgtaatagtctcaaggatgttctaaatgaaaaggctcaata aaaccaaataatgaatcatcacaaaagctatgtcaacgaagacaaaatgcaagtgtggatatttttcagaa actgcgtaaaacaccgtgaaaatcatcttcttcttttttttttttttggtcaaactctctacagagtttat CA a GTG a T ttt TA CCGG TA cctgtttttccttgt ttggttcaactcaattattcgccttaaacccacttta agaaaagcccactgaaaagcctactttaaaattcaaccaaaaggcccataagtactaca G CCGAC a T cagc agacaacaa GC CACG T C gatctcgaggaaccgcgtaatta ACGTGT C G G a A tc ACACG a G ccac CGTGT ac T agatagctccataaaa GC ag ACAC atctcatgaagcagaacctaaaacctttaata GCA a ACG aaaaata aaaaag aagaaactaacaaaacaaaaaaaaaa score :138 17842_i_at|17843_s_at||#T20D16.15#At2g23220 putative cytochrome P450;|COORDS: 9833097 9835299 cgtagagatttgtaatctatttcattcattgtctaacatttactagaatattaacagattttctattagta ttatttatacatatctcacacaaataataaaaataaaaaagagatca catttaaattgataacatatgcaa ttaaggaatgcactttgttgtttttcaatcaatt T AC GG G TA aaactaactctgatactaaaaatctctgc ttcaataaaataataaaatatagtacatgttttttgttacgaaacaagtttcggattctgaatagtgatga tacaataaatttgtagttaactacaatgtatttatttaatcattatt TT a CC c CG cacattttgacctctc atttttttt GT gt TGCC ta ttttcctaacttagctgaattcttgaactcaaagatcttgaatagctctatg ctacttgtgctggccaatctgaatttattaataatgtttgatatat A t T ACACGTGT T taatt GC at G G G A CGTG aatatatcttta GT A ACG g G T G G G A t A aactccatgacaataccaaaatttgaactctcgc G C t TA C GT tataaactaagctataacttctaacccccc score :228 19442_at||#MVI11. 13#At3g19100 CDPK related kinase;identical to GB:2AAD38059 from (Arabidopsis thaliana)|COORDS: 6605677 6608976 tttataacctccctagttaattagtaataaaatatttaatttagtagtatcacaagatggagctttaaaag ttatctttaat TCCCAC ttttgaaaatgataacgtccataaatacatttca A a ACGT t T ttattctccc aa taagatatcaaaaagacaatatcaacaacaaaaaagacaagaaaaaaagacaaaaaataatgatagataaa atacaaaagcaaaa G a A a ACGT a T agtattgcttagagaaagaacaaacatattgtgaattaaaaagaaaa

PAGE 99

89 aaaaagagggacaaaagtagaa A G A CGTGGCA AAA agcaaactaaatttcgaccaatccacatctaaaatc taaccttcttcatcagatggaaggaagctacttcaccaccc gcaga T t C TG CCA t G TG G cttctctctctt ctacagctcacttcttctctctctgtctctctctaactacattgaaaccaaaccagagtgcttcaatgtca ttgctgtgattctctttaagtc CAG a C a TA actcccatatcttcttcttctgtctttttttatttctttga acggagaagaagaagaagaaggagaaggatat score :150 20420_at||#T16H5.170#At4g19810 putat ive chitinase;chitinase (EC 3.2.1.14) lysozyme (EC 3.2.1.17) PZ precursor, pathogenesis related, common tobacco, PIR2:S51591|COORDS: 9730250 9728648 acccatacttgaagatgtgattcttcaaaaactaaatagtatcaaaattgtttgaagaaaaatatattatc aaaattggtaaaaaacatattaaaaaaa T GG C A G t A C tctgtagacagagactctccacaaccattattaa c C C A A CACGT CA C atgcatggaa C t T GG CA G c A accatacactttcactaatgagtcat T AC C G CTAT tag tagccattggtttcttgttagacaagaaatcatataatatacgattaaaattgaatataaacaaaatcttt cgtaagcaattggatattctata G G A T ACGTG a G C tgaatattcaactttgatataaatgaagttgaatat gtg tctcttggtccttgctttcgcattcatattac ACAC t T GT GT gttgacaaaataaaatgcaaaatttt gccccaactaagactttttctcgtttagcaaaaaacaaaaactaagacttttctctttcgtaagatattat tcatagatttctcattttcaaagttgaattacgtaagaaagtaaatatactttttcaataaatagacaaca aaccatctctgtttttcacagcaactcgaaaa score :192 1 7014_s_at||#T17M13.16#At2g02990 ribonuclease, RNS1;identical to GB:U05206|COORDS: 872712 873665 tctatctttttttgtcatctgaaattattatcgctcaaacgaagtaattctgaggaaagttgtttacaaac tagttatttcattattgtctacttatataatagaattaaaaaaaattattgcttaatgcaatttagtttta gataaaatcatta aacttaatagattatataagttagatatcaataattgggcttgcttaaaaacataaat ataaaatattattgggc C G T T ACG TG c ATAC aaaacgaaccttctaacaa AC A a GT GT gaa CGTTAC gact tcaaaattaaaaaaaaacacaacaactatgt C c ACA CG T A at C tcatatgattcagattccaaggagaaca aaattaaaaacaaatctcgtaaacatacatacacttcacataaaacaaaaggtaca GT ATAT AC cataaat ctccgagattcttttgatgtatctgtccatttcattattacacaaactaggaaactgatatctctctattc acattcctctgattctatttctctttatatatattcacccattaaccatctcaatcttataaccctcaaaa tcacaatcttctcttacaaaaaactttgaaag score :84 15214_s_at|16505_at||#T3F17.4#At2g46270 G box bindin g bZIP transcription factor;identical to PIR:S20885|COORDS: 18949398 18951440 ctctttttcgctctggtttttttagagagagagaaagatgaaaatgcgtttaattgctgtttaggtttcga attcgcgatttaaatttctgggtttctctctgtttaagcttcttcttcttcatcttct G C t T A CGT t TC tt cttcaaggagctttcggattcttgtaggtattgc gtaacttaaatggttcaaatagagattgtttggtcga atagttttgtgatttggtttctctatcattcattgatcttctaagttttgtgaaaa T tt TGCCA tttttga t CGT t TGC ttttgcatttacttgaattgaatgaaaaattataagttttgaatttgaacaaaaacagctaaa gcgtagagaagaacagagtttctttttttgggtg CA t GTG a T tgtttcaggggtagaattgtccagtagtg tctact ttgttgattccttttgtcttcctgaaagttttctgaacataggatgtgagagagaaatactattt tacccttctctgtctctatatttctcacacaacaacacccatccccatgattttgcagaaagagtcattgt tctcttga GTGG G A a A ccttgaaaccattcct score :54 15110_s_at||#T23E18.12#At1g76180 dehydrin, putative;similar to dehydrin G I:975646 from (Arabidopsis thaliana)|COORDS: 27800307 27799663 tcttactctatttaaagtttgaatcagttttctttaacttgataaaccgatcattgaactacaaagtccac ataacaagtattaaaacaaacaaaaaaacttatgaaagtctaatatttaaattaaaaac A gt ACACG cc T t gaacaaaccatttagtttgttttcactacttttgaaaaaaaactatcta gttgccattcttatccattatt ttt T a T t CCAC taaattgatatca G at C ACG T A a G C aaactattaacaattaattaatgaattcttcttaa tccaatataactttatctaattaacagaaaaatgataatta ACGT G G G gacgaaggagaagatgccataaa aag CG g GTC a T tttgtaatttcatataattagatatatgtttttaaattgcaaaaaaaaagtgtacaaaga

PAGE 100

90 gcgtgaaatccgcggtggctt ggacccct C c CG a GTC c T caacactatataacacactttcttctctaatc tccatcacctcttattttcatctgcttttgaatttcaaaccttcacataaaaaaaatacttttga A T c CGT G T ttcattcatcgaatctttttccgataacta score :84 13950_at||#dl4810c#At4g17550 putative protein;|COORDS: 8744238 8742562 cagcaaactctcagt tactcggagaaa ATG ga GGG aagaca GTGG a A a A gagccttatctgcttttcgctt cctattttg G C ACACGT A T G C catcg CACGTG T gaattttctttatggg CCGT t TG ttattgtgggccttt agtatatcaatattaatataaatagatagtactatgatttataaaaatatcaataaaaga GTCGGC taatg aca CAG t C t TA ttatgattacataaacctataatttg G g CCC a C C AC C AC GT CGG t CG TGT g G G C cgtcgc ttcat GCG ta CCA aa G t GGTC a C C G TGGCA tg A C A ac TGT t G T t C TGCC ttttggctctaccgtaacaaat AG GACACGTGTC G c GTC t T cacagtttatgt CGTGT tc T atctctctctgtctctctctctc T c C c GCCA a tatctctctccctttggtctcttctctctaatcgctctctctcaatcaaatta AT t CGTG attctgattta tcaaacactaaaacaaatggatttgggtat ttcattttgatccgtaaaggttgattcttttttttttttgt tatcattgaaaattttgttttgttgtgagaga score :552 18560_at||#T3P18.7#At1g62510 similar to 14KD proline rich protein DC2.15 precursor (sp P14009); similar to ESTs emb|Z17709 and emb|Z47685;similar to hybrid proline ric h protein GB:CAA59472 GI:4454097 from (Catharanthus roseus)|COORDS: 22348500 22348051 aaaaaaaaagattcataattttattaaaaattatagctgaacttactggatgttttttcttaagaaaactg gacaatatatatt C CA a GTG t T atatgttttatcttatagaaaa G A gt C GTGG G A AA ctaaaggtgcaaaa cttcttatttacctttttaacaaaaa aaaaaacttcttatttaccataatatacaaaatggacaatatata ttccaaattttgcttttgtcaa T t T g CCAC cacagagttatct A g CAC t TG aatttaatcatataaattgt tggaagaatcttctcgaacccagtctagaatcagcactaacatgctctgtttacacaggacattcatatat caaataacctctttt TA t C a GTA atcaaaaataattataaaccaaaaattcagaatttcaccgatgttt tt cttaaataatcacatatatgaggtcataaacattctcggcatgtccttcaaactactataaaagcaacttc gaa CCC tc CAT ctttctccaacatcccaattcacacataccaaagaaaacagtactgtttgtttttgaaac ctctcctctataaccaaaagtgagattaacaa score :72

PAGE 101

91 APPENDIX G REPRESSED VP1 AND ABA DEPENDENT GENES 15618_s_at||#F23N19.17#At1g62800 aspartate aminotransferase;nearly identical to aspartate aminotransferase, cytoplasmic isozyme 2 SP:P46646 (Arabidopsis thaliana (Mouse ear cress))|COORDS: 22468837 22 465355 ttctcttaacgaagcagagaaggcgttggaaacgaagaagggtgaaactttttacagtcgtaagattattc ttgacgtggctgagatggctccagaccctttccgactcaagtaca ACCT tg TA aagaagctt TGTTA cc A a gactttcaacaagaaagaccggaaagaca GCA ga TAA aatccctgttctgcggtaagaagactatcatctc ttctgacgactgatagaagctcc T TT GG t GT ttttt agttgtgatatctattccagttacgaagaccttca atgcaagtat GCA ac TGT gctttgttacttcgttttttcttttcttcttgttaagtttttgtcttaattaa attaaaa T ATGC t T A A tttttagttatttacctaaataattaaaatatgcgaattgaactgatttggtaaa taaataattaaaatcacaatcaatacatagtatgattcatgtggacatg TGGT t GG aca GGT t GGT gaacc taat T ATG C t T t A atttctcaacactttgttcaataattggttttttttttttttaatgaat T TT GG G GT c AA attcatttcaactgattcaatcaattctca score :132 17119_s_at||#F5K7.19#At2g06050 12 oxophytodienoate 10,11 reductase;|COORDS: 2360966 2358235 attctg ACCT gt TA atggaagacaactatgactttatacattgaa ATG tt G TC a AC C TG t TA TA tattaa T GT TA AC A tagctaatgacctatc AA t GT a CG taaagcgaatgtcaatca TAT t C g TA tccgttaatttcat ttaggaaatcaggctcattgccaatccactaagaaacatctctgatagatccaaataataaaaataatata atatgaagcaa AATAG aa C gaaatatctcac GCGG a CC aaactttgaactttattataaaacaaat CA g CA CG tgaaagtaa T GGT c GG T tcgg tccaagatgccaagaactcctccggtccatccatcaacctaacctttt tt G g AT a TAA aatttagcaaattccatttccacgaccacacaacaacaacacatccactcgaattttctat ttccgaataacaaaaccggtaaactcatc AC aa AC T T A a GT tactttgtagtttgtgttttgtccttttat gtaatgaagaagatccaactaactacta CTC a ATA tttttgtattgaccgttgatttctcagatct gactt ttacttctccttcttccagatcggcggagaca score :102 18217_g_at|18216_at||#T22C5.18#At1g27730 salt tolerance zinc finger protein;identical to salt tolerance zinc finger protein GB:CAA64820 GI:1565227 from (Arabidopsis thaliana)|COORDS: 9649106 9648423 ag atgctttccaaatcaatattggttagttgttatagactttctttctttttttgtcaactggttagt TGT TA ta A gttataactatctatctatcttaaagtaaaccga AACGT g T accatacgagtaccatccatgacaa cgaccagaaaccacagaaactctagaatattttgtaaggcg GCAT ca GT actagtaaactaagtaatgaag gaaaagtct TATA g C c G acctcttcttcattttaccaacaccact tgcacacacacacacgtg TA ct AGGT caaaccaaacgtgccctaattattttcctctc T t T AC C CC c A A aaaaacatctagatactttcatcacaac ttt G G TCC GC AT attaaaaagagttttagaaaaatatagtaattatctaatcgccaccttcactcttcaca c T tt CC T AC CA cttgtcacgcaacttccttctccacactcc ACCT ac TA ctatatatttac AC ac ACTT tc attacttcttcttca GC A at TAA acaatttcttcactgcaattcacaagcaaccttcaaactaaaactcga gagacaagaaatcctcagaatctttaacttaa score :102 20676_g_at|20675_at||#F5H14.28#At2g20750 beta expansin;|COORDS: 8889733 8890978 tttgatcggtaacttctatcaatccgcac TA g GC T T A CAA ctctaatcattctcattttttattttt TTA c c TGC tttttctttacatttgatcttttactggtttgtgtcaccaatcgaatccataa AG a CA a C CG a TATA acaatgttaaacaa GAC gc CAT ggtcatatagta G t AT t TAA gaaagaaaaaaaaaaagagaaagaaagaa

PAGE 102

92 caagtgtaagagtaacggaagaatcttgtc CC a ACCA acaacattggctttaaaaataaaaacgatggaat cttataattgcataaacacacacacacacacaccctt T a C CCTA C gggatcgtgctaaatcaaggccccta actgtcagtaggtcccat C C CC C C CT cataaaatggcacactacgtaaataagaa G aa TGGCT ctaggaca ttatcgtcattttcttatctgatgcaaaagacttgacttttcctttttgcagacttaaccattcctcctcc ttccctctctattta AC ca ACTT aaccaacaattctttctcacttctccactttttttctcggttagtgtc agtttttttgttggct agtctcttcggtcaaa score :90 20442_i_at|20443_s_at||#F3O9.21#At1g16410 putative cytochrome P450;similar to gb AF069494 cytochrome P450 from Sinapis alba and is a member of the PF|00067 Cytochrome P450 family. EST gb|F14190 comes from this gene|COORDS: 56 08858 5611111 ccccttgtatttgggatctctcatgttccattaaccttatccagattccaatattgaaatatcgaatatat atagtcgtgtatattagtgat TA aa CATA tatcacaacaccactttacagtcaca TA T a C A TA tatatttg aattaagtgttc AAGT tt GT aatatgagttttgtggatcta GCA aa TAA aattgaaaggaatta TA T G g A T A aatttattttgttaattcaatatatt ttt TA T G g A TA attaattttaaaaattttactctcttttttttt acggctaaa A g ACGT T TA c AT t C aaaatacaaaac AA AC t GT TT ttgtgacaaaactaaatcccttgaaat agcagtatagacgtagtagacattaaaaaatc TAT a GAG cttttactatcgctactcatcaaactagatat aaaatatatataaacgtgattaccgtag T AC C t ACC caaaagcccaaa CCCTACGT ga GCAT cac ATGC a T g A tatcgattctataaataatccacacaacaaacactttccagatatcgctagtgttcat T tc TA AC A a A T GC ac AC ctcactac A c GCATA catacatcatg score :144 15483_s_at||#T3A4.14#At2g46650 putative cytochrome b5;|COORDS: 19100342 19100929 aaaagtttaaagtaaacaaaagaaaa T t GAG g CT tgacttttctttctttttt tgacttgttttctttttt cctttttcttataagagaaaactaaaacgtgaaagagcttc ACC a ACC tgaggtcgctgaccac TTG g A g G C attctca GCT ac CAA tccattaacttacaa A C T C C A T A ttcttcacgttttacta T GGT a G G T A cctttt tttaaccttgttcg T aa TAACA attttatacgaaaagaatagtattaaacttcaacaaaaacaac TA T G t A TA ttaactggt C a G a TATA cattgc tttgttcccttacatcttcaaagagctaagattaaacta TTT AC C C C A AA taagactgtaatttcatcaaaatctagtaataataatattcaaca GC TT a CAA ta TA aa CATA acgc gtgtggt CTC c ATA agct ACGCAC tgtcttttcatc GGT tc GTA gaaactagaaactcttcccctcacgat cttgctttctttattttgcttcaaccctagtttatttga AG t CTC t A TA taaagttctctgtttcaag tta ctttct CA g CACG agcaaccaaaagtaatcat score :174 14856_at||#T31E10.17#At2g34490 putative cytochrome P450;|COORDS: 14485916 14484417 caaacagattatatatataacaaaaaaaatctctctactttttttttaataaaaaaacaga G t AT a TAA aa actact GAC ta CAT ttgttcacaattactagttaatataccataatatt gaaaa TA T a GAG a CT agtagct aatgactgtttgatgtcattg T cg TAA CA t CACG ttag T t GGC G T A C a TT catgcttctttggattccaat cttcctaaatctttttcttatatagtagccttgtttgttttttttacttcgctacacttgtgtatgtatta cttcttg T a A a GCAT A ttttcattattaataatatttacatgtttagaaaaaaaaaacatc AAGT tg G T T A C A CC A A CC A tatatattttc A C g TA t GT atcagacaaattactaattacagtgga AAC t G T a TTAACA atc atagcttttatatatatagcctttatgtaatttttggttttca TGTTA aa A ta AGCCA gg C cctaagagag tgtcatacccttatatattgtttactgcatattcacaaaactcaaatacagagcagaaacctttaagagaa gatctaataagaaagaagaggaaaaaaaagga score :120 14393_at||#T4B21.4# At4g04840 putative protein;similar to transcriptional regulator|COORDS: 2448777 2450306 tttgc TTA a AT a C acattttataag T ta TAACA aataataaattctataaaca AATAG aa C cgattaattt aaata TA at AGGT aaaacacatgttaaatcgtatttttatattataaatttttttt TA T t C A TA aaacttt agacaaaaattgcaaaatcca taaatatcactaagaaaatggcaaaatcc G ta ATTGA ttccaaaaatatg AC aa ACTT ttgaatcttttaaccaaaaaaagaaaacagaggttt T ta TAACA agttcagtttacaattttc AACGT g T ggtttctctattagaaccaaaagaataatatatatccaaacgtgcac AACGT g T caatttttt T GTTA ca A aatgtgaagaacaaaaaaaagtgtgaaaagctggttatgagttatggatctgcgcct ataaata

PAGE 103

93 tacaaacatttctccatcc A C T C C A T A GCCA aa C ca TAC t C t CC aaagtccaa GCT ca CAA tgaacacttc gtaagtctctctccctctctcaaaatgttctttttgtctgaaatgtaaacacaattttttgttgctctcgt tgctcagcccaaaaatggaaatggaaatgaaa score :102 15238_at||#T4I9.6#At4g03060 putative oxidoreductase;similar to P. vulgaris gibberellin 20 oxidase, GenBank accession number U70530|COORDS: 1352854 1350980 GCAac TAA aggttttgggccacgaattacttacaataaaaaacttttgtaaactcaagttccaaatgttta gtaatataaattcacattttatctcaaaaaaaaagagaaaaagagaaaactttttatcaaaagagaaaagt ta AC t TA a GT tataac tcgttgagttgttcttcatcttgtacaacaata ATGC cc AC acatcgtatggatt gatattttctatgatgaaaataaacgattc AC ca ACTT tccttttttt G t AT a TAA aatgtgaacttttaa tagtatat AC ca ACTT tccttcctctgtctattatta TT A a GCA T g T GT ttgcttaaattaagcaaagcga caaaaaaaaaacttaatacaatcacttgtgaataattt CTC t ATA aaatggggaccctt cac TAT t C t TA c tcacacagaagagaaaaatctctaga GCT ag CAA agtaaaaacaattaa TATA a C a G aaagtccaaaggta attttct TATGC g T ttcgaatgtttttttttcttattaattattgctactatc T gc TA ACA tc TGCAA ACG T TT ttcga TT AC a CC AAA aagggaaagaagaa score :138 12727_f_at|17610_at||#MBK20.16#At5g07700 transcription factor (gb AAD53097.1);|COORDS: 2450324 2451577 atattatcatataacattaattaataattattgtgttgtagataaatg T c A T GC A T A A tgcagtaataaat gt GT gt GCAT atattatatat A c ACGTT aa TTA g A T G C taaaatgagtgacatatcttttaattctt T ga T A AC ACCA tttccataaatcattgtaaaactt ACC T ta TA ACA aaaaattaataaatgtta T a G GGGT ca AT tg ACCCC c A taactctacactag CCCC a C c T ctgcgga TA a GC t TA acatgtctattaatattcattagtt tacgtggtttaaaagtttattgtcacgagtgcatgacacttac CGTG a TG ttgactatatgaagaggtaga tcgtacgtgtacaaat GAC tt CAT agatctttgatcttttttttcttcttcttcttttttggtaatattct ttagttttattt G at CTATT gtcgttgtaatgatc tttgattacaaaggaaaaaaaaaactaagacccttg acgaaaataat AAC c G T A T t C g TA atctctga T at CCTAC attat G ta CTATT tctgatttttgtttctt A T A C G C A C T T T tgttctagatataactaagaaa score :174 16005_at||#dl4705w#At4g17340 membrane channel like protein;|COORDS: 8663817 8664749 tcaccactactttgg atcacttttctaagatcttaagttttctgtttgttgcaattttattacatgtatat gccaatatctct TAC a CA CC a AC t CC AA A caaagcagcggtggtacatgaatta AC t CC t AA tccctatta atatt TT a GG c GT ttagcacctcccaaactaaaaaatacaaattgtaaccaaaaaaaatatatatagatat aaaaga ATACG ac C atctatcataat TGG t GT g AGAC aa CAT tagt TTA a A T G C agga T at TAACA tgaca acagaacactcttatcttcacatattatgaaataccttatccatcattttcatagcattataccaatt GG t TG t CT ttaatcatcttcgatgtcactaaccctcaaaattcaaaagaaaatgaatagtctaaataattggcg gtttgggcg GC a T c CAA ttgtgacaggccacgtagagtgggaccaacaacttcattacctc G tt TGGCT ct ttgctatatataaag T c A a GCAT ta GT atg cccaaaaccacacaaatagaaataacttgtaatcaaccaaa gtcgaaaaacagagttattttgtcggatcaaa score :102 15965_at||#MDC8.9#At3g16460 putative lectin;contains Pfam profile: PF01419 jacalin like lectin domain; similar to jasmonate inducible protein GB:Y11483 (Brassica napus ), myrosinase binding protein GB:BAA84545 (Arabidopsis thaliana)|COORDS: 5593028 5595521 tgctgactatatgagcaatta TT A CATA C TT ttatttatttgtacaacaattat TA c A C AT A CTT gtgtgg accaacatgattaattttatattggccatat G G TG C GT A gtaaa TGTTA ta A taacttgaaattaaataat aactaagctcga CTC g ATA tata gat CC a ACCA gtagcctctcttattc AC a CC t AA tcttcatcttcatc ttcgcattcat AG t CTC t A cgatcaggtaatccccctctctctatctatctttcatatatgtgtg TATG tg TA aactatctatattctgaa AATAG at C aatcaattgatcttttcctatctcaattgttttcacaaccatc agtttgacttttgatcgttta AG g CTC g A gagaattatcattcactgtagtaaagatagtttatac caaca aaccca T TT GG t GT tgacca GC TT t CA A C a TA a GT atgagttagagctagaaccggattagtattaatgtt

PAGE 104

94 acttgtacctgttcatag TAC ta ACC aaaaatgatccaaaaaaatgaaaataacaaataaaccatt TATG g t TA tcacagatagataaaagaagtcaacaacg score :132 16229_at||#F16J13.100#At4g12030 putative transport protein;Na (+) dependent transporter (Sbf family) Aquifex aeolicus, PIR2:E70482|COORDS: 6175459 6177088 aatttcacttaatt AACG T A T c C t TA caaaaagtta TA ct CATA tggaaaaatgaaaaggaatttgaattg aatcaaactgaaacttcatcgaactaaattattatgtttcattgtaaaccgtaaagaggaaaaga T GT TA A C A aagaaaacaaaacaa aaacaaaaagaagaagtaaaagataa AGCCA ta C ctaaaaattccttttttttt taagataagaatttttatatagatgccaat T aa AC C CC A A A C t GT TT ctt AA AC t GT TT ctccctttctct aaattatatcaaatagtttcttcatttgtttgctcaaaatgataattgtccaatcgaaacc CG a AC t TT tg agaattttttatttttattttttcctttca AATAG tg C atttgagacgata GGT t GGT ga cttggtgtgtc attt ACA c GTG C GT AT cgtggaaatgtatcaacaatt TAC a C g CC tcatgaaacgtcccttggtta GGT t G GT A tagtactttgagctaaacaaaaatataaatcgaacaaaaaacgctgcgtttttgtttccacatctctt tctctttctctctta AG c CTC a A gaaagaaaa score :150 19315_at||#T3F17.42#At2g46310 putative AP2 domain trans cription factor;|COORDS: 18960153 18961037 aaaattcataattatcatagtgggtcaaaaggagtgagtgacagtactgtgaccagtctcacgtgagagaa GG c CCGC ccatttcttctcattagcgcgcatgaccttcacgaacacatataggcgcgtgtcatggtacaac aatagtcaaagttacttgtaaaccaagc TAT a GAG tgaaccaaaaccgaagctcttgaataccgaaata at ccacgtgtcaaaaatcggagaagcggctagtcgtttgaagcgcccactaatctctctc TTA at TGC cttac ttaaattttcattaacagaaaacataattatgatacggctaaatctcggaaattccataacgataatttcg aaaatcaagaaggaaagtaagaaaaagccaaaaacacaaaaggaaataataatatttttgaaaaatgggaa aaaggaataa A t G a GGGG G g G gcgtgactatatttaacaaa gggat TGTTA ct A agaaacggcgttttctt cttccttcatagatagcctttttgactcttctttctctcttctactttttttcaggctctct CTC t ATA tc tctatcttcttctccggttaactaaaagagaa score :42 15406_at||#F7D19.24#At2g42760 unknown protein;|COORDS: 17745727 17744924 agaga GGT t GGT cacttt TA g G t ATA ggatgga aatgtcttaccacatgttaaatatgaaaagaattgcta aataattataacgagttgcttctaaatattagtatttgtttatttatgatggctccactct T tt CC TAC g C t CC ctttgagtccacac G C A T c TAA agatatt CG a AC c TT ggaagagccaaaatgaacc TT GG a GT AA tat gtatttttatctacgttgtaaatatataaatgaataa TA a G a ATA aattttgaatacatcatatacttta C a G t G T A C GCAT catataaaaatttcttatttatgaacaatttgaattg TATA a C g G aaagaacaagtaatt aatcaacaattctaatggtgtattatgacttttatgagtctgacatttgaatttgatt GCA ct TGT tactt gtgtgttaatgtgtgttgtgcagcaaaagggtccacacgtaattagtccaaagataagggtattaaagtaa tttatatatgtagctaat TA ca AGGTTTA a AT a C ctaacccaattcct tgttcatcatc T tc TAACA a AC a CC AA A aaaacagagaaa AC aa ACTT taaacaa score :126 19901_at||#T21L14.18#At2g32880 unknown protein;highly similar to T21L14.19; similar to GP 2191153|AF007269 and GP|2252859|AF013294|COORDS: 13899047 13897709 tttgaaaccctttttttgtcaacaat attatattaatcttcgaatgcgattacaaccgaaagaaa C c G g GT AC tcgatgccgaaaagtcccggaatgatacataaatgagatagattaaat A a AC g GTT acaaccgtagata aga TTA a AT a C aaaacatgaatttgaacttaacataggaaac G aa CTATT ttaacttgttgtagttgttaa cccgaaaagagtttgga TAT t C t TA catttaaaactttttggttaccaaaaaa G t AT t TAA tacaaagt aa tttttatgttgaatgatgtgcaaatgaatatatcctcaaggttatcaatttggaccatacggattcctgtc aagaccaaacatgt G T A GGG GT ga A taaattagccgaaaaaccggctctacattaaattaaacttgttttt tctttttttctttctaaga GCA ac TAA gatata TATA a C c G gttcaatttataccctaattgaaacccctt tgtcaaggaacctgtctatatatacag AC A T ATGC t TA C ta ctttcttc T c AC t CCA atacaactaaaaaa atctactacggtgtattatacgacgaccatca score :120

PAGE 105

95 19952_at||#F12F1.11#At1g12020 unknown protein;|COORDS: 4063305 4062625 acaaa T g A a GCAT tttt AAGTT t G T AACA aaaaaacttattgcaaccaatgtgtatatccaaaaaaaacaa tcactcatccaaagccgaaggatttatgtgaaat aaaacaca AAC a GT c T tacaaacacacaaataattaa ttttatttagtatttttggtctgttttctttaaatgta T tt CCTAC attgaattttgtagtggaattcaat gtattaagtttagtt TA T t C A TA tataatcaa TGTTA ta A aatttttgataagagttaatcggaacattgt ttagtcaataaagatagcttaatttcatatata AG t CA g CC gtctaagctgattaattggtcatacaaaag aaatt T tt TAACA tttt AC tg ACTT ttttttaaccaatttgatgaaaaccaaagtggtcgaatgatttttg cttaaacctagttttgccattaaacaaaggtttaggcggtggatttttaagacgagagacaataactaaaa aacaataccaaaacaaacaaa ACCT aa TA aaactttcg G t AT t TAAAG t CTC g A TA acatcttcttccaca agtctctctctctctctcttccatagaaaata score :90 19718 _at||#YUP8H12.8#At1g05300 putative zinc transporter;Similar to Arabidopsis Fe(II) transport protein (gb U27590)|COORDS: 1547710 1545259 tcttcgcgaatttgattacacagatcacatgggccatttgatcatcatgaccaaaatgattgcgaataatc gattattatttgtttatgttcgaatgaggatagacaaaaaattc tgctcagtttttgacaacctcgagtaa gttattttagaatcattttcttgaatcacatctattacaaattatttatgaaatatacaactaacgttcat tcgtttttatttaaaatttaaatgttttaatttatttcgagggacaaaaacaa ATACG gg C ctagtgattt ttcatgtattttcttctttgaat TA T GG A G T CA aacttgtttgagttctacagagaaaaaaaagcaaatta aaataaatattgaatg tcatattaaa G t AT a TAA tataacaaaaaattaaagaaataagtcttctcttaga aca GC A CC T GTTA aa A tataacta TAT A A C t G T g T aac AC ac ACTT tgagataaatacatcgatc AC t CC c AA t C t CCCC CC g T attttgtcctttattcttccattcttcgcgtcgtcatcatttatcattatcatcgaga agagattcaaattcaaaccctaatcgataaga score :90 12052_at||#F28P 10.140#At3g54880 putative protein;hypothetical protein, Picea mariana, AF051204|COORDS: 20214819 20214111 tcctctacttttcttatgtagcaaactttctctttctatcatgaattttggtttctccttagaatcttttt ggatc TTG a A t GC ttctgatctgttttatttggatctttaaaaatatattcttcaccggaaatgccattgt ttt aat CTC a ATA catttcttaaggagatagaactagttctttgattgaaaaatctctgttttgttgcctt agagatgaatgaagattttaat G a AT g TAA accca ATGC g T t A aaaaaggaaagatgtttttttttcttct tttgctatgag TT T GG t GT A tatggtttgaatctttctttcat AACGT a T tgttttctgagaaatctggta tgttttgttttcttgaatcttctagggttttccacatagctgattg atgaaaatgtcaatgaggtaaatta a ACC a ACC ctttc T tg T AAC AGT t T tcatgatttttgaaatggggttttaattggtttcttgttgtttctt TATA g C a G cttgtgtggaaacctttgatgtttttttgtatgtggatgtgaattggtatactcgggtt TAT t C g TA aaggaatgatcattcgattgaataagaa score :78 14947_at||#F6G17.100#At4g37450 putative pr otein;probable arabinogalactan protein precursor, Lycopersicon esculentum, PIR2:S55925|COORDS: 16571421 16570663 attatattgaacttgagtttaatcatttttcattattaaggttatttatttatagacgca T g A T GC A T c A a gtcataataatttcacc CGTG a TG atcacataataatactaatcca A G A C TCC A T gtgtcg T AC C CC A AA a atttatctcaattatactatgt ACCT ta TA tt G tg CTATT tttttaactaataatttcccgacaatttagt agttggaaatttctcgaccgtgtgatataatctataatataaaatatcataattattaaacaatgataaaa t TTG tg AGC gacattggta TCAAT tt C aaatccaattattatgttgacacaagactagtgatttatagtat aaccaatttaatgtctaacgtaaaaacaattactattaattg attttc TATA t C t G atcatatgttcaaca aa AC A c A TGC taatc T A C C C A CC A ttggcgaataaatttattgca G a A T A TA AC a G gagaaaaatgccaag aaaaaaaaatccttataaaa ACGCAC aagaaaaaagtaaaa G ag CTATT ctgtaaaataaaaatgaaaaaa gtattgaatcctacattaatagatacaggata score :150 13547_at||#F14B2.22#At2g43100 3 isopr opylmalate dehydratase, small subunit;|COORDS: 17869228 17869998

PAGE 106

96 aagacttgagtgtccgatagtagaacaatacttcactcttattttgtctctaagactcctttttttttttc cttttttt G tt CTATT tgtttt GCA gg TGT caatggaagtgcatgagagtcacttcaaggatcttgatcca ataa TA a G a ATA tgtaatgtttccaaaagcttatgaacctaatctat gttt TA T G a A TA ta TATG ac TA tc ggctttagtcttcagtatcagtttcttgtttacatttggatgtatcagctgctaaaaaa A T GC A TA gtttc ccatcccatttgaaacgacgttttctctcatgtcttccactaatataatcttctggtcatgagaacagaac aggagctctgtttttctcacacaatgatcga AC ac ACTT ttgtcatatag TAT a GAG aaa T at TAACA ctt cttt TT A G G CA T CCGT atc T a GAG t CT ttgtttcactg T g A CC T AC C TA a G TATA tcaaa A CG C AC g TT tg gaattctgaatggcactttcgtaaata GG a G g GTA caaacatttctccttttaaattgttttcatcgtgaa ctctctcactcactcaacaaagtaaagaaaca score :138 12150_at||#F6E13.24#At2g44080 unknown protein;|COORDS: 18186340 18186675 ttcttctctg gttttcttaccaaaagaaactttcttcgtcttcctct G t AT t TAA gctttaacaccctgtt tttggtttccaacgttcaatcttcatcttcttctcgctgaaggtgt G tt TGGCT ct AAC g GT t T aaaggt A C tt ACTT tcgattcttttgattcattctctgctcatgtttttgattgaatcacaacgatttactgatcaaa acttggacaaattttctcttgcagctttcttgaaacaccaattgaatcttttc tctctaccggcaaaaaaa aaagattagtccttttaggtctggaaacgccaagatcactcgttctaaaccttagatttt GT ct GCAT ttc gggataatcatttcatcgtcagggttcttcaaccaaactacatttacagaagaagaagaagaaga A A A GT t C G T tactttt TATGC g T ttggataaacaaactcaagtttcttcttcatacatcgatctgattttccagatc aaacttcgaaaagagaaaaagcctt ctttaaatgattcgtgagttctccagtctacaaaac GAC at CAT aa acattcaagaacattattctctcaacaacaac score :54 17207_at||#AP22.86#At4g36670 sugar transporter like protein;|COORDS: 16253973 16252170 cgaggataaaaacaaaaagc ACCT at TA ga AATAG ct C ttatccaataaggcaatgataacggactcgtgt cgg tgtc TA cc CATA tcttattaatcgaaaaaaaaatcttagttagtccaaaact C A C C CC C AA agagtca actttgt TA c A C AT A C G T TC ttatccaaaatccaaagttaacttattaacg AC ta ATGC ca AC tcaggttt acacatgacatcaga GC TT c CAA gtttatagtctggtcaagctgtaactgttaacttttaattacagtcaa GGT ct GTA attagaatttgatcacaaaacatgtttcattcta TA aa CATA tattaaaacgatgt GG a TG a C T cttgtccattggattcaaataactaaatcatc T ct TAACA gtagcttgtggctttcacgactactgtttt gtcaactattttttttctatataatcaccgaatcttttctttggtactgagagagcaagagagattagtaa gacctaactctta A C G a AC T T T tagataaaagct TATA a C t G agagagacatagatcgacaaa AG t CTC t A TA ccttcttgtaa CGTG g TG gttaattaatca score :150 16114_s_at|17588_s_at||#F9F13.20#At4g20370 TWIN SISTER OF FT (TSF);terminal flower1, Arabidopsis thaliana, PATX:D1021318|COORDS: 9967463 9965509 g TAC a C A CC a ACC A gatatgtgaacaat TA TG G c GT CA atttatctggagcatatatacttttatatgatg gtttaggt tgaaggagttattctaaa ATGC a T t A acatcatacataa T c GG t GTA atgtagtgtccttaac tatttaaatgagtgacatgtgacgcaaa ATGC g T g A cgtacgcaaatatgcagcata TTA T A T G C t T gtga cattatttta AATAG tt C atgaagttacaatgataagtcagaataaa TTA c AT a C tagtg GCT ag CAA gaa acaagtggatgatgtataatttggaa TATG at TA tttctgttttatac T GG T g GG T tttggaa TA a G a ATA ctataaatttagaaaaaaggaaagaaaaaaaaaacatgaattataagtga T ATA C GT A C G T T gaacgagtt gtagattctcgaaactatacacgaaaagaccctggtaatgatttcattggagagaataaagctaaatagct aattattgcaattccca A a ACGTT atataaagaata GCA ag TAA gggttcataaggtaagaagcaaaacat ttgattgagatacttgagatcca agataaata score :156 18844_at||#T9I4.20#At2g29120 putative ligand gated ion channel protein;|COORDS: 12464357 12460021 ttaataataaaattttcactacaaaaaaaataatatagtaattttccaaaaaggtttttggttcgtgttac attttctgagatgatccatcaacttacaattggcattttctttctctctctctctta GTA GG at A ttcact agcgtaaaatcactcgattatg ACCT tt TA TG tg TA tga AAC g GT c T TG ct AGC tggaagaaaaatattaa tcttccgtcaaacaagtaagaagactgaagttggagatttttcaaagta A AC G T ATGT agcctaaaaaaaa

PAGE 107

97 ggagtgacaatggtagagaataatgacaaaaagcgagctcatattatcgtccatacaatttttttctgtca ctttcaaaccttaacttcgtttgttttctt TT A c AT t C ttgttttcatattttattttat CTC t ATA TA T G t A TA tatatatatgtatgtgtgtgtgtgtatatatgc AATAG aa C aataaaaaagaagaggaagttaaaga atgacataaagttcctcaatctggatgaaagtgatgaaccctagaaaaactaataatacctttatgtatta ctttgtcctgttcgtttgtggttttgtgttga score :72 12307_at||#T20K24.21#At2g19190 putative receptor like protein kinase;|COORDS: 8278442 8274616 ttgactatattaaataaaaaaattcaccgtaacacattgatattcaactgattcctaaaaaaatatacaaa cta TT g GG a GT tgtgagattttt TATA t C a G tgttggtctctttacatttgtga T g TGG t GT t A tagcata tatagtaataaactcaaaaggaaattagatgtgttttgaccattta ttaaaatgaaccttttcttgtcaaa catttgaaaaatactagtttttttttttggcaacgttgtaaataatagttaaaaatagattttaagtctcg ttttt T TA T GCATA tagtttcattcgctttatt AG a CTC a A atatacttttaattaaattttgcagagaat taaaggtaatcatttgccaaggaaaaaccatgcaaatatgcaataagtagaaataatgttaatgagagtaa gcgttgacatatattacg tcctggtc CG a AC a TT cttaaagttgcgtaacactaataaccttagaaga T GG T t GG TTG a CT atcaacatcttattgaccaaatgtttttttttttttaattata A a AC AGT T GC tcattgct ctagcccagagaaagcagctcaattaagtaaa score :126 12333_at||#AP22.54#At4g36430 peroxidase like protein;|COORDS: 16170406 16169137 ta GTAGG aa A aataattactattagtgatcatgattgtcgaccg TA ag AGGT ggtttagt TAC t C t CC atc tttctttgaagaagtcagaaagtcagaaattatatcaaattaaacatcaatattgaacaca TATA t C t G ta tggttt TATG tt TA gaaaattccaatatttata TAT t C c TA gggaaaaagaagcttattcttcaaattat T GTTA tg A gtcgttaaaa TA T G g A TA aaaatataaagtctaaatat taaaaactcagtttgctttgctttta cctctcca AG t CTC c A aagtcaaattaattttagttaattaaaccaaaaaaggtttattagtcaaactta G C a T g CAA tg C t G g GTAC caaacccaa GCAT ta GT ctcttttaatcttctttttctccaataagtttttaca atttttaatt GT tt GCAT ttcccttgattatttatcttcatcccaatttagctaataccaactccgtttct tattcttccaagtcttt tcctataa A T ACG TTC ttctt CCCC t C t T atttcatatcactcaccacaaagtc ttctcatttcctcatcttcttgttgaaagtaa score :102 17278_at||#F17F8.37#At1g30900 putative vacuolar sorting receptor;similar to (gi 3033390); similar to EST dbj|C72582|COORDS: 10997393 11000661 ttaacccaaa tgctacttcaatagactttgcattttaaaaagtcaatgaa G TA a GCAT A gtcttgatctt T GTTA tc A ttataaattcaaaagtatttacatatgatgtgtccggataaaaattgaaacttgtggacttaaa ctta AG C CA c C C ataccttttt AACGT t T tgccacttgattaatcttcaagattagagctttgaataaaaa tgataggatctcacgattgcc TTA c AT t C taaga A a ACGTT aatacttttttt tttttttctcgtaaatca ttttaaaattgagagaagagaaaatgaaagaaagag AAGT tt GT tgaatcagtagttgcagagacatggtt agttccgctgactctgtcaagtcatctctttctctcccgtttactttacgctcacatctctttttctccac aaggacaaaaaacagagtcattcattttttctctcaaactctgtttaagaaaaatcgaaaaaaatcattac tttccgatttgctttctcactttct catgcaacaaacgtaaaacccacaaaagaaattcgaagcacaagtt ttgaaactctgtttttttattgatcagaaaca score :78 14612_at|19267_s_at||#T14P8.14#At4g02330 awaiting functional assignment;similar to pectinesterase|COORDS: 1031479 1033928 aaacgacgtgaaatcaagctaaatggcaaattattagaa ttagtccgaataataaacaaattcgtctaaaa ttgtagaaaattttg G g AT c TAA gaatga GAC ta CAT catttttttttatcaacaactacatcattttaaa ccaaacttagaattgttaggtctcaaac T tc TAACA ccgatttgctgcaaaaacaattatggatcggatca aga TAT a GAG aggatttcacatttgacttctattgcttactcattccaaa A g AC g GTT tggtctagttttt gaaatttaacc aatcttgaaaatacggaatttgtaatgtactttacaaaaattgtatatgctaattataat tttgtgatctcaaatattcatgtttatatgataagataagagataacatga ATGC a T t A ttatataatacc atagaagatagaatgtatac AG t CTC a A aataacaaaattta A TAC G g A C C cacatcccaaacctctccca

PAGE 108

98 CTC t ATA tatttggt CTC t ATA aaacccactaaacctaaacttttccaaaatca taccaaaaaaacatt T a g TAACA tcaaaatttacaaaaaacaaaaaaac score :72 19135_at||#T6A23.5#At2g38750 putative annexin;|COORDS: 16146975 16145126 tgaaaacgtaatttctccaaataactttttcactttcttaattttctttagaaagtataatattctttact gtttcttttacataattctttactattttattttatactaaagaaa ctttcatttgtc AG a CTC t A cttct tgttgtaat T tt CCTAC ttttcgaaaactgtaaaagaaaataaactcttcttcttatctgtgttcaagaac ctccgttttggattatagatatgaatgatgac AC t TA a GT acaacttttatagtggtcaaacttaataaat tccacttaagacaaagaaaaacagtaagtgttaatttgtcttgaacttactg AA G TATG T AA attccgttt gcgta A a AC a GT T GT TA A CA t A T GC A a T TA A t TGC gtgggcgttactcct TGTTA ca A caatgccccta GC A T t T GT cttaaatcttgtattgaatacctaaaacacccttctcttctcccataaaccacaacattaccctt aatcatcatataa AACGT c T tgtccctcttccatagatttccacagaaaccaaaccaagagccggaaatca aaaacagtaataaaagatcaactgcaagaaaa score :96 18015_i_at|18016_ r_at|18017_f_at||#MAH20.17#At5g08610 RNA helicase like protein;|COORDS: 2790409 2794058 gatcccaacggtaccatttttctttgtattttttgtcttatttataaaagaaacatttaaaaaggttttat ttcatagctcttcaaatccttgttcatcaccatttcttccactaagaaaaattgaaaactttttgtctccg ggctttggtatgggccactta ggactgcactaag TA T t GAG c CT ataaggcctttgtgtaactc TGTTA at A ttatccagctttaaaattgaacactcgacaaagggtaattcagtaat TA ta AGGT ttttcaaaacacttg tcgttttgtttacaatttacattttaagcaatgggagcaaac TACA t AC CGTT tcttatctgttgcacgaa acggcatcgtttactaccaattatctttgcttcatttcctatccctttaaaacccttatcgtgt tttttgt ttcatgtaaaacccctctcgacattttgcgactcttcttctccttatcctctaaatctctcaccggtcacc gacgacgggaacttggtagttacaggaacaagaaatgtcctcgaagttccctc TCG G t GTA C gcttcatca ctcattctcttccttgtactcgtctcgcttcc score :48 19876_at||#F8N16.5#At2g28760 putative nucleotide sugar dehyd ratase;|COORDS: 12287184 12285011 cgaaatttaagatctataaaatccttaaaacccataatttattattccatcttccttcctcagtcttaatt cttaaatgtcgagaatagtctaaagtcaaactctcttagtctttgttttctcttatctttctttctttatt ttttgtcatctctctttcttgatataa A a ACGTT ctcattttctttattgaaactt TA T t C A TA agcgacg tttggtg AC ca ACTT aaaattgggtttggtgcatttcgtagtctcacttttcgttcatttgaagcttcatt ttttc TG GT A GG G atttttttttttttt T TT G G t G T GGCT tagttaacaaattttgtgggactatcttcat gtttcatatgatttatttaaat TTA c A T G C g TA taaatattgattccagtgaatctcttaatgtcatcaag atcagaagacaatttttatttttcatcaagtttcttagatcatcaggttg attgatactctaatcagtgtg attgt G t AT g TAA at GCA T g T GT attattctgtttttggttttctaaatctcatgatgaatctgtgattag agtttattaaatttgattcaaaattgtagaaa score :90 12577_at||#T8O18.8#At2g28630 putative fatty acid elongase;|COORDS: 12225691 12224261 tta TGTTA cg A aatgagagtttaaa AACGT a T aatataaaggagagagagactctcttttcttca TA T GCA TA TAA actttggttca GC TT t CAA atgtaaaagagttgtagaggacaaaaaatcaaga G tg CTATT aaata tgaaagaaatacataatggtaaatggtggactttcttttttttgaaaaaggatttggtaaatggtaaatgt taaatggtggacttatttgtaaggaagtttctcttatgttttaatgatagaacatgtgaagtcttcttcca ca tcaacatgt CCCC a C t T atctcctaagtagaaaaaatcaccttaaacttgttatatcttaccgtcaacg aaaatttattaaaattccctttcttcaaaaatactattaaaactcaatggttatgtagaacccactgggaa attcagagctggcgtttcctcatccattctcatatattctcttatataatcccctttacaatctccacttc atttccatgcatctt AC tc ACTT catcatctcttct T aa AC C CC A A A acaaacaaacaaacaaacaaaaaa taccaaaacaaagaagaaaaaaaaaataacta score :138

PAGE 109

99 19840_s_at||#T5I8.17#At1g30720 putative reticuline oxidase like protein;similar to GB:P30986 from (Eschscholzia californica) ( berberine bridge forming enzyme ), ESTs gb F19886, gb|Z 30784 and gb|Z30785 come from this gene|COORDS: 10898316 10899899 tccaaactagcaa CA t CACG ctcacgc GTAGG ct A aaaatttattaatctccaaaagtctttcttatgaac actgcaaacacaacaacttgaaaagtcatataggtttagatgatgacgcgtattggctatcgcttaccg G a g TGGCT cataaatacaataaacaatacgtaaaagtcaaagtcaaat atatttagtcaactataaccattaa tcgggcaaaaccttta GCT gt CAA aacaacgtgaaaacgatatttgtatatatcatcaagaatcagtagat aagagaatgatttaatcccctgactattacaat TTT GG t GT AA ta AACA G TC T C t A ttggtttttattctt tgttttaatttctcatgacc TAT a GAG agaattaggtagtttcgaaaattggctaatcaacttttgaaaac tactgtctactttgc TTA a AT t C tctacacttagtttcggataagataattgtcggactaatagttaatcc cttgacaatctttgatatta TA aa AGGT ttagttaatctctt CTC t ATA taaa TA T t C A TA cacca GC TT t CAA aaatatataatccaa AC a CC AA A aacaaa score :120 12299_at||#F21P8.180#At4g23290 serine threonine kinase like protein;serine threoni ne kinase, Brassica oleraceae|COORDS: 11145293 11142408 agtattaggctaattcgt CGTG t TG tatttttca AC ta ATGC g T c A gtgtcgtttttattttca AC ta ATG C g T c A gtgtcattagtttcggaaataaacgatatttgttttgtttttgtattgggtccta TAT c GAG cgag attggcatgtgagttg G G t TG G CT agggaaaatggtaaaatctccacattttgttt tgaaaatgtgaaaaa tacattt T t C CCTAC agcggggatatgaatca G ta CTATT tataatgcaagatatgtaaatttaaaa GAC a t CAT aagaaaaacaaaactacagaatttcgcttaa GC TT t CAA agatatcgattt G a AT a TAA tggtgcat t T gc GGGGT tc A gtttccacggaaccggtaaatttaaaagccaaataccttaaaagagttgtgaaacaggg ttacaaa TA T a C A TA gtggattcgtatg aaaaagaagggaggaatgagtcctacacgtagtcacatcacat atgaaagagattgcttttgaaaatcgaatcctcttctaatattaaatctttcaagactggaggttttaaac aagaaaatcaaactcgatgcaaaagaacaaaa score :114 12325_at||#F22I13.190#At4g38420 putative pectinesterase;pectinesterase Lycopersicon esculent um, PID:e312172|COORDS: 16947320 16949653 ctattactaatttactagtattattttatacttccagcata TA T a C A TA catacattgcttcttcatattc tctac GC g T t CAA tcaatatttctcgtgta G G c TG G CT atttgtgccaaaatacaactcaaattgt TT c AC C CC AA A aaaaaaaaagcagatacaataaaataaaatttatataattatatgttaattctcatttgtgtaa a atattgaagcagcttaaaataaaataaacctcacagtaaaaatgaaataaaatacctt GCA at TAA atgag a ATGC t TA t T tt TAACA aaataaaaaataaataaaaatca ATGC a T c A ccggattgggattttattttctt tcaaaaaaataaaataacaatccccaacccaaaggacaaaataatgg G t AT t TAA t TA gt A G G TGGGG tgg ccttataatatattacaggttgaaataaaaacacggaattag ctgattggaaaagcttccccctccccatt ggact TTA a AT a C g AC a CCA g A gtcatttgttccgttactacaaaacagagttgaagctttctcttccaac tttatcttcttcttcttcttgtcaaagcgagt score :120 13428_at||#T16B12.7#At2g31120 unknown protein;|COORDS: 13210613 13209675 ttttt AATAG ta C taat CGT G C T GT CAA T aa C c aattactgaattttgtagtg ACGCAC tgattttctaca ttagctagattactc TGTTA gg A ggaatagtttttactttaatttgagtacaatttttaaatttatttttt catgaaataaattggttattataaaaatttatcctttgagaacattcaatatgttaagcaattttgcaaga accccattttaagttaataatatatcaaaaatggttgtgcatgcagaaaatttatgaaaaaataataattc a TA T G CAT t TAA atatttagtacatagttttataaatatagttaaaaaaaagttttggctgaaccgatatt tcta AC ta ACTT gc T ta TAACA agttttattcaattagtaaatttgattccaaatcacctttatcaactct taaagtatatattaagtttaatgattgtccacgaaataat TA T G t A TA ttgtatacaaaaatatataacaa gaaaggtaaacaagagaactcactgtagaaacaaagacttataaaaag c T t GAG g CT tgaagaataagttg atcattctttcttggcgacc TTA a AT a C aaaa score :96

PAGE 110

100 14497_at||#T5J8.17#At4g02850 putative protein;similar to T5J8.18|COORDS: 1266921 1265535 attcttcttcgtcttctttggtttttgtttctcgcaatgtactagtataacttccttcctttctactttgc t C t G a GTAC ttcttttagaaa TGT TA tt A tcc TTG t AA GC ttcccaaagtgttacctgtccttttttggag attt TA a G a A T AC C t ACC A aatcttcccatgtactatcaatcagttattattgtttgagtgttcctttttg actaaccaatttttgtttgatttttaacgactagtaagctatgcaacacaattcatcac ATGC t TA gca TA a G a ATA atggaa TA aa CATA ccatatttgttttgtcaat AA GT GC GT ttttgctttatgagtgcttg cttc cacttgattagatctgtcctttgct G a AT a TAA aattttgcaaagctgcagatttcataatgatttgagta caattcaac T GT TA G C A T a TAA acaaaacaaaaaagattaaaagccgagatttgtttcaataagattt T g C CCTAC aaagattgtctgtttctcttaaaagatgaagtggcattgtaaattaaa A G CC T CC A A aatggtata aaagtttagctttggtctaacgaaaacatcaa score :132 17909_at||#T3P18.4#At1g62480 unknown protein;|COORDS: 22340218 22340967 acggttaaatcaacaactagactaaagac G A C T CC A TA tcattaacattctcattcattgca A t AC t GTT t ctgccatcgattttctcctcaccatcatcactccttcaaatattcattttaactatagtttgttttcatgc tataatgattctatatttctatcaaacaata gttacagaatttctttgcttaagaaaataatatatattta aatctataatattcacgactaat A g ACG TT A at TGC aag TGG t GT a A gatctgtatatgatatatattttc tttcacgatcacaaactgtacatgatgtttctttctcaaacatctatacaacaaaagcttcattattattg ccagagaataat T ATGC t T A A tatgtgattttgttttaggattttttttttttaaatatatcttgaaaaat gat tctaaagctttttttttttttttgataaaggtgtatttgtttctcctataaattaagccatcttctca accaaactcttcactcaccaaagcatcacataacactcacac AC ac ACTT tctcttctcttattttctcag ttcttttaactcttttctctacatatattcaa score :84 15357_at||#T16L1.50#At4g33560 putative protein;|COORDS: 15099954 151 00241 aagtatctttcttgcttttcttattatagtttcgtaattg T T A a GCAT A TAA taatttggatttgaat C t G t TATA gg T t AC a CCA ttccggcggcttaaaaaggaacgaaaccaacagaatttggatttgctataaaaatt aataa AACGT a T tagtatctgaataatgatgatgatgga TGTTA tc A ggattttaggtcttgtagtataag caaaat GGGGT tg ATGGT c GG aagatataattaatga aaatatgtttgtgtttttttattttga AC a CC t A A agtaaacatctgaactagcaaaacga GCA at TAA acatct GAC ca CAT aaataatccattttttatctct atttta ACCT cg TA gttagataattaatttatgaccaaaaagaaagtaaataactaaaatgagaaaatgaa aaccacga GCT cc CAA atttcctgcggcttctttagcctcttgtttcgtttatgtaccactttctcg AC t C CA c A cttta taaaccaaatcctccacgaa AC t CC AA A cacattgtccaaagatcaatac A c AC t GTT ttaa gaagaatttgaaaagaaaaagaaaaaaagaag score :138 17317_at||#F13K3.9#At2g36690 putative giberellin beta hydroxylase;contains similarities to GA beta 20 hydroxylase from tobacco (GB:3327245) and t o ethylene forming enzyme from Picea glauca (GB:L42466)|COORDS: 15328474 15330531 ttcaggaggtg G ct CGTAT tattatattttaataaatctattctattttttctcataaactaggtagactt ttattatttttattattattttctctatcattttcaattatatttaatatta TTG cg AGC taga C CC T AC C A tctcgtgtcttatcgttcaaatactatag aataataa T at TAACA atatgtc AAC g GT t T tttggtaagg acgagttaagaagaag AAGT c A G TCTC c A agaatttgactcagaacaataccaaaactgctttaattcgtc aaaatatcaattct G ga CTATT catttc AACGT c T cctccaaa A AC G T A T G T A T A TAA atacc TACA CC c A CC AAC a CCA c A acttacca AG t CTC c A aacatagttcctcgcatttacata TA T a C A TA taataatatctc ta gaattttcaagagcatcaagatccatcaagatccac C a G g TATA tat ATACG tg C ttcgtacatacata catctcctgagtttccgttctttgataagtt CTC t ATA tttgctaattgattat T g A T GC AT A T GT acgtg acaatgtctcatt AAC a GT g T aacggagaaga score :186 15982_s_at||#T2N18.11#At2g37130 putative peroxidase ATP2a;|COORDS: 1 5548548 15546769

PAGE 111

101 taagagagttg GGT at GT A g ACGTT gacgtgtaatattcatgataatca TTG c AA GC aattttatttattt tctatactaa G aa CTATT tcaaattg TA T a C A TA ttactttttggtggaacatgaaaatgtttatagattg ctgattaatatttgaaggaaaaaataacagaaaaatggatatatacttgaatcggtttgataatatattgg aacaaaagtgaaaagctaaattaa ttaatgctactgtgctagtatattactatttttatcgaattaaattg gaca T c GAG t CT cactgaattcacaatttcacatgttcaacaac TAT t GAG ttcagatagctagatagcta gtggttttaaatttgagtattaatatgagtttagaacgattaagaa ACCCC cg A ttattgaaaaataataa ttaattaatagtgacagcaaccac TATG ac TA agaaataaag ATGC t T AA ttttatcatacacgatt agag aggttctcaaa CTC t ATA taa CCCTAC attggaacaca TAT a G AG CCA ca C accaataattaattagacaa aagataagagagaatagagagagagagagcaa score :102 19713_at||#T9A21.190#At4g18340 beta 1,3 glucanase like protein;strong simlarity to endo beta 1,3 beta D glucosidase, Nicotiana tabacu m, PIR2:S46495|COORDS: 9096516 9094909 ACct ACTT aattgaaacgaatattttaatattttccggtgctttatgaaacttcttcaatattctatacac gtcc AC t TA T G T g TA tcccaatctcttcttaatt T aa TAACA ttgctcactttttt T AC C a ACC caacagt T a AC a CCA cttgtatagtacatgtaattaaacgaaacacacaaac AACGT a T ttccttccat ACA tg T GC a T g CAA attgtcttgtca T a GA G TCTGCAT taatctagctatatgataaataactacatacaaatattttaa a ACCT aa TA attttctttaataagaaatccttcgttgtggtttatagaaagccgtcagtggataaaaa A g G CATA aatgagaaaagcaatatagggaggcaaagctaaacaatttaaaaacactttttttaaaaaacataaa cttcaacactaaaccttttcacaacaaaaaacacggaaaaaaaac aaacaagcgaagaaaagtattcttca cct T G G T T G GCT tattgtg TATG tc TA taaatattccattaaacacttctcttaaaacccaaaaccttcac aatctctttccctccctctctcttcctcaaca score :108 17894_at||#MSF3.7#At2g18690 unknown protein;|COORDS: 8046199 8047167 ga CCCTAC tgtttatgactcattcatgatgacttgttcgc ttgaggaaacaaattttttttttagatagat gtttcattatctt A c AC g GTT tcttgacatttt CA a CACG aatagaggaagtgaatgttgatca CG t AC a T T tgatgaagaagtcaacttgcaatgttaaacatgttcctcttattttctttatataacatcttcatttatg acgcggtcaggaagtcatcatttttagacttttcatcg ATGC a TT A TGC t T cgggttctttgttcttggat ttgtctaagtct tggtttcttcctttgatttt G g AT a TAA tatgtagtaataataata TAT t GAG ggtt C c G t GTAC ttggactcaatcaaatggatgacaagaatctccgtaagcgaagc T AC C CC g A A atacaagccaaa gacttctttttccatagaaaaatagctggtcataatctaaaggtatagtagtttttgaatgttt GGT ta GT AG t AT a TAA gccctagacagataggtttgatcttataaaccctcgccaccattac caaaaaccaataagcc aagagcttttctcatttttcttcttgaaaccc score :90

PAGE 112

102 APPENDIX H ACTIVATED VP1 AND ABA DEPENDENT GENES 20682_g_at|20681_at||#F10M23.80#At4g26740 embryo specific protein 1 (ATS1);|COORDS: 12439772 12438285 acaaatccatcggccggtgccggtcactgaattttatatctaatcta T GACAC t TG G ggttgatgttag TG C GT GT G t G tgttctcaacatttgtggg tttgtttattaacttttcaggctcagacttcgtttacaaagaaa atttgtgtgaattattcttattatcataaaattttccttgcaactt CGTG t AC attcatacatacataggc aatggagttcctcttcagt C tt CACGT A a A gagcgagtgtgg GACACG cact CA t GT a GC gggtggtgtta gtactcgaggttgggcctatataaaagcccatagaggcccgaattactgaatttagcagacaagaataga a agagtgatgaaacatggaag AA a ACGTGTC T ctagagtca T G t CA a GT G T aagacagaggaagagagaaga ga TGT g CGT caaagacaaggaaagagagatgtcaatcgctgctttcgtc G g C g C GTG C A T G T C C G C CA CG c ACA tcaatcaaatcgattcttattattattacctcattatactctttactctaagacaa ACAC a TA C A tt T GC actcagtctagagacaaagtgagagagaga score :25 8 20570_at||#T1P17.20#At4g12430 putative trehalose 6 phosphate phosphatase (AtTPPA);trehalose 6 phosphate phosphatase Arabidopsis thaliana, PID:g2944178|COORDS: 6331827 6329961 aagtaacatattgaaggttacgacaaatttaaacaggagaatttgtcaacatataaaacttactttcgctt agaaaaatgaaatttcattt GTGT CGAC t CGT taaagaacagaaaacattatttatttttccactactttt gtgtttaaaaagcatgaaagttttggaaatcttatagaagttaatttatttattaatttccaaataaagca tttaattttttttaaatttacattataagtgactttgttgatttgttgt G T G C A g GT G aatttctaattga atctgaaagtttacgatgaacaatttgatatgatgataaggta cttagtcaagaaacaaaatattcctaaa agtaaaatataatcttatatttttttcttaactgaattgtgttttttttttttactggtaatcagttttgt ttgatggagctttcaccaacaagcagtgacaagaaaacaatgaagagatggtttttcatagataa ACG a GT t G GAT ag AG tgttattgaagataaagaaagcaaaaaacagaagttcttgcattgaagcattggttttagag tttattgaagcattg tttagagttttgattaa score :54 20535_at||#T17D12.5#At2g28490 putative seed storage protein (vicilin like);|COORDS: 12129525 12127354 tagaagttaagagcttccgggattttgttttttattttttcagtgttgtaacaaatttaaattctgt C G C A C t TG TC gtaacaacgatattttttctttgaatataatttaacattaaat taaaaagaaaaactaaataaat tattttgaagttaatatattatgttattatctttggtttgcaataaatagatcgagtcaaggtcgttatat gaccattgtttagttacgacgctacttcatacttggaatctaaggagaaaaaatgtaacatagttctcagt acttaatcacatagttctcagttcttaatcactttattgttaaaacttttcatcgaataattaatgatttg atctccaatctcaattaatta tatatttctaaagccaaaagagataatgaaaggagaggtggtagaaagaa aacgttaatgtatcaaactctaataaaagaaactgcgtgta TA G A CACG aaggctccgatctttt GC ATG t C t C GCACGTGTC G tcctctttcttcctacttaacacatatatgcatgcacccttcttagaaaagtagcaaa acattgtgaatcatcggagagagtgggaaaca score :252 20412_at||#F13M23.2 80#At4g25140 oleosin, 18.5K;|COORDS: 11864996 11865757 ttggctctggagaaagagagtgcggctttagagagagaattgagaggtttagagagagatgcggcggcgat gagcggaggagagacgacgaggacctgcattatcaaagca G tg ACGTG g TG aaatttggaacttttaagag gcagatagatttattatttgtatccattttcttcattgttctaga ATG t C g CG g aacaaattttaaaa CT a a ATCC taaatttttctaattttgttgccaatagtggatatgtgggccgtatagaaggaatctattgaaggc ccaaacccatactgacgagcccaaaggttcgttttgcgttttatgtttcggttcgatgccaacgccacatt

PAGE 113

103 ctgagctaggcaaaaaac A a ACGTGTC T ttgaatagactcctctcgtta A CAC a TG C A gcggctgcatggt gacgccatt A ACACGTGGC CTACA at TGC atgatgtctcca T TGACACGTG a C T t C t CGT c TC ctttctta atatatctaacaaacactcctacctcttccaaaatatatacacatctttttgatcaatctctcattcaaaa tctcattctctctagtaaacaagaacaaaaaa score :552 20375_at||#F21D18.15#At1g48130 peroxiredoxin;identical to SP:O04005 from (Arabidopsis thalia na)|COORDS: 17053309 17054199 agagaagcgttaagcgggagcgagaaag GA a ACG a G agaaagagagagcttccagatccgacagaagtttt cggcttcttctttttcgtttaagaacttctgatcctcctaggtctgtccgaagaactaatctttttgaggt aacgacgccgtttttctcaaaacatgggcccattaaccatagtctcggcccaaacgaaacttaatacgaca atgtttgggtg taaacgcaaagattttgtcgattatcacaagtaaaaaaataaatacaa ACAC t TG agtct ctctagacatcgtgcgtcgccttagctttaagttttttctcgaaacaaaagagttattttatttgaacttt gaagattatacga A GACACGTGGC G T G a AC ccaattcataaca A C G C C AC G ctatactcttttgcatgcac ctcaatttgaacatcatcaagtctctctctctctttttctgactttgatccacg aacctaaccagcttgcg atctctatttaatcggtcctcgacgcaacttcaacttcta CT ac ATCC attcacatcaaatcaatacagaa agttttttctatatataaatataaaaggtaaa score :348 20321_s_at||#T18K17.14#At1g73190 tonoplast intrinsic protein, alpha (alpha TIP);identical to tonoplast intrinsic protein alpha (alpha TIP) GB:P26587 (Arabidopsis thaliana) (Plant Physiol. 99, 561 570 (1992))|COORDS: 26733573 26734589 aaacatttttgttgtcagttacccgccaaaatcatttgaagcctaagaagaaagaagccgaaattttacca ggttaaaagtgaaaat CA tt T G T C AC a T G ttatgcttgaactaagaaataattattgacttgcagaat tat caaacgatcaaatcataaagaacatattacaatttcattaacttccgattaatctgccgtgaaaccgtgca atctcacagttttcccaactctagaaggttcatatgcttgtttat C t AC t TG GC A CAC a TG C A tgcttagt caacaca ACAC a T a C acatacata A A CACGTACACG a G GA CA t GT attatatatcccgaacctaataaggt tcgtccaaaaataactcaccaagagaagataagaaagcag c ACG a ACA ccaactcttaaggaaaacatcta agttatggttaagtaattgcatgcaatttaaa GC T ACGTGTC CA gcttaagacactcaagtctca CA tc TG TC cctttttacttcgacttcgcttcttttggtttcttttaaactctctctatctctctctttcttcttcac actttgttgttaat T t CA a GTG T ttgatcata score :282 20108_at||#F23H11.5#At1g59730 thior edoxin, putative;similar to thioredoxin GI:992964 from (Arabidopsis thaliana)|COORDS: 21164806 21164173 ctcattagaaactctcc T GCCA t GTG a C A tttttatttttatagtaattaatagcattaatattatatagg tattagatccttgaagtggatggtgcatatattatttaaacaaaacaaattttaactacttttaatttata gcgttgac aataactaattaatctaaactaataaatcaagtacctaggctaccattttttaattagaaaat cgatc CGCC g CG gattgttatccacccacaaaaataaaaaattcgttt GCA aa TGT gataggcaagctagt tccttgcaaccagctatatttttatgtagtggagaatatttaacaactaactggaattttaaaaacacaaa gcgaaactaaattggtataagctatatattcttgaaaaaataaaaaaatca acggctagaactaaagaagg atagcaaattgaatcatattcctttgcatttacttatttatttataaccaacaaattaagttctccgatta aaagctatataatgctcctccttgtctctcttctttgcatatcagtatat CAC ta GCA aataagtgaacac catttcttcatacat ACAC a T t C cgaagataa score :84 19688_at||#M7J2.50#At4g25580 putative protein;s imilarity to low temperature induced protein 65, Arabidopsis thaliana, PIR2:S30153~contains EST gb:W43419, W4351200|COORDS: 12020817 12023154 c A g ACGTG T aatttcgtggggcagttctcgataaacgactaatgtt T C GCCACGTGG A tattgtagtagct gcatctactagttccgcttccgc GC tt CGTG TC TG tttctt ctaccaccaaccaaaccattaaaccgcgat agagcatactggatcttttttaaggcccattatgtacatttctgttggaccaacccattgcgcaatggaaa ctattcatatgatatcatttcttgtcgttttcggatttcagtgactgtgttatgctgtttttccaatgatc gaaacggcatgttgttggaatgggctagtgggctaacttgtgaaagaaggtaacatcgacggttcttctc G

PAGE 114

104 GAT aa AG attagg ctgagag C G C CA CG c ACA caatcaagagcct GA T ACGTGGCATGACACGTG G T atttc gtaagataaagtacagaggacacagccatccagaggtagttcggctatgcgtcttcctcctctttcccata ctctctgactctgtataaaactttgaagaagctcctcatattttctcagaaacaatttggatgcatagaaa ggaaaagaacaaaatcaaatctagcaaataaa score :606 19217_at||# T25N20.16#At1g05510 hypothetical protein;similar to unknown protein GI:4105683 from (Oryza sativa)|COORDS: 1629528 1630691 atcgtgattcctaagcttaatcggtagaatcaatttatcgtgattcctaagcttaatcggaagattcacaa cttcgaat GT a CA t GT tgtctagtagcgtcaaac G g A t GTGT tttgtcaaacaaag A CG a GTC tg CA t GT a GC ccagaagctgtggactcggaatatgagttttcctctttttttctgaaaaccctagacaa TA C t CGTG T A CG T ttgcctcataagct CA ga TGTC ataaggggtttgatctttgcgtgcatcacaatccttcaataatttt ttaccttttaccctaaaatctcgt C t A CACGT A TC accatatttgtcttcaccaagtttgtttcactatca taatcagcgatctct T G A C A C a T attgac tagtccttgttgcttattatcctcaagtgaagccaaat AA G C C A C G c ACA ctcgg T GA C A C c T t C aatgtt T GCCA t GTGTC A tagtccatgcaaatgcccacttcattttat taagacgagagctatacaaacaccgtctcttcttatt CAC t TGC tgctagttataagaatcagacagaagt tgcagagaaaagagaattaattgattggtgta score :306 19153_at||#F16J14.21#At3g22 490 LEA protein, putative;similar to LEA protein in group GB:BAA11016 from (Arabidopsis thaliana) (Physiol. Plantarum (1996) 98, 661 666)|COORDS: 7970726 7969773 cgatctttcttcttcttgtcca ATG t C t CG gatatatagcttcttcacaaactcttcgatcccttccttta tgcgtt GCACAT GT GG ga gt ACGT c TC tgacggttctctcgaccaacgcctagcctcctctcatgcaggag catttcctcgattggttgaaaggtttaatggcgttgatgaccagagagtattcgct C AC t T G C A C ctcgag ttcagttatattatatatgt T GC CA t GT t G gagtaggtatgtattagctctcat C t C a TGGC t T cagaaat gttcagaacagctttgctcattgttgttatgttgtctcaatatt GA a ACGTGGC C A atgtt gtacgaaaca tggaaaaaagtatttaaacctaaatgacaaaactctatctcagtcggtctgaatcatcatcttgctaaggt tgctataaacttggaaac T GACAC c TG C C tct TA G t CACG T ca T tgacaaaa T G A CA a GTGTC A aaatctc ataatgttgctctcacactcatgcacctctcttatctttataggagttgt TG t CAC t T ttgtatgtaacac aaaacggaacataacataacaaaacagagaga score :336 19045_at||#F14M4.34#At2g46950 putative cytochrome P450;|COORDS: 19239908 19237998 ttcgttcagaaaaagacagtccaaaatgttaataagttataacaatataaattgtcctctatgacttcaaa atattttattacagaaatggcaaaatgccaaacgaaaaatctggatcgctgtttacattatttgtggtt GC C tt GTG agac G A t CA CGTG a A C tttagcttttacacgattt TCT A C GT G T GTC A tatatcccaagaaacaa catctacaacttctgtttctaactt G ta A CGTG T aa G caagctacaagaagataaagattacttccataag caatgacg GCA tc TGT tataatcgatgctgttaaaagactgtcccc ACAC c T c C attagcgataggcgatt ttagaatccgaagtttcctttat TGT c CGT tacgttgtcttactttctttcccatgta aaataaattccca ttactcctatgttattttgttcggttagaattttattcattatatcttcaaaacagcttgtgcttatatca gaaacaga AG t CACG acgaccctcacactacaca C GACAC a T aaaattgaaagccccactgcgaatccacc acaaagcaaaaaaaatttgtttgttattcgaa score :168 19009_at||#F2H17.12#At2g36270 putative bZIP transcrip tion factor;contains a bZIP transcription factor basic domain signature (PDOC00036)|COORDS: 15155115 15153524 gattaatttattttactaatcatcaggttaagctctagattatcttttgtttaaaaccgtattaataaatg attaaagaggaaaaaaatcttgtgttgataagttcgctttaagaagaaa ACG a GT g G gttaagatattttc ct cttttggtaacgaaaactttattggaagagtaaattattagtttatttattgctaattttagtatcgtt tgtcgc TG t CACG atgtggaccgttcttctttatt T t CACGTG G A C cgacttaattaccttctttcattct tgttatttcaatattttttacgtcatttaaactccactttaccaaaataatttatttcattctttgagagc atcaattatcaaataatt CA at TGTC ttttttcgtgatagttgaa taagggacaactattaagtgagtttc gtgaatagctgaacagg GACA a GT aactgaagtttggtcaaaataatacgatgaatttaatatataagaga

PAGE 115

105 caaaattgttattgtcgtttaggtggtatccatatttcaatatagtttgtgtgaatacagatataaaaatg gttattgttgtgtatatgat GCA gt TGT taaa score :120 18991_s_at||#MGF10.3#At3g27660 oleosin i soform;identical to oleosin isoform GB:S71286 from (Arabidopsis thaliana)|COORDS: 10245078 10246048 ctacattcatacctaagctagcaaagcaaactactaaaagggtcgtcaac GACA a GT tatttgctagttgg tgcatacta CAC ac GGC tacggcaacattaagtaacacattaagaggtgttttcttaatgtagtatggtaa ttatatttattt cgaaacttggattagatataaaggtacagttagtgaaaatatttggttagcgggttgaa ttaacc GGAT at A GG a GTCA tatatacaactgtgaaaaaaggataaatacaaaaagggaagatgtttttc C GACAC acaagctaattaagtgcatcgagaggaga GCA at TGT aaaatgaatgttttgtttgtttttgtacg gtggagagaagaacgaaaagatgatcaggtaaaaaatgaaacttggaaatcatgc aaagccacacctctcc cttca ACAC a T t C t T ACGTGTC G tcttctcttcactccatatctcctttttattaccaacaaatatatttc aatcccatttata TGT a CGT tctcgtagacttatctctatataccccctttaatttgtttgctcttagcct ttactttatagttttatatcatatcaatcgac score :120 18718_s_at||#T20D16.13#At2g23240 metallothionein l ike protein;identical to an EST: GB:X92116:ATECPRHOM; contains a vertebrate metallothionein signature (PS00203)|COORDS: 9844872 9844542 aaacacaattttaccgttttgtcggaaaaatgcaaaatgcaaaaacacaattctttaaaatcatgattttt aatttatcaataataaaatgggttatacatgggcttgcccatatgga catgatttttattggtcatggaca tttttcggaccattgtccattatggaccgtc C a AC t TGG C ccatagaaaaaactgtccat A C G a ACA C G cc caatggtccctggacggaccatttgacagctatagacatagtccaaaagtaaagtaacaaaactagggtct tgagaagggcttataaaagaggagattttagagcataccaaaaacaattcagctttatctaattctttttt ttttcttttttggaaacca tttctttgaccaaaataaaattgtaaaattttcagtattctatttctttgtc ccaac CG a AC AT G ct TC G t CAC G AC t CGG G AT ag AG gagaagt T G t C AC a TGACAC a T G ttacctacatag AG GCCACGT ATT caattctcttgcatcgctcagtattcttatctatatgaaatgtaaaaaggatgagctag tgagataaaaaaaaggaagagaaaacaaaaaa score :192 18617_at||#T6D20. 11#At2g42000 metallothionein like protein;|COORDS: 17478370 17477952 acgaaacccgaactctcttaggct GCC c CG a G agtcgtctttatgtatatttgagtttccttcatgactct aattaccgatcatgtctgttcactctctgccggagaccgagtcgactcactgttggtaagttggtaaccct aaccctagatg G T t CA C G T CA T GC tttagaccatgtaaac tcctaacaaagataaagtcagatgagaacaa tcttttggtttcagacc C a AC t CGT aattttccttatatgagttggtcggatccggatctaaccacacttc gttacatacttacacccccaagttaggattaatggatg CAC t TG a C ttgggggaattttttctactttcaa tgaaa G CA a GTGTC A ACGTG T T G G aatgttgttatgttaaaagtaataaccattga G CA t GTGTCA a GT G t tttgatt CT gt A TCC ctggaacaagcacgctcttgttgggcttgataatatttttcaactagaaaatttat tgataaaccaaacaac GAC t C t TG ggatgatgttgttggtt ACGTAG C ACGT t T gaattcaggatgct TT G t CACG actgaggtaagaggagaaggagccaca score :288 17959_at||#F26B6.24#At2g23590 putative acetone cyanohydrin lyase;|COORDS: 9984 532 9983154 tgtcgtattggtaggcaaagttagaaaaaaccttctcttattatataagaagatatatgtagttgatgaag agtagaaaaaagtaagagaaaggaatatgatagattgataagagtaaaaaatacaaaagtgtttaggaaga attataatagactctcgtaagattaa TG t CAC a T aatacatcgatatatataaagaaaaagactaaaccta atttactcaatgtaaacatttatgtgtta tgtgtggatgatgaagagtaggaaaaaaataagaaaagggaa tatgataaattgataatagtatgaaacacaaagtgtttgaaaaatattataatagactctcgtataattaa tatcacataatacaccgatatatataagaaaaagactgaacctaatttcctcaataaagataaacttaaat gtaaataatactat T t T ACGTG TT actaccgtaaatctagtcattgtgctctgtcttttttgttataaagg a atgagagaagaaacggagagagattgtagagtcggacaat G A CCACGTGGC TT aattgtttg GA T ACGTG T T t G ataatctaaatcagaagagtaagaaaaa score :294

PAGE 116

106 17893_at||#F21P24.17#At2g23110 similar to late embryogenesis abundant proteins;|COORDS: 9789099 9789377 taataaatgactagatcatgatcgttctcttg aatgagttagtgaataagacgtttggtgagaagtgttga catacaaaatgaagataattt TGT t CGT gttttttcatgatgcattgtcattcatctaccattcatcgaat ttattttatcacatatatatgaatcagaaaaagaaaataaattttttcaggtagatggaatgtccaatata aatttattgaatattagtattatatattgtaatagtgaaattttgtataatatttctgtgatcatataact ttat ataattattatatttttagaaataaatgataaaattataaagataggctttcttgcacaaaacgtca aaaacaatcgtataaaacttaaaaacttttaatgtttttttgtttttttttttttaaattgcagatgcttc taggaccctagtttagataacgtaaatcttaacataaa A G t CACG T ATA tcgagaagcttactaattgcca ccggtata C C GCACGTGTCA G aaatatacgagc TAT ACGTG ag C tat ataagca GC A a G TG T aaacctct C AC G g G C A agcatttagcaaaacgcataacaag score :306 17721_at||#F7D8.4#At2g21720 hypothetical protein;predicted by genscan|COORDS: 9222243 9225349 tggttgagaacagagttaacttcaagctcagatacatgttgg C t TGT a C G GAT ga AG agtttgagaggaga gtgagagataaagtgga gagaagagagagattcgaagc T g GCC t CG actctgaggattctcaaccgtcgga tgagaacatctccgatcaggagattgctttctcagacgaagcggaagaagaagaggatttgacagaatgaa ga GGAT tt AG cataatgaatagttctgtacagtttaacacaattataaacgaac G a A t GTGT gaatgagtc tatatttataaatactaaaatgagaatacttgttctttcatgtcctaagaattcgaaaaa gcacttttaca gtttgtgaaaccaaacgctcaatcgcctcctccgccatcttttctcttccggtgctcttgattctctcaac tccgccgtatatatttcaaataaagagagagtcggtttagggttaaccagaagtgagtgtataattgatag attagagagtattgggacttttttaatctatagcgataacttcgggtccttaagttgaaaataatttttct tcagctattgactgaaatgacatcgttttagt score :30 17214_at||#T26J12.15#At1g23070 hypothetical protein;|COORDS: 8175756 8174009 AAg ACGTG a C A A tttgttacacattgggccgtcatcatgtttgggccttctccccaatccattaaggctcc atgatagat AA CACGT A a A actaatctttaaagaatacatacatcatagacaact TA GACACGT A C G cata cataaacccaaataactaa T GACAC aacaacgacaacttattatggtatgattttggttctatttcttttt taagaccccaaatcttctgataaatttctctaatttgcctttcttggtcttctaatattcttgcattagag atagtcgacatgaccaattccgctttctctatacactgagtctacttttctttcggcccacatatttattt caagttatttcattttaattcctctaccaattttgtctattttacagtcacctagtatcaata tcttagag gcggcgaga A ACACGTGTCA G agaac GCA ac TGT g G CA tc TGT C agctt T A GCCACGTGGA C attacaacc ttaaaccaccattgagatagaaagcaaaaaaagaaaaaaacagaaacaaaatgataatgttgtctgatcta tatcaaaaaatatgtctaaaattttggaagaa score :624 17038_s_at||#F13K3.4#At2g36640 late embryogenesis abundant protein (AtECP63);|COORDS: 15307096 15305563 ccctcccaagaaccgtcaaaaactcaaaattatattgataaccgacttgatcaataaacgg CT ct ATCC aa tgattaacttgcgtccttatttttctaaaatgcttgtttccaagtttttgatgttgaggcaataacttcca ataaactgtgtggcaataactaagtaatctttcatatctctaaattgcttgtttccaagttatgtg cctca gtaagtttgaaatggcgaatgtgatcttcacctaacattttcctcgaccaacacaaattatgatatcaaaa gatttgatacgaccgaccatcgatgcttgcttgcttatgtcttagctcttaa G t C A T GC CA t G aacactca agaatcatgatccatgcatgaaaa G T C CACGTGTCA TA c C AC GT c CG tagaagataaagctaagtaaat T G T ACGTGGC A gc TGT tgcgtccctctatct AC CACGT A G C gtctcaacagaaacagagcattactataagaa gtaacaaaagcagctcaactgatagcacaagttaaagaattctaaaatcgtaaagttctaaagcggttttt catcaattttcaagcaatcgagaaaaaagcaa score :402 16317_at||#T13K14.180#At4g21020 putative protein;desiccation related protein, Craterostigma plantagineum PIR2:C45509|COORDS: 10192761 10193890

PAGE 117

107 ttctccgatcttagaacgaagactgcaactagggtttatcaatcgacaaagatcgaagaacacactttatt tcttactctatcgacgagaggatatgggtgtggtttagggtttcgacgtcgggaactgatgaattcttctt tgcccagaccttcggattaaagatgaatataaataaaactgaacgtaaaagaaa AC t TGTC ctacaagaag a aagaaatattctagtgggccgtggaagtcggtt GGAT tg AG gttgactaaaa A CCACGTGGC TT gaaaca tacaatttcattcacggttaactttgtctttttccgtagtggttattacataacctacattaaaccgatgc gtttcgtatcaattcaccgaagaaatcaaaactgttagagcaactccaatgt CG A ACACGTGTAC G T tca C C c ACACGTGTCA G atacttaggttgccagcttatcttgctgttg gatctatatgct GC a ACGT attaatat aaaccgacctcgaaaataaaagagagaagagaaaaggat CT ca ATCC aattgaatcatagcaaattaagtt tcgcattcgaagtttttcttgatttgataaaa score :618 16085_s_at||#AP22.93#At4g37060 patatin like protein;|COORDS: 16428214 16426053 aaacttcaaatttcgctgatgaaactta A t CACGT ATT acgggttttactatttaa GACAC ta G attaagt gattcctaccaattaggttttctaatgtaaatagataactgatctccaagcaaaccaaataactcattatt agttatgatt C GT G G C AT tagttgtagtttccaattgatagatgtctttagaacaagtctgaatagt GGAT tc AG aagg GA a ACG a G T CG cagaaaatccaaaaaaaaaacaaagtcagtttctggctgtttatttttgatt a attgcaaa GT c CACG aaagaaaatatttgtttcttaattcaaatttatctattgcaactatttctcgtaa attaatttgtgacagcccaagctcgatgaccctaattaatgatagggtaatattacttgctttaaattatc atattccacataaaacatatatcagcttgtatgaagaaaagtctgaccaaatagtgttaaatctatgcaaa tgaaattgt GCA tt TGT atatatatattgacct CT ac ATCC aaa acttctaccaaatttcgattcaaaatc ttgtagttcaaaagaagattgatcaactataa score :96 16040_at||#F5O4.11#At2g02120 protease inhibitor II;contains a gamma thionin family signature (PDOC00725)|COORDS: 537319 537842 tattggagactaattattcatagtagttaccattta AA CACGT A a T gttttca tttagatacgataaaaaa agttaacaaaact C ta GTGTC cttgactacacttttcttactt T tg ACGTG g TC actattattgtggactc cccaatatctgactaactcaaattttacctttcttcatacgttttttttatctgtttgttaattaattcca gaatgagttgtcaaaccagtaggatttctct GCATG t CG T G T CAC t TG CA ttttaaggacagattaacggt tatgactagggtcggtaaagcaaat aaatcgacaaaattagaaaacaaagaaaatgtggaaaaagcagtca aaagaagaaaaatgtt GACA aa TG ttacgaatttggtttaaatcaaaagtgaattatccgagaaagatgat atttacccccaaaatatctaaacaaatacttaaaaatgcaattctctaatagactatcaaatatcccgata cctctttatatagtgccat CT tc ATCC ttagtaatgtacacacacacacataacacttatttccaact ctg tctctctcaattttctttctctttctccaaac score :114 16025_s_at|13198_i_at|13199_r_at||#F20O9.210#At4g28520 12S cruciferin seed storage protein;|COORDS: 13052090 13054111 ttggctactagtgctaaacatgcaaccgaa CA gt TGTC ga GA CA a GT c GC agcatatacaatggat C aa AC AC G C C ta GTGTC G C C g CG T c TC gct CA t GTGTC A C c T tgtttcctcgtttttttttaatttttcataagtt cttttgttttatcttcaatacaaatttttggctgtatcttgcaaactcttcgatcatatcgccaa TAT ACG TG a A C actggtgatctaatttgttgtgttaattgttaaatttagattctattctccggtttaaaagtgaat tatatgtatcatggttaaaacattgtaagtaagatgataataaaatgataaatttag ttgatgg A ta ACGT G aa G caaaaaatgagatagatacatttgattttgtcgtatttt G ACA ta TG C ggagagtga GC t ACG c G ca tgaagatcaaga GACAC t TG C tcgagctcacaga G tg ACGTG T aaaaagcttagactgaagtccccatgca aacctaatc C T ACGTGGC T caa A c CACG a G ct C AC t TG a C A atatataaactcctcctaagtcccgttct C T tc ATCC atctctcacaacaaacaaaaag aaa score :336 15984_s_at|13196_i_at|13197_r_at||#T24A18.120#At4g27170 NWMU4 2S albumin 4 precursor;|COORDS: 12578131 12578631 caaaaccaaattaaaccagacaatacttcaggagaagtcataagaaatatatatttttgttcaaagggttt tgaagtaaaaaaaaaaaaaaaattatgaagaaacaatattgaatattgat t GA CA ta TG GC gggagtgatt tggaaaagtttttcttgttcatccaaaataaagattgtagtatcaccttgaagtaattataatgatgacta tgtatatcacatttagactatgtatactaatttatgctaataaaagattaccacaaattagagttatttgg gaaattttgttcttttttggctttaatttcatgtattatatgatagacaacattaatataatttcatgtt G

PAGE 118

108 T t CACG AC t TG T C A ctcctcca atttgagtttcgaacacctaatcgcgacaacaaccttatagcttctctc tctcgcacaacacacaat CAC a T G CA tgcaatgtcttaaaaaaaaaaaattgtgcatgcatt C t A ACACGT G a T C tccatgcaaaactcctttcttgtccataaatacccataaccccttcactacactcttcactcaaaca aaaaacatacacaagtaattaactaacaacaa score :138 15983_at||#T24A18.90 #At4g27140 NWMU1 2S albumin 1 precursor;|COORDS: 12571857 12572351 aataaccaacatatgtattttaaga GACAC t TG GA ataataattctaaatatcctaa CT A C t CGTGTC C G T atgtt TT G t CACG gt GA a ACG TG agaggactagtttttgtcacccgtccataacattcttagacatacatt actttgggagtgaaaaacattaagcttat CT tt ATCC ata tattgtcttaccatcaatagacaatatccaa tggaccggtgacctgcgtgtataagtaatttttcaagatgctaaaacttttatgtatttcagaattaacct ccaaaaacatttat TGACAC actactactctttccgtattgactctcaactagtcatttcaaaataat T G A C AT G T C A gaacatgagtt A CAC a TG G ttgcatattgcaagtagacgcggaa AC t TG T C A C t T cctttacat ttgagtttccaa cacctaatcacgacaacaatcatatagctctcgcatacaaacaa ACA ta TGC atgtatt C t T ACACGTG a A C tccatgcaagtctcttttctcacctataaataccaacc ACAC c T t C accacattcttc actcgaaccaaaacataca CAC a T a GC aaaaa score :342 15747_at||#F14N22.17#At2g42560 putative seed maturation protein;|COORDS: 17665 364 17663356 tttgtttcttgactgaaacaactcttaaataagaccaa CGC AC t TG a C tctgtgaaatgagaaa AC a TG T C A agaaaagattatacaacgga GGAT ga AG aatttagtatgctaatgaaattttccgagtaataagctaaat gggggcttagagtaagatatattgggcctaatagagatgagtgggcgggcctgaagaagcccatatctgat gaagtggtcggctggttcgtagtagaga taaaatcatgagactatgagaggcaaaagtgcaccggaaaga C T C CACGTG C AC A T GT a CG gaaa GA T ACGTG ga G gaaagcaagactattggctcgctaatattgaaactgat ctcttcccacaagctgagtcat CGTG c CT ct C C t A CACGT t TC aatagtaaact C A A C A CGT ca C acacct ttctccaatcttcatgaccgcccaaattttgtcttttcggttctatgattataaagcaggacttaagcagc tcgataagaatcaacaaagataaagaaaacagaaaaaagcattatcaaactatcatcttgaaagtaaaact gtgtttttggaagtgatcgttagcattaacaa score :276 15440_at||#T29A15.20#At4g27530 putative protein;|COORDS: 12717134 12717616 atttaattctttttgggatttgaagttatatgaatttatccccgggaaaaacgcattaga gaacagttggt tttgtggaaagaatcggattcaacgaatcatatgcttccattatatatgagaataccgaaaagtctgaaat cttgcatcaggtcttaattgcaaattcattttccatttgatatgagaataccaaaaagcttcaaccatatc gacattagtacatattggtctgatggtctctcctacgtattttcgacgaaagaaacttgcgaactcggaag agcg G T a CA g GT G G tagtccgtagtctatagt agataca A CAC t TG C A acattttgaacatatgggctggg cttataaggttcgggccctttattttctttcttagagttgatgatgataagaagactagagtcactttttt ttagcacaagaatctctgaatcagcctttgcttctgttgctctgg G GACACGTG agaagcacagtgat T A G CCACGTGT T C ttctcc CGACAC taa CT cg ATCC tcaaagtacttccatcggtgtgtatatagatgagagcg agag ctagataatgaattgaaaagtgtagaaa score :426 15377_at||#F4P9.29#At2g33520 unknown protein;|COORDS: 14145416 14146054 agtttcacaaaaaccaagtacagctcaaactattaaagccaaaatccggatttacttggagtttgattttg attctgtcaattcaagttctagttccatagtaatcttaaaccaaattgaccaatatttaagttagac ctat caaaccaaaacttttagttttgt CGACAC aaaagaaaacagagttactacaacaagaaactatgatgggac caaactactgtaaacggaaagtcggaa ACAC t T a C ggagtacggttattatttaaaacgaatataaagcaa ggggacttggaaattttaaaatcaagcctaaccggaaggtaccggcctaaattggattaaaccggaagaac tgccaatttatc TC G t CACG gtgtttggaag C C A A CACG T c GA caaaacttat T GCCA a GTG c C ttcccgt cgataaaactaacctcttcgttgtagaccacaacttcagtttgttttcataaaaataaaacgagagatcat gtcttgtatcttgtgaagtaattctcaaaagatcactttttaaaaatcaaacctctttcaactttccattg cc GA a ACG a G gttgcaaatctaaagattcaca score :126

PAGE 119

109 15362_at||#F11C18.30#At4g31830 put ative protein;|COORDS: 14365598 14365296 ataaaggtagaggacttggaaactagccgtttaggaattacgaaaaattaaaaggaaaaaatggtaaatgt taatgtatccgttggagagatgataaag GC a ACGT aacggtta GGAT tg AG cgacaaa AC t TG a AC tctga tagagaaagtgatttcacggtcttctaggaagactggtcaagctaagctgtttctgttttttgtttttgta ctttactttttgtt TGC ta GTG ggaactgggtttattgggccttgaagttgataaaagatgaataaaagac atatcgcctaaagcccatatgagaagcagaagacaaaaacctccaactttgggcataaattttgattatag ttaaaagtccagacccaatt T G g CAC c TGGC t T agttacgattctaag GCA T GACAC c TG C C taatatgtt tattacagaaaataaagagaatca GC t A g GTG TC ccttattga acacattaacaaactccaa CGAC AC T AC GTGTC T t CG TG A C TC t T a CT at ATCC aaaaacctatagctaaagctgaattttccatgattagtatagtcc caaccaaaaaaatactgaagaaggcataagca score :270 14574_at||#F10M6.70#At4g32290 putative protein;predicted protein, Arabidopsis thaliana, PATCHX:G2252634|COORD S: 14555485 14554331 tgtcattagaaagcttaccatccaacggtttccatcaat C CAC t TG t A agactaagaacc C tc CACGT A a A tcgaatattttta TGAC t C c T tcctttaatgttagctttgaacacactaatcaaacgaataaattctttat acatacacaatcacattatgtgtggcctatgatttagtcagatttatgacaaactgataaagtacaagaaa tgataaccaaatagaattca attcatatatactcttttcggataaacagtacagtgatcaaaattattaga attagtggccaaataaaactaagaacgttatgtatattaaatgacaactcgagagaagacgaatacttttc tatataaagatgtaaaattccaaatatgaaatgccattgaattatatatttctataattagtgtactcgaa ttaaatatataatttcttaacgtaataataagatacaa CA a GT t GC accaaatcccactctct cactctag aaac CA aa TGTC gttgataataataga ACAC t T t C tttctctctctaaaaagcaaaagtagtctagtcact gatttcttctaatcctccggtggagtgagaga score :54 13622_i_at|13623_r_at|13624_s_at|15819_s_at||#F21C20.170#At4g20820 reticuline oxidase like protein;reticuline oxidase (EC 1.5.3. 9) precursor, Eschscholzia californica, PIR2:A41533|COORDS: 10114658 10116256 ctcatacaactgttaaccaacaaattattaactcctatatatgaatcgtcctagcacaactttaatattga gatacaaaatgctt T t T ACGTG a T tacaaatcatcagtgaaaactcatgctaaaattgttaaagacgaatt atttgcaggattactaatcaaactagcaataatg taaatttaggtttagatttttgtaacattaaaataag cccacataaccgccatacctaatcaatactattatactaacaacaatatataaccctctaccaattgctta acatcaatccaaaatagtcccacttaatgattcagctcctacgacgactgtgtag G ACA aa TG C cattcaa ttatcacatccatctaatattattt GGAT c A A G ACGTG T gattctatctatccacaaacttttcatatatc taccaa aaatctataaattgttttcactttcaatttcttatatcttactccacaaactaatttaaaattag taaatattttgttcacttctgtctagtagtagataatacttcactcttcaatcttcctacataaagtcaaa cttcatttgttgcagaaaacaaaacaaaaaat score :54 13519_at||#F16J13.200#At4g12130 putative protein;hypothetical protein SPA C21E11.07 fission yeast, PIR2:S62592|COORDS: 6228118 6230114 taccatatgaaatggcgt C G T A C A T G T tatcataataaagaatgtacctct CAC a T G CA taccaaaaaaat aaaccatgtattaaacaaaacaaacatattagtgtttattacaaaaaaaaattatactttcgtttcctaga ag A CG a ACA a G gtcaggatcattctcttaaacattgatacgaaaaaaga aaatttcctaacattctaaagc taatgaatgtaatcaacaactttattttctta TGT c CGT aaaccaaatttctttggaaaatttactaaaat gctctgttagttataaatttagattattggtgactggtcgtcgtaataaacgggaagaaacgaccgaatct taaaaaacatgttgtattgagaaaaaatctctcttttgtggtttggttcagtaatttcatcga A CA C t T GC taagcaaccgcataaacttcc gtacttttat T ACACGTGTC G caaaccaaagatagttccca T TGACACGT ATT c G A t CACGTGGC TT tgtcctgagcttttctttctctaac GCA aa TGT ttaaccacag T g G C C A CG aag aagaacgatactaacagagccttcacagagag score :588

PAGE 120

110 13514_s_at||#T9A21.60#At4g18210 putative protein;W15DMY32F, W25DMY32F|COORDS: 9040887 9041992 taagtttctatttgtctatgcgtatggg GT t CA t GT cctccacatagttgcagaagaaagtctgaatgaat attgcctacacataatttcctcctagacttaactcatctcttggtcaacaatagaggcatctttgcatgta gaaagactacagatactaaattcatggaaagagaagagatatggttggcatttactgaatgtggggccaag aacaaagaaaacttagacggtgaccgattcctaa tcccagaatatataacttagcatatctcacccatatc aaaggagcaat CAC t TGG tagcattatcaaccaagagaattgataataa GA t ACGT tagttcctccttgtc catctccaaggtggtgattccacatccaccatgacggc GGAT ca AG aactacaaatcattggtgagtaagc ataaaccagtttgtggtgaacgtcctattctttcatcttcaaaaaaaaaaaagatggttatgatattttat aattca atatatgaaacgatcgacaacaaataaactaatttttgcgttattttcactcacttaccagttcg gcagggaaaagaaccaaatccaacagttcaag score :24 13306_at||#T3K9.16#At2g41070 putative bZIP transcription factor;contains a bZIP transcription factor basic domain signature (PDOC00036)|COORDS: 17079920 17080750 tttctaaggcaaaataagctctctttctactatttctctttctctttctactatttctctcctgtggagaa actcaggagatagagagagagagagagaagagaagagagcatgtatgtttggttttataatctctctactc ataccaaagatttgtctcagaccca C C AC t TG GA C agagagaacccaagctcctttctctctttttctcga tctactccttcttaatctccttt tttgaaacttgaagccactttcaacatcatccttaaacttttgttccc ttattcacaatctcc TG c CAC c T ctcatttctctagttgagttgttatctgcgtttttaagcactcgaata ctgcatgcaaattccctgattgtttgttagtaccttagagattctcgattttttagttgtttagattgaac caggattactaaattgttattgttttctgtgtaaaggctacatatatgggttctattagaggaaac attga agagcctatatctcagtcatta ACG a GGC agaactctctctatagcttaaagctccatgaggttcaaaccc acttaggaagttctggaaaaccactaggaagc score :42 13278_f_at|13277_i_at||#F14F18.200#At5g12030 heat shock protein 17.6A;|COORDS: 3884680 3884210 atcgatctcggagataatttagggaggaagttagat ttgagagaaatgaagcgagaaagaggattagggtt taaaagattcttcgaattgaagaagagtgacgatgaaattgctgacttcatttcactttagtccgatgatc atataggtaggtagatgatattgatacttcacctcacaa A a GCCAT G AC t C a T G aatcctctgcgtttgcg tttcaaccgtattgacgcggttcaatcaattttatacactaaattcgaatgaaatcgtaccaaaccggata aatttaac atcctta CT aa ATCC aaaacagaatacatatgggcctgaaaacgaattggtcttgggcctgac cgaatcaatcgcttttctctacattcgtctagaaatatccatctacacttctccaacgatcaag ACGT t TC tcgaacgttagagaagcgcttttaagttctataacaccgaaacaatcgctctctctacattgctctagaac tatccaactcaacttctatttaatactttcttacatcaaaatcaaaatcaa tcaaaccaaagaaa A a GCCA a G aagcaagaaagttaacacaacagctaagaa score :36 13275_f_at||#F12M12.200#At3g46230 heat shock protein 17;|COORDS: 16861764 16861294 ttagattaattgttattaataacttgttcatcaaaccactaaaaat C c CGTGTC A T cttcgacttcttggt taaaattcaataaagagtgtaacttttcattgc tataacttaataatttgtttgtgagaagagaactctag tcttacagggaccaacaccaacaatcaaaatttagataatgaagaatagttgctgatgcatgattaagatt gaatttatcaacaaaagataagtgttcattata C A ACACGTG a T taattgcatggtgtattaaggcccatt aacgaagtccatggtaaaatgaaacg G C A TG G C g T tcactaccccacctaatgaact GCATG t C gtctcaa ccatc aacatagaagcttcttgaa GCC A C c TG agaaatctggtag CGACAC tcttgaa A GACACGT ta T aa agaaacggaaagaagaaacctgaaatttcaagaaacttgcagagctttctatctcttatcctcttctctac catcatttctccctataaatacgcca ACG c ACA taagtgtttgcattcgaagagagttctagcaaaacaaa acaaaacagagcaaacagagtaagcgaaacga score :252 132 02_at|20221_at|20222_g_at||#F21M11.18#At1g03890 putative cruciferin 12S seed storage protein;highly similar to Brassica napus cruciferin storage protein, gi 762919, arabidopsis 12S seed storage

PAGE 121

111 protein, gi|808937 and others. Location of ESTs YAY049 3' e nd, gb|Z26364 and YAY049 5' end, gb|Z26363|COORDS: 989251 990909 gctatataagttcattaatagatgctataggtttttcttacaaggcacacatttgattgttattttctttc atatacactgaat G T a CA t GT G TA CAC t TGGC AT AC a TG G C A agattatgtgttacaatatagactgtgcc att G c CAT GC a A t G T G AC t C c T G tggccatttctatcaca A t G T G T C A atcttggagtatccgttgtttat cctctaatttactgattaatttatgaacatgtataattatttatatcatatgatctcgtaagatatcttag cattttccaccatatgttattagtaaatcatctagatggattgatgtaaataggaaagttaaattaacaca ccaaaaaagtaactgattaaaagcatacaacttaatattcagattatggtaactaaatcagtctcatgcaa actccaaaaaattat A C G a GTC A caactcttgatttttttccggttaaacaaaatacatattttcatttgt atgcaaccagaataaaacactaactatctcctttaaataccattttccct ACG a GTC tacgacgctctcta aacttcttatacaaaacaaaacacacccaaat score :156 13201_at||#K5F14.10#At5g54740 2S storage protein like;|COORDS: 21532963 21532466 aa cttttcttggattatatgtaatatgtcatatctatgacctaagaaactattgagaattaatgataatat agcttatatacttggg A C G T A C A T G T a GC gcattaggataagatcacaatgatttatttatctaaatcaat ctattagt TGT g CGT ttacttccgcgatcaagctaatccgaagtagatgtcaattaccaaaaagaacttca aatgaaaatcgaaccaagaaaataaacgaattgacatatacctat accattgtaatgtaatataaattaag gtttgttgttggaattaacaaacaaaataaataaataaaaatacaagtcaagagcatgcatgctgttgtag atgcgga GA C t CGT c TC catcatt C AC t TGACAC a TG C A C acattactgaacttcgaacacctaatccaaa caaaactccattcgtgttatgttccaaatgctcacacacaacctcatgcttat T ACACGTG a A t G C CA T GC A aa TGT C ctatctcctc cataaatatattcattcccactttcacttca CT tc ATCC atccatcactcgcca aaaaccttgaaacctttaaaaaatcgaaacta score :258 13200_s_at|18295_s_at||#F21M11.19#At1g03880 putative cruciferin 12S seed storage protein;identical to 12S seed storage protein, gi 808937|COORDS: 985787 987917 cttctcattttccgttcgtcattactattttaaattactagtagaacagcaaacaaaacgagaatacacat tacttagagggtcactgataatatgatgtcttccaaaaaatgtttaatgccaaaagtagaaaaacgtaa T a CA t GTG G a GTGTCA ccccgatcaagagagctatttcccttaaaccaattg GTGTCA aattagttaattatc taaac CA t G g GTC gacaattataattcgatttcat tgtttattgaataaatcataaccaagactaaactgt agcgaatatgaaacaaattggtctacactatcattttgttacaaaatgttgaataatttattttaaaaaat aaataataaaatcttgaccaaaactaaaataatatcagacctaattatcgaatgtaagtatgttactgatt attggattattggattatgcgtctatttgttttcgattattggatttgtctatttttattggtgttatggt gttattt ttatttttaaaagaactctttacaacaagtctcctataaatacataaactccataacccacaaa gtaagaaagtaaatcaataaagaggaagaaaa score :30 12574_at|19019_i_at||#T2O9.120#At3g60140 beta glucosidase like protein;several beta glucosidases different species|COORDS: 22093783 22097740 aaaga TGACAC acatttgaaatcgtagccgtactacgcgaaat AC a TG c AC tcttcgttatgttaacactt taacagtga ACGT a GC cataatgttgaccacattcaacagtcaacacaaacattactttacacacaaatat atgattatatatacatat GT a CA t GT aagtgaatgtgagcaataatgacgggaatattcagagaagacgat ggtgaatgttagcagtg A t CAC G g G C A cattcaaaactgactg tggacaaaaaaag C t C c TGGC c T taaat atgattgtgccaaaaatagtacaaaactaagaacccaaaatggaattcgagacctatataataatatatat gtatatagtctttccttggaaagaaatcttatgttattaagaaaaatactataagttatctctctatctag atatgatatatatgtccaaacattt C CA C GT A G A tgacgtatattaccgaggataatcctctatataagga agagaagctcgagta ataaatctcatcactttgaaatctcaaaagacaccaagcaaaaaaacaaagacaga caaaaaaaaaagagagagagagagagagaaaa score :78 12185_at||#F14C21.37#At1g54860 unknown protein;|COORDS: 19731117 19730205

PAGE 122

112 aacact CGACAC a ACAC a T t C atggccatatatacaacatgctacattgaagtagcaatatatctcaacat aggatc T G A CA t GTG G C A ggttgagga AC a TG a CA tggtcggaggagga T ACACGTGT T G GTA a ACGTGGC C A gatttgtggtgatccgtatg AC a TG g AC aaacacaatgtctccttcacacactgacaaatttcttcgtt aagaagatttattaattaaaatttaattttactatataatcggataactataccttatatgaaaataaaaa aaaactataccttatatgtatatatacatatatattttaactcgaattt gttgttgtgcat GC A TGACACG TG a A G aggatgtgcattaat A CACGT c GT cttttggcaaaacgaaacctcctcgg T GCCA a GTG T t TG t CA C t T tagcaaattcttatcaagagaagctaattctagacctttaagatactataaacaaatagagtacagga ccaacaaaattacattctctttgcctttgcctatcgcttaattctgcttttattaccagagaaagcaagaa tatataagaatcagaaattgg gtgaagaagaa score :546 12085_at||#T1G11.19#At1g04560 unknown protein;similar to GB:AAC37469|COORDS: 1245071 1245889 cgttattaagataaaagagaaaaaaagaagttctagatactcggtggttaat C A C t TG T C ctctttcatta tccccaaccttcctcagataactttttaccaaatgaaaagaa AC t TG G C A aaattatgaaaat tttaagct aatatttcatatttaagcaatatgaataaatgatattctaccaat AC a TGGC tattttcattgat T C CA g G T G G attatcataagtaattaattgctgaccccaatatatggtagctttggaacatttattatgataaaaca taaggagataatataaagaaagcggtttggagattcgagctcaatgacaacactgg CT at ATCC aacactt taaaacaacgtca A GACACGTGTC C ca C CAC c TG t A agtagctaaacacaaaggtcctctt CG CA TG G CAC GTG C T C cttgcca ACG a ACA cataagctcggtgtatataactatgtct CAC a T a GC ccttccatatagatc atcaccgagcaagttag T t T ACGTG T T G ttgcaaagagaagaaaggtttcgagtagcaaaagagtaaaact agatctttaggagaaggtaaaagaagcaattg score :522 17869_at||#F8A5.20#At1g60680 a uxin induced protein, putative;similar to auxin induced protein(atb2) GI:6562980 from (Arabidopsis thaliana)|COORDS: 21560623 21559238 accattctctgtatagtcttagtgacctatcgccatcagcatgagttaagcgtcccagctgacttaggttt agagtttaggcacctaatggctaatgccaatgatgagtcctcccatca tatttgaacattttaccattagt tagagccaatgatctggaacctgcctacaagcttccaatgagacaaactat A g G TG A CA C aaattgtagat agaaacatgaagaccaaacactacctaataagacaaaccttgcctactcaagaagatcgacattactcatt gactcaagaccatgaaagaaaacaaatatcg A t T ACGTG aa G actactgaacacaaattga T G A C A C a T ct ttgaagaaatatcactta A A CACGT A C G tagttttttgtttttggttaaatctcatgtaagaaagatattt tgtaaaagcacttttgtaaaagacataggcaatttgagagtttttttaatcacaccactcatcaacccaac ttcaacaactactccactcagttctaaactctacaaaagagagaaagagagaaagcgagttgtgacataaa atccatggctgaagcttgcagagttaggagga score :90 17549_at||#T16N11.6 #At1g15550 gibberellin 3 beta hydroxylase, putative;similar to gibberellin 3 beta hydroxylase GI:3982753 from (Arabidopsis thaliana)|COORDS: 5346072 5344563 aagaaaaatgttggatatacttaattccaatgactgaggaaataaccaaatcaacataaacaaaacaaata attagactacaattttttttgtc tctttctttcttttgaatacttattgtccctttatatacgcattaatc actttcttaaatttatagattatattgtaatggatagatacggtttaacttctacatatgaacacaatata ccatattactacctcaacttcaatttgaattatacatagtttgacaacccat CA t GTGT atgtatcttatc ttgcataatcaaataagaaattatatttattttcttacaaaagaaggaaatggcgggacctctata atttt ctagaggtttggtt TGT t CGT attgtattaatccaataaacaagtagaataaccaataatattggaccaaa ACA ag TGC ttta ACA ta TGC ttttgcctcctcttggtctctactccaccactataaataaaactctctaac tcttccaatctcccatcaccaaacaccacacttctcataagaaaaaaaacacaaacatctatcaaatttac aaagttttaaaactaattaaaaaagagcaaga score :24 16896_s_at||#F13H10.19#At2g41260 late embryogenesis abundant M17 protein;identical to GB:AF076979|COORDS: 17155083 17156034 attgaaaaactgttcaatataaaaataaaatataatcaaaaaatatgaaaaagaatataattagtttctca agtgagaaactttaaaaaacgcataaaaatacttgtatatcatctattga tcaaatattattttattatta aactttaattatataactaatatttattaaaatactaatattttttttttttgagaaaccatgaaggtatt

PAGE 123

113 ttactgatagagatgtttttagctaaagctttaacttctttttttgaaaaaacaaaagctcgaattaggcg aaaaaaaactagaaaagataaaaaagatttcaaagaattaaagattctatataatctttctcaatcaattt atgagacttctcgatcatatct attagcatggattatagtgatcttttgatatgattttctttacatttta tatatgtgtatatatatatatatccccttaaataaatatgtcattaactcattatcttccctcactcgaca tagttgactggtttcttgtc GCCA a GT tttgctgcccaactataaatatgtgctgtgctcatgcactcttc aatataagtgttttgtgtaactaaaacaaaat score :6 18827_at||#MSJ1.5#At5g 64210 alternative oxidase 2 (sp O22049);|COORDS: 24979190 24977791 ga T t CACGTG C G C tctgtctgctgaacgccacttcaagattcggacaaacttaaaaaatgtgcattcgttt ttttaaaaatagttatcttaaaagtaaagcaacagttggatatg A GCCACGTGT A ttcagcagtat TC GC C g CG T a TC agctgataacgtcattttttatttccttcttgttt cccctactt ACAC t T t C tttcgagaaaaa acagagttgaatcattttctttcagtcagagaacaagcctcttatctgagaagtgaatctataagtatacg agaga GTGTCG taag A t G a GTCA actcattacgaaagcagctttacgagttctgcttgtctgcggcagagg aaactgtaa C A T GT t CG T gagttctgtttcctccacctctgttatgaaaagtccgtatgagataacggcac cgatgcgaattcat gactggtgcggcggttttggcgatttcaagatcggctctaa GCA t GT G C a A g GTG tg catatctgtagatgatcctcatacaaccttataacaattgtacatgataattaggatttgtatatttctaa aatttgaataggaaattttaacttgaggtgga score :324 13284_at||#T2E22.11#At3g12580 heat shock protein 70;identical to heat shock prot ein 70 GB:CAA05547 GI:3962377 (Arabidopsis thaliana)|COORDS: 3993698 3991496 agttttgagtcaaattgggtaaattttttggttattttggtcataaaaataactagattatctcttatatc ttatgagttaatttggtaaataaaccatttatttgggtcaaactatttttttccccatatatatatccaat caataataaattcataatatatttcattaacgcga ttgaaatactagtaattaattgaggactaaagaaaa agtaatttcctttttatctttaaaatgtgcaaaaaaaacaaaaatgttaattgggtgatgaaataacttgt tttcaaaacgggagttactatttgacaatttaaaaaagacccatctcgaaggagctagaagcgataacaaa ataaaaaggaaacaatagtaattagatggcgcaaaaataagatccaacggctgagatctt TA C t CGTG a AC gttctcg aaagctctttgccgacccactcttcattcatatataaacaaacacctctctgccttctcttcct cacacaatcataaacacaacaacactcacaaattctcttaaagctcacagacgaattctttctatttttaa tctttccggcgaacaattctgatctctaataa score :18 19466_s_at||#F14O23.6#At1g71695 peroxidase ATP4a;identical to GB:CAA67309 GI:1429213 from (Arabidopsis thaliana)|COORDS: 26175777 26177975 agaagcactaaaacaatactctttttaaccggttaagtatttgaaatagtgcacacaaaaataactaaaaa gtttcgtgttactctaattagtagtagtactatatatataaccattgtcaatatagtctaactgatataat aggtggaccac TG t CA CG a AC t TG aatgccagcgggaaaaaaaaata tactcgatcg CACGT ATT tatttg ctttaatgtttgatccgaaaaagaatctctttggtaagaaaaaaagaagcctttttctattgacaatagta attggtacgttgaacacagggtcacaggaacgac CG a ACG a G accagacgcaaagctt C C AC t TG G A tatt ttttttcctttttgaaaagcaaagaaatatactctcctttaatcacaataaatatatagaaggaaaagact tcagtattcctttgccttt taacataagaaaaatatatatacttaaaagatagaaataagacacacaagga ctgaaggaccaaaattgcctataaaagcaaagtccatctattccacaagaacaagaaaaggcagagaagag agaggctctgataaaaaaaaaaacaaaaaaaa score :60 14888_at||#F26P21.40#At4g32920 putative protein;|COORDS: 14860495 14852667 aagaaagag ttgtgatgacagttcttacaaaaatagagattcatttggtaaaaattggatagaagatattt tattagcatttccccttttgtttttc G T t CA a GT G aaggtttcaacaaaattaaagagaagatctcttaat cacaacaattaaagtaaagtttaaagactaaaaacaaagcctaaataaatatcagtttggtggtattaaac t AG CACGTGTCA A aaactaattggtagacaaaaaaaaaaagtct CACGTGT A acgcgaaagcctttctaaa aaaatttcttcagcatttaccaaactaaagcgcgcctctcgcttccttccttcaacctcttctttgctcat ctctctctctctacaagaccaaacttatctttttttttctcttctctctgtctttttacacacagagctct

PAGE 124

114 gttttcaaagctctgttctgtttcttcgtgtaatctctgtttcttcaatctgttttttttcttctaatttt gaattctcgtacagacgcaacggt tatattgttgtttagcctctttgttcttcgaatccaatgtcttcgtt gttttgtgtaaccatggattctcatctctaga score :282 12651_at||#F4C21.12#At4g03200 predicted protein of unknown function;|COORDS: 1407056 1411566 aaaatatattttatctttggtaccaaatatcaattcatattgttaccaatcaaatgttacaactga atctc ttcaaacaatcaacgtttt T gg ACGTG aaaaacatttttatgattaaagcttcgaatggcaactcattatc aaaacggtacagtttatacagttttgacaattatcatttacaaaaagaatttcaaaaacgttacaattttt ttttattttcaatttttttatacagtttttttttacagttttgcattattctataacagtttaccgcgtca gttaacaacttatcgctctagctaacagctaaagtttt gttaccaatcagactcttaaatcttaagaaatg gagtaatccatctcagtgcat A a G T G T C A caaaatgatcga GAC t CGT tggttacaaactctttatcacat ttgaagaagcttaacttaagttgttttgattttcgagaaaccgatagagtaacaaagaagttaacatccca attaattatgttcctaagctttattttctta GTGTCG gattggttagattttctcatagacgctattccgc taaaaaacta tgactctacacaattacacaag score :42 15849_at||#F22D22.16#At2g32090 unknown protein;|COORDS: 13593544 13594150 ggagccg A T GCCACGT A c T tttactaatctcttttttgtcgaccatttttgtttctgagttgttcttcttg attagctttgtccgcttggaaccaa AC a TGGC gagtcttggccatatagccagagaatcatcagacatcac a CGCC t CG cccagttttacaaagaggtaa C GACAC c T aatcgacgttttcaatttcttacacttttctttt tcttatctctcttcgtattctctgttctgggtttgtgttgttactaccaatttcgacccaaaaaaacaaac gcactgagtcgattacccatctggattagaatctatggtttgtaatttggctttttatttagagatttgtg ttgctattgtggctttgtctctttctct TG t CAC t T at C t CGT t TC tagtttctattgagagtgtccactt ttggatgcctttaatccttttgattaagttagtgttatcttcatcttcgtggtttgttctgaaaccaatgg attatatatgtataat T G t CA g GT G T ttgggttcgaggagatcgaaagtcctgattttggagacc T a CA g G TG G tgtggctaaactta CCA g GTG cttttgca score :144 15395_at||#T3P18.17#At1g62610 similar t o glucose 1 dehydrogenase (AB000617); similar to EST gb T88100;similar to oxidoreductase like protein GB:CAB75763 GI:7019662 from (Arabidopsis thaliana)|COORDS: 22393873 22392950 caatcccaaactccttagcgaaatctttcaaatacgccaaaacttcgcc ATG a C t CG gaaacctcctcgga tctc tagacacaccggaacgaacca CG a ACG g G aaatct CT gt ATCC ggtacact C t CGTG TAC cgttgat tcgtagagatctata GAC g C t TG aatgaacaacggatcgggtcgggtcaacacttaacggatccgattcga cttcatcggtgtagatccaggttcctccgacttgtttttgtttctcgaaaacgaccaccgagtgaccttct cgtcgaagct C t CGTG c C G ctactagacctgcaggtccagcaccgat aac C GCCACGTGG T G tgatctggt tggtgataa TG CA g GTG C C A tt TG tgtgttctttgattagaatctatgaatcacaatcacacacagttgtt acaatttatgattacaatgtttttaataatcaatttggtgtgttagaaa C tt ACACG aatctgttattaca gacgtaga T T GCCACGT t TT ggttgatagagagaaatgcttttattatataa C tt GTGTC A ttgtatttct tctccaccgagacagtgaa acatcaatcaata score :396 12606_at||#T24H24.16#At4g04020 putative fibrillin;|COORDS: 1931165 1932550 aatttttaatgggtcttctttgttaataaaggcccattaaatatactatccaggcccatttagtccaacta cttattacactgttgattaatcgccgtttagatattttgcattggagagaatcttcaccgttcagaaaact gaagatcg aagatcaccaaaaccggccaaaaagtata G TA CACGT c C G aagatt CT ag ATCC atcat GCCA tt TG catt G C A TG G C c T aactctcgtagaaaacccaaacaaattctcccccacacaaataatgggacaaaa accaaattaaaccaaattgtttccctattgaagaaaataaaatcttccacaaattttatgaatttttcata aagtcggcaacttc A CACG a GTC CA g GT c G attcttaaaacaatgatcttt aacca T C CACGTGTCA A aat cccaccttagcttacctcatttgttctattccaaatacctattctatcccttggatcttgacttatatgtt cataa G a A a GTGT accagataagggtatatacgtcatttacagtccagctcaacggttataaaatcctaat tcgtgaggtttctctgaagaacaaacacaaaa

PAGE 125

115 score :336 13631_at||#F4L23.17#At2g45290 putative transke tolase precursor;|COORDS: 18621277 18624129 tcggcccaattgtttcataaagcccaaatgggcaaatattcggtacaaaagaaggtcataagaaggcccac taattgcaaacataacaagaaaaaatgagcaaaataa A GACACGTG CT at GC a ACGTG G gatgttgagaga aaaaaaggtttcgtcagctacgccgtcaccgactgggtcaataccggagttggtgaattggtgactaa aat gtatttcctaaatctaaagcaattttgcccccaaaaaaaaaaaaaatctaaagtaattaggttcttcacca acccttgtcataac C a TGGC t T ttgcttaacacaccaaccatgccgcattgtcaatctattttttacttaa attaaattattattttaataattagcgtaaaaagcaatgattagttgaaaattaaacaaaagaattagatt cgaaaaattgttaaagccatagagcatcacagccgtcaag attgtgtccttt CT ga ATCC aacggtcagat cgagatctcattggtttattaatagcaacagtctcctaaaataggtttgcagagaagctttctcttctatt ttctctctctcagttaaccagagagagcatca score :240 17452_g_at||#T19F6.110#At4g24120 putative protein;various predicted proteins|COORDS: 11489103 11491521 a acttcgaagtcaaaactagctataatggttatctttgaaaagtaactcaaatataaggagagtagaaatt C t A C A CGT A a T gctgtc CG a G a CAT ttaattatggacgatgtgattgcagaaacaattttgttttcaaacc ctctcttgatcaaaaaggacatttaaatcattgcatccacttcaaccac ACAC a T t C A tt TGGC tgtaata acatttcaaagattcttacaaaacaagtgaaatgtcattaaatg tccttttagaacaagatgctaacgaca at C t AC t TG G CC A C t TG a C ttgtataccaaatttctcccaactttttgttcacagaatttgcgtttacaac tttggcgtttgtgttttgacggattcgtctaagtgggacaa T G C g CGTGTC G GA C AT G C gcacttaagagg gtagagacagtgagttctttagctttccttatattctctttcaaaccaaaaaagagactactattttttgt acacttagactaagga aagcaaagtttcatattcctaaaaattctggttttataagtatcaaagacaaatt cagtctccatggaaatagagcaaagaaggatc score :174 16003_s_at||#F20O9.60#At4g28390 ADP,ATP carrier like protein;ADP,ATP carrier protein rice, PIR2:S33630|COORDS: 13007275 13005980 cccagaaa GA a A C G g G TC A aag gagtccagtagcattttcgtaattgtaccttctcttctgctcagtttcg tcttttagtttaggaccagaagtcgagaatcatgtagaaacctactcttctcgcaattttccacagctctc tctctctctctcgttaacgcttcggcgaaaaatccacggtaagctctcgactccagaagttctagctttat cattaatgcttctctcctagctctctctgtttttttgttctgc T t C t CGTG aattctgttttctc gacg C c A C t CGT t GC ttgatcttttagaaacagaggatttaattgagctgctgcaactaggttttaatttgtgctaa ttaggtcgcattatagttgacagattgaaatggtgtctttaacgaagctaactaggatcgtatgaaatgtg ttgatgatgtcttctctgtttttgttccagttttggttatgatcattcttttttaagcatgcaaatataga aattttcgagaaagcaagattttgtggtgtttaattt gtatttttgttttgttttgttttgttttgttttt cagagtcagcttttactgatttgtacaaaaaa score :36 20190_at||#T24I21.7#At2g16660 nodulin like protein;|COORDS: 7170144 7167482 tacgac CG t ACG a G tatccaaaaccctaattttttttttctccaaaccccaattttagtacccaaacatcc aattcgtatgattgctaaaaaaaaaa aaaac A CAC a TG C A ttcgtatcagctagtaaattacacgatctct ctgtatctaaatttttccggatactaaaaaattgcagaattgatcggaattaagtaatactagatctaacc gtatataccagaagatgcatcaattc CCA a GT t G gccacatacaattatttcaaacgaccggaacaaaaaa aagaaaaagaccaaaccatt A g CACGT c GT attttaatattcttttgtcataata C at ACACG gtttat ac tttatacat ACAC c T a C tttaacc C G T A C AT G T C ggcgtcgtacctttcctggtctcattccacatcttat aattaaaatatacaacttgctaattgttactaattgttcacaaaaatgtcaaaacctaattaataaaacaa aacacgcaaagcataaaaaccctttaccttttctcttctataaaaccgtccat A CG g ACA a G ctccacgac ccacac ACAC t T t C tcatcttctttctacaaa score :1 20 20519_at||#T25N20.21#At1g05560 UDP glucose:indole 3 acetate beta D glucosyltransferase, putative;similar to UDP glucose:indole 3 acetate

PAGE 126

116 beta D glucosyltransferase GI:2149127 from (Arabidopsis thaliana)|COORDS: 1647234 1645675 tattgtttccttcccaagta atctatagttagaaagttcctgtaactttcatagatatatgttaattattt tccatttccataagttcgcgtccattataatagaaagtgcat GA g ACGT a TC tttttcatatggaagacga aaacattggttctccattctaaaagtcaaaaagaaaatgtatat C AC t TG a A C ttttccacataggtcatg ataaagacaggactaatcaatcattcattgttgataatggtctcaattacctgttgctagtta tttctaca tttgtatagtgcaatactgtttttttttcctcttccaaataaaaactacagtactaaaagaaagttggtaa tcataaaagttataaaactcaaataaaagcaattactctgtttctttttcactcttattttatatgactga aatgaaattgtctatatgttgtttttgtttgtatgttcattgatatgtaaacatcaaatctgcaaaaaaca accgcaagtttggatttatacgatgaagcgtttat ctagtgagcgtcatcgcagacgtcacaagttatgtg gtacaattttgcgtatgtttgatgaatacgca score :24 13217_s_at||#F18B3.50#At3g50770 calmodulin like protein;flagellar calmodulin Naegleria gruberi, PID:g458232|COORDS: 18751014 18751631 atccgaaatgaatgaac CA aa TGTC catagttctg tttttttttcctcctaaagtgaatatattaagagac actaaattctagaaaatatgtttaaaatataataatcagtaattgtccaaaaaaatgtgaatacttaaatc aaaatcaggatacgattgtcacaagaaacaaactttcctagataatgtatattttatattattatcatatt atgtatgcacttaagacat CT cc ATCC atgagaaacctacaaagtttttcaaacaaaaaaaatattaatat tatatta taatttgattatttttattaaaaaagtatttttgttaaaaaattaaaccaatagtaagatgaga attgtcatgatgggttgtacaaagtatctcagagtatctcagagtttctcacttgagaaactttctacact ctctctccttcatttttattatttttatttttttaattgtgagaaattcttatgagatacccacaatagag atggtcttataaatttat C a AC a TGG tgaatctctcatgttatatataga ggtgatttaaaggctaaatag aataacacactacagcatataaactcaatgat score :18 18714_at||#MWD22.16#At5g51210 oleosin like;|COORDS: 20113997 20114532 atgaaagccaaaagttttaccattcacacttttaatgcttttctttttcacaacaatcttaactattctct gtttactattcacacctttaagttttaccgttcacacctttgagg cgaagagctactaattcacaacaatc ttaactattcacacctttgagttgaagagctactaatactattctctgtga G CCA a GT t G ttttgggccga ttggcttaatggatgtggattccgaacttccttttaattctttgcttctatagttctatagagtgggggaa tgggagtgaccataacatcacgaccgatcgataataacgtc C A t CACGTG a C T t ATG t C c CGGTC CA t GT t G tgtactatagcaaaaa aaaaaaagacccaga C G T ACGTGTC G caaaaaagacccatctctactcatta A C AC a TG C A acggctgcatgaa GACAC t TT TGACACGTG A C A T GGC tcaagcgagccccaccagcgctcctct ccttcataaatcatcattacttactccgaaaacacatcactaccttttctctgctaatttctactatttgt gttggagggatatattataataaaaagaacca score :480 14978_at||#F18O 19.38#At2g43820 putative glucosyltransferase;|COORDS: 18100822 18102258 ttctccatttcggctctctcttattttttttccatctcttttacttctccaaataataacaataaaagctt cgattttgtgtgtgtttgtatttacatcttgacatcgatattcttttcatcaattttttaccaaaaatgta ataaaaacaaaaaaaaaccaacgctgaacacagacat ggttt CT cc ATCC gtttatattcatcgtttgtat gtttacttaacaacttatttcaaaatagtacatatcatggttgtgtttttaaaaaaagtatacagaacaga aaag CAC a TGG tagacaaaataatgaagccaaaattaatacaaagaagaagttcaacttgtatttattaac acattttctttccttgtcaaa G a CATGC aaattggttttgttttcttattcccattttttttttataataa aaagaagaa gagtaaaacaaaaaaactatcatttcttcttatcgcaaaactcttatctaagcaagaaaccg acaaaacctatatctacatatattctcatcaacatctcttgagacatattcattttggttaaagcaaaaga ttttaagagagaaagggggagaagtgagagag score :18 15252_g_at|15251_at||#T4P13.21#At3g01100 unknown protein;similar to HYP1 GB: CAA55187 from (Arabidopsis thaliana)|COORDS: 37228 35019 ggcagcataagagtgtttagcttt GCA tc TGT ggttgggatcttcatacttcttccagtgaattatatggg aacagagtttgaagagtttttcgacctcccaaagaagtctatggataacttcagtatctctaatgttaacg atggatcaaataagtaaagaactcgaagctatgcttaaacagtatcccgatttct A C t TGT C CG T aaccga

PAGE 127

117 tctcttatgctt G t A t GTGT tttgcttcaggctgtggatccacttttgcgcaatatacatcttcacggcgg ttgtttgttctctactttactatgtaagacgccttttgttttgtccatccattctactttatatcgcctat atgggaacataaaataatgtttggactatctttgcaggagcacaagtacattttaacaaagcggattgctc atctttattcatccaagcctcagccac aagaatttacagttctggtcagtggcgtccctcttgtatccggg aacagtattagtgagacagtagaaaactttttcagggaatatcattcttc C t CGT a TC tttctcatatagt tgttcatcggaca GACA a GT tgaaagttctca score :42 16425_at||#MLN1.4#At5g44120 legumin like protein;|COORDS: 17052073 17050287 agctgtttcctg T C GC CACGT c TT ttcactcgatcgtctttttatagtttaccattatctaataatgcgta taaactttttatcatttg CA CGT A TC aataattgatcatcaaagactttcgtaatcgtaacataaaacatt ttctcaattcgt A t GTG a CA gttttatatatatatatatatatatattacgataataaaataaaataaaca atatgacctattacaaatacaaaaacagagaaatgaaaccgctgtatataataaaataa agatttgtccta ttacaaatacaatgtgcctatctcaaaagctgatgtgtaagaa AC a TGC AC t TG a A t A a G C CA T G C aaatt gaa A t G T G T C A actccatttattttttacagagtgaagccaaaattcattttc GGAT ga AG tcataaatag caatttaagtgaagtgtaaattgtacatagtcgactctatatacctggttcttatctcattcaatttatcc tcaacaactttaatagaaaaatatcaaataa attccctataaatagcttcacataa TG C A a GTG agaaacc acaaaaagtaagaaatataagaaataacaaaa score :180 15621_f_at||#T26C19.10#At2g22240 putative myo inositol 1 phosphate synthase;|COORDS: 9402485 9400448 tgcaaaaatggcgaaaaatattttaccatctcctgtggtaaaaaaaaaacatttaccatctccag tacaaa gtaagtcaataattattgttgaaggacctacgcactactactaggccctagttcaaaaattggtctct CA C G c G c C AT cac GA a ACGTG A C A C cacatct T T GCCACGTGTC CA tacagcttccaaacaacagatcg C g C t T GGC ac CC A C G C GT t TC tcagtggccaaacgatgtcgtttaactgcagtgtcccacgacaacaaccgcttct tcttcctcatccct C gg GTGTC ttcgagtgtgttcac agatttacaaagaa AC t TGT C tt CACGT ta A tc A C GCCACGTGTC TG aacccaatttc C a ACGTGTC T ctcacattctcgaattcgactcttttagtggttctcg tcctctctgggcttttagactgggctcgaaccatctccattccaccgataacatcgaaacc C GCCACGTGT C C tccctcctctgtctctcgtctatatatccacgct C t TGT t CG T cgattcgagatcacacacacaaccac caaacacaa agcttttcaaaacccagagaaaa score :1092 18508_at||#F14H20.8#At2g02010 putative glutamate decarboxylase;|COORDS: 475494 473374 tacaacaaaaagtcaatgtcctttagtggtgataccaatataagttcgaatttggtttctagcttattggt ataaattccattgacagcaaaaaggaatcacgaaaacaatataatctagtactatat tcacattttacgcg taacgcgtttgtaatattcacataattat G t A a GTGT gaaattgatcaaatcgcgggagatgctaaataaa caaaatcaaaatcactaataaaat GT t CACG ataacttagaaagatccaataaaaagatgacggtcggaat ttgtatacgatagtaataaaagtattcgtgtttctaaaagttagtaattcattggtctaaatattataatg aggaaaagtcattggtaaggaaaaggttt gaaaaaggagcacataaataaggaccagcgatgcttccttt C ACG at GC aaaacacgaaatgaaga G ACAC a T t C tctctactatatataccatttgctcctcttcctctaat tcc ACAC a T t C ttcttcaactacaaaacaaaaacactact ACAC a T t C tttcctcttgttcatttcaaacc caaaaatcaaagttcgaagttgaaagaagaca score :42 16859_at||#F13M23.140#At4g25 000 alpha amylase like protein;alpha amylase, Vigna mungo, PIR2:S10514|COORDS: 11818323 11816607 tattagtccggctaattaaaaaggaaatgaaaattcaaagtaagttagataaacatgatcattcacaggtc agatgttttaaaaaaaaatcattatggtgtacat CAC a TG t A gacaatacttcagaattcatctggactac cagaattgag ttacctagtacttctcaattctattttaccctaacgtctaataaataacaagtactctagc ctcttcgttttatgattcctctaggaaaagttaatgttacggcccaatcactttttttaacagcccaaaca acatatattagctccaaatatcattttttcccctagaatattctcaacctattgtccactc AA a ACGTG A C A A a TG gaggtctaaagggagaccatact TGAC t C a T tttagagctaggatcag acagagtagattttttgc cataactccttgtaaatgtattcacatttcattcccaagaaaaatagactgatgaagaaatatatcagata

PAGE 128

118 tgacaaggc C GTG T C G tttaggttacgtaactctacaaggtttagggtctcaatataaacacacaaagcag atagaagaagcaaaccattcacaatcagacaa score :78 16450_s_at||#F24M12.20#At3g50980 dehydrin like pr otein;dehydrin Xero2 Arabiodopsis thaliana, EMBL:U19536|COORDS: 18819061 18819748 tttaagtatttgtgtggtgtaataatatgatgatgtggctttctttcattgagtcgctaaataatatatgt aagtgatggatctttgaattaatataaaatatctacttctggtttcgtttatttatctttctatcgcttca actattccatgttattaaaatacttgt gagttg T GA C A C a T a C a A c CACGT ATA att CAC a TG t A attatt tggtaattcaattctttacaagtttatatgttcttttgcatacaagttatcagggaaacgacc GCA gt TGT ttggcaatgtatattactcaggttttcaattcaggcccatacggcccaaataagttgtttttacggatatc gagttaaaagaggt GTGTCG attata A CGG ACA T GTGTC A cactaaccacaaaaccaacgtcttacgtaa T C GCCACGTGT G T C A C t T caacgcaatttgataataatcataatactatatgtgattatgatccctatccaa agttgcggttctataaaaataccattcaagagcgtatacat CT tt ATCC atcagtgtcttaattttgtcta tagctaaaaaaaggttcaattgtaaaaggaat score :390 13134_s_at||#T8I13.5#At2g47180 putative galactinol synthase;|COOR DS: 19318905 19317582 tagaaaagtaacaagaattgaaattttcgacttttaaaagtgtttctctttagtggttatggaaaatgaga tctttgtagatagagattactccttctgataattcttcaatacagtgttatctacactgtttctttcccgt atggtttcagtatcttgttaaaatacattagcaaatgactttaataaattagcaaaaataagtatgactta gaaaaaaaaaggtgtgtat taaactaaaatgcaagaaagaggactag G g A a GTGT acaatgcac C t CGT a G C agggaataacatgttctgatttcccacaaaacaccgtagctagctagctgcttctagcgttttctccgta taaccgtttggtttaaatttctccggtacggtcccactttccagaactttcacggtttcaccaaacgccaa tcaccaccgtctgttctccta C t T ACACGTGTAC G T a TC agatttcacgatcgtaacccatc caagcgcct atattaaccctgtttatcccgctaccgtaagtcctcatcagcaaacacaaacacacgactagataaact CT aa ATCC tcacagca G A t CACGTG C T aatcaca score :234 17642_at||#T4B21.10#At4g04750 putative sugar transporter;|COORDS: 2417114 2421628 atttctcatgttgaaagttactt AC t TG a AC ttggacattgaac atatatagatgaatatccaaaacttaa cacaccaatgattttagaat T t CA t GTGTC tctcttattgcattatttaagaaaatt TG c CAC t T atgata tgaactaacaccagtactaattccccaaacatatttattaaccaattattttcacgcatttatttgaagat tcgaattatgatttttgtcttttcttttgagtaagtttttttttttttttttgaacaaagagtaagtaata cttaaaaacaattcca tgtccataaaattttactttatcaaaaaagaatactattaccataaaagaggtaa cacaatctgtgttttaagttccactacttcaggagaccaacaatcttccaaactcaaatacaaaaaggttt gagactttttccaagatttgtctcttctttctttattctacctctaaaacaaaaatcctaatcgagttctc tataatattcaataaagattataattcttcttttatcttttgataaaacttgtgct T t C A t GTG attgtgt ttcactttgatttataatctcagattccaaaa score :60

PAGE 129

119 APPENDIX I REPRESSED ABA DEPENDENT GENES 19216_at||#T19C21.20#At2g38310 unknown protein;|COORDS: 15998795 15999418 ta TATC t AG ccat ACTA t GA aataacaattgaagaacttctaagttgaaaga ATTC t C a C tttgtatgttt tgatgaatttgatcttacattccaatcgttccaaatatcgaaactcttaaagcgtattcaatcaa acgaat ctcgtc CT a GATA tacctt GGTC a T T TCAAG aa G aagagaaaaaagttatggtcaagaaactaaaa G TC T A AA C ccatattataattctagtgttgatatacgagaattatatattggt CACT t GC ccaattaaaaatatcg tgtatagaaaacagtca AG t C a ACA actata G C a A GGG GC aaaaccgtaatttca CA a CA a GCA a CTTGC t cggtttttt CG t TA t CACCA c TC a C atgaactctgca ttaaaaactctatctctctcaaatcgaaa G g C a C AGC c C a A cttttcgcaagtcgctgtaaagtttgatttgcttctttttatatacacacatacttctcctcca tacactttcctcttcaatcctcagttttttttctaag C CCT AAT A ccatctcaaagaagagatcaagattt gaaa TCAAG aa G acaccattactcagatcaac score :138 16991_at||#L73G19.10#At4g25630 f ibrillarin like protein;probable fibrillarin (Sb21) mRNA, Picea mariana, AF051216|COORDS: 12038736 12040702 c AG a C t ACA gaagaaaattatttatgagaacttgtaatgttagagtgg ACCT c G T ATAA a C taatta TGT g G g CT tttaccataaactatttatgaaaattattatggcccacaccactataactaa AG c C CA CA T a T ttag ca gcccagtttcattgtaagagacatgttc GC t CTG g A actagaattttctggtttttgggtatttgtttt cttatgtgtagagaaatga TG g TA a CG attaaatgttgtgtattacaatttacaatg GT a AG a CG attaat atatttacacacaattttgttgttgctgtaacacgttagtgtgtgtgatgatagaatttcataaagcttta A CTACGA G GGGC aaaatgttaattctaaatagttgacagcagaaa aagatatgtatacataa TAT a AGG at taaaacgtaaataataataaat AA g GCG a G ttaaattaaaaccctgttaaaa CCCT a G C TTGA a A cacatg tataaaa ACA CT TGC GA G C G C AG CT tcatcgccatcg CC A T TC t C t C tctcatcaaaagctttt C tc CTTG A ttt TC GC A T T C tttaga G T CT t A AC gcaaag score :198 14383_at||#F14J9.12#At1g09460 unknown pr otein;similar to beta 1,3 glucanase GB:AAD22663, location of EST 192N12T7, gb R90355|COORDS: 3053905 3054546 aaacagctaacaaaagcttta TGT t G t CT gcttttttgttttttcaaaatctttccgctttgttgatttct ttaaatctttgtgggttttgctgagctaagcaaaagatcatccattgtattaattaatctcatgatcttaa tct g CAGC a C c A gtgtctcttctttccttttctcctttttggataagagaaataaaaataccaatttgcag cttcttt CTTGA a A actttggagagctttatgcatatgcttatgtctgaaacggatctgacaaaaatgggt ttttctctttcttgttttttttctggctagcataaggtaaaagcttatatctttgcacacaaaaatataaa gaacagatgag ATTAG t C agaggccaaatccaaaagacttcatagc aaagtaaacatcttttgcctacttt cttctttttttctcacttagt C A CT t GC A tgtagagtacagtcagcatcttctctctttttctccccca AT TCTCT C AAG agaatctctaagcttttgttcttcttacaaatcatgtacttataataaacccaagaaagaag aagaagaaaaacaaaa CCA a TC c C aagctgtg score :54 19930_at||#MAH20.18#At5g08620 RNA helicase (emb CAA09212.1);|COORDS: 2794539 2797547 ctgg T g G A GC T G gccaatga GTT t A G c C g CAGC atggga CT TG AC A G c C ca CC a GCT a T ccccaaaaatgt tcttg GCAAG a T gggtctcaaaaacgttcctggtcttagaaccaagtagattttcttttt TC c TAGT gatc tgctttggttttaaaactttaatgttttgtaa GT gt AGAC g TGT a G t CT cttgtc A tt CTTGC tacact ct taagacttcctaactaaaaaatcctataaa AGT t GC C T TC ttataaa GGTC t TT tgtttgtaatggcttgt

PAGE 130

120 taacatgagtatttcgattttcactaaaaaaacaaaaaaagtattaagttg G a TTA t AC gtgtttgaagat ttaacaaaacttttttctgatataaa ACA a C a CG tttcaaagcaaagggagcaactacat GCCGT t T tgta tatttgttgacattttgg G C C A C T CGCA TCT gtacccttaa gccctaaattgtgtttcctacaaa ACCT tt TA aa CCCTA aa A taaccgacg G c G g GAAT ttggtagt T g CAG a GC cttacagttaa AG G C A t A C GC ttcac cgttctcttccgtcttttctccactcgttggc score :204 17361_s_at||#F28M11.40#At4g10120 sucrose phosphate synthase like protein;sucrose phosphate synthase, Zea mays, PIR2:JQ1329|COORDS: 5279508 5284260 ttggtcccgaacaaagacgttgaagttcagtgagagattgaga AGG c A g AC tcgagaaa A c ATG A GG C tat agaaacaa G t CTAAT gggaacaaatgaagaaaaaggccttgtgccactcattattctctatctccacaaaa gaaaagaagagacaaa CTTGA a A aaggaaatgggtgattcttattcttttagagaatct G at AATGC aaa a aaacttactaaaaaagaagattctctttgagattaggacagaatatttgtctttccagagtttaacggtta taacatgacgttgtaatatgccatctgagagatgataatacaaataacaagaaccacaccaaagttaagca aatatttgaaattaaa T A TC A AG aaataaaagagtaaagagaaaaaggagggaccttttgtatgagattat aaaatgcaaagctaagttgatatacaaaact AGGC ct CT ctt ccaccctatagctgaaggagattcggtat gtaaaatgccgaaaaatcgaatttgttccttttatctattgctgtgtgtatagatgtttatgtaaattttg tcaaaggagtt T t CAG t GC aacaatcagagag score :66 18583_at||#F4F15.250#At3g52140 putative protein;150 kD protein cluA Dictyostelium discoideum,PID:g2281117|C OORDS: 19210259 19218322 tcttcgatcaaactcacaaaaattataacatatcacc GGTTC C TC c TAG ctacaaatattaaacccataat aataatattgtggtcagtgatt ATTC t C a C ctactgtgggtatctaa AA a CTTGC ttaaaattctgatgat ttgcaataaattt CGCA t CT tt C CC GC TT T aaaaaagcccaaacaaagcccattagaaacccaaaaagga G g GA t TGG tttgttatt ctctgtatctctc TC c TAGT att C A GC T C t A acacatagagaggagtcagaatat C a TC A GCCGCCT ctttca C t TCA a CC ctccttaaa CC T CAT c T ctttcctactccgc CGC t CT c T ccgtcg tctctcccgttactccatttctctcgacccctccaggtaattccttttcagatcctctactagacccttta cttattttagtgaatt GGTC t TT ctccatttccatctatcccttacgtatctgtgactt tttccccccttt tggatt GT a T t CC TCAT c T tcctctgtttccgtcaaatgaatgatctttctgaaattcaatcacttttgta tcgtttctgaatctttcct C TA at AGG T gaaa score :144 15842_at||#F15K20.6#At2g27840 unknown protein;|COORDS: 11811137 11812249 ag C c GCTC c A ttatttttgtttgtttgaaacaaactctaaaaccctaaat tcttct T c TCAAG ca GCC a CA T c T tcctcttt CACT a GC tatggagttttggggtatgtctctctcaaaatcatatctttattaagttctac tgaatacctcaatgactcctattgactgttttcgccattaaagttaatcattttg GT a T t CCT aagttcgt tctgtgattaaaaatct GGTC t TT atacaa AA g CTTGC ctttcttctgtattga ATTC t C t C tattac TCA at AC G CATT tt C ttgtgttcaa aactatacaattgtttgggaatat T a GC a T TC t TAGT gtttaggtttca atgcatctatctttgtgaggttatgtttcaaagtatctatctttaagagtttatgtttcaaagcatctatc tttgtgagtttaggtgtcaaagtatctatatctttgtgagtttaggtgtcaaagcatctatctttgtgaga ttttgtatgaatttgtactcaaatgtgtgatgaataatgttcaattttagg TATC g AG attaa GC c AGGG a agccatttaaggtgatacaaaaagatggattc score :102 16541_s_at||#T26J12.13#At1g23090 putative sulphate transporter protein;strongly similar to GB:BAA75015, location of EST gb W43788 and gb|N96564|COORDS: 8188952 8185236 ttgttatattattttctacaaaaataaaatca ttcaactaaaatgttctatt T G TC AG t C G cttcacaaaa ctatgcatctaccataccagtgatcataattataaaactctgtttttgtttttt GTCT tt AC aaccactta aatacgtcatagctatatatgcattattacaagt T c TCAA G GT G G CGC G c GAAT gttttaattaaaatatt atgtttttcttatgtatttcatctttaatcgaacatagaaaatgtgaaagatttggagcatagacacgaat cagt tttttttttttacacgaatcaatat T ca TAGGG taaataatagacaaacgccaacaaaaaaaaaaaa gagaaat TCA ca ACG gattat C g GCTC a A attacaaaacaaaaaaaaaatacag AG a C a ACA gtatacgtg taactttccgaatttttttttttttcatatcaaaaataaaagataggccaagataagaccattataaaat G

PAGE 131

121 TAAA GAC C gacccgaaataatactcatgtttaaattatcag C AA AG a GC caagagaggaaacatcacacat tggtgacaaaaattctt AGG a A a AC caattaa score :114 16053_i_at|16054_s_at||#F22D16.8#At1g02920 glutathione S transferase, putative;similar to glutathione S transferase GI:860955 from (Hyoscyamus muticus)|COORDS: 659708 658889 att G t GA a TGG atattgaaacttatgagttcgat C C A C T CGC A T caatacttacatgg GT a TAA t C catta aatctaaagacgga C TA T a AGG T t G t G AA T GC a A gcaag GCG AAG C T agagaaggtt CT a GATA gatttat tacgaagaacgattcaaaaagaatag C tt CTT G ACTAAT aataacttcacatt G CC T CAT CT c CTTGA ttc aatgatccaatttgaacatccaacgatccatatgttttcaacaacgtcggt caactttgactagtatacat taaaagactattct TGT t G t CTTATC c AG ccatggttgatattttctccgattatattgttttataagtta tatgaacagtatatgaaaatcttgaaccaatagaaacgacgaatcatacgttact C ag CTTG A CTA t GA aa taacaaagagtattcaa GC t T G G T G G CG C CGT t T gttttttgtttattcactaagttactctaatttagtt gtataaatacactctcccatttg tgtatttctttt C a TCA a CC acaaagatctctctacttcaataaatct cc A c CTTGC tttaagaacaaa G T CT t A AC aaa score :228 12467_at|15184_s_at||#T17F15.30#At3g48100 response reactor 2 (ATRR2);|COORDS: 17637771 17636143 ttttccaacaatgttcattaattaattaacaatctcaaagattttgtagattgaaatacaaa tcttctctc tgtggtacattt CTTGA a A aatgggaaaatcaagaaagtatcgaaaatgtacaaaaataaaaagaaatgaa tcaaagtagccatgat CTTGA c A acaataatcgagagagat CGT ca TGA tacgatttc C C TCA TCC aaaat tgattttatttcccttcccaaatcaaacatatcatatgatttca CCA c TC a C cattacttgactattctca acaaaaaaaatattaaaaaactttatgactttga ttttatttttatttgaa GTT t AG c C aaaatttgaaaa tatgacttttgagaagaaaacagaataaacaaata ATT A GC C A CGCGCT A T C A gacagacaaaatcccaca gatatgcaaagatctctcagaatcctct CC c CAT a T catatttttctcttttccctctccttctttcttcc tttataaatccatttattctcctctcatctctcagcaaaatcaaat CC T CA T A G T tg ATTC t C t C tatctc tctcac gagtcacgatcctact C tt CT T GA T A score :168 17832_at||#F7H1.8#At2g16060 class 1 non symbiotic hemoglobin (AHB1);identical to GP:2581783:U94998|COORDS: 6932074 6931334 catttgactctgttgta GCGAA a C G C G a T G T aacttacggatacggttgagtttgaaagaaattaaacacc ctcctctgtaatgttcat t G c CTG a CA ttcagtgtgtaaagtttcaaccttttctt A CCT ct TA G atat GA A a GC CACAT ggtgctacataactctcacaagttttgt TC a TAGT tgtgcaaataaatt GGA TGA G G gagag attcatttgagta GT a TAA t C tctgttctgccaatttgatcaatagattcagtgtatcccaacttatacca cttggtgtggttggccaatacataaataagtaaattaatgagataaagagatctaaaagac ttttctttag tgtttt G a CTAAT aattggtcaagcctaccattacaaactatgttccattaccag T a CAG a GC acat GGTT t C a C tttttaaccaagcaactttatctt AA a CTTGC t T A TC A AG aactgtctctt CG g TA a CA caaaag GG TC t TT tatcaaaacctatatataggatacttttcataattggagtaagatctacaaaacagagagttgtat actttaaatcatttagaggttgtgaaatatta score :126 12630_at|19926_at||#F4I1.49#At2g44370 unknown protein;highly similar to GP 2435515|AF024504|COORDS: 18271396 18270644 caagtcaatctctgta C CCT AAT A tagaagttaattatagatgacgttatatacattttccacataataaa atcgtaatttcgagtttgtttctttagattacttgaagacaatattat GC ATT aa C taga G a C c CAGC gta attg A CC T C A T A T ccttcattatgtcaactctattgttcatatatatgataattcaaaagtttataatact atttaattacaattagctattgaatcagtttataatttataggttacaataaaact A g ACG GC GT G a TG ga tgggatcatgaagaataattcca C ca CTTGA ctttgaattccaaatgagaatcaatcttgtatatatctca ccatcat C A A AGCGCG T a CTG A catttttgtcccaatatagttagtatataagctacacaacttggtaaaa tagtcaaaccaaaacacatcacta C t TCA c CC taaagaagactcattatcaaaagt CG t CT t AC tattttt actataaatatatactctcaaatcccttagcttaataagaatcacgaaaacaataacaaacacacaaaaag aagaagaaagaagaagaagat ATTAG c C atca score :162

PAGE 132

122 12115_at||#F7K2.50#A t4g22470 extensin like protein;hybrid proline rich protein, Zea mays, PIR2:JQ1663|COORDS: 10805954 10804815 ccaagagat T A GA GC T G G aaaaa AT AT G A G G C t CT TT G ttcattggtttaaataacggatcatattatt AG A A C GCG CG aaaattggtactcgtaattgcttt G G TT ACA C ggtaaaactcgaaatgagttaaatgctttt t tttttttttggtcggacgaaatgagtgttaaaatttaa AA a GACC tttattcttagatttcagatttgcgt agttcttctttgtttcaacttaagat T t GC c TTC tcgtgatcacaaaatgcaaaagatcttcgaattctga a ACA a C a CG T ga TGA cgtgaacccgaatatagataatg ATTC t C t C c A TGCG t G C ttggactgattagtaa actacaaaatatgattgacattttcatttcataaaaatatca taaa TC a T AGTC A A C A CG G CG A AA C C aaa ccctattgaaaccaaaactccaagcttcaatctaaac GCAAG a TT tacgcaaacatttttg TCA ac ACG tt ctcct ACTA t GA agctac C aa CT T GA T A actctaa C t TCA t CC tcaaatccatcgttcataaatagtccaa atttc G ca AATGC aaaaccaaacaaaaaaata score :228 14640_at||#T16B24.16#At2g39200 simila r to Mlo proteins from H. vulgare;|COORDS: 16308341 16304799 tcaataaatttatgatatattaaaaagtatctttttcagttcaacatcgcagacattcataataactgcca aaacccatttctttgatcgtaccgtaaaat G t G A A T GC G A G gcatgaataaatttc GGTC t TT taaaaata aaaataaaataaaaaatttcagtcttca TCA GA AC G ACC gaacactacatt tagtttttaaaattgaaaaa atcgttttaaggtatgcatgacccataagaagtggtagctaagttc A c ACG GC ATT a C C A g TC t C CTCAT g ccacttca T c TCAAG tc G ttatctttga ATTAG t C aaagtcttc GT t T c CCT tattttttaataaataatt gcggctttcacttgtttgactttctttttattctctttacaaagaaataacaacaactcatgtat AGGC at CT tgtttttaattgaattatgta tatatagagacataacaggaacaaagtccatatcactttctttctcct tttccccttatcgat AA a CTT G C G A AA C C CTA aa A gctaagaaacttaagagtttaaagtttaacaaaaac aacttaaaaagagtt T t GAGC a G tgaaattaa score :156 17413_s_at||#MHJ24.10#At5g64120 peroxidase (emb CAA67551.1);|COORDS: 24954772 24953377 Gt G a GAAT ttataaacctttggatatttccattattaaaatccact TG g CAG a C tttt GC a CT TT G taaat taaagaagaatttgatgttcgttttaatcctctaatattgagatctacctaccaaaaaatggatgtatata caaattt CCT a ATA acaaaata GT a TAA t C C T CAT t T ttaattatatatttgctatacatca ATTAG g C at gaacctaaattcaaaactcctatacgtaaacgatatttccatc attcgttacataattcaatcaatttcac accaagtaacta TCA aa ACG aaaaaaaaaaaaaaaagta TCA aa ACG aacaaatccaaaa G aa AATGC ttt taaagtcaaccagtcgaccagc TCA ca A CG ca CTGA cca A CA C C A CGCGC T T T tgaatctcccacaaacgg tgtcgtttttcatctatcccccacaaacacc A a AGC c GG ttt TGT t G a CT cacacgtcattcttt G a CTAA T caatctcctctata aatagtactctataacgaactaactctactcactataacttaaacacacacttcat cttctctaaaa CCCTA aa A tttaaacacaaga score :174 15107_s_at||#F12B17.220#At5g10430 AtAGP4;|COORDS: 3278233 3277826 cttt GT t T a C CT T GA T A ttttcaactatacgtgtatcattacaaatatttg ATTC t C t C tccaattatatt caccat CCA a TCA C AG C T C G A TA agtcagaagtttattcacctaatcataatttacagctctta TG t TA a C G ttaattcatgattatggattgtatactaaggaaagtcaatataataataacat GT T a AG A C t TCA t CCGC CT A AT A agtttgatattttttctttccatatgtaaaactaattt ACCT ag TA atatgtatgttcattaatc atattcaatatgtaattaagcata G TC TAA TC atattcatttatgttataaaaaat acttaaaatatatta a T GC T A G G G C A A CC c CAT g T tacaattat GCG tt TGC atgttagtctgcaaaatttatctttaatatatta aaagagatttataggaatcttgttttaatttaatcgaaataattcttcacatttttcctcctttttataca tctcttccatttttcctcaccttcacaactctctcataatctctctccctcactctaacaccaaaagaaga agaa GCATT tt C caaagagaaagagaga gaaa score :162 14964_at||#F5I14.4#At1g65500 unknown protein;|COORDS: 23569307 23569795 tttagtataattctactaatacaaaattaagcactaaa G TC T A AA C c G C A C TC GC A atattttatattcac agtaactataagaa GCG AAG C T aaaaagaaaaacagaaaaaa GC t TG a TG accaattcaacgatcattgat agtgg CA A AG T GCG A GCTG A G t C tttgttat T t GC t TTC aacataagtcaaagtcatctgcggttcatcta aaatcaaa CG a T AT C AAG tgacactgaaaataaag T g TCAAG tc G ttgttactttcttctttctaagtcaa

PAGE 133

123 cctacactaatagtcaactccacttaatatattcttcttc G a CTAAT atgaaatggcaaagaagtcgtatt tgaatttttccaaaaaaccaacg G ac AATGC acttatctcttaaaattcaaatttgtaaatca ataaaaag gaatct TGT a G g CT tgtaggtaatttagaaac C GC AT TC a C a C gtgaaactttgaaataaaacta A a AGC t GG tttacaagaaaa C T A a T A G G G AA AC C aaaa A T A T G A G G T t AG G C G t CT tctatatatatatcac AA a CT TGC aatgaaaaaaaaa CGCA a CT aaggagaaa score :288

PAGE 134

124 APPENDIX J ACTIVATED ABA DEPENDENT GENES 17548_s_at|19614_at||#F14J9.16#At1g09500 putative cinnamyl alcohol dehydrogenase;similar to cinnamyl alcohol dehydrogenase, gi 1143445|COORDS: 3066815 3068488 acttaaccagtagttgtccaataatttagttttccaaaatgaaaaattatt gttgtcatctattttaggtg ttttagttcaatgtggattcctcgtcctaacaaatacttgacgaatatatctagactataaaattggttat gagttctacttttttttgtttgtgaaattatcaaaatttgttatatttatttatttattctcattaatttg agtactaatttttaaattatttatactaaaaacaattactaagatacaaaaatggataagagcatggtgta tagatatttaatgggatagaata tttcccataattgtatgtgtgtgagaggttttgtttt CGT aa GGA aag aaacaaaaaccatttgaccaaagaaaagcaaaagaaggcaaggaatcaaacaacaaatgttgcaaggcaga aata AT G GA CG T TA tgttaatgtag TGTCG t CACACGTG A C T taaaag A g ACG a GT c TG CGTG T C a A acta aaaatgtatgcaactataaaaatgggatttgattatctttt TAGTACCG a AGCC taccaaccacat gcaca ctaattctactcgccaaataaagtgaaaagag score :228 12332_s_at|13211_s_at||#T2E22.18#At3g12500 basic chitinase;identical to basic chitinase GB:AAA32769 GI:166666 (Arabidopsis thaliana) (Plant Physiol. 93, 907 914 (1990))|COORDS: 3963993 3962510 tcaacatt gatcatagttaaatatt TA G A CACGT C T a T ttttatttt C t C C A CGT tatacattttcttgac tatt CA t G c AA C t C t TGTG atcttgtcttagatatagtcaagtattgtgtatacatcatagggaatatccg aatatgtgtgtgtttgattaattagtg CACAC ta G ctcagatcatcgattagtccgca AC t CG a GT A a ACG TGT T T tcagaccacccaaaattttacgacctttgtgtcccactatct AC GT C G a T A gtaactttagaacaa aaaactcaaaaagagtgtaatttgttatgcttttggtcttctaaaatttgtattcttctaccaacatttag cacagtttcatacatagaaagttctcattacttggaattaacatgataatcccccaaaaaaacaaatcaat atctaacctagcaacaatgacggaaggaaaataaatataatcaaaaacaaacctaccaaatactgtttgtt cc CC at AGCC acactccatagac ataacaaacaataaactatggaaaaagatagaatttgatgtgataaaa cattcattaattggat C t ATATA tatctttca score :162 16038_s_at||#K1F13.5#At5g66400 dehydrin RAB18 like protein (sp P30185);|COORDS: 25812979 25812337 gtttggtaaaagttgagtaaattttgttagggcttagttttagtccatgggctaattagt aagtgatttac GGCCC A C AC A T G a G cccaaatgtttcagaccca GCCA a GT ttcttcaaattcacccaatcaacgacg A T G T A CGTG T G T A tgaaaatca T t AA CACG a C g C atcgctttcgaggaggagcat T A CGTG T C ctgttagctacg ataatgt TAGT A c C G C c AC aaagaaaaggatagatattttgctttccagcaccctgtcatgggattgata T g A ACACG T AC TT gg TA t CGAC atgaaagctca aaaataaattcaatccgattcctttagtgatatcagaag ttcattttaaatacg A ACACGT A T G gcgaaacaccac GC CGA C A T tttctgctg C t G C CACGCG TCAC TT t ccaaatattgattcattaaactaatagttgatccatatccgaaaccggactataaaactatcttcaatgcg ttaacgaatcttcatcgatcaaactcatcaaagtctaatatcacaaagaaagagtttttttaactagctta gctc aaagtgtttgcttaagacaagaagaaca score :354 19186_s_at||#F24M12.10#At3g50970 dehydrin Xero2;|COORDS: 18817852 18818433 atcttttaacaccaaaaatctgaaacaacagcatgaaaaaagcattaggattcagagataaatgagagcta aagcttcaaaatcatacctttttcaattccttcctgatgcgaagaagaaaaataaaaaaacaca gcttgag aaatcgaaatcaaagcttttttccttt C tt TACTA acaatttcaaaaaatcaaaagatgaacaacaaaggt ggaagaaacataccaaacgacatcgttttttagtttagtaaaagatatgcattattgggcctttcatatct

PAGE 135

125 aaa GGCC c AC aggc C c ATATA agttaaaatt A C GT C G G TC G c TA ac C ac TACTA c AC C G AC G T cttactt G C C A t GT GTG t GT g ACT cttaatcaattacgaattga atatattacttttacg AT GTCGG C C A ACACGT A T t gagtaaaatatctatgtgatgatgaattcctatccaaaatgaaatttgcatctctataaaagtatcattca agagatcattaatcttcatcgatcaaagtcttgaatatttcat C t ATATA tacctaagaaagaaaagagtt aaagaagtatattttcgattcaaagaagaaaa score :216 18872_at||#MKP6.25#At3g17520 un known protein;identical to LEA like protein GB:CAA10352 from (Arabidopsis thaliana)|COORDS: 6000355 5999459 catcgtttgcaagtagtcttccattttgctagggacttcgccggaaccctagagagtatttccgattcca A at CGT GT GGA agatcggaaatcatcattttctttctcattgtttttgatatgatagctggaggaagaatgg aaga tagcgtagaagagtgtatagaaattagggttttacttacaataagc C ct TACTA ttcattgaaaagc tcactaaacttgtttatgaaaagcccactggttattgtatacaagcccattagcttcacagatgtgtttca gttgaagcctctctttgtttttgcgagtcggttttccgcaaaaagcaatcgcttgcctcgttgtttgtgt A ACACGTGT C a A gaaccac T t A A CACG a AT ccaaaatcgagaagccaa aagaagctggta C t CG C CACGT A C T T A G C CACG C G TC ctaaacctatctctttttcaactaatacataacagagaagcaatcacagcaccattcc tcggagaacacatcacagtaaacagaggtttttttcttcttctgaaacttgatataagttatataa C c ATA TA atattttgtgttcgattagtgtaacaaaaa score :336 15280_at||#F6F22.7#At2g19900 malate oxidor eductase (malic enzyme);|COORDS: 8543952 8540655 ctctacc CC ca AGCC ttggcactctaca ACGTC gt T tacgattatgccattcaggaaggtgttgcgaaatg tacatttgcctggaatgttgcaggaccggtcctgtgcaaattttaccttaagaaaacgaaggataaatcag tagtggcttca ACGT C t G T gcttaaaaagcttttgggttgaacagacttacttgtcctgtttg ttgtcttc atgtatcataagatgcgttagacaagtatctgacct A g GT a ACC gaacttataggcgaagctatgtggtgt acttcctttgtttaagttataacttaagatttgtcatctagtgtacaagtaat T c CGTG T GT A tgtttgct tatggaataaataaagaaaactaatgcttatatttaatttcaattaat T G T A C G T G G C actttcagagtcc atttggtgtacaaagctgtcttcttcagtgtgaca cctctctgcatcctcaagcttccatttgtcttttcc agacctttcttctttccattaagtttcttcctttgtgatcctagaaaaatctacttcttttcgctttttgt ttttgggattgttttcattgatcctttcaaga score :84 13645_at||#YUP8H12.4#At1g05340 unknown protein;EST gb ATTS0295 comes from this gene|COORDS: 1559647 1559080 tctttgtagtcctgattttgaatatttcgttttagctaaatcagctccaatttttcccagttatcgtttgt cgaattcatattactgttaatttagttaacttccgaagactattagtacatgattcttcaacaaagcaaat aaagaaaagaatcttga TGTCG tt C tctcaacaattcaacaaaacttggtctcgacgactcgac CGTGTG a gatgtttgagaagtcagaaca C CACACGTGTCG AC A taaagcccgcac CACG c AC gctttt CGT ct GGA ta tatccaaattactttacagaaatcatttgtattgtttgttaataaaaactttgaaccactttgattactat attatttttttggaatttattct GGT g AC a T taatgttctat C ct C CACG c G g T tctcataaa CCG AC t T G GC ttctttatttttcctttatctttct C t ATATA tttaca CA C AC AC G c AT ttgtgcagagatattcaaag attagaa acattcttgatagatacaaaaaacatttttcagacacaaaatcataaaatctttgcctttagta gatcaaagttctttacattaatcgttagaaga score :336 12449_s_at||#T32A16.170#At4g24000 putative protein;cellulose synthase catalytic subunit, Arabidopsis thaliana, gb:AF027173|COORDS: 11426640 11 429969 aaatttaattacttaagtgtagtatatgttagtaattttctcaactcgcgagcaaacttcagttaacttga ttttgataaaataaataacacatcaattatctgaattctttttaaaagaaacaaatatttcatagaattag tttttttttcattttagaattctaattgaatattacagaataaaaacttttacttactaccacacaataat agactttaaagttagttattgtaagtagaatgtatt attttgaagttaa A at CGTGT atttatcgtgatgt aaaaatttac CGT t CTT tcttataaactagtcaggctattctggtgttgaactacttcac T t CACACGTGT C T tttggagaagtagtttcattaataaacacttatctttaattttggcttcaacttttcaaaggcacataa tgatcacactgtactgtatcatcttatctcataaatgagaaactcgaccataagaactgtctatagtgtgt

PAGE 136

126 atattcag aaaagttatcaatagtt C aa TACTA agaaaccggc C t ATATA tatatatttgtggtcactaat ttctccaaaaaccagttgccttccgtaaaaag score :222 13100_at||#F17K2.13#At2g45570 putative cytochrome P450;|COORDS: 18730462 18728475 ttgactggaagcttcaaaacgg TGTCG t CC ccggaaacattgacatgagcgagactttcgg tcttacctta cacaaggccaaatctctttgtgccgtacccgt CA a G a AAC ctacaatatcgtcttcttattaataatcgta t C a ATATA aa G T t CGTG T acggatcaatattaataattgaagagaatgaaattataaaagatactgctttt tatgtttcaagtaaaaatgttaatattagaaagtcaagaaagaggaaacacaatgcaatttgataaggtgt tttatcgtctccgggattgaattgcgtaggtca aaaaaagacatttgccatttga TA t CGAC atataatca ctcattcacggatacgtat C t C t TGTG a ATG gt C GT CGG t C aattacaactttactgctcggctttaactg ac A a GTG g CG cctcctggtttcattcagaccctaccggtaggatgtaatttggaccataagtaagaaatat ttgtgaaatattta C t ACACG a AT atctttacgtatttaagttggcttttgaagctttgataattcactca tttga aagaaaatat CAC t TG a G aaaaaatta score :90 14573_at||#T14P8.17#At4g02360 putative protein;|COORDS: 1040179 1040643 tcccagcacaaacccaaa ACGTAA G TACG atcc CGT c CTT tgtgttaattatatgattgctttgtatttca ttgttgatgattcgttggtctctcaaaaagaaaggcaaaaaaaaaaacactaaaactgatttcacccca at attgtttttttttttttttcttcttccattaatagcattttcgttaatgagttcaatagtttgattttgaa ttgcaaagtt CA t GT c AC tgcagcctttcactatgccgtagctttttaccgcattaagtcatcaatcggat cct TAGTA cg G ttttagtt ACG at CAT gaaatagtaaaatcagtccagtggaatgcaa AC t TGGC tgagtt tctttatttcaatcaaggttggtttgg T t A c CACG T G a T cg tcgtttttcccta T t AACACGTGTG ctctc tctgtcaatcaaatcgtcaccgtacatctaatttgcatggtctcttcatcagacagtccagaacacacaaa cacctaacaactgttttcgtcttctgtacttaatgagacacttcaaagcttatgctaaat T A g A CACG a AT cataagagcaaaacttaaagaagtaaagaacc score :306 20220_at||#T2H3.8#At4g02280 putative sucrose synthetase;similar to several plant sugar synthetases similar to P. sativum second sugar synthetase, GenBank accession number AJ001071 similar to beet sucrose synthetase (EC 2.4.1.13), GenBank accession number S71494|COORDS: 994166 997719 aagtaa tt TATAT a G attagaggcccaataaggtt C a C GT a GG CC caaacatattaaa AGT a AC t C agaag agatccatcaaaattttgattcacacatctgtttaggaattaaaaatattatttgatcggttattcattcc ttttcatgaaatcatgcaaaaaatcaaaaatcatttttttctctagaaact ACGTG G CG a G aaagcagagc accagttgtcttcttgctctgattatctcgttgaaaccgctttcaaagc agagcaaaaga G a CGACACC gg AGCC tccactgctttacttttcctttaaactgtgactgctttcatttatataataaaatacatacactctc aga GT c AC a TG tactctcctctaaca T A A A CACGT C AC T T G tagcgaaaacagtatcaagaaaaagagaag atc A A ACACGT C T tcttttctctctctctctt TGTCG c C taaaattccagaatcactctgctttttaccct tttaatcaatgatttttcctt ttagtagcaatcgttggtgattcgaaaaaccaaacttttctcggactagg attctagggttttagtgatcatctgaatattc score :240 12269_s_at|14653_s_at||#T4L20.230#At4g34650 predicted protein;|COORDS: 15506868 15509038 tccagaacaaacagagtttaagcatggtttagtctaaaaccatggattctattttagttactacctt cgtt atct A a ACG T GCAT ttgttcatctatttttattccttgtgtttaaagttctttctttgttta GT t GCG t T t cttctttcgttctcttttggcatgttttaagattaattttaccaacaacaaccagataaaatccaaatgca ataagaaagaaagaaaaatgacaatcgaagactgaaatctcttctttcttttt A g G TCGG C aaaaaagtct cttcaaagtatataaaacaaaaacatgaatcaagattag atttatctttttaacgttaagtgtgt A g CACG C G g C aagcaatttaaatgcaaaaatagaacatagttggacca GT a AC a TG tatcac CGTGGC aagcttaag agcaaggctaatagcgattctgtggaagtgcaaggtatgcaaaggccgtggggtctattgatttattaata gggccatatttgttgaattatagc AC a T GGC C t AC cattttgttccccaacaaatcctccattgttt C A a A CG a G T T ctgtt ttttctcctctgtttctaaaa

PAGE 137

127 score :102 13128_at||#F11C18.60#At4g31860 protein phosphatase 2C like protein;protein phosphatase 2C, Schizosaccharomyces pombe, PIR2:S54297|COORDS: 14373082 14371178 gtttgatatatcaatttactccataagactttgaaacatataatacagttgaaaattgc aatacagctaaa acaaatcataaagagtcaggtaatacattattgtattagtgttaagtaaaaatagtgtgttgtgctggcaa gtaagag A A CACG a A C aaaatcttaagaactttcgttgcttttctaaggcaaaacaaaacagactttttct cttcttcttctttttggggctacgacaatctgaatttgatcccaaatcaaaactttcttcttcttcgtctt cctctaatcaaaactgtctcctaccaaaggt ctttcctttctccagccagatatgtgaggaaaccaagctt ctcgtacaacttcatcacctacttccttttgacgttttcaaatcttggtatgtttaaaactcttatcgatt cagccctctctgttttggtttctgtttgtagattgatttcttaattagctctaaaagattaagat CACA g G g G tccattgtcaaaattctcttttgattttgtttttgtaatcaagaatagtttctttttttgattcatgaa gaa atgtctaattttttcgttgcagagagaga score :24 13158_at||#F4P9.36#At2g33590 putative cinnamoyl CoA reductase;highly similar to F4P9.37|COORDS: 14173162 14174905 tggtgaacactgccaaaggctaaaccggtcgccgagacggttaatctgatgtttgggattgacacatcatg cggctgttgttt C t C CACGT A A C at ttggagacccgctcaaa TATAT t G T g GGCC ggagaacattgtttta C c ATATA gggcccatagactttgaatatgtgtaggtaaataaaatacaatctgtggaacaaaaatggcaaa aatatttgaagtcagcaggattggttaacaatttgagcacagaaatacattt GGT a AC a T ctgagcatatc attcatatcatatcgctgtcgaatttgaaggaaaaaaaaaagactaaggaagtgga TATAT t G gtgaa tgc tgatgatgagaaaacttattattatatacgaaaaatttactaaagacagcaatattccaaaataatgatag ggaagattctgagatggtgtgtggtcagtgtggaggcactacagatggtcaaacagtagcataccc AC c TG GC tatttggacaaggacaaggaagaagaaaaatcgagaaccatcttctca GCCGA tt T tagtaaccatctt cttgtttgctccgaacaatcaagagatcgaaa score : 84 13549_at||#F5O8.35#At1g23800 putative aldehyde dehydrogenase;|COORDS: 8414508 8412236 ctataactccgaagattatggtatggtactactaagtactaagtaaaataattctaaaataa TATAT a G aa atgaatgattaaaatgatattgtagtattattag T GT CG ACACG a GT T ttgatttga C t C t TGTG ta G a CG ACA agtggggaacgaatgt gatatttttatccctgaaatggttctttctt AT G t T CG TTA aatgcgaaatc att A C t CGTGTG cttccataaacagagttctcacaccaatacaaaagagaccatcaccgtgagaaagagaa gaagaacaaaaaaagttagccatggcatcaagaagagtttcttcgctgctctctcgctctttcatgtcc T C C t C ACG T tctatcttctctcttagaggtatctaaatgcgatccgatccatttgtagagccat tgcttctct gtcatagttatcttgataatgatcgagatcgttctgttttcattgtctctgttaactttcatacggatcaa tgatctccgagtttttcggattttgt G GT T A C t T G tcgatggatttgatgatcttggtctcatttagatcg aatctatg CGT t CTT gtttgtttctttaggca score :210 13225_s_at|15997_s_at||#F5M15.22#At1g20440 hypothetical p rotein;|COORDS: 7085663 7084721 tgacacaacatcaaattcaaattccaaacaactacataacatatat GCCA a GT ccattgaaactctttaaa ttatacatctgttatccttttaaatctgataaaatgcaatacaataatagattaaaaggatgtactt CA a C CACGT A AC C atagaccattgattaatccaaaacaaaaccatagaccata G t TCG a TA cagcttctgttgat aaaagtaca tgactatgcgattatctaatattagtttaactcttcat C t ATATA ttaaaaaca TATAT c G t ttaaattaagaaaaaagtggtgatactgatatcatgatattcatc TAC CGAC t T caagaaataagaggggt cattatacaattcaagaaataaatatccatctttttggatcattttaaaaataattaaatctttacagcgt tac C t C C g C GT t GG CC tggacccttctcttgacactataaaaaccccactct ctcttatctcatcacaaac attactcattcacaaaaccatcttaaagcaactacacaagtcttgaaattttctcatattttctattta C t ATATA aacttttaatcaaatcaagattaacta score :108 15437_at||#T16L1.40#At4g33550 putative protein;predicted protein, Arabidopsis thaliana|COORDS: 15099027 15099453

PAGE 138

128 aaa tcacaaaaaatattcgacttttcacaagcatcggatgcatagttgtttatttgataagggtatctccc atagctagttaaaaatgttatttaatatattcgttttggtaccatatgctacaataatcatattatgcaga gatcatcgaataacatcacataaggtaagaaagttcatgtaaaaagaaaatgcactaataatcattcacaa acttagcagtagat TA g CGA t C gaaaattg A t GT c ACC aaatttaa caatctttaaaatgacatcttttaa t T A a C G AC G T t CA T tcggagacaaaatgaatttcatcaaaaat G a GT c ACT aaactatttttcaaccactc cactaaaaaagttaaacctcattcaaaaa C t GC CA C G a A AC attcccaccaacaaaagataacaaataata tagactccctcgccttgttgcttcccactacaaaagtaacaagaaaaattgtta C aa TACTA agcatagta ttaacgttatg C t ATATA tacaaatct C t C GT C GG a C caaatattcgcattcatagatttcattacaaat T A a CGA a C tattcacagaaagaaataaaacgag score :102 16632_at||#T19L5.130#At5g13170 senescence associated protein (SAG29);|COORDS: 4106069 4104229 ga GG g CG a TA aagcctaaagaaaatgtacaaatcgtaatgtaatctagtgatgatttcc t CT ATAT AG gtt gatgttgggtagacttttggtcaccatgatatttatctatcgtctataaagtacaaaactgtggtactaaa tgtgatttatgaaagctaattaaaaaga AG g CGTG aagaaaatgaaatcgtaatagacg AC g CG c GT acga gatgagagtagtggaggaaagaagtaagattgagtgaatgataaaatgca A a CGC t AC ctactaatatctc cacttgt CA t G c AAC cagactgagttcgttt tccttttcgagtcttatttttttgttttttattctac T CA ACACGT t A C ACG C t TC taataaactctaaacattaaaatcaaaatattttgactacaatggttattttgga g C t ATATA taaac CA C CTGA G CC tcctcagtttcctccatgaaataaaaagaagcatcttctagagagaga catatagagaaagagagagaaagctttagggtttcaaaaaaaaaaaaaaagagaccctttagaaatctcct aac aggaaagttttctcaattgctatagaaaa score :144 15052_at||#F4P9.15#At2g33380 putative calcium binding EF hand protein;|COORDS: 14094914 14093524 tcaaattagggtatatgatcaattgatcatcactacatgtctacataattaatatgtattcaaccggtcgg tttgttgatactcatagttaagtatatatgtgctaattaga attaggatgaatcagttcttgcaaacaact acggtttcatataatatgggagtgttatgtacaaaatgaaagaggatggatcattctgagatgttatgggc tcccagtcaatcatgttttgctcgcatatgctatcttttgagtctcttcctaaactcatagaa T A A G C ACG T t G G T tttttccaccgtc C t CC t CGT gaacaaaagtacaattacattttagcaaattgaaaa TA A c CACGT G G a TG gaccatat tatatgtgatcatattgct TGTCG t C ttcgttttcttttaaatgtttacaccactact tc C t GACACGTGT C cctattcacatcatccttgt TATAT c G ttttacttataaaggat CACG a AC accaaa acatcaatg TG t ACGT C tt T tgcataagaagaaacagagagcattatcaattattaacaattacacaagac agcgagattgtaaaagagtaagagagagagaa score :282 13426_at||# T3K9.4#At2g41190 unknown protein;|COORDS: 17118687 17116116 ttta CA a G a AAC atgatttttccataatattgaaactttatatatttcttaatattctaggcatttataaa ttattctatgttttgaatatttattttaaatttttggagaatactagaacatataactttccaagtcctag tgatccc GGCT ac GG caacaca TC C C t ACG a G ggat C a A TA TA TC G A a C catctc A a CGC a AC AC t TG TCG t C gatcaaaagtctaaaaacattgcttctcaactttgatctctttcctctagtctctatcatc A C CGCCAC GT C A C T attttcccgccttataactaggggcaatcaactctcgcgaacagagagagacggagaacaagagc aatcaaatcacacaaatcctcgaaaaagtaaaattcttgattaaaaaaaa ACG ga CAT tttgtgctcagac actcagctttccttattagtc ttttgccacctctctgcaaagttctcgaacccactctgtaaccaaaatca atcataaagatttgatctttttttttcctttttcggtgagatttaatcctctaaggttcttgatactctct agatctattcttacttgggtttgttcgaaata score :168 18215_at||#dl3061c#At4g14040 selenium binding protein like;|COORDS: 7067310 7065173 t ttgtttattcaaatttcacttaccaactcacaagtcatgactatagtaacaa C A ACACGTG CAT ggactt atgctaataatgaatgatatgaataaaataatgaaaatgtttgtgttgaacaatttttcacgattagtaat ttcaaattttttttctaattatatattttagcatgctatagataattgcggaagtaagttttgaattcgtg aaatttgaggaatgttataaaatataaaaccaaaaccaaaaatg taccaacaaagtctctttatatctata tcttaattcaaaacacat C c ATATA cactttggttacaacttgcaagaaatctgtttcaatttcttaatgt

PAGE 139

129 acaactaaaccctctttagcaaaagtagaattattaaacgaatatt GTT t C t TG tcacatactt TATAT a G agacatataaatagaaacatatatgtagataacaacgaca CA a GT c A C A at CGT G C T T A C t C ACG T a T tac gtatagccaaaagttt cctactcatatatat C A C A a G T G T G cttgcgtttgagatatactc A G A G ACGTGT a G cttcactaattccacatactcagagagaaa score :270 20149_at||#F7G19.23#At1g08890 putative sugar transport protein, ERD6;similar to GB:BAA25989|COORDS: 2848379 2852021 tgagtgggctttagggttttgctcttctctatatg tataggctactaagattgattgtgatttgttagact ttagat C ACACG g AT tg CC a A CGTG G C a G atcatctaccctttcaattattatgtgatctatggttcgtac atctccttttactttttgaaaataatgtacaaacgaaaagtgattttagtgaaatagtcctccattattaa gctttctataattttagaccctgaatcttgatctgttacaagttcatatttttaatttacatcaatacatt tattgtg gattgatcatcccaatgtattgatggttatttaaaagatggccagggttgatcgattcggtttc atatgattttagttctatg A t GT a ACC tttattgat CA a G c AAC aagaataaatcaataatttatacgata GC CGA C A T aaagagatttctattatctta C t A TATA T a G tcaacaaatatctataacagttgtt CAC AT G g G acacgaccaat G a GT c ACT gccaaatacaagcttcaagagtgaagacta ttaaagaacagagaaatacag ggaaaagcaattaggagaaggaacaaaagaag score :114 13785_at||#F14N22.20#At2g42530 cold regulated protein cor15b precursor;|COORDS: 17658416 17657734 gtgagtttttgtttttgtttatttaacattggagtattaggttcttagaaatatat C t ATATA ctattagt agtttaactacagttt gtacttaattgaaaaaatgttaaaagttgttttaacctagctaattgctaaaaat gactaaatagacatacacaaagacttgtacattttcagct TA a CGAC taatacatttttcctttatatata tatctctatcgagtctagttattaatgttgaaagttgcaaataaaacagaaatgctaacatgtaaatatcg tagccaaaaatgctaa CA t GTGT ataacggttataaccacaacttgat GG CC G A C c T ct tttttcttttgg taaccatagaaatggtt A CACG T A A C tagtacgaaccaacgaaaactcttcttattcgatagttaaagata atagcaatgcgcaaaaatatctagcact CACACGTGT a G ttttggattctcattggtcgagagatctataa aa C ga TACTA ttggaggttagatttttctcatctcactttctccatcttaaaactctttcttgtatttatt ttcctcccaaaaaacatctttaagagtcctc a score :270 18955_at||#F20D22.1#At1g04220 putative beta ketoacyl CoA synthase;Strong similarity to beta keto Coa synthase gb U37088 from Simmondsia chinensis|COORDS: 1122484 1119854 tctggagactggagtataaaat A ACACGT A T G gtatggtatttatttctcatagctagttccttaac agtt ggaagaatttatacatgacccggccctttaaaacctat C cc TACTA caagacattt GG GT t AC TT tccatc gacttttgaatgtgcatcaggcatgaaaatcctttaattatgcatattttataatcaatgcttaaattact ataaacacaaaaattgtagtgatatattagaaagataatttaaattgttacattgaaaagataataaaaaa ttataaatagacatctgatataaaaatggatgaagtata gcatattaaaaaaacatatgtttttggtcaaa acagaatcaatgcatagttagctc A c CGC t AC aacaataaccatagggactacgtaccatccataactaca ttttcttaaattgcatcctctttctaaaattttgcctataaattcacaataacacttcaactttttaggcc ataagttatctctttctctacaataagcaataaatctcacctcccttttttttttttttgtctcgctactt ttgattatcat ttaaaaccaaaaaacctacca score :78 15018_at||#T8K14.6#At1g79520 unknown protein;|COORDS: 29127162 29124110 ataatagagtgtgatattgttcatc A ga G A C GT C C A CG taagggtaaactaatcag TATAT g G acaaaaag gtacttgtttatctacaccaggaagaaaaacaataccagttttccggtccacaacttttgcatcaccatgc at CC g CA G C CACGTGTG T tcttatgaatattaaaatctacttatttcgttatacgacaaacgaaagatact gctttgttgcgtagacaaacaaacaaaatctcttgataataccagatattcttaattaattagtacgtttt aaactcaacatacccataagaaaatccagttagctcatg GTT t C a TG ttttccaatagatatatgtgtata taattaaacgctttgcatcgtttgtgatttgcatgttggaaataata atgttttttttttgctaatcaagg tttgattaaggaaaaggagtttggactttggacctttctactacaaatacatgtggcccttttt GGCC a AC accaaaagttccaaaaaagaggccaaaaaataagcaacaacttttgattct C t CGT a AC ttcccccacaaa atcagcagagcgagagcgagagcgatcggtta

PAGE 140

130 score :222 14115_f_at||#F17L22.90#At4g21630 serine pr otease like protein;Cucumisin, Cucumis melo, A55800|COORDS: 10459999 10456747 tcaactatc CGTG g T c A ccattttataatctataaagtataaagtgtgtaaaaaaaacaaattcaaaa C g A TATA cacattaaaaaaaaatccggaattggtttgctgtcctgtgatcctatatttcggtgtagagtcttct atatttcaaaagttcagaatataatcatt C ta TACTA aattgagtaattcagtcaatcatgatctaccaac ttcttaattacagttacctaacctactcatttagttagaaattattgatatcctcttatagtcttatactc atttgaattataattaggtaatatatataattaggtacactattcgtatatctataataagaaa G a CGACA attgtaagagttaaaactgagccaaaaagttatggtgggaatatc AGTAA C G C t A C ACG a G a G ataaaacc ggtc tgattcggaattaccataataagttgaataaaccaataattgaatccgaaccaaattcgaatctaac cccaaattttattgcttaagacgaattatttactatttatatgtatataaaaaagcttctataccacac AG TC AC A C a T G cacacacttctcacttcagacaa score :72 14733_s_at||#T5I7.10#At2g39800 delta 1 pyrroline 5 carboxylase syntheta se (P5C1);identical to SP:P54887:P5C1_ARATH|COORDS: 16551483 16547060 taaccattcaaacccctaattatttcatcagataacattatacactaataatcattgcactcaaatatgtc acacaatcatataataaaataataacaatgattaaaatgaaaaaattgttgtggcgccgcataaaatagaa atcgtgagagacgacgtcatctaaaaattgccttgctgtcca cttttcactttgtcctctcttctcatctc cgttca C tt CCACG gcgttt CC tc AGCCGCCGA tt T tatttatttcccaaaatacccatcacctat A g CGC c AC aatcctctacatcacaccctaatctcattaccatacaccacccaacg AA CACG C G c C AC t T catttgt tagtatctaaaataccaaacctacccttagt T C CACACGTG G CG T T t C c TG gtttgataacagagcctgag tctctggtgtcgct ggtgtttataaaccccttcatatcttccttggtgatctccacctttccctcacctga tatttattttcttaccttaaatacgacggtgcttcactga G t CCGAC tcagtt AAC t CGT tcctctctctg tgtgtggttttggtagacgacgacgacgataa score :282 13965_s_at||#F17I5.190#At4g34000 bZIP transcription factor like protein;Dc3 pr omoter binding factor 2, Helianthus annuus,PID:g2228773|COORDS: 15260499 15261862 ttttatttctcaaagaggaattattgattttccatttccaaagaaaaaaaataaattcgaaggtcaggaaa attaacaaaaaacttcctttttttttttgttagttt G t GT g ACT gagctgcttcattttttttctttcttt ttttttttggtttgatgaatcgatttttgt tgtctattactgattggttttcttgttcagattcactgatt cgaagagaatcatgatttttttttcccgctgaataataagcatatgattgggtgttttggagatttgttta ctgattaaaaggagattcctttccattttcaccatttgctctgtttgacttcattgtgcttatatttcatt tagatcttttgtttgggtttagctttggaactgataaaaatctgattttgtct CAC G G CT tt GG atttggt tc ttaaattttggtactttaaaactggataaagatcagtgcttttttagattcttcgtttgttgatgaatt tatggatgtatgtataattaaaccataatctctctgcttgtttgttttcttataggtaaatatccagaagc ttgatcctcctagttgtacgaaagcttgagta score :18 16115_at||#F21F14.60#At3g61890 homeobox leucine zipper protein ATHB 1 2;|COORDS: 22792269 22791376 aagaaaagcaaaataatcgagtttttttttttgtttcattataaactgcttcatttttcttaggaacggcc aaactgttaaaaagtaaaatatgtatggtgattaattg A t GT a A CC aa AGCC agttcg C c CGT t TG attgt ca A C c C ACGT tatcattcacttgatcacttc CA t G a AAC atataaaagctgataatacttatattataaag gaaaaaaagtat gaaaatattatcagtagttagatgattagttcacatctaaatgaaatacgacttaaact gaaagagacatgagcca AT t CGTGT C G AGC cacaaattt T G G ACGT atattttt A g C c CGTG g T tccacaa tatttgcaggtctttttattcaatgagtttattttgtcttggttgaataatgaaatttccaaatataaaaa ataatagaaatccgaggccctacacaagcacacat AGT a AC t C ccacattatata taagcggccaatatca gcaactcagagattccagaaagaaagaaaaaaaagaaacaaataattccaaaaccttctctcttaatcaaa at CA a G a AAC ttacaagatctggtgaaaacca score :114

PAGE 141

131 15103_s_at||#F5M15.21#At1g20450 hypothetical protein;|COORDS: 7089106 7088234 gtttatatttcaacatatagtatgcaaacttaaatcgtgag aatgacacaaccactaattcaaaccactac attatatattctaatccattcaaattcattaattatacattatatatcatat C t ATATA atgttataatat acaatcaaacaataaattagaaggaagtac CA t G c AAC caaact TCC at ACG t C t C c CGTG aactaaaagt a C at TACTA tgcaaatatctaatataagttaactctatat C a ATATA ttaaaaacataaatccttaaatta agaaaaaatatgg ggtgatttcattgatattcatccacc G a CCG AC C G AC G T aaaagaaaatgggttcact ctgcaacttaagaaatagtaa A CGT CC T T tgccataatttaaaaaactaatgaaatctttaacag CGTGTG ac C t C C g C GT t GG CC tggaccctctacttcacacactataaagtcccactcatcacaacaatattcatcac acagctcttgaaaaccatcttaaagcaactcaaagctcaaatcgaaatttctagtt tctctttatcattca cgctaagtgttcaatcgaatcaagattaagta score :102 19368_at||#T7N9.26#At1g27200 unknown protein;|COORDS: 9451660 9449933 c CA C G C AA C GC a AC acaa C t G T C ACG g G a T catcggtcctcctttttcttcaatcccaaaacattaaaaat gaaaaaacaaaagacgagaatatggtccacctcatcgtcgcatcttttga cacaagtcgcaaaactagtgg cctcttctcgtcttaattattaaaaccaaaacaaaacataaaag C c ATATA aaaatcactttaatctcctt aattaatcacacttttaattaagccaaaagtaagacaatgattagtgctactaatcttcggtggcttacaa gattacgagaa C g AC t CGT aaagaaa AA AC G C GT c T tccttgtttctttccttctttatcacaacgattat ttcctttttcttaacaaatcaa aatcaaacaatatttcacaaattaaattcctttatatatttcctttctt taatctctcttaaagaaaaggaacaagaagaaggtacttt T GT C G GC C ggcgcaaaaaccagatctcttct t CACG a GT aaaaaccttctctagatcagtttattaagcaaccacccttttgctctgtttctaacaaaccag aagaagaaaaaagtttcaatttttcttgaaga score :108 19386_at||#F14M13.9# At2g22510 unknown protein;|COORDS: 9518248 9517874 atttgttggttgaccaaacaaaatattagttgtaaaaaagtaaatatttccaattaaactgatattatttg cttttcacctattctcatcagatattgtcgcttgtattttttaaaattaaaccaaaaacaaaccgaattgt tttcgctatacat AGT c AC t C atcgaagttcaaacaaaggccaaaccaaaaggcaaaa ccatcggattata aatttatttattgatatactaattttctaaatttgtttatcaaattatttttctcacttttttgtagttct tttatttgtgaaaatttattttattttataatatttaaataaaatataatacatatacattaaactaggtt agctgaaatctc CA t GTGT catggttgtttctagttagaacttag A a GT g ACC aatcttacttgtggttgt ggaa CA A G T A AC t C taaataagttgcaaca acaaaacttcttcaaacctacttctcatcaaacctcaaaac cctaactcctctataaatacacacaaactccttcataatctctccatcatcactcacaatcaaacctcttt ccagtgacaactcaaacccctaatccccaaaa score :36 15145_s_at||#T6A9.26#At1g02200 receptor like protein glossy1 (gl1), putative;similar to receptor like protein glossy1 (gl1) GI:1209703 from (Arabidopsis thaliana)|COORDS: 418818 419898 cgtatataaatt GT g GCG g T gggagatg G a GT t ACT aaaaacgaa ACGT a CA agtattattcatagctctc gtataaggggttagtccttagatctagatattttcacttttctttcattt A t GTCGG agcaacagacacta gctggcgcttca ACG T G CAT gatc ttgattggctagtaaattccaagcatcaataccta ACAC a TG cccaa cttggttcattagtattctttcattggtaaaatacccttacctttcaataatatccagaaataaatatatg aagccatccatcaaccggtgcatttcctcaaggcatggatatgatatcagaacatcgatgaaggtgggagg gggtaattagctgagtgtcataaatgaggatccatgtggagatcatcgaatggtagtagtacatgtt tggt cttagctggccccaccacaaggaattggactggtgggaagataggggt GGT t AC G T C at T ccacatatcta ccaattaaggagtttaatataaaccttg C t ATATA atgtaccttggct CACA a G a G ttgaagagacacagt G a CGACA caaacatattacattcgacggtata score :78 16073_f_at||#F25P12.92#At1g56650 anthocyanin2, putative;similar to anthocyanin2 (An2) GI:7673088 from (Petunia integrifolia)|COORDS: 20507776 20506401

PAGE 142

132 agtgaagatgcacattctaaaaactggtaaaatggtaagaaaaaaatatataaaaaaatagccttattaaa atttatatctcctatttctctatccaaa C t A CACG g A T G aagcttattgttattcatccaccctttttctc aattctgtcctatttcttgtg CA t G a AAC ttctccatcttgtaatcggataaatcatacccaaattttttc tttctgaaaacatatatacccgaacattaattac TA t CG TC C TT tctcctaattttgttaagaaacatgtt tgtttgtttt TAGTA ct G aaaaaggatggagatacttgctagatcctatgaaccttttctctctaggacaa atcagtaaccaaacaataacttagcaaat TAAGC A CG ACA gctaatacataaaatgtggatatcaaa c ATG CACGT C AC TT cctttttt C C G T CACGTGT T T ttataaattttctcacatactcacactctctataagacct ccaatcatttgtgaaa C ca TA C T A TATA taccctcttccttgaccaatttacttataccttttacaatttg tttatatattttacgtatctatctttgttcca score :288 19638_at||#F24K9.8#At3g11410 protein phosphatase 2C (PP2C);ide ntical to protein phosphatase 2C (PP2C) GB:P49598 (Arabidopsis thaliana)|COORDS: 3585657 3584189 tattttgacctctgacttatttaaatcttaattaacagcataatactgtattaagcgtatttaaatgaaac aaaataaaagaaaaaaagaacaaaacgaaagagtggac CA CA T G CGTG T C a A gaaaggcc GGT CG t TA ccg ttaaggtg TGTCG aa C tgtgattgg G C CA C G T taacggcgtatccaaaagaaagaaag G g CACGTGT a TA g atctaggaaaaaagaaagaatggacggtttagattgtatctaggtaccaggaaatggaacgtcacaccaaa c GG T ACGTG TC G G atcctgcccgttgatgctgacggtcagcaacttccccttattcatgcccccctgcccg ttaat TA CGTGT aacccttccatgcgaaaatcaaacccttttttttttttg CGT t CTT cttcaacttttct ttttaaatcaaaccttttctttttaaaatcacattgcatt TCC ta ACG ctcaacaaaatctct C tc TACTA atatctctctctctctctctctattgttgaagaagactcataatcggagattgtttgtttttggtttgctc tgtaaattggagaagttttgttagagatcaaa score :264 14924_at||#T1B3.8#At2g28400 hypothetical protein;pred icted by genscan and genefinder|COORDS: 12097543 12097055 tatataggaattgacaacaaaaagaaatcacttattagtaatatatcaaaaattgtactaaatttattttt tatgaaatcggatccggtatgcttgcttagattatactaatactttataactaataccattgtcaaaggag gaaaacggtgtctgagaaaaaaaaaatctggtttgttaataaattccatccatg taagagttattattgtt ttccaattcatcttcctacgaaatttcgaagaaatataataaattgcgttctcttcttta ACAC a TG attc aaatattcacagcgtagttaataattgaataaaccacaagttaaaataaaataaccttttaatcggttgc C GTGT a GA gcccttgacagtcatcgttt CGT tt GGA taaaatcgcaaacttctacgaaatttccaagggcat ttccgtaatttcgtagaaaacggtga ccaaaagaaa T t AA CACG a G G a G gttgaagtctatgtctatccat tgaccttcac C a ATATA taaagaagcaaaac CC aa AGCC taaaagaatcttaaaactttgcttaaaaccaa aaacaaagtttcatttcttcttcttcttgaaa score :72 18936_at||#F28P22.4#At1g72770 protein phosphatase 2C (AtP2C HA);identical to protein phosphat ase 2C (AtP2C HA) GB:AJ003119 (Arabidopsis thaliana) (Plant Mol. Biol. 38 (5), 879 883 (1998))|COORDS: 26602416 26604269 tttaaattcatctttctatttctgggtaaagctaaagcttcttttttaatttagatccaaggaaacccatg ttctactatatgatt AT G GA CG T TA atggttcgaattttttagggctttcttgtgattgatt tcgctgatt ttccctcctttttctggtttttgtgtgaattttctttgac ATG ac CGT tt GGA taatgattcgtgagtttt tgattagtgctttgtgatctctcaatagggttgaaatcaagataaggatttgagaaaagatcgacgaagtt GT t GCG a T tttgggatcaagagagtgatataatcgagtgaggagaattcgtcgtatagattcgtcagatct ggttatctccggtacttaattctccttgattcgt cgttttaactatgctaagttagtgtatgaataaagcg gcatttgggtcgctgacaaaaacttatttggagcaaggattcttcaacat CA t G a AAC cattgcttagagt attcgaagaagaagatgaaaacaattgatttctcctaagtccatctttgaaatttaaagctttgttgtggt gtggtgtggaaatctctgattttggagagctc score :54 13967_at||#T13J8.150#At4g28040 Medicago nodulin N21 like protein;MtN21 gene, Medicago truncatula, Y15293|COORDS: 12905375 12906695

PAGE 143

133 agacattattgaataa A t GTCGG attatcta C t ATATA tgtgtg C g C AT GTG ata TAGTA ct G taatagtg ggctaca GGCC a A C t TCGGC tttccaagaacatggttgcatcgagcatatctttgctagattttaaatata actacaacttc ttgcttactttgtgatgcggaaactcaaaaccgtaatcactcatgttatattttttgttg ttgctttcaacgctaatatctgaaaccagttttcgctggtctaagatccatttgcaagaaatgtcaacaaa ataaaaagctcttatcatt C C C ACG a G TT ttagttaaggtaatctattattccaaga C tt C CACG a AC cac C t A TATA T a G aagcttacatattctatagacttaataaatgtatagttcaatat tatgtc AC ta ACGT acg cggctaacaggtttgttaatccttcctatcacatgattgacatacacacaca CA C A C AC G C A CG A CAC a T t atttgtattcagtattcacatata C g ATATA taattaca A G A G ACGTGT A TA T t G agagattcatttttta gtgattgtagatcttaagagagagagagagag score :234 16413_s_at||#T22F8.110#At4g39210 glucose 1 phosph ate adenylyltransferase (APL3);|COORDS: 17224812 17227661 aaccagagaactacgaattattcggtgagttctctcattgatacgatcctctt C t ATATA tc T c CGTGT T c ttctagcttgaaagcattttaaaagtttttacttttggaggtttcacattctctccctgaacgattttgct ttattttgttcagcttcggttctcgtcgaatcttctgtttctctgtcgattcat tgatctgaaaatctaaa aatggattcattctgatttaacttttatcttgttcaaactcatttatgatcccaaatttgttcgtttcatt ctttataaattagaatccttttttgttcttgtggcaattatcttcctccttttctctgttttgaaagattt GT TA C c TG cagaagttaattccattgatttcttgaagatgcagtatctcttgctacttaaaagagttagat tttttgtctgatacgttaaaattatg agaggctcgtt CGT t CTT taatctgtgtggtcaaagtttcctact ttccgagaatttctttattgttgtgttgttcactttgcatttataatagtttttac TA a CGA g C tttacga ttctgttcttgtctcattccagctaaaaaaag score :42 18590_at||#F10D13.14#At1g69490 unknown protein;similar to N term half of NAC domain protein N AM (Arabidopsis thaliana) GI:4325282|COORDS: 25333645 25334634 attgatcatgtttatcagtaatcatgaaagacaaaga G t GT g ACT attgtaaaccaaattttagaataaaa taaataatttatcata C t ATATA cagtattttgttaagtatatgtcatccaatagtaacattatcatttaa actgaaaaatgtttcagctactttaaggaattatagctttattaaaagt atatactttta G G T CACGTGTT T A gaggtgaagaacaataataattactcaataagttcacc AGT c AC a C tccaacatcttattcaaattcct tttaaaagctttttaac CGTG GC t G tttgatgaccatttgacaaaatttagtatattagaaaaaaacaata ggatagggataatataggacattagactattagatggacaaaatgaagtattatttaattttccaatgtac caaccaataagaaag AA GT GA C GC acagtaaacgacaaaaagctcaagcataaaaacccaaaccttctctg ctttctaaacatttcaagaaccttgagaacatcaaaaactaacacagaaagaaaaaaaacagttcctgttc tattagattgttttctaaattgtctgaaaatc score :186 17860_at||#F27G19.10#At4g27410 putative protein;Arabidopsis thaliana nap gene,PID:e1234 813|COORDS: 12673507 12672422 aaggacaactggc T t AACACGTGTCG ATA aactttagcttt TCC tg ACG gttcatgcacccgaacggcac C a ATATA gtcgaatatagaaataataaagcttcattgctttctcagtggaaaaaaacttactataattagaa tatttctagtaaatatttagacatttta CA t GT c AC gatggaataagaaaaaataaattactgcaatcact cagcggacttt tctcataatcgtagtacacaaatttcttactcatgagaaaattttc C t CGTG TT ttttgc acatt TATAT c G ctggaaaatattgtacaacacatcgata A g G T C GG CC agttc A t AG A CGTG G C a G tatc taaatggtt A a ACG g GT ccacaaagaaaatcgtaattagattttagaacagtaagac AAG g ACG ac CGT a C TT atccttgaagaacctttggctttt CC gt AGC C t ATATA aaccaactacttac TATAT t G tccaaaagat cg ACG aa CAT aaaaacaaaaacatataatttgggtttttagagttcgaaacttgaaatctttttttttttg gttgctgaggaatcgaagtagaagagtataaa score :360 20096_at||#F19I3.4#At2g34810 putative berberine bridge enzyme;|COORDS: 14633835 14635457 aat C A CG C CA C ATG c AAC aattaactacc aagggtcccaaaaaaagaaagaaacacatactactctattta atttgataataaacaaataaagtccgtttagtagattatgttttacatataaaaacaattaaagtccgttt tgttaattatgttttacctataaaaaaaaaattcaccaaaaaacattaatcaaatcatatcctagttgatc attggtgaggtgactgacaggtcgtggaattctggcatcctatc G t CGACA aatcttaaactaa AGT a AC t

PAGE 144

134 C ttactgttactttttaaatatgtca A a GT a ACC aatatcttctacggaacttatatccaatttgcatact tttactcatactatacaaaatacaagcggatataaacaacaaaataagaaaactacactaatatctataga aacttcaaattgaattgaatattatcgtgatatatatatatatata TATAT c G ccgaatattccctggaaa attaaatacaaaga C ta TACTA ctctattggctataaatacaca cacaaacccaacaaagctatttcatta ttcaaaaaccaatcacaaataacaaaacgaaa score :66 18122_at||#T9D9.17#At2g30360 putative protein kinase;contains a protein kinase domain profile (PDOC00100)|COORDS: 12887114 12885807 tgtagatcgattgtacaaaatgaaaaaactatatgatttaaggaacta gatatacatatgcatcgactcaa gaacagttgctttatgtgaagactatcaattacattatcggctaaatttggctt CA t GTGT tgaatagatg gattgtgtgctcaggtagttgtgtcagttaattataacagtaaaccttcactgcttcacatatataaaatt tgggttaaaaaaactttaacattttatttttttaattgacaacaaacaaattctgtaac GGT t AC a T ataa tagtgcttcagttaaactcc ttttttaggcaactctaattcctttatacccttaatccattattattc TA t CGA a C ttatctttttaaaagcttc C g CGT t AC gccatcacttctccaccatttaaatacctaaaccacttt ctttcaaatttcttattccataaaaagtaaattcacaacaaaaccaaaaaaaaaaaaagagaagaagaatt agct T A CGTGT T c A tcacatctgcattacgcgcaaacgaaacagattcaaattctccggcgaa gttttctt attcatcaatcaatctctggattcatcaatca score :60 17179_at||#F13F21.11#At1g49450 En Spm like transposon protein, putative;similar to En Spm like transposon protein GI:2739374 from (Arabidopsis thaliana)|COORDS: 17578384 17579799 attgtcatgtatataagcta tagcatatatacggatggaatatcgattaaaccattttctttttcaccatt ggaccataatcagataatatacatagattgggaaccccaaatattttcacattttgacatatt TAG TA g C G AC attaaactacttcgtgaaaatacataggatattatgtcaaattgtcaacgaaactttttatatttaata agaaaacgaaattaaataataattaacagaaaatact C A C A a G TG AC tttcaatgattttgtc tctaacgt tttcaatgtttattttttggtctctgactcttcaaggaaactggtaaaaccggtgtcccggtagtcatcac tcacatctcgt C CAC CACGTGT C ACT ctt AC ACG T t CA T aagttttcccacctttct ACGTCC gattctgt atttatataccctaaacccatcgtcttctatacatgctcttatttcacaatcccaccaaaacaaatctcaa gaacatttcaagattcagaaaaaagaaactcgcaa acttttttcttttgtttctgtca ACGT t CA aataat caagcgattgttttttttaacctagaaaaatt score :168 15485_at||#F12K2.11#At2g27310 unknown protein;|COORDS: 11633510 11632497 agaattggagatattcaatccaagtccagatacatgaatactagctaaaaatgtaacaatagaaaactaca aaaacagtattaaatcgcagttaaat atgacattttcatcaaggacataactaaaaatatagaaaatttct t C t ATATA caatataatggatatctaaatattatatatttattttaaaactttatataaaacactgatgta acattt C t CGT c GG atctttgttctctgatattttatccattcacaagaaaaaaaatactatattttattt ttctt TA t CGAC aaaaagaacctttttaatcttatgataaaaatgaaagacttcataa A GA GA CGTG a T a A aaaaaagaagaaaat G ga G C GT G TG TG t A C g CGTGTG T A tctactatgtatagaaactttagaaaattagg aagaatcattcaacttgaagtagaatagtctcaaagagccatggttgactgatcaaccatcgccttcttca tccttttctttattttctacattagtcatttaattcattcattatatatacttctctctaatttcttaatc atctcttcttatagtttttaccctaaagaaaa score :1 50 13189_s_at|16981_s_at|13187_i_at|13188_r_at||#F27F5.21#At1g45145 thioredoxin, putative;similar to thioredoxin GI:992966 from (Arabidopsis thaliana)|COORDS: 16404426 16403434 ttctatcattagtgaaagaa C at TACTA gaatttgatttattagaggaataaagaagattgatcaaaaaca gc ttgcattattattcaaaacataaatctagcatattatagataatatttttcttattgtcttagataaag tttccattcaactt TG C ACG T C T aaaagctcggaatagtccacaaattatcacatagaaaaaaacttatat t CGT G c CG ACA aaa AC aa ACGT tacttttttatatctgc AAG g ACG agaa AAG a ACG gataaactca TATA T c G tgcacacaaaa C A TG CGTG T C ttctacatctagaact GTCG t TAC t CGT t AC ca T c A t CACG ccgttg acctagactcatgcctattctttaatcgtttaatataaaccgttggatcccttcacact C C t AC GC G T caa gatcgaactcattttaggaaaataattaaaaaaaaagaccgagaagaggtagaggtataggtggtcaagtc

PAGE 145

135 ttgcctataaaagctgatcccaacaagaataattctttcattcaccaaaaaaaactcatcaatcaaacaaa actcttaaaagcttaag aacaaataataaaaa score :114 19982_at||#YUP8H12R.13#At1g79270 hypothetical protein;predicted by genemark.hmm|COORDS: 29029265 29031458 ctaatttaataattcttgagactttttaaaaccctctgcgaaacagatgattcgagttttggtttcttcat ctgtgttttgattttttcgtcttttattcttagtcaaacacataaa tttcttttactactcttcgtagatg acttctcatgttccaaaacatcctcgtaaatgtactctttctctctttctttccta C A CACACGTGTT T A a tctgtgttgttctt A CACG T C T t T ttaacttttgcgtttgaccttttatctgaatcctgtgtttgtgtgtc tgtaaatatatatatgtgtgtctatgtgtatagtgctct GTT t C t TG tgattagactcatatgtcttattt tcaatattaactataaat atata A G T C AC GC CT agttattagtttttatcggtacgactaaatattcagac atcagtt CACGT t TC aaacctgtctcttcacatagttaagaaaataaaggcatcaaaaagattac TAGTA a t G tttgtgtctactttatacatacatatgtcttgtttctttgtgtttttcagaatctgatctcttcggga A CGT tt GT ttaatttgtttcagcctttgaagat score :306 14811_at||#T8P21 .19#At2g37900 putative peptide amino acid transporter;|COORDS: 15814952 15812940 gaattcatgtttttcattttggtctgtcagaaaaaggataagatttaggttagcgggataaaatcttttag tagaagaatgtaatgagggaatttttctttgttgtttttgcctgaaaaggtgaaaagaatacaaaaaccat cctgtaaagtttgttgaaggacaagttc aggataagattttgatttggagattcactttgggtttggattt aatgaaaattagaagtttaaagtaagttttatagaccaagttgttggacaagcaatcgaattgggagctat gttatgtcttgattatgctt ACG a GTT ctttt A t GT t ACC tatattttcacctatgtgaatacggccgttg tgttgactcatcatcctcaacatgaatgtggaaaaaattaaggaaaattagaaaatgaaaactgcttgagg ctgaaaataaggcaatgtgaaaaacactgtacaagatatgactcaacacagtgagcttacgc C t A TATA T a G aggac CA a GTGT tcttcttctctaacacacacacactgacacacacactccttaagatcatct C t C t CGT G T a G tgtattttgtcaataaaagttgaacaga score :48 19108_at||#F10M10.180#At4g34410 putative protein;ethylene responsiv e element binding protein homolog, Stylosanthes hamata, U91857|COORDS: 15416482 15417288 gatctctctattcatgactgtgaaggccattgatgatcaaatagcatgtgaaatcataattt TG C ACGTG a T T tat CC ta AGCC aatcagt A a GT t ACC tagctacttattccatt A a ACGTGT T T A aatttatatgtaaac tgtgttaaaaaggaacaattgat aagcgcaatcataataatactaattaaattaacaaaacagtaaaaaga agttcttgaaaatcgtcatgtatgatgaaatgattaatactaaataatgtttaaagaaatgtcatgtcatt taaaccccccaaaaaaaaaac A aa GA CG TCAC a T tcaaatgtcaaaaaatattcgaggaaata A C g CACGT GT T T ccaaaaacacagatcagaat AC G T C C CGTGTG a A aagagtgtgtgccagt TAG g CACGCCAC GT aga tacttgacaaa A ACACG g A g C c ATATA aatagagcttctttctcactttcagtttcccaaacacaaacaaa actcatattttcaatctccaggtgctttacaccaacagagtcgcaagaaaacaaaaaccaaactcggattt agtttgacagaagaaggaatcgagagtcgggt score :384 16989_at||#M4I22.70#At4g27260 GH3 like protein;GH3 protein, G lycine max., PIR2:S17433|COORDS: 12618198 12620386 taacaattcaacaagtgagatttttttcatttgtaaatgcaaacaaagtaacaaactggtggttcattaca aaatcttcacaattatatatatgcttaagtcagctcaattatatgctaagtctaaaatcttcatagcaatt ata GT c AC a TG gaattctctattaaagtataagaagaaagaaatggtatgaatgaagaatg atatagttag agaagtcattattattagagtagaccatctttatccatccggtacaggttttgaccaatacggaaagagag aaaaagagtcgcaatccttttttttttgttacgaaattaaaattatataattagtgcaattta AAG c ACG a atagtaagcgttcacttaattcctcgattaactgattagcatattacgattatacccttcatcttccttat tgtgtctctactctctactttcaccttcctttc cttccattctcct C t ATA TA TA T C G ACC ctttctcttt ctttcttcctcacacactaaaagcttgcaaaaaccataagcttatctacttactcatctctctcacaaatc attttctcagacttctctctttctcttaaacc score :42

PAGE 146

136 15586_s_at||#F20B18.190#At4g26080 protein phosphatase ABI1;|COORDS: 12186325 12184728 tatgaattaaat agtttaaaatttc AAG a ACG aaacatctccccttattacaaaaaataaataagccaaaa aaagtatcactttaataatgtaataaaataaagtgatatgtatataagtatacatacataaatagtagatt cgaagcaattgttgcattagcctacccatttcctccttctttctctcttctatctgtgaacaaggcacatt agaactcttcttttcaacttttttaggtg TATAT a G atgaatctagaaatagttt tatagttggaaattaa ttgaagagagagagatattactacaccaatcttttcaagagg TCC ta ACG aattacccacaatc CA g G a AA C ccttattgaaattcaattcatttctttctttctgtgtttgtgattttcccgggaaatatttttgggtata tgtctctctgtttttgctttcctttttcataggagt CA t GTGT ttcttcttgtcttcctagcttcttctaa taaagtcctt C t C t TGTG aaaatctct cgaattttcatttttgttccattggagctatcttatagatcaca accagagaaaaagatcaaatctttaccgttaa score :36 20227_s_at||#F5F19.9#At1g52030 myrosinase binding protein, putative;similar to myrosinase binding protein GI:1711295 from (Brassica napus)|COORDS: 18620976 1861878 4 tgagtattaatcagcgaaaaaacgtattttgaatattttatgttttgtgtttcttctctaactagtgacta cttctagatttcaaagaacttataaacaaattattatattattttcttttttctttccaattttgtatttc ttccttcttaacaatagctttaagctaggcttctttttctctttataacgtcaaaactgattagtttgtta ggaaattgcagaggcaaattaaagtaatgatagtgtaaacc atgaatttgtcaaaatccagaaattttttc taatgatttcctcttgttaagtatgtaaaaataataattatgtcaaagactcaaaagaagacaaagatctg aaaaagtatttcctgaat CA t G g AAC ctgagaaataatattcaacaaaaaaagtaattaattatctagggt tataacgttttttgcttcacagttgtgataagtaaaaaaaaaacatagtttgtgttgcttcctcttattct gtttttcatatga ggagaccattcggttctttgtttcttagtgtgtttctatgaaaacgtttccatctttc aggtattttctgaagttaaaagatagaaataa score :6 18206_at||#MJB21.18#At5g42800 dihydroflavonol 4 reductase;|COORDS: 16459692 16458124 ttaatatgttttataattttacaattttgtaataagaaactcctaattcataaatctaaaatata aacata tttttcatttaagcttttccaagatttataattattttaggtgtctgatttttagatttcaattaaaatta aaatattacttaagtaaaatgtatttctgtatatattctatcaaaatgttaatttgtttagacaaattttg atttatttcgtaaaagtgggtggggaacaaaaacaaaaacaaactgaactg AA GT c AC C C ACACGT C T C ac caaacaaatcgaagtcaacgtatttcacccaccggta caacaacaaaatacacacctaaggaaataataaa atcaacttaccagattgttacgtaccacacatctctttagtccttcgtca AC ca ACGTT C C C CACGTG C T T ctccggttgg T A C t CACGTG A C CG gcagcttct CGT t CTT attatctgttttcttcaataacgattcataa tctctagtgtcttatttataatgtcttcacatcacaaagatttg TA c CGA a C atacatagttgaatctttc ccaaagcac aatctatcatataaccacaaaaa score :276 15453_at||#dl3050c#At4g14020 hypothetical protein;|COORDS: 7060187 7059852 atgatatataagaaaaatacaattaaggagcctgcgatcaaatgcataatccccgaaactaaccggacaga tgacaagatatccttggaaactaaactcaattgaacaaaagcaaacaaacaataacactccaaatgaaa ag caatggtaaaacaaagcagctaaatggggatctaccgtcaacaacgaataaatctaaactata C c ATATA a cccaacatcaataattcaaccagtatataaattagccacctaaagaataataatttggattttttcggaaa actccttatggttctatgcttcaagagactaattacgaactatttttttttttttttgttaaagagtttgc atcacaatcactcttatccaaaataactaagaagaaacata tt ACGTCCGG c C G G T AAC c T ttttgaaaag gagaaaaaaaaaaagtatttgtttt C g ATATA atgaattatctttctagttaaaagtaaaaaagattcgct taaacattacattta CA a GT t AC aaacacaatcgttcactataaataaaggtgtagc C cc TACTA ataaat atataactcaaagcataaagaaacatcaatca score :42 12004_at||#M4E13.120#At4g35060 putati ve protein;|COORDS: 15650910 15650365 aacatttatatttgttgcataaacaattaaacatgtcaagggtggtccaccaatac GT a AC a TG aacagtc cgtcttaagagattccaacttccaatcccaaaatattcatcaattctgtaattgtttagtcagatttgaaa

PAGE 147

137 taattgcgaatagagatatcttaaaaacttcattcaatagcaaacatgaattgtataacaaatgtctattt cat ctatctgtcaatgtttaattattgc CA CAC tg G TG atgcatatataacaaaagcgac CT ATAT AG att aaagtccttttataacaggttcatatacacccagtaaccaccaatacac A t A C ACG T A C A T atatatatta ttgccgtttgagagtcaaactcat A CACG a CT C actgaaggttgatacaaaatatcagcatataataacta aa TATAT g G tcgaaaactacttatagagtttggtaggtcgtatttt caacattttaaactcaatttttatg ttgacaataacaaaaaaagaaaatcatatcatatatctcaacatatagagtctaacttttttcgccaaacc aaaagaaagaagaaac CA a G a AAC aagaaaaa score :96 13874_at||#F23E13.180#At4g36290 putative protein;hypothetical protein, Arabidopsis, PID:E353139|COORDS: 16137676 16134158 gatatttctttccagtagaggaattagaatctggcaacactagggcctaaacttctttagtcgttggtttc atttagaactttatttttttttgttttctttttagattacgtattttgttttctgcttggatttttgctgt tcttttatttatttacttttagactttaataaagtttgtgtttgatttttccaactctcaaaatatctcgg atcagcttaaacaatttttgtgaagtactatc atactatcattaacacttttggattcagtaattaattta t G t TCG a TA cataatctccagacattatatatctaatgggttatagccttataggcttttagcatagttg G GCT tg GG cttttgacccaactaggatttagtgatctgcaccgtaagagcaaagcctttttttttttttttt tttcgccataagagcaaag C t ATATA cagtatcggagattaaaattttcttactattagttaaactgactc ttaa agcaaaagctctagtgatctgcaca CA AC GC GT TG ttttagtttcaaatagggaaacatcacgctgt tttgttcgaattcgttt TGTCG c CGTCG g TA a score :54 15975_s_at||#F23N19.3#At1g62660 beta fructosidase;nearly identical to beta fructosidase GB:CAA67560 GI:1429209 (Arabidopsis thaliana)|COORDS: 22411370 22414936 ttataatttactcttaaaatatgaaatagaaaaatggagtggaatattattatttagaaagatatatatat atatatatatatatatatatttgtttgtatgcagaatgttagac C a ATATA attagtggacatttctataa aaaaacaaaaaattgtggaatggatgggataaaaaaaaaagagagatcaaagataacattttaagatgttt ttctttctttaattttcaacaat tttaggcaatttag TAGTA gg G ggaaatgtatttctatgaaaatccaa atatatgctaatgtatgcaacaaggaaagatatagtaaccaaaactccacttcatattacaaacaataatt atttttataaaagtattgact TGTCG t C tg C A a A CGTG a T TA cgtgaaccttactggtaataatacaaaaa tcacagctctaatttgccaacccaatccagatctctctagttctcaacttctcatataattattca ttaat atatttattttgagattttggttgattttttgatatataaatattggacttgccattttctattacaacca tcaaatctcaataaagcaaaacgaacaagaaa score :54 15798_at||#T20F6.15#At2g02710 putative receptor like protein kinase;|COORDS: 759606 757227 attagttactaaatgaactattaagttcacttatcttatc caatttgtgtctaccctacataaaccttgta cttatccctaaatcactttagataaattgttgaaaatttaatttataaattttgtatttaccaaattagga aaacaaatatctgaaaatatttatttttaatatcttaacaactcgaagaactgaga A a CGC g AC aaaacca atcgtcctcttccgatagccacaaaacaaaaatcagacaagaagaaag AAG a ACG tttcttctaacagata gagattacaatc aaattgactcttaatttctcaattccgtatctctcatttcatcttcttcttcttctcct ttacttaaggatctctggtttctctttctctcctctgtcttatcttctccaa AGT t AC g C tttttttcaat gaaaacattttcttgattcctgatttcaaatttcacaatctggat T t CGTGT T tcgaattcactttgaaga aaaagtttctcaatttcgcctgaaaacc CAC AT G g G attggaactaaacactaga ctgacaagaagaagaa gagtaatcatcatcgagtgtttattacgatta score :42 17026_s_at||#F7H19.60#At4g22880 putative leucoanthocyanidin dioxygenase (LDOX);|COORDS: 10970557 10969403 gataaataatttagttttccaaaagaattattgatccacatacaattgtctatttgaaattgaaaaccagt caaattgttttttt agtaattgatttccaaactacaaaaagaaaatgtggttagtagaagaac TAGTA ga G g T g AACACGTG G a G A CACG C t T a A A G C ACG C G A C gaag A AC A CG T tgatagcgattatgggtttaattcta ttgggccttttctgggagtctaga CC ca AGC C c ATATA gtagtaatctttttgaccaatcagtcaacccaa ccatcctctcccgttgaccgtgaagtg A G T C AC GC AC ttacctcacaacaatagcac taaccaccggtagc tctacaatgtctctta G t TCG g TA acaaactcttctaactaaaagtatagtaaaaactttg C t ATATA aga

PAGE 148

138 aagagtctttgcacatttcatttacttgcaaccaattacaaaaaagagtgtaagaagaaaaacaaaacaaa tccattttttttattactctgttttttcccctgtttttaagtttatttacttcttactctgttttctgctc tgttttagctttaaacagaagactaaaga aga score :198 13603_f_at||#F17L22.110#At4g21650 subtilisin proteinase like;subtilisin like proteinase ag12, Alnus glutinosa, PIR2:S52769|COORDS: 10468870 10465813 aggattggttttagtttggtaagttttttttgaaccaaccaaaattgaatacaatgtacttcagtgataaa accaaatattaac ccgaaagaaaaactgaaacaaagtgagattgttttgagtttgacactactctcaatta ttaaaccgaaccaaaccaaatccga GCCGA aa T tcgaatttaacccaaaagttcatcccaaatggtggtgg agt CA t G AA AC GTG aa T tgaaagatgaataatagatgaattattta C t ATATA tacaaccatagaggcttc tatgccaccacacacttctcacttaagaaaatgaataactctttacaaagttctaa acttgtgcttctatt ggctattgctttggtcctatttc T t A A CACG g A gctggattttcttacagctgctggagccttagacagtg atagcaaagtattcaactaaaaatacctttctctatttgcaaaaatctctaagagttaacggattttaaat tttgttggtactcttgtttcgcggattttataggtt TATAT a G tgtatcttggcgaaagag AACACG atga tcctgaacttgtcacagcttctcatcac caga score :72 14550_at||#F9D16.160#At4g23690 putative disease resistance response protein;disease resistance response protein 206 d Pisum sativum (pea), gb:M18250|COORDS: 11304213 11303650 cttagaagcaaagctaga ACGTG at T tggtgatttgtttta C A a GT t AC C taattttag ccatatggaccg cacatgaatgatgggcggtctcaaagcaatatttccaaaatggtcttttaataattggtctttttttccat ttctgcgtaacactttcgtttgacgttttgtttaaatttatttttcagataattagtat ACAC t TG ctgtt acattagctagctagccttcgactttatgcttttaggttggactcaccataccgtttaataatagtttcaa atcattatatatatatatatatatatctttt tcctgtcaaaaaatattaaatctatcaagaaagtgaaatt aggttgtagcttagttaccaccaaccccgcttccgc AT G GT CG T TA aaccttatgatctttctaatttata tcttaaatat A aa TCGGC aaaacatcaacaacccattagatatgtccctcaaactttgaaaacgattatcc aaataaattatttattaaaacaagtgaataatactt C t ATATA aatgggtttaagcctccttcattaaaca cat ctcacaactcactaatctctagctaacca score :60 18953_at||#F24J8.4#At1g21400 branched chain alpha keto acid dehydrogenase, putative;similar to branched chain alpha keto acid dehydrogenase GB:AAC69851 GI:3822223 from (Arabidopsis thaliana)|COORDS: 7493490 749 6238 gaatctttaaataccaattccaccatcatcaaatcgtttatggtgattgtt G t GT a ACT ttcaactacact ataaatggaaaaatgtttt C t CGTG g T atcaataaaaaagcttttttttttctttgttttt C A CGG AC GT c atggaaaattcaaagaaagatataaat GA CACG CA agta GGT g AC t T tacaattttgctatcgttggataa agatcaaaattcaataaaaagcacaagtacagtttggt ttggttgttgaaaaaatctttaaatgatttgct aatgtaggtgttacaacttctgtcaatatttcattttgatttttttaatttttcaatagtatttgtccact ataattagtgtaaaaactaaaaatcccaaaaagagaggaagataaataatcattttaagaaaagaaaaaga agttggagatggatacggatgacatgtcaa G t C CGAC A aggaaatacatgcact TGT CG TC C TT ctaatcc tttgccctta aacatcgttaaaaacgacaacttctctcttcttcttctctactcttctccattttcttctc ttcttctcttctttgtccattttttgacagat score :84

PAGE 149

139 APPENDIX K VP1 ACTIVATED/ABA ACTIVATED SUBCLASS OF VP1 OR ABA DEPENDENT GENES 15863_at||#F7D8.14#At2g21820 unknown protein;|COORDS: 9251766 9251530 ggaagaagcaagattggaatcttgatctaagagtaccattagataacacagtaatcttgatcaaagaacct tcatgattcaaaatcccaaaagcataacta attgtgaggtcgacagagagagagagagtaataaattagtg gctcaataattgtt TC t C a A CG T G GT G T ggttggaaaatttaatattttttatttcgtttactgaccatgc ttaattggattctattgagcccatttttttttttttggtatcattatggtaa T t GA a GTA ataatgatact taaaattgttattgaatatgaaacttaaaaattgatgaagagtcttctcagtgtgaagt AC c T t TCA ctag ac agttctttattcataagcatgttcgata AC a T a TCA agaagcataaatccaacagatgataagggtcaa agacgacttcttttggctaactta C G A C A CGT GA C c A acaagataggtctagacttctgaagaa C T G C CAC GT C T T G gca A G GTG GCA ggttttaagaccatggcttgagggcataaataagttt TG t G g ACA aaaggggaa aatgagttaagaaggatcggtatcatcaacga score :168 19152_at||#MPH15.12#At5g06760 late embryogenesis abundant protein LEA like;|COORDS: 2090312 2089753 agattatgcattgttagctccattatcattgtgactttttgctctctctttttgttttatcaatttgtt T t ATGC g A ctcgctttgaaaactttagcccattctgtattgagctctgaagat TCG a CG a G ttctgtaagtta C CG at CAC A gttaaaagactt TGA t A t GT taaaacccttatattacagcta CAT a C t AT ttttgtcttaac tcttaagatatcatgcacaataatatacttgttttgtcttaacctatcgattacaaaac CG g G T TA a C CGC C G A C a TG a G g CGA actctaaagcc T AA C A CG C GT C A A CA t C TA T C t TC t C a ACG actcaaaggctttc C A A C ACGT GT a GGACC a A taactgaaacacaaagccta CCACCT cttcttcctct tctttca T GACACGTC TC A c T GACGTGTC G T C a A g AACGT aattaaatattaaa CTA t C G T G a C G A a CG c GA g G C CT A T G GC tatatcta tgggacat G CT C G G T G a GC aaaacaataaacaaaagtaatcagaagatatctttgtaacatctttgaa T t T CGC t A aaggaaaagagagagatttggtaaaaa score :378 17407_s_at||#K24M7.3#At5g52300 low temperature induced 65 kD protein (sp Q04980);|COORDS: 20531031 20533230 atgatgatgaagaagagaacgaattttgaaattggcggttttgaatttttaagaaattaaaaaatat C C C C C G t C GA tttcaagagggagatggagataccaaagcaac T c T CG C C A C t T gtcgtcttttaattttaattga gt ACG T T a T G CCG ttttaaatgttcaaaaca GC a CAC a G tt G A TA G c TG aa ttgattttttc T tt TGCCG t tttgttatatttaaacaacacacagtgcatttgccaaataactacatgatgggccaataaacgt GGACC g A ctaaaactaaataatagaaga TAC a TC GA taggcttctctaaagatcggataaaagataatgtcgca T a G c CA CG T a G a GA gcaactggc T GA G ACGTG G C A GG A CG A a AC g G ACG ca TC G t ACGTGT C A gaatcctacaga agtaaagagacagaagccagaga g AGGTGG ttcg G c C A TA T G t C atcgttctctctataaactttatggaa ctttgttctgattttctcagagacacgaaaagaaagaaaacaacactagaacaaagagggtttgattgatt cacttgaaaaagagaaaacacagctttggaaa score :240 14097_at||#F17A22.41#At2g47770 unknown protein;|COORDS: 19517234 19517824 gaattagcgcgaa tgcgaatggaactgcaggttttttgaatagatcggat C GA tt CGT C tcctt CCCC a G C C G A CG G cta CG a GA a GC tctcaaactcgccggtgatgagg C G CCCGC C A t G aaaacagagcaaa TCG CAT C AG cgtctagcca A C G C C G CG T A a C agacaac TAC t TC c A tattactactcttctaattagcccaaattaaa tgagcctattgggcttcttgtcttag TCGG t G t A gagcccaattgttgttttattt tttaa T a ATGC a A aa gtattaagcgataaataaataagcatcgcaa TCG t C c CA aaactgtgtgtatgcatcagacatgagcatat

PAGE 150

140 agagta A g CA CGT GT C C A CA ctttttcacaaagttatctaaaaacaaaaaacaattaattagcat TC GA CG T G TAC a T a TC A C T C G C C AC G T GT acaagagccttggcctttttgcttcttcttcttgtctattaatatcat ctcctgattatactctcttttgaccaag ctgcttcttctc CA t C TA T C ca AC a T c TCA aatctcagtaatc ttgttccttcgattctgttttggacgtttgta score :240 20641_at||#F6D8.9#At1g52690 late embryogenesis abundant protein, putative;similar to late embryogenesis abundant protein GI:17828 from (Brassica napus)|COORDS: 18892 535 18893282 aacctttttttccggtgacaattatttatgactttttattgttgtcaaaaaatatattatcagtaatatat caataacgaatacaataaaaa C t CAT CCGAT cgattttcaagaatttatagctatattaaaat TAC t TC GA atccatgtaagaattgtgtattggttctttttagaaaaaagtaaatatctatgcagtaatggcgg T t GCAT a A tatatgccttgagtagatgaatatcc aatatcaag A t AACGT gag T C AC C A CGT GTC t A acatcttccg tagctccgtttttac CA T GACGTGTCA C a T agatataggtcatcatgaaaacgagaaacctaactttaaca ctcgcacataactccaagtttcgaaact T CG t C A C AT caacctaa T C GG G G C A cgtacctacacacc TGT C GC GA A AC tgcaa CA c C TA T C ttgttct C t CGCC GA C C aagacttgctataaataactctga C T A A CG A GT C G gagacaactcacagttccaaacacacaaaaaacacaagatctaaaaaaaaaagcttttatcatttagaaa aatttggtttcgaatttcttcgaagagtgaaa score :306 14420_at||#F22D22.27#At2g31980 putative cysteine proteinase inhibitor B (cystatin B);|COORDS: 13558312 13557788 acatccataattgtataaacttt tgtgagaagaaacctacgaaatttctgtctttttaacatattaatatc aactaatagatggtgatctaaagtttaaaaat CA gg TATC aaataatttcaatttcattttgaattataag tatggactagttgacaatcataaataaaaaccttaaattaccaaaaaaataatccaattgcttagattagg atatataattag TGTG tt CG aatccgatttacaataaaaacaaaattaca G a T C G G a G C A attcac atat A t ACGT t A aaa AG G T GG GGG ctcacttc A CG TG G a G A cttaattggatcagaaaagtaacttaaaagatgcg gaaccccga A g G T G C C G A TG G c ATG g GGACC c A ttcacgtacacccaaacacaaacaccaacattgcctaa actctttctctatataaacacttctctttctttttctttctctcacacaaaaatacaacaacttagatcag tctcaaagggggaaaaaaaacttaaaagaaacattaag aggcaacacaaatcacacaaaagatcaaattga agcctaagaagaaggcaaaaagtgagaagcaa score :108 17484_at||#F20D23.28#At1g17020 SRG1 like protein;Strong homology to SRG1 protein, a new member of the Fe(II) ascorbate oxidase superfamily, Similar to SRG1 protein (Arabidopsi s thaliana) (gi|629561). Location of est F1A5T7 (gb|N96370)|COORDS: 5820252 5821735 c CG a CAC a T atatatatatatatatatatatatatatatg TC GA g GTA aatttgaacgatatataaggggt tggtaaaaaaaccaaaaaccaaaaagtgatttttggattttggtttatagttttcagattgtgtattaaaa aaaatgaaaaactaaaaaaggttttata aaaa G t CA a GTC actttaagaaaattaaaaaccaataaaacat aaattttagattttctttaaaatttgacaaaactatcaaaatctaggtttcctattttttagaaaacacac tttttcacaatcttggttcggtataaagtagatttttagaaaaactaaaaccaaaaacaacatcaaaatct acaaccattcaaagtat CA tg TATC aatttacctactttgg G t C t GGGG tcatact CAT a CCA T ATG t C tt ttatcatctgtcctttatttatattttcttccatcagtcagtttaaatcaatcattttcttcaactattta agtcaatccaatcaatcatttaggaaa AA t ACGT at ACGT TT T taata CC A CC TAC aaaagaac CG T G T C c A aactttt GT t T CG T C caggaaagaa GCCGAC score :108 18317_at||#F23N19.7#At1g62710 beta VPE;nearly identical to beta VP E GB:BAA09615 GI:1805364 (Arabidopsis thaliana)|COORDS: 22438278 22435491 gaagaccggttcaattccggtttactattttctaattaaactgggttaatagtctaatttcaaaattattt tagagctctgggttaattaat T ag TGCCG ttgtaacggtgaaataatatcggttg T g GGTCC attaagtgt AC AC GT G TT CGG actataatttgcatgttctctgttg T G T CG C GA A G C agaccaattcgttgattttatga gatcagcagagtaatattccaaaatattcaatcgtcgacatcacaaccattttaattcgtattttttatat ttatttaatgtaatataacttagttttagaggaa AC a T a TCA aaattaaataat TA ACACGT t A aaat TAC a TCG gtataatatgtataggagtttataaaattaaatttatcataaaagcatatgctaactttaattaagt

PAGE 151

141 gctttctttt taataatc A G G T GGCA T t A aatataattagacaattgatttgtttgattatattttggtga caaataaattattaacaccattgtctaaccatcctttgcttttatagttctctttattctctcaaaaagcc tctctttctctctcttttgtttctcgctgcca score :156 18482_at||#F7H19.240#At4g23050 putative serine threonine kinase;MAP3K delta 1 protein kinase, Arabidopsis thaliana,PID:g2253010|COORDS: 11044610 11048206 ctaagaaaccttatggagagtattattctacaatgat T t GA a GTA tactactaataattgattacatttaa attcaactataatttattgttattgttcttcaatcaacttgaactttgagtttat TG g G c ACA ttactttt agaggaatgtgaa TG t G a ACA ttgaa GAG ACG T G T attttattttcttcttgacaaatacc GTTG G AG G TA taaaaacaccatcaaactcaggcccatttggcaa T g G CG G CA ac A agaatgggtattatattgaggttaat ttcgtcaatcaccacttttaaat CGTG a C a A attcgtctattatctctcaatcttcttcaaccttcaat TA C a TCG gcaattctac A a GTG g CA g CAT c C t AT tagtaacaaac AC AC G T G G C G T aagctaatgcgtcgtag tgttta taatagctggggtttgttaatta T g GC G T C A CG A GT g A tttaagaagcagctgaaaaataataaa caatgagggaatgcca ACGT TT T aaattatatttttaatttgtgaattttcatcagagtcggatct T g ATG C g A ctgctcaaacagagtcaccggcgattacg score :156 13803_at||#dl4370c#At4g16690 cyanohydrin lyase like protein;|COORDS: 835 7933 8356914 gaactaaagaa GAC a TG a C actaattacaagt CAT t C g AT t A t GTG t CA t AC T CG T a A aatttatcatatt aatctagtagaatagttaaccacataataaccacattaaacttcatatgataaaaatatatttcaataatc tccgttaaatatatctatgtataaatcacatattttaatagcaagcaagttctccaattagaatcttcata taagttaataagtaccaaaataattatt agggatcccatagttaatagctttagaatatattaggcatttc aagaaaataatactcagataattaagattccagttt TGC t C a GA TA ct TG gaatcttaattatctgaaatt taaatatttctttttaccttaacatctaattttctttttgtgtgctttacaatatcttttttacaaattta aaatgaaaaaatggaaaattgagctggtagcaaaattaaatacaaacattagacccttttgaaaaaggaaa aaaaggatcgtacgtt ACGTT g T tatctccaaactgt GGACC a A aactg C AT A ACGTGT TA aaaggtgaga tagtgtaaaagcattgatcaagat T c ATGC a A score :126 14448_at||#F4L23.26#At2g45210 putative auxin regulated protein;|COORDS: 18590424 18590912 cacgctaaaataccataatagtttgacactactgttttttttcgt attgtgaatttgtgattgataaccca t C A AA AC G T t TCA gtgataat GATA aa TG taggctaaaggttaatgg GACGTGTC A tatttaatgtagatt cttcgggtttggactat C T GCC A a G caagagaaatatggcagttttaatctaatccaacggtcatagacct tagtctt CA t CTA a C cgttcatttatttagactccatataaactcctttaaagacctcttgtctttcactc attcttctcttgatctc ttcttctt T a ACGT t T atactttata TAC a TC GA aagcatttcttctttgt C CG tt CAC A tcattttcgaatttgtgttttccctagaaattgtctttttttttttctttcttatagagaaaatc aagaaaatgtaaaaaagattgatatatttatttattctcgttcaaatgaatcactctttat TTG A C T CG T T A G tccat G TCT CG T C T C A ctctcttaatctctatttcctctcagatctctcaaaaagtgt agttttgtgac aaaaactgttttttaaagttagctataagaag score :198 16888_at||#F13M22.27#At2g37770 putative alcohol dehydrogenase;|COORDS: 15783432 15785028 cgattctgcattt C G C TCGGCGCA atatatatgcgttcaccca A AA ACGTGT TA cttgga CG tt CAC A CTC G ag A taatctaaatgaactcatatac ACTCG ag A t gttatataaactcatctacaattttgttgttgttgt aaaaccacataatttttcttatcacaaagatttttgaaaaaccaatcacaatgtataatattctactattg ctacattatttatgaaatatataaaacttttttcataatatgaaatgaattataaatcatgtacgaattgt cttggttgtttgagagttctgtcttctg T GA T A C G T G T g A at TG t G g ACA caaaa A a GTG t CA gtcac ACT CG ag A tt agcaaagaggaaatctaaatgaacgaatttaactca GATA C G TC c A ccgtcca T t GGTCC atgt aatgatcaatatct GACGTG A C AA tatactaactaagaaagtactagtattacagaaagaagaaatcaatt T g G c CACG aat C t CCCG c C tctgtcgctctctcacttcttctcaccctttaatagc T a G T G C CG A GTGC a G TG t GC gttgaacaaccaccac C a CATC t G ata score :306

PAGE 152

142 12521 _at||#ATEM1.11#At3g51860 Ca2+ H+ exchanging protein like;Arabidopsis thaliana high affinity calcium antiporter CAX1 encoded by GenBank Accession Number U57411|COORDS: 19116483 19119544 caaaatgatatattatatttattatgtattgtactagctagagtacgaaatcaactaagatcacaa acgga tagt CGG a TA c C aatcaaaagtgttcaattctgactcttgaaaaactttatttatcttcattttgaattga ata CA at TATC taactgtaactttttttc T t AC T CG T c A atctatagaatcaagttaccctttctagagaa t CA a CTA a C tagtagtctagtaccttcctcaatacaatagtcgtacgaacgcc CA tt TATC taggtaccct caaatttt GT g T CG T C t CA cttattagttatctattac tagttgattactcaatacaaatttccctaaata gagaggatatgagagttgcgtatttaaacaaacctaccaaaatatgattttcgtcaaaccaaagcaaaaac ctaccaaaagaaattatttcttaaatacagaaaaaaacaaaaaggaaacttc CG g GA a GCTG c G t ACA tag at GGACC c A acataaaccaaaacaa TA ACACG T ag T CG C CAC C T atatataaaaaacc T G A CG C a A C acaa acagatgtag tagaat C A AA ACGT c TT aaaac score :174 15178_s_at||#dl3105c#At4g14130 xyloglucan endotransglycosylase related protein XTR 7;|COORDS: 7102678 7101643 actcag TGT t C t CA gctcacacactctttttttgttctctttcttttggacagctttcattttctcttttc ttttttctattttgtttcaaaattccatccatatt aaaataggcctgatcatgagaataaaggaaatacta atgatgagtttctcaa T a ATGC a A taagatgcaattattatgagctatttactattgaaaatgagcaaata aatgtcaaaacacaatctggttaagttagagcaactccattgtataggattcatgtagtttctaagaaaac aaaatgtattaatattttacttt TAC a TC c A aaaaaccaacttatatgagtaatagaaacgatcctaatat taggaat tttagagattttct C t CATC t G tttctta AC t T t TCA atatttttattttttaaa AT t G t ATG a gtttctactaagaaactactgctggagttggtcttagcttcccaatgcttct CCACCT atatatatgcata tctccttcttaaaactcatctcacaccaaaacacaaagctctcatcttcttttagtttccaaactca C CCC C a C A actttcatttctatcaaccaaacccaaa score :54 18596_ at||#T3P18.13#At1g62570 flavin containing monooxygenase, putative;similar to flavin containing monooxygenase GB:AAA21178 GI:349534 from (Oryctolagus cuniculus)|COORDS: 22380626 22383202 agtttctgttaggaatctggttctattctcctctgcaacctccagtctctcatgaatctggttcg gattct ctttttccttgtttctatataattta GATA ca TG gttttataattctat C a TATG t C tattttggatatag tattttaaaaatatatatatttttcataaatggttatggtctattctatgttaatgataatcattagtctt tttgtcaactatgtttttttttccaacaaatttagtatgtaaacttttttttactaccgttttattaaatc gacggttgatcagatcaactccggtataacacaacat aagtttcgttatcaaaacaaaaacaaaaacagat ttttttttgt CA a CTA c C agtgaagattagtc T t ACGTGT CA A gaaaccggataaaaatat ATA ACGT a T T tgggcaatcagctaagatattaa C TAA C G C G G a G T tcattattaaaatggagtaatgatgttttcagtttt ctatataa A t C A C G TCG A G A C cgtagagtcttacacaacaatccttcttacatttctaccaacaaaacaca aaacacaaa catagcattcaaaactttgaaaa score :144 17744_s_at||#F13M22.26#At2g37760 putative alcohol dehydrogenase;|COORDS: 15780539 15782062 tggaaaattgatttctctaagcgtagactttaacccaaaagggtggtctcgtggcttttgatttgtcaatt gtgtgttttcttttaattgtataaagtttctatgaatatatttagtttattcg agaaca AT t G t A TG A a A t GT ttagttaatgttttcatggtatataggttgtaaaacaggttaattacgttacaatttcctagttttggt attaatatgcaaaaatgacataatatcaaacttttaagtaaattgatatttttcactaatgttaataatga aac CA a CTA a C aaaaaaaaaaaagaacaaaccaatttctatttgtttaagactttgacagtacatcacttt tagaaataaatgccataattatgtt a TG T GC A CA C t G caaaagaaaatgaatacgcacaattgactataaa gaaaacgcctttagtg G c CAT ACGAT ttttagcattttatgcggaagaaacaaagtgctat T t C t C CG AG T T A CGT a T gtttcttttcttctgcatctagttttttttttgcatattcatcaccttctctcagttttcttct ttcaatagaattatccagaaaacttttaagag score :66 12765_at||#T9I22.10#At2g 22660 unknown protein;predicted by genefinder|COORDS: 9576284 9579387

PAGE 153

143 cgataat CC GA TGC G T atact A CA CGT t A aaaagtacattgaattttggtttaggccttttctccaattag tttattttctgagtcagacaaggcgttacattattgttcgcctttttagttttattttatccat T t TCGCA AACGT t A aaacttaaagacacaacaaaatatataaaacg gagatatgat TG a GGGG catatttgtcatttt ct TGTG at CG ttgatttcgaagtcgctataagagtataagtagcagcct TGA a A a GT gttgttcatctc TC GA a GTA aaacttttctctca GTG at CGG ataaaatt C T CG GC G AA gaagtgaagtttccggcgatcagagg tgagttttccgttagatccttatcctagttgctacttgtgtgacttcgttgattaattttgatctgaatat caaaaatcggg ttttaagattctccagagtcctgatttttctcatgcacatgatcttccaaaattcatagg gc T t ATGC a A ttgattagataaggatcattttaggtcaaaatctggta ATC ACGT a TT G atttacttagtt gtttgcattttttgatcgttt TGA a A g GT gat score :132 20432_at||#F3C3.5#At1g32170 endoxyloglucan transferase, putative;similar to e ndoxyloglucan transferase GB:AAD45125 GI:5533313 from (Arabidopsis thaliana)|COORDS: 11575551 11577893 gatttggattagattttgttagataaagaaatagtctagctattctaatatacagttgctttaactcaata tttctttaccttagttaatttatatatttttggaa GATA t TT G GTCC agttgttaatcaaaaatttattta atgtct T t G CAT t A tgttaaaggtttgtttcctaaaaattgcaaat CAT a C t AT aaat A T t G C AT GA aatt caaatcatgaactgctaaaagtaatcataatgaaatttttagtcgagttctctgatcattctttttttttt aaatccacattggtttgatttttcttgatcgatcatctttaacatcactaatatacaataaacagctgcaa aac C GT GGC AG actaattaaaaaaaagaaagggtaaagctggaaaatattga caaagaggacaaatcagaa aagactattgaataatttactctcccttatataa AC a CC a CG agcgtggaagccaaagagcaattttct C C CCC a C A ccaaaagataaacctctctctctctatctat C t TATG t C aaaagacagatttattaacaaacaca aatcataaatctttttcttcttccgagaaaga score :78 15222_at||#T9I4.17#At2g29090 putative cytochrome P450;|COORDS: 12447622 12443580 TG CGCCGA a C AC tt TG t G t A CA aa TATC tatatatgtaaagactcaaaagaaaagtaaaatcaagttttga aaggagaacatgtaaataacattcaaaaaaaaaaaaaaaacaagtagaa TGA t A a GT tctgaattatactt ttaata TAC t TC t A gaatctaggatattctatttgtagtatatatgcaattttcaagtttgtgtttatt GG ACC a A a GA a CCG T CGGC aaatttttgtatt AC t T a TCA aaaatatattttattttaattactataaaaata aatacaaaaaactaagatagatatacgatctttattttttctttttgcatatattagctattc GTTG GC G G CA aa A ttctaattaatatgtatataattaattacaataaataataaatgtttgaacaaaaaagaaaaaaaa aagaaagaaagaaaggccaggaatgaagtatctttccatctcaactatagc tatataaa CCCC t CA a TAC t TC a A gcaaaagtcactaacaagaacaaacaaacacacacaactcataactataactatacattcatacata taacaataattcacttaaatc ACTCG ga A taa score :156 15161_s_at||#F4I10.80#At4g33150 lysine ketoglutarate reductase saccharopine;|COORDS: 14955558 14949968 ct TG t CAC a T ca tacattataacaagaaatattatattatattaatttaatc T t TCGC TA A C A CG cc CACA atatattaatcat A t ACGT a A tttagcttataaaaaggacggaaagagattattac TGCGC ct A aaaaact cactaattccaaagaaaaaaaaaaacttgtattttttcttgacaaacca GC t CAC a G gc A T t G C AT GA tca aa C t CATC a G gt ACGT TT T G attccttcttccataattttcccatcttgaggaat gcaaattt GG a G a GCG ctttagctaaatcactgccttcattttttcactttggatttaataatttgcgttcctctcttcctctctgc tctgttctgttctgttctgttctgatttgagttttcaattaatcgctcgagcaaaagctatttctca AC T C G T t A aatttc TGT t C c CA gtttgttcgattttcaacagtttcacattaaagtttgggtttttgatgtttgg ttgatgaa ACTCG aa A ta TGA a A t GT t tgtgaatctattccagggtgtttaaaataagggtttgttgttca tctgcagagattatatgtttttacatgaaaga score :126 14367_at||#T13D8.8#At1g60190 hypothetical protein;predicted by genemark.hmm|COORDS: 21409817 21411877 a A T c G C AT GA tttatctaagttggtctttattaactcttaacaaaaaaataatataagaa aacagagtcag aatttaaaaaccacttaattagtccttcaagaa CA at TATC aaaaccttaataatgtttt CAT c C a AT aac atcctcgaagtctcctctaaatcattggatccaacgaaattcatgtttatctaaac TAAC t CG aataaaga aacgattataataattgcacactatgaaaaatatca GA ag CGTC atagaaatt GT CGG C TA CCTC c ATGC a

PAGE 154

144 C g GA accttcacgaaacag T t GGTCC ctcaca cacttca T CG C C AC G cta T A c C ACGTGTC A A ttttacat acaccaaaacatatctactaatcatacctct T c A CG T G T a A caaagtcccattca A C G T G G C A A ttacaga ccccaaaattatgaactaatcaaacctct T C ACGT G TC G C a A acttgt A g AACGT tgaaa C C C C CCAC T c A CACG aagtgtatatatcctcttcacaacacaaacataaacat TAC t TC a A acaaagacttgaaagaactat cttt gttttcactcatatcttatctttattaa score :276 17362_at||#dl3675w#At4g15260 glucosyltransferase;|COORDS: 7678206 7679631 taattgttttctatcatcataagcttataagtggaaaaacatcaaccaacaaaaaaaaacaataacagaga aatttgtcttcaaaccaatcaaaagataagattttgggtgaaaggggaattaatactcaga C a GC C AAGGG ttaatcttcaacttctgggagaagaatcccagatctgaaactctagtccacaatactcttgtatcatcacc atattcaaccccttcca CAT c C c AT aaaactaagacatcacaaatctctttcacaaaaaccttaaatatca ccaatttcaacatatatacacaaacgaaaatagtcacagacctaaccctaaaat CA c CTAA C a T a TCA ttc agacatgaaaagaaagagagttcctatgat T t GCAT t A G A CG A a AC cttataaatacacagacacagcaga caattagggtt TAC t T C G G TCGGCG G c TA c C atatctattttaggttc TCG G C c C A attatactaaatggg ccagctttcgtgacccaacaaacaacttacatttattttactttattaacacattcactttacaaacccaa tcaacgaaaagtcca AA A ACGT a A aaacagaa score :126

PAGE 155

145 APPENDIX L ACTIVATED VP1 DEPENDENT GENES 17282_s_at|17310_at||#ATEM1.6#At3g51810 embryonic abundant protein AtEm1;|COORDS: 19091843 19092486 taattattacatggtagatatgactttgtc GACA a GT aaaccaactaatcctcgaagctaccttctctt C C C a G t TA T tatgtgtgatcgatttataaatctcttcttct aataacacctatatttttcttatgatgtgaat aaatataaaacttttaactttaaaacatatttatccgaaatattgcacttagatttcaaatagataaataa tagtactatctaactgatattgaaaagacct AACACG gaaaacagttttataaaaaatcccaaatgtgggt aattatcttgatttcttgggggaaacagaaaatggattaagattaatcg GA gt CG T G T C A a GC agctcgtt aataactgtag caagttgactgagtaagcat C a ACGTGTC A T ctc CGT a A a GC ccattatttctagtc T C G CC g C GT cttctctt C CACGT A GC AC ttcactttttctctccttttgtttccttt GGAAC a C aa ACGT t TC t atttataggaataattac GTC gt CCG tatctgtgt CGGAAC atagatccaaattaaaagcaacttacttaa ttacatatc GT t CGTG TT tttttcttcaaaaa score :264 16575_s_a t||#MPO12.130#At5g40420 oleosin;|COORDS: 15468568 15467450 catctttgtgaactaatatatagtaatggggattggattggcacaaaactttatttacggaatcattttct taatttgatactaaaagttcgaaatcgaaattttgaaatccaaatgtggtgcaaaactacatcgtcctatc ttcgatttcccatcttcttttataatttaacatggataattcaaaagcta ttgtctagattacatattttg gattcgttagtcacataaccaaaaatgaaactcttttgatgaattatgtattttcaaatttgtattggaga tccttttagacttcggactcaagtatacatgcataggatagttttcttt TGTC tt GC tttaaa T t CGTG T C A GCG gagaaagcgaaaacatcttttaaaatctcatgaaatgaaataaaataaaataaaaactaaaagaaga aaaaagttg A a AC t TGG aaatc atgcaaagccacacctctactactctaacatctcaccgtctt A T G T G T C C tcttctcatctctccaacttcttctttaaataaaccttctccaaacttttctagttttattacaaagaaa ataggtaaaaacaatttctcattagcttacaa score :54 20004_s_at||#T4C15.3#At2g35300 similar to late embryogenesis abundant proteins;identical to GB:X91917|COORDS: 14811902 14811609 atatacatttatattcataactcggcaagtcatttgcttcattatactagctaattattgacaccaactta ggcaatctatgaaacaatccctaattgtagatttttgggcaatttagcgaaggagggtcgaaggaatctct ct CACGT tc C atgcatcttttt ATG t G a CC agttgttttggttaaaaagtaacgagttatggaactac G t A c A CCG caaatccaaaaatgtgtttagccaaataacatcaacatttttttttcaaacgtataatatttcagc ttttttttttgttaaaacagctaccttaaatattatgtaacactaccgacaaaaaaaaatattatgtaata cttgagaatctgacaagataaagaaagaaaagataatgcaaaatcaagaaactattgt CGT GT TCGTG tag atttgcttttgcat G a GCGCAA c C C A CG T A G ata T t CGT a GC ttg T GACACGT A TC tacgtatt ACACGTG G TGA taacccagtagtcgttaaattagatcggagagcatataaacc A t TA a GCC aaataagtttatagttt gataagaaaagtaagaagaagaagaaagaaga score :246 14439_at||#T9G5.2#At1g32560 late embryogenesis abundant protein, putative;similar to GI:4102692 from (Glycine m ax)|COORDS: 11774644 11775148 gaacagtgactttgtgcgaaggaacaaccaacgatcctctatctgatggagtggaaacttctacaggaatc C GGAAC t C A CACGT c GC A gt CGT a GC t C ctcgacggaaattcgtaatatatctccgattaatcggaaaatt cttcttctgaagcgaagtacaaaatgtacagggcaaaatcagagccatagttttggaggattttgatt T t C G c GGC TT ttac tagtcgagaggttgaagaagataagagagtgtttctgagaaaagagaccaatcaaccatt ccactgctaaatcataaaccggaaatacaaaccaaaaacttttc CGGT t T a C aaaattcggccaattaaac

PAGE 156

146 caccaaaatccggtagaggcg G T g CGTG G at A tcag A t TA g G CCAGCG aagcatct C T ACGTGGC A A ctag atagaaagctgggaattgaaaca A GACACGTGGC G A cattaat A ACACGTGG aa tta ATG t G a CC catttg gaccgacgaagccatgcatacatttccttatttaaatcgaataagaatcattcaagagtaattagttagag tttagagaaacagaaaactaaagaaaaaaaat score :552 14092_at||#T1G11.8#At1g04660 unknown protein;|COORDS: 1301307 1300669 gcaaatcagatcttttgtttaatatcctaagacgcaaactctaataag aagctttattttctcttattatt tataaatgagagtattaaatttgt CC g G t TAT gagcatttgtaggtttctttgtggtttgtctttaacaaa acccaaatttctcaaaccctgattgttcgcctatagactcgatcttcgttttttttttgtcttaatcactc aataatatattgtctaaagtacattgtatatcattagttatgaattatgatgatata GC tc GACA tataaa ttaataatagaattcaaatg ct CGCTC GC T G a C TG tagta C CC t G a TA T cactagccaatacaacttacga aaaaatgaaaaatgaaaaaaattttaaactcgaccaatcaatcgtccatggtatagttgtatcat CA G a C a GC CT ttactaaggaaaaatgttataaattgg G T t CGTGGC c A aatttatgactttcaactgtcaccaacct tacattttgctatataaaccctaagaaactcattataaacttcaacaacataaaagacgatca atataatc tagaaaaaaaaaaagaagagagaagagagaaa score :102 19003_at||#F17H15.8#At2g25890 putative oleosin protein;|COORDS: 10985903 10986427 ttggaatttgtgaaaccaagaacaagttattttcttacaatttgctgcgtctctgtagaaatccccatgga agagatcgttgatgagccaagatagcgagagaaagagaagaaaga aagagagaagagagagtaattttcac aaagggaacagatccaaattgagaaatttaaaaaggggaaaaaacgtaaaccaccagcaaagcagcaacgg tcgaatccagatgc T GCCACGTGT T gatacatctgcatcgaggtttttattttcaaataatt T G T C t C CC G agatgatacagtcgctacagg C CC a G a TA T t A t TA a GCC ca CGGT a T t C caacttcataatcactaggata atatacagtcactacaa g CCC a G a TA atagcccaataagcttcaagatcgtttgacgatgagattttaatt CGGAAC taag TAA CCACGTGTC A gtacctcgtcaaagtagctccatgcagaac T G AC AC G C CC G C GTG A CA ctcttctccattgcct AA C A C GCC T C GT c G ttc GAC AC A T G T C t CG tttaagtttagcctctccttcatac tagtctattttcattcttcataataatcacaa score :486 13449_at||#AP22 .80#At4g36700 globulin like protein;|COORDS: 16264827 16262933 tggttaggcactcatacatacatactaacatacactagcccaaccactatggacaattg GGCCT gt C aaat tttcaaccttgatgtaca G T ACGTGTC cgtgatcatt AT ct GCCC agatgaaattgaaatgtgaaagattc tcgagttttcatttccaaaacacatgttaataatgctaatacatag tattggaaaataagcatatccacta ataccgatatgcatgcatatctcaatatcgcatcgatgaaaactatgaactgccatcacttg GA ac CG TG T C A T ttaactcgac TGTC gc GC gatataaactatgcatctttgttgttcataccttcacaacgtcatcatca acagttatgatcacttatcagccagatttttcaaactcgagctttgtattttagcggcaatttgtttttgg taacaaa GGAAC c C atgc AC t T G T C A G CG T a A a GCA A A CACGTGTC caaacc A t T AA GCCACGTGG T c A tt gtccgactctacaacctctctctgtttcttctctcattcctttatatgatactacgatttaatacgctttg cattgaataacaaaaaaaaaaacagagcacaa score :396 18559_g_at|18558_at||#F3K23.25#At2g21490 putative dehydrin;|COORDS: 9155587 915475 6 tatttatatgttttggtttatttggtgtgggatcgtattatacgatttttatatacttctacaaaccattt ctaaacaaatgtttcagttcaaaaccaatctaaacagatttgtattatgatccaacgttaacttccatggt tcgtttgtttagtgctttatgaaattagttaccaacacaaacttgcacttaccaatcagaatacgtcaaga atttaaaact C G G A A C A C CG accttctttagcggttctaag ccctgatcaaatccac A T G T G T C C atggaa catagaaagcaccgtacttaaaagataacgttatctaccgaatctatcaaaaaaggcaaaacgttattact atgtcg GAT t CG TG A C G T acacatttcaaagtacac AC a T GAC A C a TGG AAC t C taattctatcc G T g CGT GGC A A acatttcacatta A a T A C GCCACGT A T C A tatccatgcagatta AC a TGTC taatatatatgcatg cctacttgaatca atcccaaa CCA a GT t T tcttcaaagctgtatttgaaagggtatatatctcacacacaa aacagatcagaagctaaaaggtaataatataa score :282

PAGE 157

147 15495_at||#F13K3.15#At2g36750 putative glucosyl transferase;|COORDS: 15360550 15359075 attttattttaaatgaaattttcattaagagttttgatttttcggaagaaaaaaaaaagag gaaacaaaaa atacgtttttaagaaaacaa CCA a GT a T ttcacatatcctaatttaatatcctatcaaaatcagtcgtcat aaagcaaaaaaaaaaaaaaaaaaaaaaaagactagaatagctagaaatctagagtacacactttgatttag tttgaagacaaatgttgaggcttgtttgtccatacgacataaaaagaaggagagactgagatattccgaat ttgatttttctttctttattcgtttatccaata ctaatttttacggctataaacttttcactatcttttcg acgtacctctagctaaaagatattttctcttaaaatttt GGGC tt AT atgaatacaaaagaacatatgagt gcc C G C CACG atatgtactgtgaaccacttagtgtgttttgtaactattacgaat A a TA a GCC aatttcct tttgttctttcttacatcacatcttaatatatgtaggcacagagaaacacacacattat CGGAAC caaagc aaaaa aaataaaccttagaaga G a GTTCC tca score :42 12076_at||#AP22.17#At4g36600 putative protein;|COORDS: 16228160 16229460 cttcttcacctctctcactttcggtt GG G C t C AT gctgggtttcttgttg GGC t TA g T tttgtttttttat ggtttaaatcaattttcttggttgaaccaaaaaaaaaacagttttctttaaaaatgagatgacaaaaa aaa aacattggattaactaagaaaggtaaaaaaataaacactttctgatttacctttttactggattatccggt ttgtaa CCC t G c TA agcgcat GGC t TC GG T t T t C taaa CC g G t TAT tttaatcttt CCA a GT a T tgaaact tcctctac G CG a C CG GCCACGTG a C C t CG g GT C GG G a G A C A ctaaccctacaaaatcgaaccacaagtg GA CACGT c GC A agcacttccagcataac CGCC t C t T taa T T G CCACGT A aacatcgaa TA a C CACG T A atcat cgaa TAA CCACGTGTC T tacaataaccgcttatctagtgaaagaaaggaatgtgatctctccctctctcgc tcatctataactagaaacttcctccatgaagattggaaccaagaaacaaaacaaaaaatcggaattttaaa gaataggaataatggagagaagaacgttgatg score :552 19918_at||#MSJ11.8#At3g15670 LEA76 homologue type2;identical to LEA76 homologue type2 GB:X91912 (Arabidopsis thaliana); similar to late embryogenesis abundant protein 76 (LEA76) GB:P13934 (Brassica napus)|COORDS: 5310902 5310142 cttggctaattgtcacactatttatgtgaaggtgagtagttaaagctaataacctcgaag ttgcttggttt atatgagatcaagtttggctcttgtaagtgaagataataatttaaagagtattttaatcgtgataatctaa atgttagttgatagttaat A C G T CA CG G A AC atgtaccgaccgacggccgatttccatgatacaataaaaa aatgaaaattagcaatctcataaca TGC g ACG acgatacttatggtg T T GCCACGT A GC A agcatcttcct cttaaccatg A g GTGTC gcaacctcagaggac AA CACG g AC aagaccgagaaaccgcatacta A CGC TT GC a A CG T AA a GC acaacacccctaaactcctatttttttaattttcttttaacctcaacaca C CACG c AGC t A t A CACGT t TC A atgtgcta AT ACGTGTC T tcttctccgccgaccaagacacactacaa ATG t C c CG atggt ttg GGG a G a CA aatcacagtttctactacaacaacaaatacttttacgaaaaaagcttttaaaacttatct agtt tcgtttgatacacatttgaagaaagaaa score :342 20225_at||#F28P10.80#At3g54940 cysteine proteinase precursor like protein;cysteine proteinase precursor, Phaseolus vulgaris, Z99955|COORDS: 20231435 20233159 aatttaatatattaccattgagcttgccaaaaaattaaaatcttaccaaaat tcttaataataaaatttca caaatttattcattaaaaaaaaataaaaaccttagctaatagactaccattaaggatggtctaacattctt cttgttaaatcccttaactaaatatatatagg CC g A a GCC tctttgttgggccaagaattttgaccatctg accaatatggtccaacaaagagaaacatctatggtgctgtagtgtctgatttgtaacattacaaaacaatg gaacaaggaataaggttttttttt ttttttaatccagtacttttatgatctatgagtttaaaactaaacca aactctagtttttttctagactcaaagactaaaagtctcattgataactttc A A CA CG C G T C A TGC agctt cat TGTC ta GC tctacatgcaaacatc CCACGTGT T T ccactctgcatgtatctccaacactttcgttacc tcaaacttatgtaattcatctttataaattacacacaaa G CC AGCG attctctagggactca G ta AG GCC a cctcaaaacaagacccatcgataacagcgatc score :150

PAGE 158

148 16637_s_at||#dl3385w#At4g14690 light induced protein like;|COORDS: 7382855 7385327 ctctcatttctcaaattttattg GC t ACGTG T TT ttgtgtttagcgttcaacccaaatatcgatatattct tctttttttttcacatttttaacgatttcgagcaaaataattcgattt atttgtataattttaatatggta gttttacaaataa TGA a ACG aatgaccaactgatttgttaggtgttacaataaatatggaaaaaatctcat aagtttgaaaacatttattctgaggaaactttttcttccccaaaagaaaaaaaaagtttaaagtggaaaaa agaaaaaaaaggaataaaaagtttgaattcagtttttttcttttctttgatagatttctttctacttattt attactctactacacaccac acaaaaacaaaaataaaataagtaatcatagtatcccataaatcagtaaag ataaataaaatccagaaaatact G GGC CTATC attttccttcaccaactctataaatgaagagataatcct acagttacacctcaaaccaactccatctcacttctcaagtcttataatttattcatttctctcttcttcat cgatcttcggcttttagaaaacctaatcagaa score :84 18261_at||#dl4120w# At4g16160 pore protein homolog;|COORDS: 8122047 8123240 ttctcctactctgaatatggaaa CG g CACG a GG tcaattgcttcgacggagagaagcaaaacgcttctcgc cggagtccacagaaccggagatttctcggcgtagaaggagctgcgttggatgaacaaagcgtacgattcgc taaaggaagcgatgcttgtgagaaggaaagagaaagaaccgattgcattgtgt acgagagagtgaatgaga gatctgtgtagagaaaaagggagtgacggaga T G A CA a GT ctgtactagtctcactcgacgagcggaggcg gagatatcaaagcggag A T GACACGTGTC A cgtttctcgtctttaaaccggagatttcaaaatt C CC g G c T A T aatatccaactctgagcccattgggccttttatttct C G G T CT G C CT GA CG ttatcttgtga G A GCCAC GTGG g GA aggagaacggcgaacagt tcaaa CGG ca GAC tc C GACACGTG T ggaaaagcag A A CACGTGTC A aaagaacgctgttctgtttcacaatcttctccttacttgttgttgaagagagaagtattaacagagaaaga gagaagcaacaagtgaagaaagaaagaaaaaa score :618 18606_at||#T5L19.150#At4g10020 putative oxidoreductase;11beta hydroxysteroid dehydrogenase (EC 1.1.1.146) 1 mouse, PIR1:I56604|COORDS: 5232839 5234655 atctgtctgatcaaagtttcttgttaagtaaagattaacagtagagagaaccaacattaactaattaagta taaactaggtcttacttatggccgcccatagacccactaaaagtgaggttcaaatggagcttgtacgtatg cataaatcaatatccgactaagaccaaggaaacatatattaaaaacgtta caggaaaaattgttaaatact tttggtgggttacacttataagaccaagaaaccaataaccgaatctttatttcaaatggagcatgtg TA CG T a G C AC G a AC ttattgaaccagtcaaccagatgcaaaacagccaacccggattaaac CGGAAC atga GC t T c ACG gttttatcttatctaaaactcaaaggaac AA g CACGTGGC T C tgtacct C A G t C T GC atgcagccat gttgttcaccatccatcactat acgtcaaa AT cc GCCC cacaaatattaatattttcatattctcaacaaa gaaaaaaactccattttaatttaactgcattcctcctatctttcttcttcgtttctatagcctcactaaac atctcttgttattttcgaaagtgaagagaaaa score :192 16968_at||#F28A23.110#At4g34131 glucosyltransferase like protein;immediate early salicy late induced glucosyltransferase, Nicotiana tabacum, PIR2:T03747|COORDS: 15314348 15307758 gagaaaaaaacaatagaagataactcaaagaagtatgtagagagaaagagtgagaaaagggaagaagaatg acgaatgcaatgacgtgagattcaactcctaccaaaattcctatcaactgtttttgttgtaatta G AC A T G TC A ttgtcattgctgacaaaa taaggaagacaaaacaaatccgacttggag AT aa GCCC gtttagatttat tcaactcctatcataaggatatcaaatttttcccatatcatcaactttagaaagtttctcaaatatc TGC a ACG attgatctctttacttctttaaaaaacattaggtt ATG g G t CC gttttgtgattgtaaaag C GACACG TG a TT gattagcaacatttaaatccatatcttgttgaacaaaacggtgcagtttcggactttcg ttaccat atattgtagttgttttgttatactgtatatataaa GC a A GCG T tatctcccaaacttcactctcttacctt GCC t C t TG cttcaacctcctaagcacattcttatctctttc TGA a AC G a GTTCC caaattagctctacaaa aaccaggatcttgatttctcatcagttaaaca score :162 13528_at||#F10O3.5#At1g03120 unknown protein;Similar to rab 28 protein gb X59138 from Zea mays. EST gb|AA042774 comes from this gene|COORDS: 752274 753143

PAGE 159

149 gtaatgagtgctgcggcagaagctgtgagagcggcaa TGTC ca GC cttttgatgaagaagcaatactctga agagaagagagagttcagaaaaagaaccagaaatgataagaaaacc ACACG at G attgcattttgccaaat tgtttttttttta GGC t TA t T atcttctgtttttctgtgactctctcgttgtatcttgatcattatgttac cgttataatttctctgttactatttcaaaatttttatttattcaagcatgtactagtacaatttgatgact tttgtgtttaattaaaaaaaaaaaatctcaaaataaaatttaagta GG G C t C AT ataattttttaaggtcc acatcaatttaactaaccaatcaaaagaaaaattagcaaaatcaaaaataccttttttgt taattcaaatc caaaaaggtcgatccatgtcatcaa CA t G c GGC aaa GACA a GT gtggagttaaac A t CCACGTGTC A aaca gacatcgaaaataaacagagaagctcctactttcaagatccaaaatcccattaataacaaaccaaagaagc t CC g A A GCC t CG a A taaaaacaaaacaaagca score :204 17054_s_at||#F27I1.2#At2g40100 putative chlorophyll a b binding protein;|COORDS: 16694427 16695733 cctta CCA a GT a T agggagaaatagacaccaattcggtttgatttactcaaaccaaacaccaatttgaaag ctgagaggcattttagtt CC a G t TAT atgataaagtactaaacacaaaccaaaaccaaagtattggcctaa aatataccaagccaaagttgtagaccaaactaaatgctggacttactactagacaatatcgga AACAC G at ttaagaggcattttgtagtcgattaatgggcttctataaatgggcttta GGG g G a CA gaacttcaactggg accaaaatattttctcttttgtgaaatcgacaaaagaaatccttaaaaacgaaagaaaacaaaagaaagaa aataccttaaatgca GACAC c T accataaggacgaaagagaga GT GC a ACG T cattagaaagta G cc AGGC C acaaaagaaaaaaggtaaagagagagaccaatcacagga gataacgctaagataaggcttcttacattta ccatcacatataacataacgttactattattctaaaatccaaaaaaactatatgttataacattggaatca tcagtctctaagcagtatcacaccaactcaat score :54 15557_i_at|15558_r_at|17986_s_at||#F9F13.60#At4g20410 putative protein;gamma SNAP protein, bovine, PIR2:S32 369|COORDS: 9980952 9978597 ttgaaattgacattgtagctgattttggagtgtttaacaagagatggttcttctttttgttttgaagttgt ttgtttgttcgatgatgtttggagtcaaatggtgttgtaatttatagagtggagattgtagagtttggaga atgtgtaggagattcaacgttggatttctacttcttacctaaaatcatagtttcagaatcatgcttattgt catgagcttgact tgtgaagatgaaccagaacatgcttattagtataacgtttgtatttttatatagaaag ttgttaactttcatctaaattatgagaattagaatctcacaaagaaca GA at CGTG G aa A tggtgtaattt gaaacataacaaaagtatggtttggtttgtttgaataagaatgtttcaaggagttaaaataaaatatttga ttttattat ATA t C t GG aagataaagtccaataatttcaataataaaattaactaa aaatagaagatttgc aaattcctttgatcagatagtgaaaacgattctctctatattgccgatcgatct CCA a GT t T gctaaatcg ccgggaactttactgaaatttggagctagtca score :24 19288_at||#F15K20.21#At2g27690 putative cytochrome P450;|COORDS: 11757915 11759402 atcatcacctcaatttaatgttgtttttaaaaatcatc acctcattttaatgttgattttttcttttaaaa tttatacatttacttaatcaactatcacaatattatagtcatgtatctgaattacccaacaaaataat GC a a GACA tctataatcataattgacaaagtcttatttcgtatgtacgaaaatcaaataaagaatctttttgtc gatataattaaggtacttcatgtcgacaacataaatatataaacaaagttgaaattgttggctgaaaatga ggaatctaaa gatcaaaataatgcaaaactgttttttaactttatcaaaaaaccttcaaatttttttgtca catgaattgttttcaaaacatttcttgttaatttc TGCGC t C ttcactttctggtcttgaaaatatgtttt aactaaatattagagaac AC a TGTC tctgtccagaataaatgagattaagaaaaaaaaaaaaaaaaagaat ctctatttaattgactcaaaagttttctcataacacaaactcatctctttcct ctcaccttcctcatacat catctctctttctctttgtctcctcatcaaca score :18 19762_at||#F17L22.140#At4g21680 peptide transporter like protein;peptide transporter (ptr1) Hordeum vulgare,AF023472|COORDS: 10484075 10482039 ctataccgtaatttatggttttattcgcatataacagtggca tttccgtttttttcccacaaagtcagatt ctcttcaaattaaaatttttacataactacttttcgtattaaataaatattttagaacgaaaaagatcatt tagatttatctcttttaatttttattcataaaaactagttaaaaaattaaaatagcatatgattattattg

PAGE 160

150 agataacaaaattctgtcgagaaaaatccaccgaacatatatttta GCAG c CC G T t TCA aaacctttacaa ttaaattgattaaa ttaaacaccatagaaagaaa GA g ACGT caaagaaaggatctcttctgagttctctga ctcaatttctgagctctctataaatataagctctcttcttcacacttttaatgcaactctcaatatcacta ttaaatatctttctcttttcacttcttcactttcctcgattcctgagttttcgatccttttgttctctaac cac CGGT t T c C tctttaaccggtctctggataatcatcataatcattaggatcatca agataaggaatctt tttgagtatatctctacaccaaatcgaatcta score :24 17533_s_at||#F14M19.90#At4g25810 xyloglucan endo 1,4 beta D glucanase (XTR 6);|COORDS: 12093191 12094212 cgagaaccaaacaagtgatgtttttggtcaaacaggacgaaacatgtatacataaaaagaggtaattatgt tcgataaacataggtat aaataaactataagaatttgttacaacattctttagatttgatattaactgctg tttacatttgaagaactcagttagttatttaatattttattttatagggtctagaaattacttgcgtgtgg ggactgatcaccgattctaaagaaggctcatcctttggataatagtatgaa AC t TG T C A ataaagataagt catcacaagtagggagatcttagct G t GTTCCAT ac GCCC atctagaaaaagcgacgatg gtcaagattaa ataactgtatttgaaaaaccaaaaccgcgtcaccaact CC a A a GCC attaccattagccatcactttccat cttccagctgttcgaat C a GG a CGC C c C t T tttcttcaccaaacccatcggccgataacgaaccttcctct C t G a C T GC CT ctgctcttactataaatacaaccaatacgacctcatccaaaacccaacaaacctaagctca aagcccaccaaaaagaaaacaat CAC t T a GC a score :60 15120_at||#MKP11.30#At5g17330 glutamate decarboxylase 1 (GAD 1) (sp Q42521);|COORDS: 5634038 5637736 aaattctgagaagtcttttttattcacatcacatagtcaatattcaattcaatgcatataaatttcatgta aagcttcttttttcctcggatcccattcattattttacttaccattgaatacttt C ac CGTGT agtc tttt tttagaaatcgttgaatatcaaaaacaagaagaagaaaaagaaac CGTGTT ttcaatgtgctgtcatacga ttcatgaccctttatgaatctaatacagagtttgaaacataaact A a AC t TGG aagtttgccaaaaaaaaa aaaaaaaaaa A a AC t TGG aaataacacttctcaaaaagatattgggtcaaagtctaaaactgaaaacgaaa ctagaggcattattctctaattcaaacaaagtttag GGC t T g GG aagaaatattcacaaaaaggaagtttc cgaaaattcttttatattattccccaagcaaaccattcaatatcattgattattgatataaaacccatata a A g GTGTC tctcatcatcctccattcccatacatcactttcctcgatctcactttctctctcttatcatca tct CC t G g TAT tctctctctctcatctccgtg score :42 13916_at||#F6F22.17#At2g19800 unkno wn protein;predicted by genscan|COORDS: 8481446 8479650 ttgattggatcaatataaataccatctccattctcgtctccttccaacaacatctttcacacaacaattca cacaatttctcgtttttttttgtttatcatcaaaagttttaatctaaatt A CGT a TC A aattccgagcaag atgactattcttgttgaacattttgttcctggtaaggtttcattgcttcttttaag atatatagatataga taattaatcaaatattggagattagattgatattagtgcatgagaatatgaacaatatcattacaagaaaa aaagctatgaaaataattttgtggtgacgattatagattcaagagtggatgaaaagaaagtgatagaggag agggataatgaattggtgttggatggaggttttgtggttccaaaatcaaaggaaactgatgcattcgatgc t CC t G a TAT gaatttcttgggccattcc ttcaggtttgtttaatggattgatccatatatttatggacata aatgttgttttggatcaaaactaattacagtctctcttttcttttagggattatgagaatggtgaaagcga gagacaacaaggtgttgaggaattttacagga score :18 19845_g_at|19844_at||#F20M13.100#At4g38540 monooxygenase 2 (MO2);|COORDS: 16987667 16989307 g ga ATA g C a GG agtaccgaga GACAC G G t C t CAT acagaacggaaacagctgtcgatcagtggtttcctat atg G g GC a A C GAC G a GT TCC tgtttca GC t G t CTG ttccagaactggcgcttctgtggttcaaagtccaag actacgacaa TG a C a CCC agaatgatttc GC ag GACAG AC AT GT C ttcctctgc CGGAAC tgaagtccgga gttagagc CG t C c G GC t T c ACG atcgaacag G ca AGGCC tacaa g A A CACG a GGC TT ctcgtcagcttcgc cttggatcctccttat ACGT t T C G ttgaatctcgttttctttccga A a TA a GCC acacagtgtgtatctct gtttg CACGT t T ataaatttataatgtacaattatt TGCGC c C tagctagaacgggataaagtgtgacaac atataaatactattaaagattcgatatttgatgactgta G C C G C CT G C CT aga ACG g GG t G ccttgtttg A

PAGE 161

151 C t TG c CA acttcaacc acatttctttttgtttacttgaaagttaaaaataaatactagagaggaagaaaaa cagagctttctttctaagatcatcaacgcacc score :234 15839_at||#F13H10.22#At2g41230 unknown protein;|COORDS: 17143901 17143716 gtcaacataatcataaatacctaaaccactaaatacaattcaactaaataatttggtaaataatctggatg agaaag aaaaaaatgttacaaagatatattttttggattttccttaaaaatttttttttttttttttctag ctaagaaaaattaaaattttgttgggttgtcaaaaatatacaagctatac A t A CACGT a T acacaaatata gagatagacatttcaactttcccaattgccagtatatataacaaaagccagagacaaagggagaaggatct aagacatttatcaaaaacaaaaagagttgataaactacagtaatggaaa atgagggttcatgatcaacggc tgagatttgatgtcaca CC c A a GCC gatgggtttgaacggaagttctttgatcacggcaaga T c CGT c GC A C ttcttctctttctctctctgcttcttctgattctgccaccgttcctgc CGCC g C t T ccaccgcctccggc gacactcctcctccttcctctactcctcatgattctcctcattttcttggctttttctccttcta AT G a G C CC AGCC t CG ccgttgaacctc tcgacccctga score :66 17840_at||#T1O24.39#At2g43570 endochitinase isolog;|COORDS: 18025978 18024932 tcaataaaatttcttgccaacaattatgacttttcctgttgtttacaatttttcacaagtctattgtcgat tttaaatttgtgttatcacacacaacatgagactagtaaaaccaattgacattgactacaggcttctaccg attaaact tatcttgtataacaaactcaaaccatattttaatttttgttttccaattggcgactactttca gtaaaagtcacttagtttcttgctttttagaaccacaaaaaccaaatatacttttttcacctatctcattt tatctctgataatttttctttacactaaatgtttgcatcacaggaacaaaaaaatatttggtttaaccaat ttcgtagat TGA a ACG aaaatgttggtgtttgtttgaaaatttcc C a A CG T G T TACGTG ta ACGT t T C G aa tggtctaagaactaataa GC a T G CGTGGC A T t GG cgttgacgttgactcagaaaactacaagtatatttta tttacaacaaaacgtat GC t TC ACG ga TC catcatcatctataa A A CACG aa G C taatgttcatcatctct ctct G C A G t C T G C atcacagacacaaaaacaa score :180 18668_at||#F1L3.11#At1g17420 lipoxygenase;iden tical to GB:CAB56692 from (Arabidopsis thaliana)|COORDS: 5977505 5981377 actatcattctaaagtcatacccttttaacactttatttatcctctcttttttttttttggacaactaact ttattttatccaatcagcttttaaaatcttaattaacatggtttggattggacaagctaagtaataatact at A a T A C GCCACGT A T atatacg ACGT a TC cttcttctc tt CACGT a T aaaataaaaatagtgtgttattg ccataac A t A CGTG T TT tctaaaatttaaaagaaagtcatttatcatgaaaaggtatcataatcagaataa gagtaatcagtttgaaaactagctagttgactaaaacagaaagaaaaatatctttacttgaaacatgtgtt gagaattcacattatataaaaggaaaaaaaataaaacatgacttttgcgtagacaatccatagtccaatca aca A a ACGTG T ggacggcaa G g GTTCC ttctagctctctctatttatatctctcactcgccgatttttact agtaattaaccagaacgttcatctaccaacaaatcctcaactccttcttcttcactaatagtataccacaa aacctcaaaattgtaatttattagattatcgt score :156

PAGE 162

152 LIST OF REFERENCES Addicott, F.T., and Carns, H.R. (1983). History and introduction. In: Addicott FT, ed. Abscisic acid. New York, USA: Praeger Scientific, 1 21. Altman, R.B., and Raychaudhuri, S. (2001). Whole genome expression analysis: Challenges beyon d clustering. Curr. Opin. Struct. Biol. 11, 340 347. Arnone, M.I., and Davidson, EE.H. (1997). The dardwiring of development: prganization and function of genomic regulatory systems. Development 124, 1851 1864. Assmann, S., and Shimazaki, K. (1999). The mu ltisensory guard cell: Stomatal responses to blue light and abscisic acid. Plant Physiology 119: 809 815. Bailey, T.L., and Elkan, C. (1995). Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51 80 Baker, S.S., Wilhelm, K.S., and Thomashow, M.F. (1994). The 5 region of Arabidopsis thaliana cor15a has cis acting elements that confer cold drought and ABA regulated gene expression. Plant Mol. Biol. 24, 701 713. Blackwood, E.M., and J.T. Kadonaga. (1998). Going the distance: A current view of enhancer action. Science 281:61 63. Bray, E.A. (1997). Plant responses to water deficit. Trends Plant Sci. 2, 48 54. Bulyk, M.L., Huang, X., Choo, Y., and Church, G.M. (2001). Exploring the DNA binding specific ities of zinc fingers with DNA microarrays. Proc. Natl Acad. Sci. 98, 7158 7163 Carson, C.B, Hattori, T., Rosenkrans, L., Vasil, V., Vasil, I.K., Peterson, P.A., and McCarty, D.R. (1997). The quiescent/colorless alleles of viviparous1 show that the conserv ed B3 domain of VP1 is not essential for ABA regulated gene expression in the seed. Plant J. 12(6), 1231 1240. Casaretto, J., and Ho, T.D. (2003). The transcription factors HvABI5 and HvVP1 are required for the abscisic acid induction of gene expression in barley aleurone cells. Plant Cell 15, 271 284

PAGE 163

153 Chak, R.K.F., Thomas, T.L., Quatrano, R.S., and Rock, C.D. (2000). The genes ABI1 and ABI2 are involved in abscisic acid and drought inducible expression of the Daucus carota L. Dc3 promoter in guard cells of transgenic Arabidopsis thaliana L. Heynh. Planta 210, 875 883. Chen, Y., Dougherty, E., and Bittner, M.L. (1997). Ratio based decisions and the quantitative analysis of cDNA micro array images. Journal of Biomedical Optics 2, 364 374. Cho, R.J., Huang, M., Campbell, M.J., Dong, H., Steinmetz, L., Sapinoso, L., Hampton, G., Elledge, S.J., Davis, R.W., and Lockhart, D.J. (2001). Transcriptional regulation and function during the human cell cycle. Nature Genet. 27, 48 54. Choi, H., Hong, J., Ha, J., Kang, J., and Kim, S.Y. (2000). ABFs, a family of ABA responsive element binding factors. J. Biol. Chem. 275, 1723 1730. DeRisi, J.L., Iyer, V.R., and Brown, P.O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278 680 686. De Smet, F., Marchal, K., Mathys, J., Thijs, G., De Moor, B., and Moreau, Y. (2002). Adaptive quality based clustering of gene expression profiles. Bioinformatics 18, 735 746 D'haelseleer, P., Liang, S., and Somogyi, R. (2000). Genetic network inference: from co expression clustering to reverse engineering. Bioinformatics 16(8), 707 726. Donald, R.G.K., and Cashmore, A.R. (1990) Mutation of either G box or I box sequences profoundly affects expression from the Arabidopsis rbcS 1A promoter. EMBO J. 9, 1717 1726 Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. (1998). Cluster analysis and display of genome wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863 4868. Ezcurra, I., Wycliffe, P., Nehlin, L., Ellerstrom, M., and Rask, L. (2000). Transactivation of the Brassica napus napin promoter by ABI3 requires interaction of the conserved B2 and B3 domains of ABI3 with different cis elements: B2 mediates activation through an ABRE, whereas B3 interacts with an RY/G box. Plant J. 24, 57 66. Finkelstein, R.R. (1994). Mutations at two new Arabidopsis ABA response loci are similar to the abi3 mutations. Plant Journal 5, 765 771. Finkelstein, R.R., Gampala, S.S.L., and Rock, C.D. (2002). Abscisic acid signaling in seeds and seedlings. Pla nt Cell 14, S15 S45.

PAGE 164

154 Finkelstein, R.R., and Lynch, T. (2000). The Arabidopsis abscisic acid response gene ABI5 encodes a basic leucine zipper transcription factor. Plant Cell 12, 599 609. Finkelstein, R.R., Wang, M.L., Lynch, T.J., Rao, S. and Goodman, H.M (1998) The Arabidopsis abscisic acid response locus ABI4 encodes an APETALA 2 domain protein. Plant Cell 10, 1043 1054. Fowler, S., and Thomashow, M.F. (2002). Arabidopsis transcriptome profiling indicates that multiple regulatory pathways are activated during cold acclimation in addition to the CBF cold response pathway. Plant Cell 14, 1675 1690. Gilmour, S.J., and Thomashow, M.F. (1991). Cold acclimation and cold regulated gene expression in ABA mutants of Arabidopsis thaliana. Plant Molecular Biology 1 7, 1233 1240. Gilmour, S.J., Zarka, D.G., Stockinger, E.J., Salazar, M.P., Houghton, J.M., and Thomashow, M.F. (1998). Low temperature regulation of the Arabidopsis CBF family of AP2 transcriptional activators as an early step in cold induced COR gene expr ession. Plant J. 16, 433 442. Giraudat, J., Hauge, B.M., Valon, C., Smalle, J., Parcy, F., and Goodman, H.M. (1992). Isolation of the ArabidopsisAB13 gene by positional cloning. Plant Cell 4, 1251 1261. Grabov, A., Leung J., Giraudat J., and Blatt M.R. (19 97). Alteration of anion channel kinetics in wild type and abi1 1 transgenic Nicotiana benthamiana guard cells by abscisic acid. Plant Journal 12: 203 213. Grunstein, M. (1997). Histone acetylation in chromatin structure and transcription. Nature 389, 349 352 Hagenbeek, D., Quatrano, R.S., and Rock, C.D. (2000). Trivalent ions activate ABA inducible promoters through an ABI1 dependent pathway in rice (Oryza sativa L.) protoplasts. Plant Physiology 123, 1553 1560. Hake, S., and Meyerowitz, E.M. (1998). Growt h and development: Growing up green. Current Opinion in Plant Biology 1, 9 11. Hattori, T., Totsuka, M., Hobo, T., Kagaya, Y., and Yamamoto Toyoda, A. (2002). Experimentally determined sequence requirement of ACGT containing abscisic acid response element. Plant Cell Physiology 43(1), 136 140. Heyer, L.J., Kruglyak, S., and Yooseph, S. (1999). Exploring expression data: Identifaction and analysis of coexpressed genes. Genome Res. 9, 1106 1115. Hill, A., Nantel, A., Rock, C.D., and Quatrano, R.S. (1996). A con served domain of the viviparous 1 gene product enhances the DNA binding activity of the bZlP protein EmBP 1 and other transcription factors. J. Biol. Chem. 271, 3366 3374.

PAGE 165

155 Hirai, N. (1986). Abscisic acid. In: Takahashi, N., ed. Chemistry of plant hormo nes. Boca Raton, FL, USA: CRC Press, 201 248. Hobo, T., Asada, M., Kowyama, Y., and Hattori, T. (1999a). ACGT containing abscisic acid response element (ABRE) and coupling element 3 (CE3) are functionally equivalent. Plant J. 19, 679 689. Hobo, T., Kowyama Y., and Hattori, T. (1999b). A bZIP factor, TRAB1, interacts with VP1 and mediates abscisic acid induced transcription. Proc. Natl. Acad. Sci. USA 96, 15348 15353. Hughes, J.D., Estep, P.W., Tavazoie, S., and Church, G.M. (2000). Computational identifica tion of cis regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205 1214. Ingram, J., and Bartels, D. (1996). The molecular basis of dehydration tolerance in plants. Annual Review of Pl ant Physiology and Plant Molecular Biology 47, 377 403. Ismail, A.M., Hall, A.E., and Close, T.J. (1999). Allelic variation of a dehydrin gene cosegregates with chilling tolerance during seedling emergence. Proceedings of the National Academy of Sciences 9 6, 13566 13570. Izawa, T., Foster, R., and Chua, N. (1993). Plant bZIP protein DNA binding specificity. J. Mol. Biol. 230, 1131 1144. Jain, K.K. (2001). Biochips for gene spotting. Science 294, 19. Jakoby, M., Weisshaar, B., Droge Laser, W., Vicente Carbaj osa, J., Tiedemann, J., Kroj, T., and Parcy, F. (2002). bZIP transcription factors in Arabidopsis. Trends Plant Sci. 7, 106 111. Jensen, L.J., and Knudsen, S. (2000). Automatic discovery of regulatory patterns in promoter regions based on whole cell expres sion data and functional annotation. Bioinformatics 16, 326 333. Kang, J., Choi, H., Im, M., and Kim, S.Y. (2002). Arabidopsis basic leucine zipper proteins mediate stress responsive abscisic acid signaling. Plant Cell 14, 343 357. Kao, C.Y., Cocciolone, S .M., Vasil, I.K., and McCarty, D.R. (1996). Localization and interaction of the cis acting elements for abscisic acid, VIVIPAROUS1, and light activation of the C1 gene of maize. Plant Cell 8, 1171 79. Kel, A.E., Kel Margoulis, O.V., Farnham, P.J., Bartley, S.M., and Wingender, E. (2001). Computer assisted identification of cell cycle related genes: New targets for E2F transcription factors. J. Mol. Biol. 309, 99 120.

PAGE 166

156 Kim, S.Y., and Thomas, T.L. (1998). A family of novel basic leucine zipper proteins binds t o seed specification elements in the carrot Dc3 gene promoter. J. Plant Physiol. 152, 607 613. Koornneef, M., Leon Kloosterziel, K.M., Schwartz, S.H., and Zeevaart, J.A.D. (1998). The genetic and molecular dissection of abscisic acid biosynthesis and sign al transduction in Arabidopsis. Plant Physiol. Biochem. 36, 83 89. Lashkari, D.A., DeRisi, J.L., McCusker, J.H., Namath, A.F., Gentile, C., Hwang, S.Y., Brown, P.O. and Davis, R.W. (1997). Yeast microarrays for genome wide parallel genetic and gene express ion analysis. Proc. Natl Acad. Sci. USA 94, 13057 13062. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., and Wootton, J.C. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262, 20 8 214. Leung, J., and Giraudat, J. (1998). Abscisic acid signal transduction. Annu. Rev. Plant Physiol. Plant Mol. Biol. 49, 199 222. Liu, Q., Kasuga, M., Sakuma, Y., Abe, H., Miura, S., Yamaguchi Shinozaki, K., and Shinozaki, K. (1998). Two transcription factors, DREB1 and DREB2, with an EREBP/AP2 DNA binding domain separate two cellular signal transduction pathways in drought and low temperature responsive gene expression, respectively, in Arabidopsis. Plant Cell 10, 1391 1406. Liu, X.S,, Brutlag, D.L., and Liu, J.S. (2001). Bioprospector: Discovering conserved DNA motifs in upstream regulatory regions of co expressed genes. Proc. Pacific Symposium on Biocomputing 6, 127.138. Liu, X.S., Brutlag, D.L., and Liu, J.S. (2002). An algorithm for finding protei n DNA binding sites with applications to chromatin immunoprecipitation microarray experiments. Nat Biotechnol Aug;20(8):835 9 Luerssen, H., Kirik, V., Herrmann, P., and Misera, S. (1998). FUSCA3 encodes a protein with a conserved VP1/ABI3 like B3 domain wh ich is of functional importance for the regulation of seed maturation in Arabidopsis thaliana. Plant J. 15, 755 764. MacRobbie, E.A.C. (1998). Signal transduction and ion channels in guard cells. Philosophical Transactions of the Royal Society of London B 353, 1475 1488. Mahalingam, R., Gomez Buitrago, A., Eckardt, N., Shah, N., Guevara Garcia, A., Day, P., Raina, R., and Fedoroff, V. (2003). Characterizing the stress/defense transcriptome of Arabidopsis. Genome Biol. 4 (3): R20 Masuda N., and Church G.M. (2003). Regulatory network of acid resistance genes in Escherichia coli. Mol. Microbiol. 48(3), 699 712

PAGE 167

157 McCarty, D.R., Carson, C.B., Lazar, M., and Simonds, C. (1989). Transposable element induced mutations of the viviparous 7 gene in maize. Dev. Genet. 1 0, 473 481. McCarty, D.R., Hattori, T., Carson, C.B., Vasil, V., Lazar, M., and Vasil, I.K. (1991). The Vivipamus 7 developmental gene of maize encodes a novel transcriptional activator. Cell 66, 895 905. McCourt, P. (1999). Genetic analysis of hormone sig naling. Annual Review of Plant Physiology and Plant Molecular Biology 50, 219 243. McGuire A.M., and Church G.M. (2000). Predicting regulons and their cis regulatory motifs by comparative genomics. Nucleic Acids Res. 28(22), 4523 4530. McGuire A.M., Hughes J.D., and Church G.M. (2000). Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res. 10(6):744 57. Miklos, G.L., and Rubin, G.M. (1996). The role of the genome project in determining gene function: insights fro m model organisms. Cell 96, 4 521 529. Neill, S.J., Horgan, R. and Rees, A.F. (1987). Seed development and vivipary in Zea mays L. Planta 171, 358 364. Ng, H. H., and Bird, A. (1999). DNA methylation and chromatin modification. Curr. Opin. Genet. Dev. 9, 158 63. Ng, H. H., and Bird, A. (2000). Histone deacetylases: Silencers for hire. Trends Biochem. Sci. 25, 121 6. Ohler, U., and Niemann, H. (2001). Identification and analysis of eukaryotic promoters: Recent computational approaches. Trends Genet. 17(2), 56 60. Pei, Z.M., Kuchitsu, K., Ward, J.M., Schwarz, M., and Schroeder, J.I. (1997). Differential abscisic acid regulation of guard cell slow anion channels in Arabidopsis wild type and abi1 and abi2 mutants. Plant Cell 9, 409 423. Pla, M., Vilardell, J., Guiltinan, M.J., Marcotte, W.R., Niogret, M.F., Quatrano, R.S., and Pages, M. (1993) The cis regulatory element CCACGTGG is involved in ABA and water stress responses of the maize gene rab28. Plant Mol. Biol. 21, 259 266. Richards, E.J. (1997). DNA methy lation and plant development. Trends Genet. 13, 319 23. Riechmann, J., Heard, J., Martin, G., Reuber, L., and others. (2000). Arabidopsis transcription factors: Genome wide comparative analysis among eukaryotes. Science 290, 2105 2110.

PAGE 168

158 Robichaud, C.S., an d Sussex, I.M. (1986). The response of viviparous 1 and wild type embryos of Zea mays to culture in the presence of abscisic acid. J. Plant Physiology. 126, 235 242. Robichaud, C.S., Wong, J., and Sussex, I.M. (1980). Control of in vitro growth of viviparo us embryo mutants of maize by abscisic acid. Developmental Genetics 1, 325 330. Rock, C.D. (2000). Pathways to abscisic acid regulated gene expression. New Phytol. 148, 357 396. Rock, C.D., and Quatrano, R.S. (1994). Plant signals : insensitivity is in the genes. Current Biology 4, 1013 1015. Rogers, J.C., and Rogers, S.W. (1992). Definition and functional implications of gibberellin and abscisic acid cis acting hormone response complexes. Plant Cell 4, 1443 1451. Rojas, A., Almoguera, C. and Jordano, J. (1 999). Transcriptional activation of a heat shock gene promoter in sunflower embryos: synergism between ABI3 and heat shock factors. Plant Journal 20, 601 610. Roth, F.R., Hughes, J.D., Estep, P.E., and Church, G.M. (1998). Finding DNA regulatory motifs wit hin unaligned non coding sequences clustered by whole genome mRNA quantitation. Nature Biotechnology 16(10), 939 45. Schachter, A.D., and Kohane, I.S. (2002)An unsupervised self optimizing gene clustering algorithm. Proc AMIA Symp ., 682 686 Schena, M. (1 996). Genome analysis with gene expression microarrays. BioEssays, 18, 427 431. Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270, 467 470. Sch ena, M., Shalon, D., Heller, R., Chai, A., Brown, P.O., and Davis, R.W. (1996). Parallel human genome analysis: microarray based expression monitoring of 1000 genes. Proc. Natl Acad. Sci. USA, 93, 10614 10619. Schneider, T.D., and Stephens, R.M. (1990). Se quence logos: A new way to display consensus sequences. Nucl. Acids Res. 18, 6097 6100. Schultz, T., Medina, J., Hill, A., and Quatrano, R. (1998). 14 3 3 proteins are part of an abscisic acid VIVIPAROUS1 (VP1) response complex in the Em promoter and inter act with VP1 and EmBP1. Plant Cell 10, 837 847.

PAGE 169

159 Schulze, A., and Downward J. (2001). Navigating gene expression using microarrays A technology review. Nature Cell Biology 3, E190 E195. Shedden, K., and Cooper, S. (2002). Analysis of cell cycle specific g ene expression in human cells as determined by microarrays and double thymidine block synchronization. PNAS 99, 4379 4384. Sheen, J. (1998). Mutational analysis of PP2C involved in ABA signaling in higher plants. Proceedings of the National Academy of Scie nces, USA 95, 975 980. Shen, Q., Zhang, P., and Ho, T. H.D. (1996). Modular nature of abscisic acid (ABA) response complexes: Composite promoter units that are necessary and sufficient for ABA induction of gene expression in barley. Plant Cell 8, 1107 1119 Shinozaki, K., and Yamaguchi Shinozaki, K. (1996). Molecular responses to drought and cold stress. Current Opinion Biotechnology 7, 161 167. Sinha, S., and Tompa, M. (2000). A statistical method for finding transcription factor binding sites. 8th Intl. c onf. Intelligent Systems for Molecular Biology 8, 37 45. Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., and Futcher, B. (1998). Comprehensive identification of cell cycle regulated genes of the ye ast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273 3297. Stockinger, E.J., Gilmour, S.J., and Thomashow, M.F. (1997). Arabidopsis thaliana CBF1 encodes an AP2 domain containing transcriptional activator that binds to the C re peat/DRE, a cis acting DNA regulatory element that stimulates transcription in response to low temperature and water deficit. Proc. Natl. Acad. Sci. USA 94, 1035 1040. Stone, S.L., Kwong, L.W., Yee, K.M., Pelletier, J., Lepiniec, L., Fischer, R.L., Goldber g, R.B., and Harada, J.J. (2001). LEAFY COTYLEDON2 encodes a B3 domain transcription factor that induces embryo development. Proc. Natl. Acad. Sci. USA 98, 11806 11811. Strachan, T., Abitol, M., Davidson, D., and Beckman, J.S. (1997). A new dimension for t he human genome project: towards comprehensive expression maps. Nature Genet. 16, 126 132. Struhl, K. (1998). Histone acetylation and transcriptional regulatory mechanisms. Genes Dev. 12, 599 606. Suzuki, M., Kao, C.Y., and McCarty, D.R. (1997). The Conse rved B3 Domain of VIVIPAROUS1 Has a Cooperative DNA Binding Activity. Plant Cell 9, 799 807.

PAGE 170

160 Suzuki, M., Ketterling, M.G., Li, Q B., and McCarty, D.R. (In press). VP1 alters global gene expression patterns through regulation of ABA signaling. Plant Physiol ogy. Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J. and Church, G.M. (1999). Systematic determination of genetic network architecture. Nature Genet. 22, 281 285. Thijs, G., Lescot, M., Marchal, K., Rombauts, S., De Moor, B., Rouz, P., and Moreau, Y (2001). A higher order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17(12), 1113 1122. Thijs, G., Marchal, K., Lescot, M., Rombauts, S., De Moor, B., Rouz, P., and Moreau, Y. (2002). A gibbs s ampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J. Comp. Biol. 9(2), 447 464. Thomashow, M.F. (1999). Plant cold acclimation genes and regulatory mechanisms. Annual Review of Plant Physiology and Plant Molecula r Biology 50, 571 599. Tompa, M. (1999). An exact method for Finding short motifs in sequences, with application to the ribosome binding site problem. 7th Intl. Conf. Intelligent Systems for Molecular Biology 7, 262 271. Uno, Y., Furuhata, T., Abe, H., Yos hida, R., Shinozaki, K., and Yamaguchi Shinozaki, K. (2000). Arabidopsis basic leucine zipper transcription factors involved in an abscisic acid dependent signal transduction pathway under drought and high salinity conditions. Proc. Natl. Acad. Sci. USA 97 11632 11637. van Helden, J., Andre, B. and Collado Vides, J. (1998). Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827 842. van Helden, J., Rios, A.F., and Collado Vides, J. (2000). Discovering regulatory elements in non coding sequences by analysis of spaced dyads. Nucleic Acids Res. 28(8), 1808 1818. Vasil, V., Marcotte, W.R., Jr., Rosenkrans, L., Cocciolone, S.M., Vasil, I.K., Quatrano, R.S., and McCarty, D.R. (1995). Overlap of Viviparous 1 (VP1) and abscisic acid response elements in the Em promoter: G box elements are sufficient but not necessary for VP1 transactivation. Plant Cell 7, 1511 1518. Velculescu, V.E., Zhang, L., Zhou, W., Vogelstein, J., Bas rai, M.A., Bassett, D.E., Hieter, P., Vogelstein, B., and Kinzler, K.W. (1997). Characterization of the yeast transcriptome. Cell 88, 243 251.

PAGE 171

161 Weisshaar, B., Armstrong, G.A., Block, A., Da Costa e Silva, O., and Halbrock, K. (1991). Light inducible and con stitutively ex pressed DNA binding proteins recognizing a plant promoter element with functional relevance in light responsiveness. EMBO J. 10,1777 1786. Wodicka, L., Dong, H., Mittmann, M., Ho, M.H., and Lockhart, D.J. (1997). Genome wide expression monit oring in Saccharomyces cerevisiae. Nature Biotechnol. 15, 1359 1366. Wolfsberg, T.G., Gabrielian, A.E., Campbell, M.J., Cho, R.J., Spouge, J.L., and Landsman, D. (1999). Candidate regulatory sequence elements for cell cycle dependent transcription in Sacch aromyces cerevisiae. Genome Res. 9(8), 775 92. Workman, C.T., and Stormo, G.D. (2000). ANN SPEC: A method for discovering transcription binding sites with improved specificity. Proc. Paci. Symposium on Biocomputing 5, 464 475. Yamaguchi Shinozaki, K., and Shinozaki, K. (1994). A novel cis acting element in an Arabidopsis gene is involved in responsiveness to drought, low temperature, or high salt stress. Plant Cell 6, 251 264. Zeevaart, J.A.D. (1999). Abscisic acid metabolism and its regulation. In: Hooykaa s, P.J.J., Hall, M.A., Libbenga, K.R., eds. Biochemistry and molecular biology of plant hormones. Amsterdam, The Netherlands: Elsevier Science, 189 207. Zhang, M.Q. (1999). Promoter analysis of co regulated genes in the yeast genome. Comput. Chem. 23, 233 250.

PAGE 172

162 BIOGRAPHICAL SKETCH Matthew Gene Ketterling was born to Gene and Cathy Ketterling on June 5, 1978, in Fargo, ND. He was raised in Brainerd, MN, and graduated from Brainerd High School in June, 1996. Shortly after graduating high school, he moved to Key West, FL, to work and attend college. He graduated from Key West Community College in May 1999 and transferred to the University of Floridas microbiology program with a plan to attend medical school. After completing an undergraduate research project in a molecular biology lab, he found his interests were better suited for research. He was accepted into Plant Molecular and Cellular Biology (PMCB) in fall of 2001 where he was able to combine his interests in computers and molecular biology to fulfill his Master of Science degree in August 2003.


Permanent Link: http://ufdc.ufl.edu/UFE0001266/00001

Material Information

Title: Quantitative Analysis of CIS-Regulatory Sequences in Genes of Arabidopsis
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0001266:00001

Permanent Link: http://ufdc.ufl.edu/UFE0001266/00001

Material Information

Title: Quantitative Analysis of CIS-Regulatory Sequences in Genes of Arabidopsis
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0001266:00001


This item has the following downloads:


Full Text












QUANTITATIVE ANALYSIS OF CIS-REGULATORY SEQUENCES IN GENES OF
ARABIDOPSIS

















By

MATTHEW GENE KETTERLING


A THESIS PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE

UNIVERSITY OF FLORIDA


2003

































Copyright 2003

by

Matthew Gene Ketterling















ACKNOWLEDGMENTS

I would first like to thank my adviser Dr. Donald McCarty for the opportunity to

work in his laboratory as well as helping me develop a research project that catered to my

interests in molecular biology and computers. I would like to thank Dr. George Casella

for his statistical advisement pertaining to my thesis work as well as a special thank you

to Dr. Masaharu Suzuki for his advisement as both a committee member and a friend; I

am not sure where I would be right now if it were not for his added guidance. I

appreciate the help from the rest of the McCarty lab and members of Dr. Mark Settles'

lab for discussions as well as support from all my friends in Fifield Hall; they of course

will not be forgotten. I would also like to thank my parents: Gene Ketterling, and Cathy

and David Eide for their support. This thesis would not have been possible it was not for

them. Lastly, I would like to thank my brother Benjamin Ketterling and the rest of my

family for their understanding and support of the decisions I have made in my life.
















TABLE OF CONTENTS
Page

A C K N O W L E D G M E N T S ....................................................................... .....................iii

LIST OF TABLES ................................... .. ... .... ................... vi

LIST OF FIGU RE S ................. .............. .. .... ...... .......... ......... .............. ... vii

A B S T R A C T ............................. .................................................................. ............... ix

CHAPTER
1 L ITE R A TU R E R E V IE W ................................................................ ...........................1

A b scisic A cid ................................................................................................................ 1
M icroarrays ..............................................................4
G ene Regulation N etw orks ............................................................ ............... 10
C is-elem ents D discovery ........................................................... .. .. .......... .... 12

2 IDENTIFICATION OF CIS-ELEMENTS ........................................ ............... 16

Introduction.................................................................... .... ....... ........ 16
M materials and M methods .............................................. .................. ............... 20
M otifF in der alg orithm .............................................................. .....................2 0
M otifFinder im plem entation ...................................................... .... ............... 21
M otifM apper algorithm ............................................... ............................ 21
Prom other database .............................. ........ ... ............ .. ............. 22
Testing for false-positive m otifs................................... .................................... 22
PERL version and modules ......................................... ........................23
R e su lts ............................... ...... ....... ........................................... 2 3
Analysis of VP1/ABA regulated genes .................................... ............... 23
Analysis of flanking sequence m otifs ............................. ............................... 27
M otifs associated w ith the ABRE ............................................ ............... 29
M otif L O G O s ...........................................................29
A analysis of cold-regulated genes......................................... .... ...................... 31
Distribution of motifs in VP1/ABA and cold regulated genes............................34
A lig n A C E ...................................................................................................3 7
D isc u s sio n ............................................................................................................. 3 7

APPENDIX









A SOFTW ARE DOCUM ENTATION .................................... ......................... ........... 42

B SOURCE CODE: M otifFinderplv4.pl ........................................ ...................... 47

C SOURCE CODE: M otifFinderp2v6.pl ................................ .. ......................... 53

D SOURCE CODE: M otifM apperplv3.pl.............................................. .................. 57

E REPRESSED ABA REGULATED GENES .............. ............................................69

F ACTIVATED ABA REGULATED GENES ...................................... ...............80

GREPRESSED VP1 "AND" ABA DEPENDENT GENES ........................................91

H ACTIVATED VP1 "AND" ABA DEPENDENT GENES............... .............102

I REPRESSED ABA DEPENDENT GENES ................................................ 119

J ACTIVATED ABA DEPENDENT GENES ..................................... .................124

K VP1 ACTIVATED/ABA ACTIVATED SUBCLASS OF VP1 "OR" ABA
D E PEN D EN T G EN E S ..................................................................... ..................139

L ACTIVATED VP1 DEPENDENT GENES.................. ....................145

L IST O F R E FE R E N C E S ...................................................................... ..................... 152

B IO G R A PH IC A L SK E T C H ........................................ ............................................162
















LIST OF TABLES

Table pa

2-1. Summary of known and putative regulatory elements among VP1/ABA regulated
genes .......................... .................. ................. .. 26

2-2. Summary of known and putative regulatory elements among cold-regulated genes.33















LIST OF FIGURES


Figure page

1-1. Structure of abscisic acid. ..................... ................................... 1

1-2. Schematic diagram of cDN A microarray.. ....................................... ...............6

1-3. Schematic diagram of high-density oligonucleotide microarray..............................7

2-1. Classification of genes analyzed by MotifFinder: (A) Hierarchical classification
from Suzuki et al. showing subclasses of VP1/ABA regulated genes. (B)
Hierarchical diagram of Cold-regulated gene classes defined by Fowler and
T hom show ....................................................... ................. 24

2-2. The motifs with highest significance have similarity to G-box related ABA
responsive elements (5'-G[A/C]CACGTG-3'). ................................................ 25

2-3. Comparison of ACGT flanking sequences. ....................................................... 28

2-4. Summary of regulatory elements that are associated with ACGT-core elements
among VP1/ABA regulated genes: (A) Associations to ACGT-core were
calculated by comparing our Motif dictionary of 43,168 motifs with 20bp flanking
ACGT-core vs. 20 bases selected at random. (B) Sequences were located by eye
using the color output of M otifM apper. .......... .................................................30

2-6. Distribution of the top 75 statistically significant motifs among MotifFinder analysis
of 353 VP1/ABA regulated genes ................................................ ............... 35

2-7. Activated and Repressed subclasses show strikingly different distributions of cis-
elements in relation to the ATG start site: (A) The asymmetrical distribution of the
top 75 significant motifs of the activated ABA dependent class and 75 randomly
generated motifs which are symmetrically distributed. (B) The symmetrical
distribution of the top 75 significant motifs of the repressed ABA dependent class
has a similar distribution to 75 randomly generated motifs. ................................36

2-8. Distributions of significant and random motifs of the four classes of cold-regulated
genes defined by Fowler and Thomashow: (A) Top 75 significant motifs in
promoters of genes up-regulated long-term by cold along with 75 randomly
generated motifs. (B) Top 75 significant motifs in promoters of genes up-regulated
transiently by cold along with 75 randomly generated motifs. (C) Top 75
significant motifs in promoters of genes down-regulated by cold along with 75









randomly generated motifs. (D) Top 75 significant motifs in promoters that are
regulated by CDF1, CDF2 and CDF3 over-expressing lines along with 75
random ly generated m otifs. ............................................. ............................. 38















Abstract of Thesis Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Master of Science

QUANTITATIVE ANALYSIS OF CIS-REGULATORY SEQUENCES IN GENES OF
ARABIDOPSIS

By

Matthew Gene Ketterling

August, 2003

Chair: Dr. Donald R. McCarty
Major Department: Plant Molecular and Cellular Biology

In order to rigorously analyze regulatory sequences that are shared by clusters of

co-regulated genes, we have developed a quantitative computational approach for

objective analysis of promoter sequences. The algorithm, implemented in a PERL

program called MotifFinder, identifies sequence motifs that are over-represented in

promoters of co-regulated genes relative to a control set of randomly selected promoters.

We used MotifFinder to analyze promoter sequences of co-regulated Arabidopsis genes

classified according to expression data generated by oligo-microarray experiments. We

have successfully used this software to detect and analyze the distributions of known cis-

elements that mediate Viviparous 1 (VP1) binding and abscisic acid (ABA) signaling as

well as cis-elements shared among cold responsive genes regulated by the CRT/DRE

Binding Factor (CBF). In addition to known cis-elements, MotifFinder revealed several

previously unidentified motifs that may have biological significance. A second program,

MotifMapper, color codes the shared motifs according to statistical significance and









displays the putative cis-elements as highlighted regions within the promoter sequences.

The combined application of this software allows quantitative analysis and visualization

of promoter elements using no prior knowledge of transcription factor binding sites.















CHAPTER 1
LITERATURE REVIEW

Abscisic Acid

The hormone abscisic acid (ABA) (Figure 1-1) is found in all plant tissues and

regulates both development and responses to environmental stress (Addicott and Cams,

1983). Since its identification in the 1960s, ABA has been implicated in embryogenesis,

seed dormancy and cell division as well as stress responses to temperature fluctuations,

drought, salt, and UV radiation (reviewed in Koornneef et al., 1998; Leung and Giraudat,

1998; McCourt, 1999; Rock, 2000; Finkelstein et al., 2002). Pathways of ABA synthesis

and catabolism have been well-documented (Hirai, 1986; Zeevaart, 1999), and analysis of

mutants with impaired ABA expression or responsiveness has revealed phenotypes such

as withering during humidity changes, induced germination, and germination resistance

under salt stress (reviewed in Leung and Giraudat, 1998; McCourt, 1999; Rock, 2000;

Finkelstein et al., 2002).




O


HH3

HgC H1
CH3 OH
HO -

CH3


Figure 1-1. Structure of abscisic acid.









ABA signaling cascades lead to both early and late phase responses. During drought, for

example, ABA rapidly induces stomatal closure to inhibit water loss (Mac Robbie, 1998;

Assmann and Shimazaki, 1999) and stimulates the transcription of drought resistance

genes (Ingram and Bartels, 1996; Bray, 1997).

During seed development in maize, ABA responses are regulated by the gene

VP1. Vpl mutants express normal levels of ABA but are deficient in ABA perception

which results in early seed germination (Robichaud and Sussex, 1986; Neill et al., 1987).

This phenotype can be corrected by exogenously applying 10-fold more ABA than

required for maturation in wild-type (Robichaud et al., 1980). Transposon tagging of

VP1 showed that it was expressed exclusively in developing seeds (McCarty et al., 1989).

Vpl is a transcriptional activator, and over-expression of Vpl can trans-activate other

ABA-inducible genes (McCarty et al., 1991; Kao et al., 1996; Hagenbeek et al., 2000).

VP1 has four shared functional domains Al, B1, B2, and B3 (Giraudat et al., 1992). The

largest domain, B3, has been shown to bind specifically to the Sph cis-element in (Suzuki

et al., 1997). While the B3 domain has not been shown to bind to G-box (Suzuki et al.,

1997; Hill et al., 1996), there is evidence that various G-box motifs may act as coupling

elements which possibly enhance VP1 binding to Sph-elements of VP1 regulated genes

(Hattori et al, 2002; Shen et al., 1996; Vasil et al., 1995; Hobo et al., 1999b). The B3

domain is highly conserved in transcription factors in many plant specie s; the

Arabidopsis proteins ABI3, LEC2 and FUS3 all contain B3 domain which could indicate

redundancy in the signaling pathway (Giraudat et al., 1992; Luerssen et al., 1998; Stone

et al., 2001). B2 domains have been implicated in enhancing the binding affinity of

various transcription factors for their respective target promoters in vitro as well as trans-









activating an ABA-inducible early methionine-labeled LEA gene (Em) promoter (Hill et

al., 1996). The Al acidic domains of VPl and ABI3 function as transcriptional activators

of ABA regulated genes (McCarty et al., 1991; Rojas et al., 1999).

Genetic screens for ABA-insensitivity in Arabidopsis were conducted by

germinating plants on plates containing ABA concentrations high enough to inhibit

normal germination (Finkelstein, 1994), and five ABA-insensitive mutants were

discovered (abil-abi5). ABI1 and ABI2 genes encode type 2c Ser/Thr protein

phosphatases (PP2Cs) which function in ion channel regulation of genes induced by

ABA, drought and cold (Gilmour and Thomashow, 1991; Pei et al., 1997; Chak et al.,

2000). The mutants, abil and abi2, both encode Gly to Asp missense mutations that

result in reduced phosphatase activity in vitro (Leung et al., 1997; Sheen, 1998) and lead

to a wide range of effects such as stomatal movements, germination effects and changes

in adapted growth (Rock and Quatrano, 1994; Leung and Giraudat, 1998). ABI3, ABI4

and ABI5 genes of Arabidopsis are found in an ABA signaling pathway and are

implicated in seed development (Finkelstein, 1994). ABI4 is a transcription factor

belonging to the APETELA2 family (Finkelstein et al., 1998) and is expressed in seeds

and vegetative tissue (reviewed by Rock, 2000). ABI5 is characterized as a basic leucine

zipper (bZIP) and confirmed the presence of bZIP transcription factors in ABA signaling.

The abi5-1 mutant protein lacks the dimerization and DNA binding domains required for

function (Finkelstein and Lynch, 2000).

ABA is one of many signals generated during plant stress responses to trigger

transcription of stress response genes. One family of ABA-regulated genes that become

transcriptionally activated during cold or drought conditions are the COR genes. COR









genes can be activated in an ABA dependant or independent manner and they function in

freezing tolerance and desiccation tolerance by interacting with other cellular components

(Ismail et al., 1999; Thomashow, 1999). Most COR genes contain the predominant cis-

element CCGAC, which has been named the C-repeat/dehydration-responsive element

(DRE). The functions of the DRE are to regulate cold-regulated and dehydration-

regulated genes in an ABA independent manner (Liu et al., 1998; Thomashow, 1999).

COR gene promoters are recognized by the APETELA2-like transcription activators cold

binding factor (CBF) and dehydration-responsive-element binding factor (DREB) (Baker

et al., 1994; Yamaguchi-Shinozaki and Shinozaki, 1994; Stockinger et al., 1997; Gilmour

et al., 1998; Liu et al., 1998). Overexpression of CBF and DREB lead to increased

tolerance to cold and desiccation.


Microarrays

The invention of oligo-nucleotide arrays and cDNA microarrays (Fodor et al.,

1993; Schena et al., 1995; Lockhart et al. 1996) has allowed scientists to measure the

transcription of thousands of genes in relatively short amounts of time and has greatly

increased our knowledge of gene expression. Microarray analysis allows the

quantification of mRNA transcripts on a genome-wide scale and provides insight on

differential gene regulation in response to environmental or artificial stimuli (Eisen et al.,

1998). Microarray techniques can be broken down into stationary and mobile phases

(Schena et al., 1995). The stationary phase consists of known strands of fluorescently

labeled DNA, which have been fixed to a solid support composed of silicon or glass. The

mobile phase component of a microarray experiment contains a collection of labeled

unknown DNA, which has been collected from tissue undergoing an environmental or









artificial stress. When sequences of similarity come in contact with each other, the two

complementary strands hybridize, and the relative fluorescent signals can be measured.

Two common variants of microarray analysis are glass slide and gene chip arrays

(review Schulze and Downward (2001) for a more information on glass slide and chip

microarrays). Both types of microarray use a similar strategy. Glass slide microarray

(Figure 1-2) is made up of known, PCR amplified oligomers that are spotted onto specific

positions of a poly-lysine coated glass slide and attached via UV fixation.

Spotting of DNA can be very precise, and thousands of cDNA spots can be fixed to a

chip in duplicate or triplicate for better representation and control. The glass slide is then

incubated with unknown, chemically labeled samples of cDNA that can be fluoresced

and interpreted. One problem with glass-slide microarrays is the laborious work of

amplifying and purifying each clone to be spotted. Co-regulation studies usually involve

many hundreds of genes that potentially respond to a given treatment, and thus, preparing

the glass slide array can be difficult. Further advancements in microarray technologies

included the development of DNA microarray chips (Figure 1-3). This process is similar

to the glass slide technique, but the oligomers are synthesized directly onto the chip. A

solid support is prepared by attaching exposed 3' OH groups and nucleotides are added a

layer at a time by photo-lithography (Wodicka et al., 1997). This method is convenient

because the chips can be made in mass quantities and include enough genes to represent

an entire genome. Gene chips have become increasingly more popular and less expensive

during the past few years.














cDNA collection


Insert amplification by PCR
a. Vector-specific primers
o. Gene-specific primers

Printing
Coupling
Denaturing


RMME*


Ratio .' ,.3 --


T
Hybridization
mixing

Cy3 Cy5
TTTTTTTT V T TTTTTTTT
Cy3 or ,' ----,, rr* r. *- nTTTTTTTT
labelled cDNA ----TTTTTTTTT TTTTTTTT
-- TTTTTTTT --M TTTTTTTT

SFirst-strand cDNA T
| synthesis |
AAAAAAA ----- AMAMAAA**
Total RNA *MAAAAM* *A-WAM"*AA


Figure 1-2. Schematic diagram of cDNA microarray. cDNA libraries are amplified
using gene specific primers and the PCR products are fixed to glass slides.
RNA from two different cell populations are used to reverse transcribe cDNA
with Cy3 and Cy5 fluorescent probes attached. The Cy3 and Cy5 labeled
cDNA's are then mixed with hybridization buffer and hybridized to a glass
slide were competition between the populations would take place. High-
resolution confocal scanning of the glass slide is then used to detect different
wavelengths of each dye.











-- ------------- --- UU
annon0 jIIAI
./ *,


mRNA reference
sequence


LPerfect match Probe set
Mismatch


In siu synthesis
by photolithography


S Staining
S hybridization
-- -


In vitrotranscription


S cDNA synthesis

.W..,,AAA -A ,AA


-- Ratio array 1/array 2








Biotin-labelled
cRNA

Double-stranded
cE'NA


PolyA+ RNA


Cells/tissue


Figure 1-3. Schematic diagram of high-density oligonucleotide microarray. To prepare
the array chip, 20, 25-mer seuqnences are chosen from an mRNA reference
library which corresponds to a unique transcript in the 5'-untranslated region.
Oligomers are then synthesized directly on the chip via photolithography.
Double-stranded cDNA is created from RNA samples from tissues or cell
populations; the cDNA is then transcribed using biotin-labeled nucleotides to
make cRNA. The cRNA from each sample is hybridized to two different
arrays and the relative intensities can then be measured.









Both glass slide arrays and gene chips are usually processed through high

luminescence scanners which are able to illuminate fluorescent probes that are attached to

the DNA on the chip. Intensities can be measured in terms of cy3 to cy5 ratios in the

case of glass slides or overall intensity as compared to a normalized background signal in

the case of gene chips. The scanned images are usually digitalized for storage and data

processing. A number of software programs are available for interpreting digitalized

intensities and converting those intensities to numerical values for data analysis.

Because microarray analysis is relatively fast and easy, a plethora of data can be

generated on a whole genome scale in a very short time. However, the scope of

microarray analysis is also its largest downfall. If 30,000 genes are analyzed for

differential transcript levels over several time points in response to several experimental

treatments, the sheer number of data points can be overwhelming. Interpretation is

virtually impossible by eye, and researchers have learned to rely on computer analysis.

Numerous methods have been used to break gene expression patterns into classes

of co-regulated genes. Although no consensus has been reached for a correct way of

interpreting microarray data, there are several preferred approaches which are dependent

on experimental setup and design and are often validated by empiricism. Cluster analysis

is often used to separate genes into different classes. Statistical clustering algorithms

such as Hierarchical Clustering, K-means, Self Organizing Maps (SOMs) and others have

been applied to gene regulation studies (Altman and Raychaudhuri, 2001; De Smet et al.,

2002; Eisen et al., 1998; Heyer et al., 1999; Tavazoie et al., 1999). An algorithm can be

either supervised, which takes biological significance into account, or unsupervised,

which functions independently of prior biological knowledge (Heyer et al., 1999).









Simple fold change has also been commonly used as a method of classifying genes (Chen

et al., 1997; Suzuki et al., in press).

Microarray technology has been a critical tool in dissecting gene regulatory

networks. Key early experiments analyzed changes in gene expression during various

stages of the cell cycle in yeast (Spellman et at, 1998; DeRisi et al., 1997; Velculescu et

al., 1997; Wodicka et al., 1997; Wolfsberg et al.,1999) and humans (Cho et al.,2001;

Shedden and Cooper, 2002). Bulyk et al (2001) employed a microarray-based approach

for testing the binding affinities of zinc-finger transcription factors that were expressed

on the surface of phages. By fixing all possible 3 base pair motifs onto a glass slide and

treating each glass slide with a different variation of zinc-finger motif, they were able to

determine the optimal binding affinity for that particular transcription factor. Such an

approach can also be applied to other families of transcription factors such as

homeodomains, helix-turn-helix motifs, beta-sheets, leucine zippers and steroid receptors

(Bulyk et al., 2001). Although other transcription factors would require analysis of

motifs longer than 3 base pairs, such larger oligomers could be constructed and spotted

onto glass slides, with the only limiting factor being the exponential numbers of

sequences needed as the size of each motif increased (Bulyk et al., 2001).

There are a few drawbacks to microarray technologies. The equipment for

producing oligo-microarray chips (Gene Chips) is expensive is mainly available

commercially (Jain, 2001). Commercial chips are usually made for organisms whose

genomes have been sequenced; those individuals studying a lesser characterized

organism may have to resort to glass slide arrays or northern blot analysis. Another

problem is the availability of proper annotation. Mis-annotated chips can cause wrongful









interpretations of gene expression data do to inaccurate transcription start sites and

improper naming. This problem is becoming less of an issue due to reannotation and

proofing of databases, especially with fully sequenced organisms.


Gene Regulation Networks

Basic gene regulatory mechanisms combine to form complex regulatory networks

capable of controlling all aspects of metabolism in a cell, tissue or organism. Multiple

transcription factors work together in maintaining these gene regulatory networks. As

exemplified by the ABA responsive genes, numerous instances of cross-talk occur

between signaling pathways to generate differential gene expression from a limited set of

signaling components to allow for varied responses to cold, desiccation, salt stress, UV

radiation, and other stimuli. One way of deciphering the complex nature of gene

regulation is to examine the role of downstream transcription factors.

Transcription-factors act as biological switches to regulate gene expression by

binding to short, specific DNA sequences called cis-elements which are usually found in

the promoter region immediately upstream of the transcription start site. Genes that are

co-regulated in response to developmental and environmental signals often share these

conserved motifs in their promoters thus allowing one transcription factor to mediate

expression of an entire set of genes. Likewise, genes often contain binding sites of

various affinities for multiple transcription factors to permit differential regulation.

Transcription factors can recruit other co-activating or chromatin remodeling proteins

through protein-protein interaction domains, which leads to further regulation of gene

expression. Although promoter structure usually involves multiple degenerate cis-

elements, understanding the mechanisms of transcriptional regulation requires a









knowledge of the conserved motifs used to bind transcription factors. By locating these

conserved motifs among co-regulated genes, we can begin to dissect the primary steps in

gene regulation.

Promoter structure can be complex. Transcription factors generally recognize short

motifs of semi-conserved nucleotides which may be continuous or separated by non-

conserved sequence. Binding site length, spacing, copy number, and degree of

conservation vary depending on the specific transcription factor. A particular

transcription factor may have a preferred binding site, but may allow some degeneracy;

this can usually be seen as differences in fold inductions of various genes. In studies of

abscisic acid response elements (ABREs), ACGT containing cis-elements serve as

binding sites for as many as 80 predicted basic region leucine zipper (bZIP) transcription

factors in Arabidopsis (Riechmann et al., 2000; Jakoby et al., 2002). The G-box

conserved nucleotides ACGT are most often used as a core recognition sequence (Hobo

et al., 1999a; Pla et al., 1993); however, other studies of ABREs have found that GCGT

or AAGT can be substituted in place of ACGT (Hobo et al., 1999a; Ezcurra et al., 2000;

Hattori et al., 2002). Sequence differences on flanking portions of the ACGT core of G-

box directly affect the affinity of binding proteins. Differential gene regulation can be

achieved by the combined effects of transcription factors binding to G-box and

interacting with other regulatory sequences (Donald and Cashmore, 1990; Weisshaar et

al., 1991; Rogers and Rogers, 1992). This promiscuity of cis-elements hinders discovery

of shared motifs of co-expressed genes.

Even though a transcription factor may recognize a particular cis-element, other

motifs with similar sequences might also be recognized due to the inherent promiscuity









of transcription factor binding. Hattori et al., (2002) performed a mutation analysis of

one of the G-box binding sites for leucine zipper transcription factors and demonstrated

that different b-ZIPs bind to the ACGT core of G-box with different affinities. In the

same study, the authors also synthesized several different flanking sequences to

accompany the ACGT core and tested expression by coupling each promoter element to a

reporter gene. They found that ACGTGGC and ACGTGTC were the preferred binding

sequences but that other ACGT-core containing sequences were partially functional

(Hattori et al., 2002).

Various genetic mechanisms can often complicate transcription factor binding to

cis-elements. The presence of a cis-element does not necessarily indicate that the

corresponding gene will be activated or repressed along with other genes containing

similar cis-elements. Some transcription factors can bind in tandem, such as bZIP

transcription factors, and cause differential expression. Chromatin remodeling factors

may need to be present in order to acetylate, methylate or modify nucleosomal DNA

before promoters are available for regulation (Ng and Bird, 1999, 2000; Grunstein, 1997;

Struhl et al., 1998; Richards, 1997). Genes could possibly require enhancer sequences

that could be located a great distance from the transcriptional start site (review

Blackwood and Kadonaga, 1998), or transcription could be blocked due to silencing

elements which favor the formation of heterochromatin (review Blackwood and

Kadonaga, 1998).


Cis-elements Discovery

Several problems inherent to the discovery of regulatory elements have thus far

hindered computational analysis. Regulatory motifs are short so that false candidates are









often encountered; gaps can occur in conserved sequences; certain degrees of degeneracy

are tolerated, and the positions of the elements relative to the transcription start sites are

not fixed. Despite the complexity associated with gene regulation, multiple techniques

have been developed for cis-element discovery.

Computational cis-element analysis using microarray and genomic sequence

information has increased the speed at which novel binding site are found; however, for

each putative binding site uncovered by statistical analysis, functional information must

be acquired in order to verify that the motif does have regulatory function in biological

systems. Programs for cis-element discovery fall into two general categories,

enumerative and iterative motif finding. Enumerative motif finding is based on word

counting, pattern matching and methods of determining background from signal (Zhang,

1999; Wolfsberg et al., 1999; Jensen and Knudsen, 2000; van Helden et al., 1998, 2000;

Sinha and Tompa, 2000; Tompa, 1999). Iterative motif finding is based on a statistical

model of finding best-fit motifs by adding and subtracting commonly occurring

sequences (Bailey and Elkan, 1995; Hughes et al., 2000; Liu et al., 2002; Roth et al.,

1998; Workman and Stormo, 2000).

Previously published motif-finding programs, such as those utilizing Gibbs-

sampling algorithms were first used for finding conserved protein sequences (Lawrence

et al., 1993) but were later modified to find conserved DNA motifs. Gibbs-sampling

algorithms use randomly selected starting points to generate a weighted motif matrix

which is then modified by sequential comparison to the original data set. Motifs are

added and subtracted from the original matrix until only highly-conserved, best-fit motifs

remain (Lawrence et al., 1993; Thijs et al., 2002). Hughes et al. (2000) Used AlignACE,









a Gibbs-sampling algorithm, to identify transcription factor binding sites among

Saccharomyces cerevisiae. A total of 248 genes where clustered into various classes, and

AlignACE returned numerous known and unknown transcription factor binding sites

among the promoters in 25 classes of clustered genes. Liu et al (2001) developed another

variation of the Gibbs-sampling algorithm called BIOPROSPECTOR that also uses

Markov background models derived from either the user or a specified background

sequence file. BIOSPECTOR was able to identify S. cerevisiae RAP1, Bacillus subtilis

RNA polymerase, and Escherichia coli CRP binding motifs with varying accuracies

depending on the background models used in each experiment (Liu et al., 2001).

There are some problems inherent to Gibbs-sampling algorithms. A background

model for the organism is often needed in order to correct for unequal distributions of

nucleotide frequencies which often occur in biological organisms. Another problem with

Gibbs-sampling algorithms is that many repetitive or low-complexity motifs are often

returned as significant results. Since organisms contain long repetitive sequences, many

putative motifs generated by the algorithm tend to be biological irrelevant. Other

methods have been used to increase the accuracy of the reference set prior to invocation

of the Gibbs-sampling algorithm as well as slight modifications of the algorithm itself

(Hughes et al., 2000; Thijs et al., 2001, 2002; Liu et al., 2001; Kel et al., 2001; Ohler and

Niemann, 2001).

Enumerative methods, such as the approach taken by van Helden (1998) and

Wolfsberg et al. (1999) for application to the yeast genome, have also proven to be useful

in locating putative cis-elements. These methods involve searching for conserved

pentamers or hexamers and determining the probability that a given motif is in fact a









putative binding site. Van Helden (1998) searched for commonly occurring motifs using

motifs of six base pairs in length and successfully located a known binding site among

nitrogen response genes in S. cerevisiae. In a later work by Wolfsberg et al. (1999), a

similar algorithm was used for analysis of cell cycle related genes which also revealed a

number of known and unknown motifs presumably involved with cell-cycle regulation.

Enumerative pattern matching among sequences is advantageous because it uses a

full representation of all possible motifs rather than relying on a random start set such as

those used in iterative approaches. However, enumerative algorithms are limited by the

motif size used for searching. As the motif length increases, the search dictionary

increases exponentially which greatly slows the speed at which the algorithm can be

conducted. Enumerative methods used by van Helden (1998) and Wolfsberg et al. (1999)

also did not allow for degenerative motifs. Degeneracy is inherent to many cis-elements,

and this could compromise many current pattern matching models.














CHAPTER 2
IDENTIFICATION OF CIS-ELEMENTS

Introduction

Transcription-factors act as biological switches to regulate gene expression by

binding to short, imperfectly conserved DNA sequence elements which are often located

in the promoter region upstream of the transcription start site. Genes that are co-

regulated in response to developmental and environmental signals often share conserved

motifs in their promoter's, thus enabling one transcription factor to coordinate expression

of a larger set of genes. Combinatorial interactions between multiple transcription factors

and sequence variation that creates binding sites differing in binding affinities contribute

to the complex patterns of differential gene expression observed in eukaryotic organisms

(Miklos and Rubin, 1996; Aronone and Davidson, 1997; D'haelseleer et al., 2000;

Mahalingam et al., 2003). Microarray technology has made possible the large-scale

analysis of gene expression levels under various environmental conditions of numerous

genes simultaneously such that genes can be classified into groups of similar expression

patterns (Schena et al., 1995, 1996; Strachan et al., 1997; Lashkari et al., 1997; DeRisi et

al., 1997). Quantitative statistical analysis of patterns in regulatory sequences of co-

regulated gene sets remains a challenging problem.

In general, regulatory motifs shared by co-regulated genes show wide variation in

length, spacing, copy number, and degree of conservation vary depending on the specific

transcription factor. Substantial degeneracy is often allowed in specific binding sites. In

some cases, such differences are correlated with differences in fold inductions of gene









expression (Hattori et al., 2002). In studies of abscisic acid response elements (ABREs),

containing an ACGT-core (eg. G-box), ABREs serve as a binding site for multiple basic

region leucine zipper (bZIP) transcription factors (Hobo et al., 1999a; Pla et al., 1993);

however, other studies of ABREs have found that GCGT or AAGT can be substituted in

place of ACGT (Hobo et al., 1999a; Ezcurra et al., 2000; Hattori et al., 2002). Some

ABREs do not contain ACGT-core motif are coined Coupleing Elements (Shen et al.,

1996). Coupling elements with consensus sequence CGCGTG are similar to other

ABREs except for an A to G substitution in the ACGT core; examples of coupling

elements are CE3 of barley HVA1 gene, motif III of rice rabl6B and a synthetic element,

hex-3 (Shinozaki and Yamaguchi-Shinozaki, 1996). Sequence differences on flanking

portions of the ACGT core of G-box directly affect the affinity of binding proteins (Izawa

et al., 1993). Differential gene regulation can be achieved by combined effects from

transcription factors binding to G-box and other regulatory sequences (Donald and

Cashmore, 1990; Weisshaar et al., 1991; Rogers and Rogers, 1992). This promiscuity of

cis-elements requires the use of degenerate motifs while parsing promoters for over-

represented motifs among co-expressed genes.

There are some indications that different ABREs can be correlated in a tissue

specific manner such as ABI5 and TRAB1 (Hobo et al., 1999b; Finkelstein and Lynch,

2000) which are expressed mainly in seeds. Other binding factors such as ABA-

responsive element binding proteins (AREBs) are expressed in roots as well as in

vascular tissue (Uno et al., 2000; Kang et al., 2002). A number of ABI5-like bZIP factors

containing putative Ser/Thr phosphorylation sites have been discovered that bind to

various ABREs (Choi et al., 2000); additively, proteins such as 14-3-3 proteins have been









shown to interact with various bZIP proteins (Schultz et al., 1998) and may ultimately

affect protein binding affinities to different cis-elements. HvABI5 of barley shows a high

similarity to a subfamily of bZIP proteins with closet homology TRAB 1 of rice (Kim and

Thomas, 1998) and AREB2 (Uno et al.,2000); and ABI5 (Finkelstein and Lynch, 2000)

in Arabidopsis. Casaretto and Ho (2003) found that HvABI5 binds to ABRCs in a

sequence specific manner by electrophoretic mobility shift assays with recombinant

HvABI5. It was also found that when using either a mutated CE3 element or a mutated

ACGT-box, that the affinity for ACGT-boxes was greater than the affinity for coupling

element 3 (CE3) additionally, using two copies of each ACGT-box or CE3 produced that

same results (Casaretto and Ho, 2003).

VP1 is a transcription factor that mediates germination, developmental arrest and

desiccation tolerance in maize by blocking ABA perception in seed tissues (Robichaud

and Sussex, 1986). VP1 has 4 functional domains Al, B1, B2, and the largest B3

(Giraudat et al., 1992), which has been shown to bind specifically to Sph cis-elements

(Suzuki et al., 1997). While the B3 domain does not bind specifically to the G-box

(Suzuki et al., 1997), there is evidence that various G-box motifs may act as coupling

elements which possibly enhance VP1 binding to Sph-elements of VP1 regulated genes

(Hattori et al., 2002; Shen and Ho, 1996; Vasil et al., 1995).

CRT/DRE Binding Factors (CBF) are a family of AP2 transcriptional activators

that can bind to CRT/DRE elements causing gene up-regulation in response to cold stress

which contribute to an increased freezing tolerance (Baker et al., 1994; Yamaguchi-

Shinozaki and Shinozaki, 1994; Stockinger et al., 1997; Gilmour et al., 1998; Liu et al.,

1998). Fowler and Thomashow (2002) have recently performed a global analysis of cold









regulated and CBF regulated genes in Arabidopsis. These data defined several distinct

classes of genes via Affymetrix microarray that respond to cold and serve as good

candidates for cis-element analysis.

Computational algorithms such as Gibbs-sampling have been developed to detect

over-represented cis-elements among co-regulated genes. Gibbs sampling algorithms use

randomly selected starting points to generate a weighted motif matrix which is then

modified by sequential comparison to the original data set. Motifs are added and

subtracted from the original matrix until only highly-conserved, best-fit motifs remain

(Lawrence et al., 1993; Thijs et al., 2002; Ohler and Niemann, 2001). To eliminate noise,

Gibbs-sampling methods require a statistical model for the species and class of sequence

analyzed. Since Gibbs-sampling algorithms rely on random starting points that are later

refined, the sampling of possible motifs of a given level of complexity is incomplete. A

second limitation is the filtering of repetitive sequences and low-complexity sequences in

DNA. Other methods have been used to increase the accuracy of the reference set prior

to invocation of the Gibbs-sampling algorithm as well as slight modifications of the

algorithm itself (Hughes et al., 2000; Thijs et al., 2001, 2002; Lui et al., 2001; Kel et al.,

2001; Ohler and Niemann, 2001).

Here we describe an alternative approach based on an enumerative methodology.

Enumerative motif finding is based on word counting, pattern matching coupled with

various methods of determining background from signal (Zhang, 1999; Ohler and

Niemann, 2001; Wolfsberg et al., 1999). MotifFinder operates by generating a complete

set of motifs of a specified length and degeneracy and then comparing the frequency of

each motif in a set of putative co-regulated gene promoters to the frequency of that same









motif in a random set of unrelated genes. MotifFinder eliminates the reliance on random

starting points by generating a non-redundant dictionary representing all possible motifs.

We have implemented an enumerative algorithm in the MotifFinder program and

applied it to analysis of cis-elements in sets of ABA and cold regulated genes, responses

mediated by VP1 and CBF transcription factors respectively.


Materials and Methods

MotifFinder algorithm

MotifFinder is based on the approach described by Wolfsberg et al. (1999) for

application to the yeast genome. The MotifFinder program first generates a motif

dictionary consisting of a complete, non-redundant set of motifs of a specified length

with a specified number of degenerate bases. Reverse complements and trivial

redundancy where motifs differ only in having one or more degenerate bases at opposite

ends are excluded. For example: NNGGAAGC, NGGAAGCN, GGAAGCNN and the

reverse complements: GCTTCCNN, NGCTTCCN, NNGCTTCC are considered

equivalent.

MotifFinder builds a control set of promoters by extracting 1000 random published

sequences of the 7402 Arabidopsis genes present on Affymetrix gene chip. The 1000

randomly selected promoters are used to determine the expected frequency of each motif

in Arabidopsis promoters and therefore common DNA sequences such as repetitive DNA

or motifs for general transcription factors are not mistaken as candidate cis-elements by

MotifFinder. The motif dictionary is then compared to the 1000 control promoters to

determine the average frequency of the motif in unrelated genes. The dictionary file and









background frequencies are saved and can be used repeatedly for multiple promoter

comparisons.

To extract putative cis-elements from genes grouped based on microarray

expression data, MotifFinder scores each motif in the dictionary against the promoters of

potentially co-regulated genes. Statistical significance is determined by a chi-square test

performed for each motif using the expected frequency from the 1000 random promoters

versus the observed frequency from the test promoters. The resulting p-values are

screened by an adjustable p-value cutoff such that only motifs of specified significance

are returned as a tab-delimited text file that can be inserted into a spreadsheet for easy

viewing and sorting purposes.


MotifFinder implementation

In this paper, we use a dictionary of 43168 motifs containing 6 conserved and 2

degenerate bases, but other combinations are possible. We have thus far used

MotifFinder for analysis of A. thaliana gene regulation, but genes from any organism

with sufficiently described genomic sequence could be evaluated. We focused on a set of

353 strongly expressed genes that were induced or repressed greater than 3-fold in an

effort to identify patterns of regulatory sequences that differentiate these response classes.


MotifMapper algorithm

MotifMapper is a visualization program that maps lists of motifs onto promoters.

Each significant motif found by MotifFinder is aligned with each promoter analyzed.

The bases in each promoter are assigned a numerical score according to the number of

significant motifs that co-align. Motifs that are highly significant among groups of

related genes congregate which increases a particular base's score. MotifMapper









produces a multi-color representation of the promoter set which displays the locations of

mapped motifs relative to each other and to the transcription start sites along with a

composite score for each promoter which is derived by adding the number of significant

motifs present. A matrix file is created containing base significance information that can

easily be inserted into a spreadsheet for promoter graphing purposes (output not shown).

MotifMapper also generates a separate file ranking putative cis-elements by a numerical

score based on the number of hits per base as well as the upstream location of each

putative cis-element. MotifMapper collates data derived from MotifFinder to permit easy

identification of both known and unknown DNA binding motifs in a promoter.


Promoter database

A database containing upstream regulatory sequences (-600 nt of the annotated

coding sequence) of 7402 annotated Arabidopsis genes represented on the Affymetrix 8k

Gene Chip was constructed by parsing the XML format chromosome assemblies

(http://www.tigr.org). The promoter sequences were extracted from the XML files using

a Perl script modified from sample code developed at TIGR.



Testing for false-positive motifs

To test for motifs that are significant due to random chance, a set of 75 random

promoters were compared to 1000 randomly selected promoters and chi-square values

were calculated for each motif. An average of 5 motifs had a significant p-value of at

least 10-5 when 5 replications are considered.









PERL version and modules

PERL v5.6.1 built for MSWin32-x86-multi-thread

Copyright 1987-2001, Larry Wall

Binary build 632 provided by ActiveState Corp. http://www.ActiveState.com



P-values were calculated using Math::CDF module that can be found on the CPAN

website (http://www.cpan.org).



Results

Analysis of VP1/ABA regulated genes

Our analysis of ABA regulated gene expression (Suzuki et al., in press) revealed a

large set of genes that are up and down regulated within 12 hours of an ABA treatment.

The ABA response is strongly altered in plants carrying 35S-VP1 transgene resulting in

multiple classes of VP and ABA regulated genes. In order to identify patterns of

regulatory sequences that differentiate these response classes we developed methods for

rigorous statistical analysis of cis-regulatory sequences in co-regulated genes. The key

analysis was implemented in the MotifFinder program. We focused on a set of 353

strongly expressed genes that were induced or repressed greater than 3-fold in an effort to

identify patterns of regulatory sequences that differentiate these response classes. We

first constructed a database containing 600 nucleotides of five prime flanking DNA

sequence from each of the 7402 genes represented on the Affymetrix Gene Chip.

Promoters of the 353 ABA/VP1 regulated genes were extracted form this data set.

MotifFinder was used to analyze regulatory motifs that distinguish various VP1 and

ABA response classes defined by Suzuki et al. (in press). We have previously shown that











promoters in the VP1 "and" ABA dependent response class have significantly higher

frequencies of G-box and Sph elements compared to promoters in the ABA dependent

subclass (Suzuki et al, in press). We have extended this analysis to other response classes

(Figure 2-1) including an analysis of motifs which distinguish activated and repressed

promoters.




A VP1/ABA regulated
genes
35


ABA VP1/ABA
89 co-regulated
256

r_ I-I I I I _--
represse activate VP1 "OR" VP1 "AND" ABA ABA VP1 dependent
45 44 ABA dependent 28
32 114 82
I II I I I i I
VP1 activated,
ABA activated repressed activate repressed activate represse activate
44 70 18 62 2 26

B
B Cold regulated
genes
29



ule- IuPe Down-regulated CBF over-
regulated regulated
transiently 86 39
57 150


Figure 2-1. Classification of genes analyzed by MotifFinder. (A) Hierarchical
classification from Suzuki et al. showing subclasses of VP1/ABA regulated
genes. Underlined classes of genes were analyzed for known and novel cis-
elements. (B) Hierarchical diagram of Cold-regulated gene classes defined by
Fowler and Thomashow.




We extracted numerous commonly occurring motifs visually with the aid of

MotifMapper using motifs with a significance cut-off of 0.00001 and 0.001 in activated










and repressed classes respectively. Motifs related to known ABRE elements were the

most highly enriched in the activated gene classes (Figure 2-2).





Enriched oligo-mers with similarity to G-box
G[A/C]CACGTG


nacacgtn 151 173 0 gacacntn 110 152 6.88E-15
nacgtgnc 109 132 0 nacgngtc 84 106 1.35E-14
cncgtgtn 96 113 0 tnacncgt 97 130 2.26E-14
nacncgtg 95 111 0 annacacg 111 156 2.30E-14
nanacgtg 134 176 0 nncgtgtc 90 118 3.51E-14
gccacgnn 72 69 0 acacgnnt 106 148 5.04E-14
nncacgtg 91 105 0 gncacgng 76 94 5.73E-14
ncgncacg 52 50 2.22E-16 tngncacg 76 94 5.73E-14
nccangtg 91 113 2.22E-16 cnacncgt 68 81 9.57E-14
ngccangt 83 100 4.44E-16 cncgtgnt 68 82 2.20E-13
gnnacgtg 105 140 1.22E-15 nacangtg 136 210 3.10E-13
gncacntg 99 130 2.22E-15 acntgnca 110 159 3.20E-13
gccnngtg 54 55 2.44E-15 cacntgnc 107 154 4.74E-13
gacangtn 119 168 4.22E-15 anncacgt 115 170 6.24E-13
acgtgnng 79 96 4.77E-15 cnnacacg 79 103 8.47E-13

Figure 2-2. The motifs with highest significance have similarity to G-box related ABA
responsive elements (5'-G[A/C]CACGTG-3'). The top 30 motifs are listed.
Columns in order: motif, number of promoters containing that particular motif
in test set, number of promoters hits in control set, p-value of Chi-square.
Because G-box has the highest significance, other motifs will be present in
slightly higher p-value ranges.




In addition to known ABREs MotifFinder analysis revealed several other significant

motifs in found among activated and repressed classes (Table 2-1). Activated classes all

contained a significant (p-value < 0.00001) ACGT-core containing sequence along with

other distinguishing motifs in each class. In contrast, motifs found among repressed

genes tended to have lower statistical significance (p-value < 0.001). For each motif the

percentage of promoters were calculated to give the relative occurrence of each over-

represented sequence; the motif length would directly affect the occurrence of each









Table 2-1. Summary of known and putative regulatory elements among VP1/ABA
regulated genes. Describes the motif, the percentage of promoters containing
each motif and the p-value cutoff of each class analyzed. In the ABA
regulated class, the p-value of 4.4E-05 was used due to large amounts of
motifs above the 1E-05 cutoff which tend to clutter the MotifMapper output.


Enriched motifs in VP1/ABA regulated aenes


Motif


% of
Promote


p-value cutoff


ABA regulated


activated


repressed


VP1 dependent


activated


ABA dependent
activated
repressed



VP1 and ABA dependent
activated


repressed


VP1 or ABA dependent
VP1 act/ABA


ACGT core
TGGCANNA

CATGATGG
GCATT
CATTGA
GCCTA
ACTTG


ACGT core


ACGT core
TGCGAG
CCTCAT
GCGCG


ACGT core
Sph
ACCCCAAA
TGCAT


ACGT core
TCGGC


86.4%
38.6%

11.1%

82.2%
33.3%
35.6%
82.2%


78.6%


79.0%
33.3%
55.6%
27.8%


81.4%
32.0%
31.8%
20.5%


86.4%
50.0%


1.0E-05
4.4E-05

1.0E-03
1.0E-03
1.0E-03
1.0E-03
1.0E-03


1.0E-05


1.0E-05
1.0E-03
1.0E-03
1.0E-03


1.0E-05
1.0E-05
1.0E-03
1.0E-03


1.0E-05
1.0E-05









sequence therefore longer sequences may have lower percentages but there is also a

decrease chance in which the motif occurs by chance.

Analysis of flanking sequence motifs

Many of the most significant motifs enriched in ABA/VP1 activated genes were

related to consensus G-Box related ABREs. Because sequence differences flanking the

ACGT core of the ABRE affect the affinity of binding proteins, we analyzed the

frequency of each variant consisting of a ACGT-core plus three flanking bases among the

enriched sequences. Of the 64 7 bp ACGT elements of form XXXACGT, six motifs

accounted for the majority of enriched elements were selected (each accounting for

greater than 10%). Comparisons between the five response classes revealed two ABRE

motifs that may be correlated with differential regulation of distinctive classes. Since

there are many types of bZIP transcription factors that bind to various ABREs, having

asymmetric distributions of different types of ABREs may facilitate binding of related

but not identical transcription factors which may affect protein-protein interactions as

well as gene direct gene induction. The TAC variant (5'-TACACGT-3') is strongly

enriched in the activated ABA regulated as well as the activated VP1 and ABA

dependent response classes, whereas, the AAA variant ( 5'-AAAACGT-3') is solely

enriched in the Vpl "or" ABA activated response class, specifically the Vpl activated/

ABA activated subclass (Figure 2-3). The four other variants, GCCACGT, GACACGT,

ACCACGT and AACACGT, are uniformly distributed among the five activated classes

and fit an established consensus (G/A)(C/A)CACGT (Figure 2-3) for the ABRE (Hattori

et al., 2002).

















A ------


A ..







n*_


a M




m n
(D .. n-- .2
CD SMK -- ----















____.__.____.____
> o r "--
U
C m,__________________




M -D

> ri




C














n_ ur
CU



> ci: S* .









Motifs associated with the ABRE

Various coupling elements have been implicated in combinatorial interactions

with the consensus ABRE eg Casaretto and Ho (2003). In order to identify conserved

elements in ABA/VP1 regulated promoters that may distinguish response classes through

combinatorial interactions with the ABRE, we analyzed sequences enriched in the near

vicinity of ACGT containing motifs. To do this, we extracted 20 nucleotides located

upstream and downstream (Figure 4a) of the two most frequent ACGT variants for each

promoter class. MotifFinder was used to identify motif that were enriched in 20 base

ABRE flanking segments relative to a set of randomly selected 20 nucleotide segments.

Consensus motifs derived from the enriched 8-mer motifs were visually extracted

with the aid of MotifMapper and are summarized in (Figure 4b).

One or more putative coupling motifs were identified for each of the VP 1 activated

response classes. In an interesting contrast, no significant coupling motifs were found in

a set of 44 ABA activated genes that are not regulated by VP1. The Sph element and a C

box-like motif of 5'-CAATT-3', which could possible serve as a LEC1 binding site,

showed a significant association with ABREs in the VP1 "and" ABA activated class.

Among the ABA dependent class, a 5'-CTTT-3' repeat was common however, there

seemed to be high amounts of clustered motifs which contained mostly pyrimidine rich

motifs and didn't correspond to exact 5'-CTTT-3' repeats.

Motif LOGOs

To refine the definition of consensus ABRE flanking motifs, we used Weblogo

(http://www.bio.cam.ac.uk/cgi-bin/seqlogo/logo.cgi), an implementation of the sequence

logo algorithm developed by Schneider and Stephens (1990). The most conserved










S20b 20b
A xxxACG 2b


VP1/ABA regulated genes and associations to enriched motifs flanking ABA responsive
B elements
Motif % of p-value cutoff
promoters
ABA regulated
activated No 0.32
Associations
VP1 dependent
activated ACGT core 10.2% 1.0E-03

TTTAANNC 65.4% 1.0E-03

ABA dependent
activated ACGT core 15.4% 1.0E-03
CTTT repeat 25.8% 1.0E-03

ACTACNNC 40.3% 1.0E-03

VP1 "and" ABA dependent
activated ACGT core 15.7% 1.0E-03
CAATT 81.4% 1.0E-03
Sph 70.0% 1.0E-03
VP1 "or" ABA dependent
VP1 act/ABA act ACGT core 22.7% 1.0E-03
TCTTCTT 50.0% 1.0E-03


Figure 2-4. Summary of regulatory elements that are associated with ACGT-core
elements among VP1/ABA regulated genes. (A) Associations to ACGT-core
were calculated by comparing our Motif dictionary of 43,168 motifs with
20bp flanking ACGT-core vs. 20 bases selected at random. A Chi Square
value is calculated and the p-value is returned for each motif (B) Sequences
were located by eye using the color output of MotifMapper. The table
describes the motif, the percentage of promoters containing each motif and the
p-value cutoff of each class analyzed. In the ABA regulated class, the lowest
p-value observed was 0.32 which is insignificant.









consensus motifs found by MotifMapper (p-value = 0.001) were used as the search

criteria to extract ten bases flanking each motif. Extracted sequences were then

converted to logos to find best-fit consensus sequences. Figure 2-5 illustrates consensus

sequences for motifs associated with ACGT-core sequences.

In the activated VP1 dependent class, a four-base motif, 5'-TTAA-3, was found to

have other correlated bases in positions 10,15 and 17 (Figure 2-5a). Among activated

ABA dependent genes a significant motif of ACTAC was used as search criteria in which

another significant base in position 17 was identified (Figure 2-5b). The activated

subclass of VP1 "and" ABA dependent regulated class revealed a best-fit motif of

CAATT (Figure 2-5c) and the VP1 activated/ABA activated subclass of VP1 "or" ABA

dependent genes identified a consensus motif of TCTTCT (Figure 2-5d). The CTTT

repeat motif found in the activated subclass of VP1 "and" ABA dependent genes was

used in constructing a motif logo due to the lack of a definite consensus sequence.


Analysis of cold-regulated genes

Analysis of VP 1 regulated genes demonstrated the ability of MotifFinder to

identify novel as well as previous known regulatory motifs. As an independent test of the

MotifFinder approach, we analyzed a set of CBF regulated genes classified from

microarray expression data reported by Fowler and Thomashow (2002). We analyzed

upstream 600 base flanking regions of 293 genes representing three response classes

identified by Fowler and Thomashow (2002): up-regulated long-term, up-regulated

transiently and down-regulated by Cold. As expected, MotifFinder found G-box and

CBF binding motifs previously identified in CBF regulated genes (Fowler and

Thomashow, 2002) confirming the validity and efficacy of the software.























J -- -
~~;~ .* ... ..











.... i










--- --'


Figure 2-5. Sequence logos of consensus sequences were made to detect significant
bases outside identical consensus motifs. Each underlined sequence was
searched against significant bases of MotifFinder output and flanking
sequences were returned for logo construction. (A) Sequence logos of VP1
dependent activated class matching the consensus sequence TTTAANNC.
(B) ABA dependent activated class matching the consensus sequence
ACTACNT. (C) VP1 "and" ABA dependent class matching the consensus
sequence CAATT. (D) VP1 "or" ABA dependent matching the consensus
sequence TCTTCT.











In addition to the known G-box and CBF binding sequences, MotifFinder identified

several other significantly enriched putative cis-elements among the different subclasses

of CBF regulated genes (Table 2-2).

Table 2-2. Summary of known and putative regulatory elements among cold-regulated
genes. Describes the motif, the percentage of promoters containing each
motif and the p-value cutoff of each class described by Fowler and
Thomashow.


Cold-induced genes

Motif % of p-value cutoff
promoters

up-regulated long term CCGAC 40.4% 1.0E-05

ACGT core 86.0% 1.0E-05

GATAT 91.2% 1.0E-05

up-regulated transiently ACNNGT 40.0% 1.0E-05

down regulated GTGTG 59.3% 1.0E-05

over-expressers CCGAC 64.1% 1.0E-05

TAGCTA 38.5% 1.0E-05




Among 57 genes that were up-regulated long-term by cold, the CBF binding motif

5'-CCGAC-3' and G-box core 5'-ACGT-3' were highly significant. Another highly

significant motif of 5'-GATAT-3' was found in 53 promoters and 40 promoters as single

and tandem motifs respectively. ACGT motifs were also enriched in 150 genes that were

up-regulated transiently by cold. Among 86 genes that were repressed in response to









cold, 5'-GT-3' repeats were highly significant indicating a possible cold regulated

repressor binding site.

Fowler and Thomashow (2002) also obtained microarray expression data for CBF1,

CBF2 and CBF3 over-expressing lines and categorized constitutively expressed genes as

part of the CBF regulon. The CBF regulon may be further divided into sub-regulons

controlled by RAP2.1 and RAP2.6. Of the 39 CBF activated genes, 14 did not contain

the known CBF binding motif 5'-CCGAC-3'. MotifFinder analysis revealed a enriched

palindromic sequence, 5'-TAGCTA-3', that is present in 9 of the 14 genes indicating a

potential cis-acting determinant of this sub-regulon of CBF genes.


Distribution of motifs in VP1/ABA and cold regulated genes

Graphing the distribution of the number of significant motifs versus their upstream

locations indicates that putative cis-elements of VP1 regulated genes are generally

located 100 to 300 base pairs upstream of the annotated translation start site (Figure 2-6).

By comparison, 75 randomly selected motifs mapped onto the VP1 regulated gene

promoters are evenly distributed. Therefore, the uneven distribution of significant motifs

is most likely due to a preferential location of cis-elements near the transcription start

site.

Comparison of the distributions of enriched motifs in activated and repressed sub-

classes of the VP1 "and" ABA dependent genes indicated that the highly asymmetric

distribution of enriched elements is characteristic of activated promoters (Figure 2-7a),

whereas motifs enriched in repressed genes are uniformly distributed (Figure 2-7b).

Similar patterns were observed in comparisons of ABA regulated (44 activated and 45

repressed) and ABA dependent response classes (64 activated and 18 repressed, data not









shown). Other comparisons between activated and repressed genes were not done due to

inadequate numbers of promoters in either or both classes.

In cold regulated promoters, a similar though less dramatic concentration of

enriched motifs in a region 100 to 300 upstream of the annotated coding sequence was

detected. As in the case of ABA/VP1 regulated genes, the clustering of enriched motifs

among Cold-regulated genes was evident in activated promoters (Figure 2-8a and 3-8b),

but not in down regulated promoters (Figure 2-8c). The strongest asymmetry was evident

in genes activated in CBF over-expressing lines (Figure 2-8d).


Distribution of the top 75 motifs enriched motifs

45
40 40
( 35
o 30
S25 -- VP1/ABA regulated
o 20 -- Random


CD 10 -
4 5
0

0 200 400 600
Distance from ATG


Figure 2-6. Distribution of the top 75 statistically significant motifs among MotifFinder
analysis of 353 VP1/ABA regulated genes. Black line indicates the
distribution of the top 75 significant motifs from all VP1/ABA regulated
genes; note the non-linear distribution of co-regulated genes compared to 75
random motifs. Gray line shows the distribution of 75 randomly generated
motifs of identical length and degeneracy to those of the VP1/ABA regulated
genes. Motifs were counted by determining the position of the proximal side
of each motif. Number of hits were summed across all promoters and an
averaged over 20 bases was calculated in order to smooth the distribution
lines.















A Top 75 VP1 and ABA dependent activated


8 16
i 14-
-0 12-
S12 VP1 and ABA
Activated
8-
o 6 -- 75 random motifs
4-
2
<( 0 ,
0 200 400 600
Distance from ATG

B Top 75 Vpl and ABA dependent repressed


5

4-
0)
.0
o --VP1 and ABA
S repressed






0 200 400 600
Distance from ATG



Figure 2-7. Activated and Repressed subclasses show strikingly different distributions of
cis-elements in relation to the ATG start site. (A) The asymmetrical
distribution of the top 75 significant motifs of the activated ABA dependent
class and 75 randomly generated motifs which are symmetrically distributed.
(B) The symmetrical distribution of the top 75 significant motifs of the
repressed ABA dependent class has a similar distribution to 75 randomly
generated motifs. ABA regulated, VP1 dependent, and ABA dependent
classes have a very similar distribution pattern but are not shown.









AlignACE

We compared output of our motif finding program to AlignACE, a widely used

implementation of the Gibbs sampling algorithm (Roth et al., 1998; Hughes et al., 2000;

McGuire et al., 2000; McGuire and Church, 2000; Schachter and Kohane, 2002; Masuda

and Church, 2003). We searched for shared motifs in the VP1 regulated gene promoters

using AlignACE and found ACGT containing motifs as well as SPH element but lacked

other putative cis-element found by MotifFinder (data not shown). In addition, the

AlignACE output included various motifs of low complexity sequence such as single or

double nucleotide repeats, which are abundant throughout the genome but which are not

highly specifically enriched in a particular co-regulated gene set (Hughes et al., 2000).

When looking at the AlignACE output of the four subclasses of cold-regulated genes, we

found a few occurrences of the putative motifs found by MotifFinder/MotifMapper (data

not shown); however, we were not able to build a consensus sequence using AlignACE

itself due to the background of low complexity sequences. Because MotifFinder filters

repetitive bases or tandem repeats through a comparison to a set of randomly selected

promoter sequences, the significant motifs found using MotifFinder are more likely to

represent actual cis-elements.

Discussion

Several problems inherent to the discovery of regulatory elements have thus far

hindered computational analysis. Regulatory motifs are short so that false candidates are

often encountered; gaps can occur in conserved sequences; varying degrees of

degeneracy are tolerated, and the positions of the elements relative to the transcription

start sites are not fixed. Despite the complexity associated with gene regulation, we have

successfully developed software to isolate putative transcription factor binding sites.













A Up-regulated long-term by cold
A
5

S4- ,________
--Up-regulated long
3 -term
2 -75 random motifs
1 -
0
0 200 400 600
Distance from ATG

B Up-regulated transiently by cold
10

S--Up-regulated
S 6 transiently
4 4 -75 random motifs
2
S 0-
0 200 400 600
Distance from ATG


C Down-regulated by cold

5
4-


2 L -75 random motifs
i 1
0
0 200 400 600
Distance from ATG


D CBF over-expressing genes
5
4-

3 @-3CBF ox
S2 -75 random motifs


0
0 200 400 600
Distance from ATG

Figure 2-8. Distributions of significant and random motifs of the four classes of cold-
regulated genes defined by Fowler and Thomashow. (A) Top 75 significant
motifs in promoters of genes up-regulated long-term by cold along with 75
randomly generated motifs. (B) Top 75 significant motifs in promoters of
genes up-regulated transiently by cold along with 75 randomly generated
motifs. (C) Top 75 significant motifs in promoters of genes down-regulated
by cold along with 75 randomly generated motifs. (D) Top 75 significant
motifs in promoters that are regulated by CDF1, CDF2 and CDF3 over-
expressing lines along with 75 randomly generated motifs.









MotifFinder, a program written in the highly portable Perl language, compares the

frequency of a given motif in a control set of random promoters to the frequency of the

same motif in a set of promoters grouped by expression data or by other association. The

algorithm may be extended to a variety of motif definitions including incorporation of

gaps and higher degeneracy. Although the motif dictionary used in the analysis can be

constructed having any combination of motif length and complexity, empirically we find

that using 8-mer motifs with 6 conserved and 2 degenerate bases provide a reasonable

balance of sensitivity and simplicity for detection conserved cis-elements. Searching

with shorter or more degenerate motifs yielded increased the background noise level

hindering discrimination of blocks of enriched sequence while searching for longer

motifs caused some over-represented sequences to fall out. Inclusion of 2 degenerate

nucleotides allows for substantial variation among conserved elements because a given

cis-element will match multiple overlapping 8-mers. For this reason, strongly conserved

elements are revealed as clusters of significant 8-mer motifs. Hence, putative

transcription factor binding sites of all sizes can be uncovered using the software.

The combined application of MotifFinder and MotifMapper allows for rapid

prediction of putative transcription factor binding sites and offers various methods for

visualization of promoter structure. MotifFinder accurately detected known binding sites

such as G-box and SPH elements in VP1 regulated genes as well as several other putative

motifs and successfully identified ABA responsive elements and CBF binding motifs in a

set of CBF regulated genes. In addition to previously characterized binding sites, two

other putative binding sites were found among the CBF genes, one of which was already









speculated to exist based on a proposed model by Fowler and Thomashow (2002) for

differential regulation by RAP2.1 and RAP2.6.

The location distribution of significant motifs found in the CBF and VP1 gene sets

indicate that distance from the transcription start site can be a useful parameter for

prediction of co-regulated sets (Kel et al., 2001). Binding proteins usually recognize a

conserved sequence in gene promoters; however, some bases within a conserved

sequence may tolerate a certain degree of degeneracy. By searching with a degenerate

motif dictionary, we eliminate the requirement for a perfectly conserved binding motif

and allow for the possible inclusion of insignificant bases within a cis-element. Gene

regulation by cis-elements can be dependent on position, orientation and location relative

to other cis-elements, all of which can be easily interpreted using the color displays

produced by MotifMapper.

For the analysis of VP1 and CBF genes, we employed a dictionary of 8-mer motifs

containing 6 conserved and 2 degenerate bases to search for putative cis-elements.

Several other dictionaries were tested, but the 8-mer motif dictionary yielded the optimal

stringency for analysis of cis-elements within VP1 and CBF regulated gene promoters.

Because cis-elements can vary in size depending on the binding protein involved,

analysis of other promoter sets using MotifFinder could be performed using alternate

motif dictionaries depending on the degree of stringency desired. We recommend that

future applications of the MotifFinder software include analysis using multiple motif

dictionaries for optimization of stringency.

MotifFinder is designed to located motifs that are overrepresented in a set of genes,

yet a critical motif present in a smaller subset will not receive a high significance score









among a larger gene set. For example, we easily isolated G-box as overrepresented

among 353 VP1 regulated genes but the more diverse and less common Sph binding

elements were not included in the highest scoring motifs. To identify Sph-elements we

analyzed smaller sub-classes of VP1 regulated genes. We therefore caution that

MotifFinder analysis should be conducted on both large groups of related genes and

potentially co-regulated subsets for optimal probability of isolating less prevalent cis-

elements.

In performance and sensitivity, MotifFinder compared favorably with AlignACE,

returning fewer low complexity motifs while finding known ABA and cold response

elements plus several previously unidentified motifs that may be biologically relevant.

Gibbs-sampling methods require a statistical model for the species or classes analyzed.

MotifFinder eliminates the reliance on random starting points by generating a non-

redundant dictionary representing all possible motifs.

We have shown that MotifFinder can successfully find shared motifs among co-

regulated genes. While we have applied this analysis to upstream flanking sequences, the

approach can be applied to comparisons of arbitrary DNA sequences where a comparable

control set of random sequences can be constructed. The method is highly sensitive

allowing detection of significant motifs in a set containing a few co-regulated promoters.

MotifFinder and MotifMapper can be used as tools to identify putative transcription

factor binding sites that are over represented in a given gene set, but further biochemical

characterization should be performed to assess the role of putative cis-elements in

regulation of gene expression.














APPENDIX A
SOFTWARE DOCUMENTATION

Files/Descriptions:

MotifFinderp lv4
Generates A motif dictionary composed of a non-redundant set of sequences of a
given length including a given number of degenerate bases.
Gives a score for each motif pertaining to the number of promoters that contain
that motif. A given number of control promoters are selected from a promoter
database at random.
MotifFinderp2v6
Compares a scored motif dictionary against a set of test promoters defined by the
user and returns the p-value for the Chi-square Distribution.
MotifMapperplv3
Maps a list of motifs onto a set of DNA sequences in various colors depending on
the number of hits on each base. Two lists of motifs can be projected onto DNA
sequences as well, where three colors are used to indicate uniqueness and over-
lapped motifs.
Config.ini
Defines inputs, outputs and options for all three programs.



MotifFinderp 1 v4:

Arguments and Options:
Requires an input file containing a large database of promoter sequences that that
are unique to the organism of interest. This file will be used for selecting a given
number of promoters to be used as a control set and determine background
sequence information.

Requires an output file name for saving a motif dictionary file containing all
possible motifs and scores.

Options for total motif length, number of degenerate bases and number of
randomly selected promoters.


Output File Example:










* Using: motiflength=8, fourway=2 and twoway=0, generates the format shown
below. The top definition line is a summary of options.


8 2
naccagng
naccagnc
naccagna
nnaccagg
nnaccagc
nnaccaga
nnaccact


1000 43168


Note: See sample config.ini file for configuration examples


MotifFinderp2v6:

Arguments and Options:
Requires a file containing a motif dictionary created in MotifFinderpl

Requires a file containing a list of promoters that you wish to find over-
represented motifs.

Requires an output file name for saving motifs and p-values.

Requires an output file name a sub-dictionary file which is comprised of the
number of observed hits in the test set. This file can be used as a motif dictionary
to analyze further subsets of the test set.

Option for an adjustable p-value cutoff. Only motifs below the cutoff will be
stored in the output file.


Output File Example:


Motifs: Hits in test Hits in control
set: set:
cngncacg 11 '
cgtgntna 16 1(
ccnacgng 10
nnggacgt 8
nacncgtg 16 11
cncgtgtn 16 11


Chi-Square (p):

2.75E-08
1.87E-07
9.07E-07
2.06E-06
2.35E-06
3.51E-06










Note: See sample config.ini file for configuration examples





MotifMapperplv3:

Arguments and Options:
Requires either one or two lists of motifs to map onto sequences.

Requires a file containing a list of promoters that you would like to map.

Requires an output file names for saving mapped promoters, a file containing a
summary of possible cis-elements called a "Strengthfile" and a "matrixfile"
containing base strength information for each promoter.

Options for capitalization of significant bases, font size, and applying a score
based on summing each bases value within the promoter.




Output File Example:

Mapping One Class:

20412_atl #F13M23.280#At4g25140 oleosin, 18.5K;ICOORDS: 11864996 -
11865757
ttggctctggagaaagagagtgcggctttagagagagaattgagaggtttagagagagatgcggcggcgat
gagcggaggagagacgacgaggacctgcattatcaaagcaGtgACGTGgTGaaatttggaacttttaagag
gcagatagatttattatttgtatccattttcttcattgttctagaATGtCgCGgaacaaattttaaaaCTa
aATCCtaaatttttctatttttgttgccaatagtggatatgtgggccgtatagaaggaatctattgaaggc
ccaaacccatactgacgagcccaaaggttcgttttgcgttttatgtttcggttcgatgccaacgccacatt
ctgagctaggcaaaaaacAaACGTGTCTttgaatagactcctctcgttaACACaTGCAgcggctgcatggt
gacgccattAACACGTGGCCTACAatTGCatgatgtctccaTTGACACGTGaCTtCtCGTcTCctttctta
atatatctaacaaacactcctacctcttccaaaatatatacacatctttttgatcaatctctcattcaaaa
tctcattctctctagtaaacaagaacaaaaaa
score :552
12 3 4 >4

OR
Mapping Two Classes:

20412_atil#F13M23.280#At4g25140 oleosin, 18.5K;ICOORDS: 11864996 -
11865757
ttggctctggagaaagagagtgcggctttagagagagaattgagaggtttagagagagatgcggcggcgat
gagcggaggagagacgacgaggacctgcattatcaaagcaGtgACGTGgTGaaatttggaacttttaagag
gcagatagatttattatttgtatccattttcttcattgttctagaATGtCgCGgaacaaattttaaaaCTa
aATCCtaaatttttctatttttgttgccaatagtggatatgtgggccgtatagaaggaatctattgaaggc
ccaaacccatactgacgagcccaaaggttcgttttgcgttttatgtttcggttcgatgccaacgccacatt
ctgagctaggcaaaaaacAaACGTGTCTttgaatagactcctctcgttaACACaTGCAgcggctgcatggt










gacgccattAACACGTGGCCTACAatTGCatgatgtctccaTTGACACGTGaCTtCtCGTcTCctttctta
atatatctaacaaacactcctacctcttccaaaatatatacacatctttttgatcaatctctcattcaaaa
tctcattctctctagtaaacaagaacaaaaaa
score :552
Motif list 1 Overlap Motif list 2

AND
Strengthfile:
Pulls out significant bases of an adjustable minimum length with an adjustable
minimum base threshold to help identify cis-elements. Gives positional
information and a score based on the summation of each bases value. (value
based on added hits from each motif).


19009_atl I#F2H17.12#At2g36270

-352 TtCACGTGGAC 90

18991_s_at I#MGF10.3#At3g27660

-159 ACGTGTCG 71

18718 s at I#T20D16.13#At2g23240

-397 CGaACA 14
-128 CACaTGACACaTG 63
-101 GCCACGT 55


Note: See sample config.ini file for configuration examples






CONFIG.INI default

[MOTIFFINDERP1]
motiflength=8
fourway=2
twoway=0
numrandprom=1000
outputfile=c:\perl\data\outfile.txt
promfile=c:\perl\data\promoterdatabase.txt


[MOTIFFINDERP2]
dictionaryfile=c:\perl\data\motifdictionary.txt
promoterfile=c:\perl\data\proms.txt









outputfile=c:\perl\data\mfoutput.txt
subdictoutputfile=c:\perl\data\subdict.txt
pvaluecutoff=0.1


[MOTIFMAPPER]
motiffile=c :\perl\data\motifs.txt
motiffile2=c:\perl\data\motif2.txt
promfile=c:\perl\data\proms.txt
outputfile=c:\perl\data\mapped.txt
strengthoutputfile=c :\perl\data\strengthfile.txt
matrixoutputfile=c:\perl\data\matrixfile.txt
basethreashhold=3
minimummotiflength=5
capital=true
scoreprom=true
fontsize=10
mapone=true
maptwo=false


[EOF] END OF FILE DONOT REMOVE














APPENDIX B
SOURCE CODE: MOTIFFINDERP1V4.PL

#! /usr/bin/perl
# MotifFinderplv4.pl
# counts number of PROMOTERS with one or more hits.
# adds new anagram method lots faster
# better memory usage
# takes arguments from config.ini file
####################################

use strict;
use warnings;

my $configfile = 'c:\perl\programs\config.ini';
my @arguments = config(; ### grabs arguments from config.ini
my $arguments;
my $motiflength = $arguments[0];
my $fourway = $arguments[l];
my $twoway = $arguments[2]; #non-functional
my $numrandprom = $arguments[3];
my $outputfile = $arguments[4];
my profilee = $arguments[5];

my @list;
print "Generating a motif dictionary:\n\tmotif size = $motiflength\n\tfour-way bases =
$fourway\n";
print '\ttwo-way bases = $twoway\n\trandom promoter size = $numrandprom\n\toutput
file = $outputfile\n\n";
my $starttime = time(; ### initial start time
print '\nBuilding Motif List\n";
my @newlist = buildmotiflist(; ### calls buildmotif list function
print '\nScoring Motif List\n";
my @motifdict = scorelist(@newlist); ### calls scorelist list function
print '\nPrinting Motif List\n";

open (OUT, ">".$outputfile);
my $arraylength = scalar @motifdict;
print OUT "$motiflength\t$fourway\t$twoway\t$numrandprom\t$arraylength\n";
my $md;
foreach $md (@motifdict)
{








print OUT "$md\n";
}
close OUT;


### prints the time elapsed from time() function stored as $starttime variable
if (time()-$starttime<=60){print '\t$arraylength motifs found in ",time()-$starttime,"
seconds"; }
elsif((time()-$starttime)/60<60) {print '\t$arraylength motifs found in ",(time()-
$starttime)/60," minutes";}
else{print '\t$arraylength motifs found in ",(time()-$starttime)/3600," hours";}

print '\n\nProgram Finished\n\n";
exit(0);



# Subroutines #


sub config
{
open (CIN, "$configfile"); my @config = ; close CIN;
my $config =join( ", @config );
my $arguments;
my @arguments;
if ( $config =- mA[MOTIFFINDERPl\](.+)\[MOTIFFINDERP2\]/is)
{
$arguments = $1;
}
@arguments = split(An/, $arguments);
foreach (@arguments)
{
if( $_ =mA=(.+)/is)
{
$ =$1;
}
}
shift(@arguments);
return @arguments;
}

sub buildmotiflist
{
print generatingig all possible motifs...";
my $tnum;








my $counter;
my $motif;
my @motif;
my $ns = substr("nnnnnnnnnn",0,$fourway);
for (my $number = 0;$number < 10000;$number++)
{
$tnum = $number;
while ( 4 > length($tnum))
{
$tnum = "O".$tnum;
}
if(
sub str($tnum,0,1 )+sub str($tnum, 1, 1)+sub str($tnum,2, 1)+sub str($tnum,3, 1) ==
$motiflength ($fourway + $twoway))
{
$motif = ""
for ($counter = 0; $counter < substr($tnum,0, 1);$counter++ )
{
$motif= $motif."a";
}
for ($counter = 0; $counter < substr($tnum,1,1);$counter++ )
{
$motif = $motif."c";
}
for ($counter = 0; $counter < substr($tnum,2, 1);$counter++ )
{
$motif= $motif."g";
}
for ($counter = 0; $counter < substr($tnum,3,1);$counter++ )
{
$motif= $motif."t";
}
$motif= $ns.$motif;
@motif = split(//, $motif);
permute([@motif], []);
}
}
print "Done\n";
print removingig duplicates..............";
@list = removedups(@list);
print "Done\n";
print removingig reverse compliments.....";
my $revcompline;
my $line;
while ( scalar @list > 0 )








my %search = map { $_ => 1 } @newlist;
$line = shift (@list);
$revcompline = reverse $line;
$revcompline =- tr/acgt/tgca/;

if( $search{$revcompline} || $search{$line} )
{
unshift (@newlist, $line);
}
}
@newlist = removedups(@newlist);
my $newlistline;
foreach $newlistline (@newlist)
{
while ( $motiflength-length($newlistline)
{
$newlistline = "n".$newlistline;
}
}
print "Done\n";
return @newlist;
}

sub removedups
{
my (@array) @_;
@array = sort(@array);
my $previous = "not equal to $array[0]";
@array = grep($_ ne $previous && ($previous = $_, 1), @array);
return @array;
}

sub scorelist
{
print openingig promoter file............";
my (@newlist)= @_;
my @randpromlist;

########## OPENS AND FORMATS A PROMOTER ARRAY

open (PIN, profileile"; my @raw = ; close PIN;
my $raw =join( ", @raw); $raw =-s/\n/=/g; $raw =- s/>/\n>/g; $raw
s/=\nAn/g;
chop($raw); @raw = split(/>/, $raw); shift(@raw);
print "Done\n";









########## scores a list of random promoters

print scoringig in progress.............";
for (my $i=0; $i<$numrandprom; $i++)
{
my $randpos = int(rand(scalar @raw));
unshift (@randpromlist, $raw[$randpos])
}
@raw = (");
my $motif;
my $num;
my @motifdict;
my $randprom;
my $m;
my $rm;
my $rcomp;
foreach $motif (@newlist)
{
$num = 0;
$rcomp = reverse($motif);
$rcomp =- tr/acgt/tgca/;
$m = makewildcards($motif);
$rm = makewildcards($rcomp);
foreach $randprom (@randpromlist)
{
if ( $randprom =- m/$m/g || $randprom =~ m/$rm/g)
{
$num++;
}
}
unshift (@motifdict, "$motif\t$num");

}
@newlist = ('empty');
print "Done\n";
return @motifdict;
}

sub makewildcards
{
my ($m)= @_;
$m =~ s/n/[actg]/g;
return "$m";
}


sub permute









my @items= @{ $_[0] };
my @perms = @{ $_[1] };
my $newperm;
unless (@items)
{
$newperm =join( ", @perms );
while ( substr($newperm,0,1) eq "n" )
{
$newperm = substr($newperm, 1,length($newperm))
}
while ( substr($newperm,length($newperm)-1,1) eq "n" )
{
$newperm = substr($newperm,0,length($newperm)-1 )
}
push (@list, $newperm);
if (scalar @list == 5000000)
{
@list = removedups(@list);
}
}
else
{
my(@newitems,@newperms,$i);
foreach $i (0 .. $#items)
{
@newitems = @items;
@newperms = @perms;
unshift(@newperms, splice(@newitems, $i, 1));
permute([@newitems], [@newperms]);
}
}














APPENDIX C
SOURCE CODE: MOTIFFINDERP2V6.PL

#! /usr/bin/perl
# MotifFinderp2v6.pl
# counts number PROMOTERS with one or more hits.
# uses Math::CDF to acquire p-value
# takes arguments from config.ini file
##################################################

use strict;
use warnings;
use Math::CDF qw(:all);

my $configfile = 'c:\perl\programs\config.ini';
my @arguments = config(; ### grabs arguments from config.ini
my $arguments;
my $dictionaryfile = $arguments[0];
my $promoterfile = $arguments[l];
my $outputfile = $arguments[2];
my $subdictoutputfile = $arguments[3];
my $pvaluecutoff = $arguments[4];
motiffinder(); calls motif finder function

exit(0);


#################################
# Subroutines #
#################################


sub config
{
open (CIN, "$configfile"); my @config = ; close CIN;
my $config =join( ", @config );
my $arguments;
my @arguments;
if ( $config =- mA[MOTIFFINDERP2\](.+)\[MOTIFMAPPER\]/is )
{
$arguments = $1;
}
@arguments = split(An/, $arguments);








foreach (@arguments)
{
if( $_ =mA=(.+)/is)
{
$ =$1;
}
}
shift(@arguments);
return @arguments;
}

sub motiffinder
{

########## OPENS AND FORMATS A PROMOTER ARRAY AND
COUNTS NUMBER OF PROMOTERS

open (PIN, "$promoterfile"); my @raw = ; close PIN;
my $raw =join( ", @raw); $raw =-s/\n/=/g; $raw =- s/>/\n>/g; $raw =
s/=\nAn/g;
chop($raw); @raw = split(/>/, $raw); shift(@raw);
my $counter = 0;
while ($raw =~ m/>/g)
{
$counter++;
}

########## OPENS AND FORMATS A MOTIF LIST

open (MIN, "$dictionaryfile"); my @mlist = ; close MIN;
my $mlist =join( ", @mlist );
chomp($mlist); @mlist = split(An/, $mlist);

########## EXTRACTS INFORMATION FROM DEFINITION LINE OR
EXITS PROGRAM

if ($mlist[0] =~ m/[acgtn]/g)
{
print "definition line is missing, must be in the form:\n
oligosize\tfixedbases\ttwoway\t#ofcontrolproms";
exit;
}
my $defline = shift(@mlist);
my @defline = split(At/, $defline);
my $mlistlength = scalar @mlist;









########## OPENS OUTPUT FILES

open (OUT, ">".$outputfile);
print OUT "Motifs:\tHits in test set:\tHits in control set:\tChi-Square (p):\n";

open (DICTOUT, ">".$subdictoutputfile);
print DICTOUT "$defline[0]\t$defline[l]\t$defline[2]\t$counter\t$mlistlength\n";

########## COUNTS THE NUMBER OF OCCURENCES OF EACH MOTIF

my $line;
my $m;
my $motif;
my $rm;
my $rcomp;
my linen;
my $num;
my $rawprom;
foreach $line (@mlist)
{
$num = 0;
linee = split(At/, $line);
$rcomp = reverse($mline[0]);
$rcomp =- tr/acgt/tgca/;
$m = makewildcards($mline[0]);
$rm = makewildcards($rcomp);
foreach $rawprom (@raw)
{
if ( $rawprom =- m/$m/g || $rawprom =~ m/$rm/g)
{
$num++;
}
}
print DICTOUT $mline[0],"\t",$num,"\n";

########## DEFINES EACH INPUT FOR THE CHI-SQUARE SUB-
ROUTINE

my $promsintest = $counter;
my $hitsintest = $num;
my $promsincontrol = $defline[3];
my $hitsincontrol = $mline[l];
if ($hitsincontrol == 0)
{
$hitsincontrol = 1; # fixes a divide by zero error (conservative
value)









}

########## CALCULATES CHI-SQUARE VALUE

my $exp = ($promsintest*$hitsincontrol)/$promsincontrol;
my $obs = $hitsintest;
my $chi = (($obs-$exp)*($obs-$exp))/$exp;
my valuee = pfromchi($chi);

########## ONLY PRINTS MOTIFS THAT HAVE A LOWER P-
VALUE THAN THE CUTOFF

if ( valuee <= $pvaluecutoff)
{
print OUT $mline[0],'\t$num\t",$mline[l],'\t$pvalue\n";
}
}
close OUT;
}

sub makewildcards
{
### Translates any n's to [acgt] for pattern matching
my ($m) = @_;
$m =- s/n/[actg]/g;
return "$m";
}

sub pfromchi
{
my ($chi) @_
my $df = 1;

########## CALCULATES VALUE
my $ncp = 0; # the optional non-centrality parameter
my $p = pchisq($chi, $df, $ncp);
$p = 1 $p;
return $p;














APPENDIX D
SOURCE CODE: MOTIFMAPPERP1V3.PL

#! /usr/bin/perl
# MotifMapperplv3.pl
# maps one or two classes of motifs onto a promoter in color
# each promoter is given a relative score based on significant bases
# prints a Motiffile that contains actual punitive motifs of a minimum base score and
motif length
# prints a matrix version of the color output for graphing in a spreadsheet.
# takes arguments from config.ini file
# internal config for:
# one class vs two classes
# file names and paths
# capitalization of significant bases
# promoter scores
# fontsize
4444444444444444444444444444444444444444


use strict;
use warnings;


my $configfile = 'c:\perl\programs\config.ini';
my @arguments = config(; ### grabs arguments from config.ini
my $arguments;
my $motiffile = $arguments[0];
my $motiffile2 = $arguments[l];
my profilee = $arguments[2];
my $outputfile = $arguments[3];
my $strengthoutputfile = $arguments[4];
my $matrixoutputfile = $arguments[5];
my $basethreashhold = $arguments[6];
my $minimummotiflength = $arguments[7];
my $capital = $arguments[8];
my $scoreprom = $arguments[9];
my $fontsize = $arguments[10];
my $mapone = $arguments[11];
my $maptwo = $arguments[12];


$mapone
$maptwo
$fontsize


uc($mapone);
uc($maptwo);
$fontsize*2;









$capital = uc($capital);
$scoreprom = uc($scoreprom);
my $line;
my $rawline;
my @rawlinearray;
my $annotation;
my $prom;
my @prom;
my $raw;
my @prommatrix;

########## OPENS AND FORMATS A PROMOTER ARRAY

open (PIN, profileile) || die; my @raw = ; close PIN;
$raw= join( ", @raw); $raw =- sAn/=/g; $raw =- s/>An>/g; $raw =~ s/=\n/\n/g;
chop($raw); @raw = split(/>/, $raw); shift(@raw);

########## selects the mapping of one class or two classes of motifs based on config

if($mapone eq "TRUE")
{
maponeclass);
}
elsif ($maptwo eq "TRUE")
{
maptwoclass);
}
else
{
print 'Either mapone or maptwo must be TRUE';
exit(0);
}

########## end of rtf file

print OUT "\\par\n";
print OUT "\}";
close(OUT);
close(STRENGTHOUT);
close(MATRIXOUT);
exit(0);


# Subroutines #
#################################










sub config
{
open (CIN, "$configfile"); my @config = ; close CIN;
my $config =join( ", @config );
my $arguments;
my @arguments;
if ( $config =- mA[MOTIFMAPPER\](.+)\[EOF\]/is )
{
$arguments = $1;
}
@arguments = split(An/, $arguments);
foreach (@arguments)
{
if( $_ =mA=(.+)/is)
{
$ =$1;
}
}
shift(@arguments);
return @arguments;
}

sub maponeclass
{
########## OPENS AND FORMATS MOTIF ARRAY 1

open (MIN, $motiffile) 1| die; my @listl = ; close MIN;
my $listl =join(',', @listl ); $listl = lc($listl); $listl = sAn//g; @listl = split(/,/,
$listl);
@listl = revcomp(@listl);

########## OPENS AND FORMATS A RTF FILE FOR
COLOR, STRENGTH AND MATRIX OUTPUT

open (STRENGTHOUT, ">".$strengthoutputfile);
print STRENGTHOUT
"\{\\rtfl\\ansi\\ansicpgl252\\deff0\\deflang 033\{\\fonttbl\{\\f0\\fmodern\\fprql\\fcharset
0 Courier New\;\} \}\n\\viewkind4\\ucl\\pard\\f0\\fs$fontsize ";

open (OUT, ">".$outputfile);
print OUT
"\{\\rtfl\\ansi\\ansicpgl252\\deff0\\deflang 033\{\\fonttbl\ {\\f0\\fmodern\\fprql\\fcharset
0 Courier New\;\} \}\n\{\\colortbl
\;\\red255 \\green\\blue0\;\\red 28\green\\blue 128 \;\\red0\\green0\\blue25 \;\\red255\\g
reen0\\blue255\;\\red0\\greenl28\\bluel28\;\}\n\\viewkind4\\uc \\pard\\f0\\fs$fontsize ";










open (MATRIXOUT, ">".$matrixoutputfile);

########## maps motifs


foreach $rawline (@raw)
{
@rawlinearray = split(/=/, $rawline);
$annotation = $rawlinearray[0];
$prom = $rawlinearray[l];
$prom =~ s/\s//g;
$prom =~ tr/ACGT/acgt/;
my $seq = $prom;
$prom =~ s/a/a0,/g; $prom =~ s/t/t0,/g; $prom
s/g/g0,/g;
@prom = split(/,/, $prom);


s/c/c0,/g; $prom


degenerate motif matched against promoter
makes reverse comps


my @tobemapped = getrealmatches($seq, @listl);
@tobemapped = sort(@tobemapped);
my $prev = "not equal to $tobemapped[0]";
@tobemapped = grep($_ ne $prev && ($prev = $_, 1), @tobemapped);


Finds the position of the motif
increases the value of the base
use significant n's if true in config


my @nline;
my $motifbase;
my $position;
foreach $line (@tobemapped)
{
@nline = split(/,/, $line);
$position = $nline[l];
@nline = split(//, $line);
foreach $motifbase (@nline)


if($motifbase eq "a" II $motifbase eq "c" II $motifbase eq


"t" II $motifbase eq "g")


my $base = substr($prom[$position],0,1);
$prom[$position] =
$base.(substr($prom[$position], ,length($prom[$position]) 1) + 1);


##########
##########


##########
##########
##########








$position++;
}
}

########## capitalizes significant bases if set to true in config

if($capital eq "TRUE")
{
my $arraylength = scalar @prom;
for (my $i = 0; $i < $arraylength; $i++)
{
if( substr($prom[$i],1,length($prom[$i]) 1) >= 1)
{
my $ucase = uc(substr($prom[$i],0,1));
$prom[$i] =
$ucase.substr($prom[$i],l,length($prom[$i]) 1);
}
}
}

########## prints formatted promoter array (@prom) to a text file
using color
########## @prom carries bases, number of hits and position
information.

printrtf ($scoreprom, @prom);
}
printmatrix (@prommatrix);
}

sub maptwoclass
{
########## OPENS STRENGTH FILE FOR OUTPUT NOT REALLY USED
FOR TWOCLASS
open (STRENGTHOUT, ">".$strengthoutputfile);
print STRENGTHOUT
"\{\\rtfl\\ansi\\ansicpgl252\\deff0\\deflang 033\{\\fonttbl\{\\f0\\fmodern\\fprql\\fcharset
0 Courier New\;\} \}\n\\viewkind4\\ucl\\pard\\f0\\fs$fontsize ";

########## OPENS AND FORMATS MOTIF ARRAY 1

open (MIN, $motiffile); my @listl = ; close MIN;
my $listl =join(',', @listl ); $listl =- s/\n//g; @listl = split(/,/, $listl);
@listl = revcomp(@listl);

########## OPENS AND FORMATS MOTIF ARRAY 2










open (MIN, $motiffile2); my @list2
my $list2 =join( ',', @list2 ); $list2
@list2 = revcomp(@list2);


; close MIN;
s/\n//g; @list2 = split(/,/, $list2);


########## OPENS AND FORMATS A RTF FILE FOR OUTPUT

open (OUT, ">".$outputfile);
print OUT
"\{\\rtfl\\ansi\\ansicpgl252\\deff0\\deflang 033\{\\fonttbl\ {\\f0\\fmodern\\fprql\\fcharset
0 Courier New\;\} \}\n\{\\colortbl
\;\\red255 \\green\\blue0\;\\red 28\green\\blue 128 \;\\red0\\green0\\blue25 \;\\red255\\g
reen0\\blue255\;\\ed0\\reen28\\blue 128\;\} \n\\viewkind4\\uc \\pard\\f0\\fs$fontsize ";

########## maps motifs


foreach $rawline (@raw)
{
@rawlinearray = split(/=/, $rawline);
$annotation = $rawlinearray[0];
$prom = $rawlinearray[l];
$prom =~ s/\s//g;
$prom =~ tr/ACGT/acgt/;
my $seq = $prom;
$prom =~ s/a/a0,/g; $prom =~ s/t/t0,/g; $prom
s/g/g0,/g;
@prom = split(/,/, $prom);


s/c/cO,/g; $prom


degenerate motifl matched against promoter
makes reverse comps


my @tobemappedl = getrealmatches($seq, @listl);
@tobemappedl = sort(@tobemappedl);
my $prevl = "not equal to $tobemappedl[0]";
@tobemappedl = grep($_ ne $prevl && ($prevl = $_, 1),
@tobemappedl);


degenerate motif2 matched against promoter
makes reverse comps


my @tobemapped2 = getrealmatches($seq, @list2);
@tobemapped2 = sort(@tobemapped2);
my $prev2 = "not equal to $tobemapped2[0]";
@tobemapped2 = grep($_ ne $prev2 && ($prev2 = $_, 1),
@tobemapped2);


##########
##########


##########
##########








########## Finds the position of the motif
########## increases the value of the base
my linee;
my @nlinel;
my $motifbasel;
my positionn;
foreach $linel (@tobemappedl)
{
@nlinel = split(/,/, $linel);
positionl = $nlinel [1];
@nlinel = split(//, $linel);
foreach $motifbasel (@nlinel)
{
if($motifbasel eq "a" I| $motifbasel eq "c" || $motifbasel
eq "t" || $motifbasel eq "g")
{
my $basel = substr($prom[$positionl],O,1);
$prom[$positionl] = $basel.1;
}
positionnl+;
}
}
my linee;
my @nline2;
my $motifbase2;
my positionn;
foreach $line2 (@tobemapped2)
{
@nline2 = split(/,/, $line2);
position2 = $nline2[1];
@nline2 = split(//, $line2);
foreach $motifbase2 (@nline2)
{
if($motifbase2 eq "a" I| $motifbase2 eq "c" || $motifbase2
eq "t" || $motifbase2 eq "g" )
{
if(
substr($prom[$position2], 1,length($prom[$position2]) 1) == 0 )
{
my $base2 = substr($prom[$position2],0,1);
$prom[$position2] = $base2.3;
}
elsif(
substr($prom[$position2],1,length($prom[$position2]) 1) == 1 )
{
my $base2 = substr($prom[$position2],0,1);








$prom[$position2] = $base2.2;
}
}
positionn2+;
}
}

########## capitalizes significant bases if set to true in config

if($capital eq "TRUE")
{
my $arraylength = scalar @prom;
for (my $i = 0; $i < $arraylength; $i++)
{
if( substr($prom[$i],1,length($prom[$i]) 1) >= 1)
{
my $ucase = uc(substr($prom[$i],0,1));
$prom[$i] =
$ucase.substr($prom[$i],l,length($prom[$i]) 1);
}
}
}

########## prints formatted promoter array (@prom) to a text file
using color
########## @prom carries bases, number of hits and position
information.

printrtf ($scoreprom, @prom);
}
}

sub getrealmatches
{
my ($seq, @1)= @_;
my @tobemapped;
my $motif;
my $term = 0;
foreach $motif (@1)
{

my $m = makewildcards($motif);
while( $seq =- m/$m/g)
{
my $length = length($');
unshift(@tobemapped, $motif.",". $length);









}
}
return @tobemapped;
}

sub makewildcards
{
my ($m)= @_;
$m =- s/n/[actg]/g;
return "$m";
}

sub revcomp
{
my (@rc) @_;
my @fulllist;
my $revcomp;
foreach $line (@rc)
{
unshift(@fulllist, $line);
$revcomp = reverse $line;
$revcomp =- tr/acgt/tgca/;
unshift(@fulllist, $revcomp);
}
return @fulllist;
}

sub printrtf
{
my ($scoreprom, @prom)= @_;
buildprommatrix(@prom);
my $basescore = 0;
my $score = 0;
push (@prom, "z0");
my $arraylength = scalar @prom 1;
my $i = 0;
print OUT '\\b ",$annotation,'\bO\\par\n";
while ($i < $arraylength)
{
if( substr($prom[$i], $,length($prom[$i]) 1)
substr($prom[$i+ 1], 1,length($prom[$i+ 1]) 1))
{
print OUT substr($prom[$i],0,1);
}
elsif ( substr($prom[$i], ,length($prom[$i]) 1) !
substr($prom[$i+l],l,length($prom[$i+l]) 1))









{
if( substr($prom[$i+l],l,length($prom[$i+l]) 1) >= 5)
{
print OUT substr($prom[$i],0,1), '\\cf5 ";
}
elsif( substr($prom[$i+l], ,length($prom[$i+l]) 1) < 5)
{
print OUT substr($prom[$i],0,1), '\cf",
substr($prom[$i+l],l,length($prom[$i+l]) 1), ";
}
}
$i++;
}
print OUT '\\par\n";
if ($scoreprom eq "TRUE")
{
foreach $line (@prom)
{
$basescore = substr($line, 1,length($line) 1);
if ($basescore > 0)
{
$score = $score + $basescore;
}
}
print OUT "score \:", $score;
}
print OUT '\\par\n\\par\n";

#### base Strength file

my @annotationarray = split(As {3,3 }/, $annotation);
print STRENGTHOUT '\\b ",$annotationarray[0],'\\b0\\par\n";
$i=0;
my $zerocount = 0;
my @strengthmotif;
my $strengthmotif;
my $strengthmotifsize;
my $strengthline;
my $strengthmotifpos=0;
my $strengthmotifscore=0;
while ($i < $arraylength)
{
if (substr($prom[$i+1], ,length($prom[$i]) 1) >= $basethreashhold &
$zerocount >=2 | $i == $arraylength 1 )








if (scalar @strengthmotif >= 4)
{
for (my $j = 0; $j<2; $j++)
{
if(
substr($strengthmotif[0], 1,length($strengthmotif[0]) 1) <$basethreashhold)
{
shift @strengthmotif;
}
$strengthmotifsize = scalar @strengthmotif;

if ( substr($strengthmotif[$strengthmotifsize-
], 1 ,length($strengthmotif[$strengthmotifsize-1]) 1) <$basethreashhold)
{
pop @strengthmotif;
}
}
}
if (scalar @strengthmotif >= $minimummotiflength)
{
print STRENGTHOUT '\\par\n";
print STRENGTHOUT "", $strengthmotifpos + scalar
@strengthmotif- 1, '\t";
foreach $strengthline (@strengthmotif)
{
$strengthmotifscore = $strengthmotifscore +
substr($strengthline, 1,length($strengthline)- 1);
print STRENGTHOUT substr($strengthline,0,1);
#print STRENGTHOUT
substr($strengthline, 1,length($strengthline)-1);
}
print STRENGTHOUT '\t", $strengthmotifscore;
$strengthmotifscore = 0;
}
@strengthmotif="";
shift @strengthmotif;
$zerocount = 0;
}
if (substr($prom[$i], 1,length($prom[$i]) 1) >= $basethreashhold)
{
push( @strengthmotif,$prom[$i] );
$strengthmotifpos = $arraylength $i;
}
elsif( $i>=1 & substr($prom[$i-1],l,length ($prom[$i-1])-1) >=
$basethreashhold)
{








push( @strengthmotif,$prom[$i] );
}
elsif ( substr($prom[$i+l],l,length ($prom[$i+l])-1) >= $basethreashhold)
{
push( @strengthmotif,$prom[$i] );
}
elsif (substr($prom[$i], 1,length($prom[$i]) 1) < $basethreashhold)
{
$zerocount++;
}
$i++;
}
print STRENGTHOUT '\\par\n\\par\n";
}
sub buildprommatrix
{
my (@buildmatrix) @_;
my $buildmatrix =join( ", @buildmatrix );
$buildmatrix = lc($buildmatrix);
$buildmatrix =- tr/actg/>>>>/;
push( @prommatrix,$annotation. $buildmatrix );
}

sub printmatrix
{
my (@prommatrix)= @_;
my $prommatrix;
my $j=l;
while ( $prommatrix[0] =- m/>/g)
{
$j++;
}
for( my $i=0; $i<$j; $i++)
{
my $prommatrixline;
foreach $prommatrixline (@prommatrix)
{
my $prommatrixlinearray;
my @prommatrixlinearray = split(/>/, $prommatrixline);
print MATRIXOUT $prommatrixlinearray[$i],'\t";
}
print MATRIXOUT '"\n";
}



















APPENDIX E
REPRESSED ABA REGULATED GENES

17867_atll#F6F22.30#At2gl9670 putative arginine N-
methyltransferase;ICOORDS: 8449663 8447827
aacaggaaactgaaaaaatgtaatttcagaagcaaggcatccaaaagcaTAAGgCtTTtCaCACATATCaA
gctaaatcactaaaaccctagtcctcaattccagactaacaaatcgaacgaatccgaattatggaaaaatt
aagcaattcaaaacctaaAaaGCATTATCGCAGTGAaGAGatttcaaattagatctgagactcaaacacat
aaaaaaacaaactttaaacccctaaaaaccaaaatccaaaagaGCgAaGAAgaattttagaccaaaatatc
agaatcttaaacgaaataaaagaagaatgaatcggtgatcgtaagcaagccataccttttggccgtgctca
ccactcaccaCTaCcAGCgtcagaatcttttcctccaaatttCATctACActGCctTATAattacgacaat
aaccgaaccggatcggtataaaccaaaccggtttaactgtcaaaaacaattgaaccggcggctgaaATAGA
atCATACACATAaCATaAAGttCGTATCcAtcctaaaaccctaaatctttttgcttattcgaatctgttAG
CCATCCAaaacctctctCTtTGCCgcaaaaaa
score :138

12483_atl12484_g_at I#F13M22.19#At2g37690 putative
phosphoribosylaminoimidazole carboxylase;ICOORDS: 15754654 15758783
gGCATTtTCttccgtctcctcctttccgtctgataaaaatccgatctttggtgcgaccaaatcaaatCGTg
GAgCctaatcaaaaaccacatctcatcatacccggttcggttttcttaaccgaatcaaatTATacATAaaa
cggattaggcttctccggttttcgactgcgttttgtataaaacctctcaaaaaccttttcttctctccccg
tcagaccgatcttctctctcctcgccgccgcctccaaatccggaagaagcttttttccacctctattatcg
aacttGCttTATAaaaatctaacctTtGgACCTgactttccgacttctctctctgtCTAGCTgcttctctC
aCCCaGGtttgccttccTctCAAGTtGaagctgcggctttatTAGtTtAGctattgactctaagaatGTTA
CTTGaAtcCATtcACAtgagcttcttacgtcgttttcattggtttagctattgacgtTGaCTTGGgttttg
cttgagttcattctctattgtggtctaccatactttttCAtGTTACtTGattgcaTATttATAaaacggtg
ctaaaagtattccgttttgtAtaGCATTcTag
score :144

16412 s at I#F18E5.20#At4g21400 serine threonine protein kinase -
like protein;receptor protein kinase (IRK1), Ipomoea trifida,
gb:U209481COORDS: 10366207 10363716
atcGTTACtTGatttcgttttaAGGTaCaAtttgatagaagTATatATAtaTGtcAGTAagtttagaaaat
TgGATAaGagatttatattgagtccacataTATaaATAtgagagttgaaagatgatccactaatCATctAC
AgaagtaaaagaaattgtaaaataaATGGCTTAGACgaaattagatggttcctagttagtcttatattttC
tAGTaGGtttgaaaagtattcaaactctttagttttctttcctggtactTGTGgGgAaagaagTGAGCAAT
GAAcacttctaatttaaataacgttttcacaCCAAGTCGttagtttcATAttGTGagtctttgaatcTACT
aCCATCATGGACCccCacaacctCATaAAGactCCACTTGtGtttgCTaCTgGGttttataattgttattg
atcaacaagttttgttctttctgctgttggatattgattTAGtTCAaagaccatactgcttttgaataatt
ccaaGGTCAAGTTAGaTtgacttcttccatttttagactcgttattacattctggggttttctcatataga
ttacagaagaagaaaaaagCTtCaAGCgtcca
score :222

20345 atll#T15B16.5#At4g01700 putative chitinase;similar to peanut
type II chitinase, GenBank accession number X82329, E.C.
3.2.1.141COORDS: 732487 731413
acttcaaaaacaattctttgatagttttgttTTCATTGCtaatatttctctcaatttttttttattgtttt
tatttttgtatagaatagtattcaaaatgaaactataattcagtatggaacactttctgttttatttaata
tttacttcataatgaattatgtttcaaattttcaaagaataTATttATAtTTGaAtttggtTaGtACCTtc
caaagaagcttaatTcCcCACAttaaaattCaACTTGGtcttcatattaacattattcaattgaattcaga








70


ctgaaaTGGTtCACttTATctggtttgaaatatatagtcatGCATTtTCcaaatcaagaatgaatcttcttg
aaagaaaaaaaatgagaaaaaagaagaaggagaggaatgggtCTCcTaACaatgaaaaagagaataaaccc
ttaaATAaTTGCAAcTATataaaataaaaagtataaactccactaaaaaaagtcaaaaCGTaGAcCaaaac
aaaaaCAtGTAACTAAGtCgTccactactacatctctatGacTCTATataacaaCaAAGcCAaaaccacat
cTtCtCACAcaaaacacaaaaaagaacaaata
score :144

18228_atl #MDC8.16#At3gl6530 putative lectin;similar to lectin
SP:P02874 (Onobrychis viciifolia); contains Pfam profile: PF00139
legume lectins beta domainICOORDS: 5625415 5624585
ttattagcatgattacttttatctttcttttttatcaaaaaactttttaatgagaTTGCCaAacgtttttt
tgcaTcCaCGTAaactcctcTGaAcGTAaacttttccagagtgtatttatgtggacgtcagaaaaacgtat
tttgcaTcCACGTAACgAaaaTAGtTCAtataaactaaaagtggctAtGtCTTACAgaTAtgtacacgacg
catgaaattcgtaatgtttcgtagagaaaggggacaagaaaatataattaatcaattattattcaaaacaa
tttataatcgattttttttttttcttataattaattAtTACTcGaatcaattatattttatcaaaagaata
aaattaattTGTtcATGacatctgatgGGTCcAtTtTCACTGCTAGGccctttgtggtggcgtccgatctg
aatcgaatCCACTTGGCttaatttAtTtGACCatttttcgaagtataaatttgaatattcaaaCTcGATGG
CTTAcgttatcccttatataaaCACatTATcaaccTcCAAaTAttttcatcatcaatctttcaatcacaaa
aaacaacaacaccaaaaaaaaaacgcacacaa
score :168

18235_atl #T7N9.8#Atlg27020 unknown protein;ICOORDS: 9378756 -
9380540
ttttgaCTGAtCTAGTcGGacataaccaaacacccatttactgttattatcatctggaaacccaatgagaa
gataagagttttatgaagcaCaCaAGTGagaaacaaaaataaggaacatatttggttGgaTCTATcattaa
accaaAGaTAGATAAGAGGGaaacgaaactggagtaataatcaaaatacaatctaccatgTaGATAgGgaa
AaTACTcGgtggtttTAgTTGaAaaatgtaaaTAgTTGaAagttatcaaaattttctaatatttaaacacc
aaAgTtGACCAAGATGGcaaaaaggcaagaaacaagaaaatgaaagacaactatcgttatgatacaaAtTt
GACCaaaatgaagatatgataatgtttccaaaattctcgaTGCAAGTAGTTcGCTCAattatTtGATAaGa
atgatttcCCCACTAGCTCAataaaaaacaattaatccttgtacgcaagttatcatagtgaaGCATTGACA
cTGtgacCCTtGGCATTATAaattacgaatttaagataaagaagagaagaatcaagcaaaagttgtgtttc
tCATaAAGatctcaaagatattcctactacac
score :282

17957_at|l#F26B6.25#At2g23600 putative acetone-cyanohydrin
lyase;ICOORDS: 9992188 9990872
aaaattatgttttatggtttctccattgatttaaacttataggatacagatttataCACcaTATttcatga
taatctctatcaaaataatgaatttataagcaatcgagtaaaTACAAaTATAGgcGCCgaatggagatgta
cggtacaactgcgaattttttTAAGtCaTaagtaatttttaaaaaatgttatctctattttTaGTTACGAG
TAaTtgttttaatattatatcaaaaattaatattattgTCCAACTAGtgtAtTTGACCTGgtacatgaca
cggaacaaacatctaacaCaACTTGtCAAGTtGtatatcactgtaacatcagctttaattagaaagatact
tttattttGCATTtTaaTAagTGTAaagtttcttatccgcagagccaaaatgtcgcttttaaaaagaaaaa
ttccaaagttatttCTTtATGcaagaaaaaattcgaaattttaattatcacgtaaaatcataaaagaatct
caaaggaaagcgacaattttggaaaattcaacgataaGCttTATAatttttctacttatgagtggaTGTGT
GAAGAGtagggaaaaagtacgaaaggaaaata
score :168

20114_atll#FllI11.30#At4g34790 putative protein;auxin-induced
protein 10A, Glycine max., PIR:JQ1099|COORDS: 15559030 15559356
tcacacatattttaatataatactaaaagatttgttatttcaaaattatatatggaaaaCCATCTCGTtAC
tcatctttCgTCCcAGcAGtacGGCccCTAgtttgagagcgaaggaaaatatactttaaaagactttggaa
cttatTATacATActctTtCAAcTAtaacaatctccaatattgctaaaagTgGCTCAagaACTGGACCcaC
ttagtctgatCCAAGTCACatTATtcaccatttcgaggctctaatttaaTATaaATAcCTAaCTtttTATa
tATAtTTGCAatCATatACAagaaacttatgttatgtGCATTtTatttttatataaagaTGAGCTAAGcCA
TgTGAGCcAGATGGatgacggGAGGGCccaacagcccacaactttttttttttttttttagtaatcaaaat
atcgcatcgtactgtttttttaGCATTGAaagtcttttccttatcttgaagttacatactttaagttcact








71


acTATaaATAGAaaCttcttcCCCTCgTtgtcTatCAAGTcatttactttttaaaaaaatttacctaaaca
aaTtGATAaGttccaaacataaataagaagac
score :228

17425_atll#F6I18.290#At4g30800 ribosomal protein Sll -
like;ribosomal protein S11, Arabidopsis thaliana,PIR2:C355421COORDS:
13965709 13966885
tttCACgtTATccaacttTGagAGTActgtccttgtgttggccaataagtggttgaatatatggtaaaagt
ataaaagctcCCATCATGTTGGACCTACTAGCTttGCTATATATAaaATGtCCTGTCAtTGCataacgtac
cactttctgaatctgtttTAcgTGCCTATCaAtgatttgtctttaacacttaccgtaatattcTACAAtTA
agttcaagTGCCTAGattttctaaatgtgtataaagacattaTATAtgGCaaatgatcctttagataacca
tttcatttaaatcaagcattcgattcGCATTcATagataaaatactattcaccGGTCcAGTtctaactAGC
CATCCAGtctggttagaaagcaaaattgtttattttattttaggttcagttttcagatcggttttctaaca
ctgttttggttTGGTtCtCGTAtcggttcaaccggattggatagtCATaAAGaatttgaagtccttaatta
agcccaaaaCAAGTaGagcccaTATatATACTaaCAaagccgtgattagggtttttggttttcgtcagaag
atttcaaggtttagggtttctgcaattcaacc
score :252

20372 atll#T9A21.50#At4gl8200 putative
protein;W30DMY32F,W15DMY32FICOORDS: 9034215 9039562
agaagaccaagagcttcctcagtctaaagaagaagaagaacaaaaaCAAGTaGataccattcatGTCcAaG
CtTAGGCAAAGatccaagaaaaccaaaagcttttgtaatttgaaccagtttaatctaaaataaaaatgagt
TATatATAttgTAGGCAcTACAAaTAaaattggtaagtgtctcaaggtctcacagtCTTcATGtatctttg
ttgacttctgaatCCAAGtCAAGTgtacttATGgCCTtcccgtcttgttgtatcgttgttgacttccaaat
ccacaaaatagatTACTaaCAtaacaaaaaCaAGTAaTaattttttttTttCAAGTGGACgGaCACTTGAG
CcAataagattagaataagatacaacaatatcaaGcgTCTATatatctgttttcaccaaactcagcctact
ttaacaacgtgtgttataagaagttttcttaaaaccgccaattaaaaggtttcaataattCaAAGtCAacg
gtaaacatctttacagaaatatgatgcacttgttcccaccaaccttcaTATatATAaaagctttaccaaaa
ctctgttTGTGgGaAgtcctcttagagtggtc
score :174

12430_atl #F13M14.20#At3gl0520 class 2 non-symbiotic
hemoglobin;identical to class 2 non-symbiotic hemoglobin GB:AAB82770
(Arabidopsis thaliana)ICOORDS: 3277772 3276523
cgagcgaacagctcaatataataggaattgtcagcttctcttaCATgAAGctgtgagttttttctttttct
ttttgtgtcttactaaatatttttttattaactactttttaaGCgAtGAAttgaatttattgtcactatt
tttctttcttcttaatTATAcaGCtgatttTGTGTGaAAGTAgaaaagaacaaatgattgaagctatgCgg
GATGGagattttactacgcagaaGAcAATGCaagtttttatttatctttgtttgttccttttaatcttaac
tTAtTTGTAtcaAtCTAaCTcatgtattatcctacgtctatctagactgatcGGACgGaCATAatGTGTAT
ccATAtTTCtTgGCTACGcgtgtccaccttttagagaCTaTGCCttTAggtagtagatgttttactacaaa
ataaacatatttagtcaaataaaataaaatttgagagaatcttctacataagTAGCTCAcagacccaacca
aaggaccattgaataccaTATatATATAGAtaCacagacatataaacacacaaatattcgtgtttttttca
aactgtgagagaaaaagaaagagagaaagaga
score :138

19815_at I#F7A19.31#Atlgl4210 putative ribonuclease;contains
Ribonuclease T2 family histidine protein motiflCOORDS: 4857937 -
4856898
tatcttttgtaacttctcttctcctttttagagtccaaaccaaattcatggggattctaaaatctaaaaTt
CAAaTAagataaaccttcgtatattggtaaaattccTAATGCatcatacgtttcataaaacttcagcccta
atgGcTAATGCataagtttaacagaatttggtcagtcacggtacaatagttttttaATAatGTGctaaagt
attattTAGtTtAGtattATAGAaaCaataatcggaatgtagtagaatcgtttaatctccttTtGAGGGCA
TTATCggatcccagttcaagccaatgttaccaAgaGCATTgTTGcATGGCATTTATctgttacaaaaGTCc
ATcCgattaagtttcacaaagagaattaatgacgcagcgtaatCATtAAGgTTGCCTATAtttttttaGaC
AAGTGGaGaACCAAGTtGagaataataaattattaaacaaattaagaaaatattctttatatattcatttt
tggaaacaattgattgtgaggtacctttcccttttTACAtaTAgattgggaaataaattcaaggaaacaaa
aaataatcaagaaacacagagagcaaaacaca








72


score :210

16150_atl #T1P17.70#At4gl2480 pEARLI 1;|COORDS: 6371358 6370852
atttttttccgaggaagttcgagaattttgttagcatgtcttcctCTtCTgGGaccagtttgtgataaaac
acatcctctcggaaaagagtgtagagcacaacttcctctcgaatgtactaagaCCACTaGactaacgtATA
GAagCTctCAAGTaaaaTgGcTACGatccaaagagaatcTGaAgGTAtgtGCAaTGAGGtCATgaaCCATC
ATGATGGtggtgataataacagattaAcaGCATTGACAatttgaaaataatagtaatatgaacgcacaaat
catatttatttcttaaatagaaatgttttacaaaaacgattaatgtctaaattaattcaAGGTTCtACGaa
tctaCATaAAGgaaagtagaaaggtcagaattTGTAtATGTAGATAAGGGCAaaTAaataaataaacagat
attttgtagaatTGCAAaTATatgtgaataatcaaatataatagaaCAAGTTGGTCCTCtTcACatccttc
taaaaccctataagTaCcCACAccctctcttcatatatcttcatctctcactctctCaAAGaCActgaata
aatccttaaaacaaacttttgaaagaaaaaaa
score :264

16610_s_at I#F14D16.20#Atlgl9050 response regulator 5,
putative;similar to response regulator 5 GI:3953599 from (Arabidopsis
thaliana)ICOORDS: 6579078 6577919
aggaaaatacaaaatacaatgttgaagttaaaatgaattagaagtagattaatTGAGCaAtaaccaccgg
tttTAGaTCAttgaTtGATATGtAAGTAaAATGCaaagTAGtaGCCGACTTGGCAAAGAAaaagaaaaaag
aagagagaggagggtaaaactaaaataaaattactaaaagcagaaggaaagttcCAGGTCACGAGGGagta
gtcgctgtcggattgaaagaaacagaccaaacaaagatctttgaaatcttaaaaaagattcggtgcccaaa
gattttgacctttgaagaccaatttgattatttgtgttttctgacacttcccaaacacataaCATcaACAC
TTGACCcatttttaagattttgtttcctcttcaaagatcttaccccgcaagatccgatcttccaaatctct
ctctatctactaTACtAGTAcTtattaaaccccaccacctctcccttcttatttatgtcacCTaAcCTAaC
TtcttcttctttcattttatcacattcatttcctttgtcttcaatccaaagctttcctCTtCTTGGaacca
atctgctctcttttttattctgagtttgacaa
score :192

18928_atll#F18019.1#At2g43620 putative endochitinase;|COORDS:
18043568 18042497
acgcatgacTAtTTGgAaattgcaataattgagttggatttttctaattttggttgattttgattTATaaA
TAGAaaCattttggcttcactagtcatttTtCtCACAattccatacaatttttgttaaaaatcaaagTAAG
aCtTtaaaagaacgttcTaAATGCtaTattAgTtGACCaaaAAaAATGCtaTattagctaacaatatcgtt
TGAGCTAattaacaaaaACTTGgaActattcaatagaaaaatctcaaacgttTGAaCTAAtCTAaacttga
ttatctcaatcaagtttttatgagaatgattttCaTCCAAGTAACTTGgCtctttaaaattttgattacat
attcgtttttgatCTGAtCTAtgaCCgaCATGgaatttCTCaTaACgacaagagaaaaaactgTGTCATTG
ACttttgttaagtggtacaaagTGGCATTGACTTtGactcagaaaaaGCCAatCAataatcgtgaaagatg
tctaacactgatcaatatttcaatttGAATaGACCaaatttacacTATaaATACATcaACAcaccttcttc
atttcTtCaCACAacaccctcCATacACAaaa
score :282

13103 atl #T3A5.120#At3g50740 UTP-glucose glucosyltransferase like
protein;UTP-glucose glucosyltransferase, Manihot esculenta,
PIR:S41951|COORDS: 18733811 18732375
tatttatctaacatttttatCTCaTaACtCGTATCgAtgATGgCCTGtttcgaatttgtttagcgatgtgt
ttgtcagtttaataagtacaagaaaCTcAtCTAagaatCTTTATGtATAcatgacatgacatactatatat
gatGCATTATGGCCTATGAGCTAGTAaTtcacataTATACATATACAAGTcGGaagcttaccattattatc
ttctGGTCgACGCATTcAacatttttatcattcacgcgacaacttatgtGGTCaATTCttttctaTATaaA
TAaataCACgaTATggaTgCAAtTAacttagaTACATtTACActtccatgtttatatttacttttcttgta
aaacgattaaagctggtatatgatatcaaaaataaaacaaaaaaaatgaaagtgataTATAGATAAGaCtT
taaagaTACTTaCAaatgagttataaagtcttttGgtTCTATatattcaactcACTTGCAAcTATgagaga
gtaaaccgacacaataacaacaacaacaacaacaaaaaaaaaaaaaaaaaaggccaacgttcagTGAtCTA
gGCtAtGAAgattacaaaaccacatgtggcca
score :258








73


16465_at116916_s_at I#T22P11.80#At5g02490 dnaK-type molecular
chaperone hsc70.1 like;dnaK-type molecular chaperone hsc70.1,
Arabidopsis thaliana, PIR:S463021COORDS: 552563 550294
aagctacggaTAtTTGaAaCATatACAaaacagtgaaatcagtaggactgtgaattttcttatcctcaaag
cctttgtagtgtaacagacaaatcaaataaatagcttcgcttatacCTaTGCCcaatcagtaaaaagaaaa
gcctcacctcggcccacagcCAaTGACAgcctttagtAtGtCTTActggcccattactctactacgaaatc
ttacgTGtCAAGTAaTaaataAaTcGACCccacatatgaacgtTATAgaGCagttttccaattttcaacac
gcgcttttcagtttttggacaatatcactttttcccttttcaaaatttggacaatatcagttttttcctct
ttttttccaggaacATAtTTGCCTAGGatttggaaaagGaaTCTATcaCTtAgCTAccgtcagatcaacct
caaacataatccaacggtTATaaATAattacagaacattctggaaactcaagtctcaGCctTATAaaaaac
aagacttcaacgctctttcttcacaaacctaaaaACCTAGCTctattctttctctttgctgctcttaaact
ttTTCtTaGCtttttctatcttcgtgtgataa
score :162

19363_at I#F14N22.12#At2g42610 unknown protein;|COORDS: 17696547 -
17697080
acaaatccacaacgttttttcacatcctTATatATAtgtgTGTGtGtAaatattgcttcgtccaacaagaa
gagtattgtttgtagttttttaatcaatagaaaatgttatcatacaacccacaaTGAGCTAaCTaacacat
tTccCAAGTataaaatccctttagaaaactatatatgatcttgagctgaagTcCAAaTAgtctctgactct
ctgaaatcgatgtaaGCATTGAatcttctaaccattataaagaCATatACAgtatccaattttgaaatcgt
ggtggagaactcaaacttggtgatctttcaacaaaaactatgtaaatctttttagccaatgaatttgatta
ataaatactgatacgtaatTAGTTAAGAGagagagaggaagaaagaaaacaaaattttgaaagggcccata
acATGtCCTctctaacgtcaacatcatctgctactatTACTTtCAccctctcatcaaactcactcaacctt
tcatctttctttcttattacataattcttcagatccataaaacaaattgatcccttaaccttcttcaccaa
catctctcttacactcaacaagagatcatatc
score :90

14779_at I#F23F1.7#At2g30010 hypothetical protein;|COORDS: 12754375
- 12757768
aaattggtaatttCATTGACCTAGCTTtGACtgagtatcaccaCAGTGACGTGgctttttcacaattataa
aaaaaatgatcatattctagaCATGgAGGTAaagtaaaGCaAaGAAtctGCATTtTTgtattttctctaag
gaacaattacaaaatcttgataccttgtgagatCATaAAGaaggtcgttaactcgttatctattaTATttA
TAtaccttgaattcTtGATACGtttgttttacctttttttcTATtgATAatacgtttgaagtttgttttac
tgttacacattgattttatatgtgtttcccaaaaccgtttaagtttcagagtccggagcagtaactcgtaa
ttcactAaTACTgGTAttaaatgttgttatttatgggtaaagcataaatacttaaaTtCAAtTAttataag
aaaataaattaaaACccGACCccatattttttcgtccttttaaaagatggagtcctcgaaaacattacgtat
tagtaagttccatgcaaaggacaaacatataataacaacaacttctcttctttactcgtttctctctctct
cttttcttttcttttCTtCaAGCgcttcttta
score :150

20698 atI #T3G21.10#At2g40330 unknown protein;ICOORDS: 16794247 -
16793720
aaaacGTCcATtCtaattatAgTAATGCttgtgTgGtTACGacaacatgctcattcattgtaataaactgg
tgaaaTATAtaGCacCATTGACAaatagacagcctcagtaacaaaGCaAgGAAaaacaaaattaaaattta
ataaccggttatattaaatatcGaGaACCAgtccgtagaatttcatgggaaaccggGcCGGTCCttgatat
aaagaaagagagagacgtttcttaactgaagcagtacacaaactccaaacaaaTCCCAAGTGGacaaccga
agaacccacacaAaCTAaCTctctttcctaatccacATACTTGCATTtTTATatATAaacactctgtcctt
atatagttctgtatatTACATGTAaatatctctcattaatacaacctcacgaagaaaaccatttgttttct
tagagagaGCcAaGAAtattaaaagagatatggaaaaagatttgctttAATAATGCCaAcGTCgATaCag
tttcagagatcctccaccgccgcagaagcagccaacgccaccgtaagaaactatccccaccaccatcagaa
acAGGTtCaAaaagtgagcctcacgcgcggga
score :216

15531 i at115532 r atl #T22A6.170#At4g24340 putative protein;storage
protein Populus deltoides, PIR2:S315801COORDS: 11571982 11573646
atgtagagagattttagaaaaattgttaaatgtcaaatatttaattcgaagattttttttctaattattaa
aaacaaatattaaaatCCAGTGGCAaccattgtaaataagtCTcAaCTAcagaatttatttcacaaaatgg








74


ctgcaaaaatgTATatATAgatgtttaaagttaatttaatagccaaattaatatcttctttcTATccATAa
aatcataccctaaggtttactgagtttattaggatgtatcaattaaagcaaaacatcccttctaattaatc
attttgagtgactgattaataaggAGGaCATgttaaTAtaTGTAgaaGCATTAAaaCAcGATGGtctTAAG
GCATTAAGctattaagaatttttaaaaagtttatgcatggtctcaCAcTGACgaataaaatacgttatcat
ctttgatcacgaataaaaaatattaatgaaacgagaatcatgtgtaaagaagaaagttcatactagtttcg
gctcatgtgagccttgtagaggttTGaCTTGGAtGCTATATATATGTAgacaTGAGCTAGGCATTTATctc
aaaccaaccaaccaatatatcaGAtGGCTtcc
score :210

18396_at |#T31E10.2#At2g34640 unknown protein;ICOORDS: 14532888 -
14530604
atcacatattgatttTAtTTGaAAGTATATcaATAtaataaactgtttagaagatgttTTGCCaAaaaaga
aaaagatgttttgttCTtCTTGGAaGtgtttgttaaagcccaagagagaCTtCaAGCtcacaatttctcaa
agatccattAaTACTgGgccttAGtTAGaTataCCAAGcCcatatactttCTCtTcACaaattcccgttct
aataccggcgcgtcgtgaaatctagagattttgccgtcaccagaatatgttaataaacgcaggaggagata
aatggataaacccaaaaacTGaCAAGTgaaaaccttttgaagccaaaacccccttcaaaatcaagctccat
gatagaaaaCCCTCtTcctacataatctcctggtaaaacaaaaaggttagtctttttctcctcaacttcga
gtgtgaagcgtttgtAAaAATGCccaaaaaaaTcGtACCTtgttatggacaatttgttgtatcaggttctt
cttatctgggttttatgttttgtactagtttgaaactgattctgagcaTAaTTGaAtcggttgttttTGTa
aATGATAcaGTGagaagtagaaagagtgtaaa
score :132

15137 s atll#F16B22.35#At2g44790 phytocyanin;identical to
GB:U904281COORDS: 18411773 18410723
tcaaatactctgtttTAAaGGCATTAAaaATAacTGCgtttcagaaaaatattgaaattttagctgatctt
ttgctacaaatttaaggaatCTTGGCACCTgcaGaaTCTATaacatgttCATtAAGTAATGCaaTagttaT
ACAAtTAtacatTAtTTGcAtcatacttatattatagtgatattaacaaacccatgttctcagcacacttt
tacgtagaaaaacataaaaacccaaataggaagaaGCCActCAtaaggataatgggtttatataattcaca
GCaAaGAAAGCCaTCgAactattcgattaattatccattctttttttttttagttTGaAtGTAtaagaaca
aagagTtGtTACGcatcaTGaCAATGtCTTAGaaaacaaaagaaatgaataaaaaagtaaaacgaaaaata
aaaagtgaggatgaagttgttgaatgagttggcgaggcggcgactttttcatacattccatttacttaatt
cctaaagtccTtCtCACAtctctttgttatataatgacaccataaccatttcttCTCtTcACaatctttac
aagaatatctctcttctacagtaaacaaaaaa
score :156

12777 i at112778 r at |#F15I1.8#Atlg54000 myrosinase-associated
protein, putative;similar to myrosinase-associated protein GI:1769967
from (Brassica napus) COORDS: 19429049 19427232
agtgttacCATttACACccTATatttccgtgaaatttgtatgtttaattctctttctTGCCtcTAAATGCa
cgatcttttcagcacttcgaagaggttaaagtgacttttacccgatgagaagggaatTGtgTGGCAgaTAg
aattGCTAGGgaatctctttcttttctgaattatgatcccaaattgtattcTATAttGCcggattgggtta
aagctTAtgTGTAttcagactcttgtgTAaTTGTAagttgttgttgaaccaaaaaaaaaaaagcacgatct
ttctTGGTACtCGTAgtaacaatagatattttgttgagaattaaaaataaagtagaagagaaaCcAGTAAT
GCgtcagacgtgattaagcgaactatataatataattgatcgataatcgtttgctagtgcgttaaaTAggT
GTAtagtaaaatgaacctcgcgctaCACgtTATTAtTTGTAtagtaaaaTGAaCTAttattTTCATTGCCA
AattgtAgTACTTGCAATATAaATAtaacgaatcGgTCTACGAGCtAAGgCAcaaaagcaatcttcttctc
ttttcacagttctgtttctccctctctctaaa
score :216

15162_atl #F7018.21#At3g04720 hevein-like protein precursor (PR-
4);identical to hevein-like protein precursor GB:P43082 (Arabidopsis
thaliana), similar to wound-induced protein (WIN2) precursor GB:P09762
(Solanum tuberosum); Pfam HMM hit: chitinbinding proteins|COORDS:
1286539 1285699
tcatTAtcTGTAtCgcGATGGggaagatacgcaatgcttagttataaaccgtccgatcagaTGaCTTtGac
CAgTGACcttccgttcaaagaatatcgacggttcttagacacatttttcaccggagtctcaccctcaCAGG
tCATcttaatTtCtCGTAaaagagcatattacattgatattccaaattataccacttcataatttattttt








75


atttTcCAAaTAaaattaatttattaaaacttcagcactccaattctgtagagaaattttaagaggagaaa
cccaattttcttttattttgtaacgaatctcttttaaaaaatctaacaacagaattatatattttcatta
ttctatcgaatccaacgaaatttcaataTACAgaTATctATATACAAaTAtttaaCCtaCATGTGtGtAtc
atatggtttaagCACGTTACcTGTAACgtatgtaattaaatcaattgcgtagaatcacaagaaaaacatta
ttatataattaactattgatcctataattttattattttcaTATatATAtagataTATAGATAGATGCATT
AGACCaccaagaaaacaAaGaCTTATCgAtca
score :180

19424_atll#F3I6.22#Atlg24280 glucose-6-phosphate 1-
dehydrogenase;strongly similar to GB:S71245, location of EST gb Z35060
and gblT045911COORDS: 8609492 8612380
ttgtaatggaaaaatatttagcaaatagattataaacttacatgaGaCAAGTATaaATAattattataaac
ttattaagtttaagatcaaggcttttgtgcaatgtatcaatgaatgttagatgtgatatgatgaaagcaat
gttttaaacacatacataGTCAtTGatcggaatgtgtgttattAgaAATGCaTGCCTAaGccgatagggtt
atctatgtttgGtCTTGGacattatagccaaatttcgaatctaattcttccaatatatatttttttttttt
tgcttagGGCcACTACTAGTAtTgCtTATCaAttttaagagctcatGAaAATGCaacaatatagTAgTTGc
AaatccttgtttcaagagaaatcaaagggCCACTTGtGAattgaataataataATAtTTGCAAaTAacctt
tcactaaaccataccaacaaaaccacacagatTtGGCAAAGACAtaacctttgggagacgtgaaaaggctc
aaaatttgacaattgtccttacaaatTcGCTCAttagtgcaattgtgagatttgtttgcatccaaatccaa
ttcataactcacactcgtctcaaattcgaaaa
score :168

16614 s at119456 i atll#F3I6.19#Atlg24260 putative floral homeotic
protein, AGL9;strongly similar to GB:022456, MADS-box protein, Location
of EST gb H370531COORDS: 8595859 8593787
ggttttttgtaggattTtCAAtTAttaatctctataattcgaTGAaCTAagtaaaaaagcatcaaactTTC
TTGGCAgaatcacatttttctCTaAaCTAaatatggactgaaattgaaaaattaaaCCACTAGCTAGaATA
aaGTGttggtgagagtggaactctaatttctctcctttactaatTATgtATAaacacAAaAATGCaccaaa
tttttaggtttgaaaatatctaagcaTgGATAgGgtaattaacattttttctttcaattttgcaaTAtTTG
aAtaaatcctAtGAGGGtcttTGGTaCaCaaTAaTTGgAgggtataTAGtTgAGtcTGagAGTAtattaga
aagagaataTTtCAAGTAaTgaagctgacatgttTAtaTGTActttgagagaagtgttgtgagatttgtac
aaaTGTAtATGTAcactttaaaaaGCaaTATAAGaTAGaTaaaaaaaatataaagaaaaaaagaaagaaag
aaagaaagaaaggagagggctcaTATatATAtagaattgcttGCaAgGAAagagagagagagagattgaga
tatcttttgggagaggagaaagaaaaagaaaa
score :174

12523_atl #F10D13.18#Atlg69530 expansion (At-EXPl);identical to
expansion (At-EXP1) (Arabidopsis thaliana) GI:10417021COORDS: 25353446 -
25354463
aatatattatcccaaggtaaagttggTGTttATGtgtgtTtGAaGGCgcctgaaaaaaATAGAaaCcacac
gactcgtctTGtAtGTAGCTCAcatgTGTCTTTGCCtctttgctttttcaTGCCatTAaatattaaatctt
cagcggttttTGGTcCcCaacaataaaaaatatttttcttactatattctaTAGtTCAttaatattacatt
gaataaacctaatgttttttaatatgaaaaaccagaaaaataaaAttGCATTcggagttttGGTCttGTtt
tgaaaagaaaagaaaaaagtggaAGCTAGTTGCATTAACAttaaaCAgTGgCAaaaacgtaaatcattcac
tcactaagtttttctataaatTGAaCTActcCCCTCcTctgctttttccaattctaaaccaaacaacagat
tctcataatcatctcttcttttttcCTCtTtACgaaaagaagaaagatcaacCTTCCAAGTAaTcatttt
ctttctctctctcaacacaccaattcactagttttagcttcacaaaatgTGAtCTAaCTtcatttaccta
tatgcaggtttacacaaaaagaaaaaagaacg
score :186

13538_atll#F21C20.130#At4g20780 calcium-binding protein -
like;calcium-binding protein, Solanum tuberosum, gb:L028301COORDS:
10098382 10097807
tcgattaaaaattcggtaactactacgacGTgACTTGtCtcttctccgttgaaaAGCCATCGACAgttatg
tatgagtgtatcatttatCaACTTGaaAgcaatttcagtttttaaaataaTcCAAaTActaataaaccaga
aattaacccatcatcAaaGCATTAATaatcttcattaccatttgaaaccgaaacttgtatcacaattaaat
aacgatcataaaattagacgtttttttttctctatctccattttCCcGGGgGaaaaaagatcggatttttg











ttctaaTGTCAtTGtCAaaatattattcgacagatcacaaaaaaggtgaccgttttttttttttttttttt
TtGGCAAaaaaaaaggtgacctttttatttTGaCAtTGatAGTAtgaattatggggtctgatGCTaGaAGt
tcttaCATGgcGGGCCAgaCAAAGTCAATGCcggcaacaacggactccTACAcaTAaattttttaattcac
caattgtttctcttctataattAGCCCTCaTcACtcaatttctcaatctcagattataattccacaaacga
acaagaagaagaagaagaacacaagataacaa
score :222

16522 atll#MGI19.6#At5g63850 amino acid transporter AAP4 (pir
IS51169);ICOORDS: 24845320 24847200
catgggattaatggatgctaaaagaataaaaCACTcGaGaagaagagTACACGTAaaatcactcacacaca
aagaatgagagatgggagaagaatactacaaagtgtttgttttttttaaatcggctcatatataaatttgg
gcgatactctacTaCcCACAtTATtcttttaaacaatgatataaaTGTGtATGGCCTAtGtagccgaactt
tttacatatcTGaCCTGTATatATAtaataattatttatcaaagaaattagagaaagattcttccacttTt
GAtGGCTTAtaaaagatgatgacccaaaacactcAaAATGCagcagcaactgctctctcaccacccatctc
ctcatttctcttttGCATTtcTttctttttttctgcAttGCATTcTTttgaggggtttaattttctgcata
gctttgtctaatctcttagagctcaataagagaAGGTaCtAtaactgatctctCCCTCTTtCAAGTttttt
ttttgtttggtttatagaaaactataaccgccattatccgttagttttaaccgttttttgaaactagtgaa
tcCTCaTaACtttggtttctcactaaaaacag
score :144

15859_at I#T17D12.13#At2g28570 unknown protein;|COORDS: 12190606 -
12190842
acacacacacacgTATtcATAagcttcttagaacacCtCcAGTGGtgtttcttAaaGCATTcTTgaatatt
catgacaaaagtacaaggtattataaaaTATtaATAaaatataATAtgTGCCccTAgaaacCATTGACCCA
cCATGCATTcTgaataaaaccgaaccttaccccatttcTGAGCaAtttttccacagaagaccaattccta
atcatgtttaataattaattaatttggattttggAgCTAtCTttactctTTCTTTGCCTATAaattaaact
ccatctctcttcttatccttctacttttctttcctcttctcagcaaaaaaacatttttagtaacatttgtt
ctaataagaaatttgtctcttgctttaacaagaagcaagactagcctTAtcTGTAcctttctttctgaatt
aaagaaccctttaagattcatatgattttcttaattcttttttaTAtaTGTAgatttgtttcttaaaCACt
aTATttatgttttacaataatatgttttatatgttttcaacttctttgtagatttgcttctTaCaCACActa
atgttttatctttgcagagaaACTTGaCAttt
score :168

17963_at I#T1P17.60#At4gl2470 pEARLI 1-like protein;Arabidopsis
thaliana pEARLI 1 mRNA, PID:g8717801COORDS: 6366337 6365852
taatttatGCATTAAagaatcttaattaaattCATaaACAtTACAtaTAtattaatcatatattaTATacA
TAtcacaaattttcgagtaagtttctaaatactttgggtgtgtctccatggccgggctcagcttgattata
actacactaggaatttattatAtTACTcGACCtgacgtATAGAAGCCGTCAAGTaaaagaGCtAcGAAcca
agataaatctgaatatcttgtgcatccacgaactCATGATGGtgttgataataAcaGCATTGACAACTTGa
CAttaatacttatATgAATGCACGTATatATAaagtattttcttaattttaaaatggtttacaaaaaacaa
aaTcCAAaTAatttcaAGGTTCtACGTACCTAGCTACaTgCAtataggaaaaatggtcagaattTATatAT
ATATatATATATatATACAAaTAatTACAAaTATctATAaataaattttaaactaaatCAAGTTGGTCCtC
ttcccatccttctaaaatccTATaaATAccaacatcttctcttCaTATCtAtttattcaataacccttaca
acaccgaatataactttgaaaaaaaaaacaaa
score :360

13587_at I#F21M11.2#Atlg04040 unknown protein;Similar to acid
phosphatase; Location of ESTs 110C2T7 gb T42036, and 110C2XP,
gblAI1002451COORDS: 1043820 1042565
tcatgatgtacatatttcaaatcctaaaaattaatttccaacttcatttgtTAaTTGcAatgtgtatacta
tttgTGCCgaTAgtcaaatacatgaTACATcTACAattcTACaTaCAttatgtaattatacaCtTtGGCAA
aTATAtcGCcactaatcgaggaaaacaaTtGCTCAtaccatcatactttttttattttttTtGATAaGttc
atacaatcataCTTCTTGGCAAGTTGGCAgtTAattatctttaatttatccaaatcttttagtttcacccg
catattaagttaaactttcataattttgcctcccataaaatgttggaaAGCTAGaTCATAtcGTGcaTGaA
tGTAtagtaaacaacTATatATAtttttaaaaatactataatgtataattacaaccaaattattattttat
ttatgaaaaaaataatgaaacaccaaaaactttgattcagatttgtgtattaattatttttctttattgtt








77



agggctgccagtttgactcttctacttataaaagcCTCtTTACaCACAaacaCATacACACTTGtGAtcaa
aatctcaagatccaaaactcttctttcaaaca
score :204

16818_at I#F18E5.30#At4g21410 serine threonine kinase like
protein;ICOORDS: 10369523 10366961
atttctttgatGCATTAACAaaGCATTcAgatttgaggtctcattcagcgagtaagatCCAAGaCATaAAG
atcttccacaatccgcacctcataagtataaaTtCAAaTAcagcAgGAGGGatgtgaaaaaaaaaaggagt
ctCTaGGATGtCCTtaagtagtttatttcccattaatggaagattagttttagtttCtAGTtGGttaaagt
ttcagactttgagttcttattggtcctggagagaaaagaagtgagaaatataaacttattttaaatgtttt
CttGTAACttataacttcttatTtCaCACAcCGTTGACCTGaTTCgTtGCaccaactcctaaatttttata
ttgaaatttttatatatcatCATGcaGGacccaacaagcttacaatGgCtGTCCACTTGtGtctgtctgtt
actgaaatttataattgtttttgatcagCaACTTGtgttctttcggctgTGAGCTAGtTtaaagaacacac
ggcttttttttaaGGTCaAaTtaggtcgacatttaccatataaagaCTCaTtACaCACTgGtGggtttctc
aTAGaTtAGagaaaaaaagCTtCaAGCtttca
score :210

18298_atll#T25K17.10#At4g26200 1-aminocyclopropane-l-carboxylate
synthase -like protein;ACC synthase Malus domestic, U738161COORDS:
12239804 12241443
taacaatattgtattagtgaaggattattaattacgagactattatctaaAacGCATTAgTgacccttttg
gatttaaatcgTAGcaGCCgactaaaccaaaatatttagaaagttttccaaaaacaaacaacattaaaaca
catttaaactcgtacacaataATAGAaaCTaAaCTAtttttagcagactaaagtctaaacaacattacgtt
taaaaattgataatcagtgttttatagaactggcgatctcctaaagatgaaccaCCgAGaAGtttcaagat
aaaaacaaaaataaaaataGCCAaaCActCTTATCTAttcatccatttgCAGGTCAATGACACcgTATttt
tccacaatcgaatcttccttattctttacacaaacttcttctcTGtATGTAAGTAaactaatattaaaatt
gaACTTGtCAaagactttCctCTTGGTCacGTctGCTATATATActccttCATtCACACAaatgtttaact
ccaacacacttctctttctctcaacctctctccctctctctctccgattgtgtttttagagctGTcACGTG
tcacgtgtcAGGTtCgAagctctaagaaaaaa
score :174

16298_atil#T805.60#At4g21850 putative protein;CGI-131 protein, Homo
sapiens, AF1518891COORDS: 10556637 10555097
catatataatttgaaaaacatgtttttgtttagtgtctaagtTATaTATActGCtatactcaaattcattc
attcaaaacgtaTGAaCTAaagttttaatgtCACatTATattttctactagaatataatgtcaaaggaaag
gaaaaTAAGgCtTtagtcaacaaaaGGCttCTACATGTAACGAGacaaatatgaaataagagaagagaaaa
agcgTGCCccTATATttATAaaaatattacacaacctgacaaccatcaaagTcCaCACAGGcCAcaGcCTT
GGtaaggttttttttttgtcacctttctataatATAcaTGCgtcaattagatctaGaTCtACGtgtgagtt
gTAaTTGTAacCATatACAatcatttttttattattatggtttcttaggtgttaggaagattcatattgtc
acatttcagacgaaatagaataaaatatgatTGTCtATGCgtatttaaacccccctttgtagttcttctac
tcactcacaaaattaattcttcttctcttttcttttcttaaaagagctcaaaacACacGACCatctttccg
ctgtcTTCtTaGCttCTTATCTAccgtaatca
score :192

17485 s atll#d14170c#At4g16260 beta-1,3-glucanase class I
precursor;ICOORDS: 8165944 8164797
agttCATaAAGtcaaaaaatcagtaaccaattcatgagtggaattatacttataCAaTGACAcTGtttag
gagCTTATCTAaattgcacaagttttGTTACACGTAaaacacaaatatgccaaattagacatgtttctcgt
cattctaaaaCATtaACAaatatttaCATAGACgCagtcaaataaatcaaatttcgtatgataaaaaaaaa
acaaaaaaaatagcaaggttgattTGTCAATGCtggcaaaaataagcaATGtCCTaattTctCAAGTTGGa
aagtgacatttctctacccattttttgaaaagatattctcttttgcatataagatttgataaattgaacgc
ATAtTTGCActcaagctaacgttatgtacttgAtTACTTGACCcgactaAtGaCTTAtTGaCTTtGattaa
taaacaaagatattagccactccattctacattatatacaatctatacGCATCTATatATAtctactaaaa
aaatTGtAAGTACaTaCATATCaAtgttaatttgtatataaggaGCtAaGAAcaaacccaatTAGGCAACT
Agcaattgctaaaacacgtaagatctcaaata
score :288








78


12312_atl12313_g_at I#F22K18.20#At4g24780 putative pectate
lyase;pectate lyase, Musa acuminata, PATX:E2098761COORDS: 11736719 -
11735129
gttcTAcaTGCCcaatcaagaaaaagtcaaacttttcTCAATGCcgactttactgttctaaaatgtatgaa
aatacacattaataTACAAtTATaaATACaTGCAAaTATatattgttataaactatataatgGTTACTTGA
TAtGgatccttataCTtAaCTAaCTtaacaccaaCtCCCtGGaTCTATaatgtatatattgggctaaaCAG
GTCATGTggaagtTAtTTGaAaagtaCACCaTATCgAcctaattagattggaggctttttatttagttttt
tgtcaacaattagatttgagtaacggatgtaaccatattatttgttcctaCCTCCTAGGCATTATTatagt
ataaggtttttttttaactaaagatgatttaaaaaaaaaaaagTgGgACCTaaaccttaaaaattccaata
acaaatagtatagaggtcctttttctctttatataaacccccaaaatttcctcttctctgtattctccact
caatacttatttcctctgtcttccttcgcttCCCTCtTctctcaccacaatccgagagagagacagtactg
taatccaagtgagagagagagagaaatgaaaa
score :216

20239_g_atl20238_atil#MMM17.26#At3g13790 beta-fructofuranosidase
1;identical to GB:S37212 from (Arabidopsis thaliana)ICOORDS: 4535836 -
4533089
gagcaagtaaaacatattctTACAAaTAacTtaCAAGTTACATGTAtaacaacactcttatctttatcatg
taaacaaaaaaaaacgctatTAaTTGTAaatatttcatctgtactttgtaaatttgacatgacaaaaaatt
tagatacttctaaaaatgtacaactttttagccgttagaaagaaaaggtgtacaacttttcaatacgaatt
cttatattaacttttgttccttcagtttTATACAATTAGTCAATGACAtTGaaaaaccagcacaaactcct
tcagttgatcaccaATAGAaaCaacaaaatacagtataatatttctaatttttagtgaaaatttacgaATA
ttGTGgaaacaacttttatagatgtctgagtctgacttaTGtcAGTAttattttttcttcacttataGCag
TATAacattcattatttcacATTAATGCcaaaaaaaaattaaaaactttattatcaaatataatattcaaa
tctcgatattgacaggtttGCCCTCgTcCTTTGCCTATAaatttgacaTGATCTACGTtCAttcaaacaaa
aaCCAAGcCAcaaagaaattaaaTaCaCACAa
score :228

12879_atl #F12G12.27#Atlg33960 AIG1;identical to AIG1 (exhibits
RPS2- and avrRpt2-dependent induction early after infection with
Pseudomonas) GB:U40856 (Arabidopsis thaliana) (Plant Cell 8 (2), 241-
249 (1996))ICOORDS: 12346670 12348299
ttgtttgTACTTtCAtattctgattatccaaattttctgtttcttaatttaattcctctttattagtttca
ttcatttttctttttgtttcttagtatttcttCtAGTGGtcaccaatctaaaacttcaTATAATGCagTct
aaaaaaggatcctTAtaTGTAagactaattatgtatttatttattattatatagattttaTATttATAttT
AtTTGaAgaaaaaaggactttcaattgttctatttaaACTTGaaAaCTAGCTTgATGatgaaccacttcTG
CCTAGCatctGTtAaGAGctaccagaaaaatcaagttcgtcattttcacaccttttttttttttttttaaa
tttctcatcgatcaatttgtatggtttacgatTAcgTGCCCTCtAagctttctctgtatttcatgtgattt
taattgacaacttacaagctttaagtttccagtttttCTTcATGtgattatattttccttttctcaaacat
CGTAaCcATACACGTAtagctttttcaacgtttttatgtTGAtCTAttagctaataattaacatttTACGt
GaAaaatgaagggtttaaggaaatttcagcaa
score :168

16342_at I#T1F9.13#Atlg61380 hypothetical protein;similar to
putative serine threonine kinase GI:4585880 from (Arabidopsis
thaliana)ICOORDS: 21860819 21857695
aagtttgcaacttttggCTaCaAGCAcaTATttttctgagtcgtcGTCAcTGtctctgtctcttcaaagtc
ccacgaTAgTTGaAgtttactttgaatcTgGCTCAtctgaattatGCATTGAatgtgttcgatgaaaTGCC
TAaacgagaatttaaattggttttatcattcttttgttttttgctggtcttctaaggttttggatcttcac
tttgCaACTTGcaAaacgtttgtcggtaatttccagttttgtgttgtccttctcgAAgAATGCaaaagctc
aaCTTCTTGGCTTaGtagaaacgagaaagttgacttatGaGtACCACACgtTATtttctgtggacagtaag
gttgggagattctttgtcaaggtcgtctattattttattatttttgtttctctctttgctggtcaataggg
tattgtttAgTACTTGTAAtattcaaagaagtagtGtGaACCAagatctagatatttaattatctcaggat
acgttaacataaaagtttttttttctcttggtcTCAATGCtCACtcTATcccCATgAAGttTtGCTCAtac
tcaagttatcttgcagagaaatagaaaaaaga
score :174








79


19720_atll#T22J18.14#Atlg22690 putative gibberellin-regulated
protein;contains similarity to gibberellin-regulated protein 2
precursor (GAST1) homolog gb U11765 from A. thalianalCOORDS: 8027326 -
8027959
aagaaacctatgaaaacaccaatAcaAATGCgaTattgttttcaGtTCgACGtttcatgtttgttagaaaa
tttctaatgacgtttgtataaaatagacaattaaacGCCAaaCActacatctgtgttttcgaacaatattg
cgtctgcgtttccttcAtCTAtCTctctCAgTGtCAcaatgtCTGAaCTAagagacagctgtaaactatCA
TTAAGaCaTaaactaccaaagtatcaagctaatgtaaaaatTACTctCAttTcCACGTAACaAattgAGtT
AGcTtaagatattagtgaaactaggtttgaattttcttcttcttcttccatgcatcctccgaaaaaaGgGa
ACCAatcaaaactgtttgCaTATCaAactccaacactttacaGcaAATGCaaTctataatctgtgatttat
ccaataaaaacctgtgatttaTGttTGGCtccaGCgAtGAAaGTCtATGCatgtgatctctatCCAaCATG
agtaattgttcagaaaataaaaagtagctgaaatGTATCTATAtaaagaatCATccACAAGTAcTattTtC
aCACActacttcaaaatcactactcaagaaat
score :204

19450_at I#F17M19.3#Atlg71880 sucrose transport protein
SUC1;identical to GB:S38197 from (Arabidopsis thaliana)ICOORDS:
26265752 26267518
ccaaacaaaataatatgttctccgataatcagatgtgacttatgtgatatagTACATATAaATAtacacact
gtttaaatttgtctacccaatcggaatcATAGAaaCtttattgtttacccaaacaaaacggtactgaatat
cggaactttttttattaaaaaaaaactgtgagagagaaattgaatcaaCgTCCAAGTCACTTGATGcaaga
aaaaagcgaaaccaattaaaTtCcCGTAaaaacagaacacaaaagaacaggagagttaatgttctaactga
cacgtgtccctaccttGCCAtaCActcacacaattaaaatttctaactctgtctcttatccgaaaaataat
catctccaagtgtaataagaaaatcaaaataaaaccttcatttcttcttcTTCcTcGCcTATaaATAcaac
tccattctctcatctcctacatcacaaaacaaaaacctcacttaaaaaaaaaaaacagaagacagaaaaaa
acaaaaaaaaaaagaaaaagaaaaataaacaaattttcttttcttttttttcctctaaaGttTCTATtttG
TCtATtCggtgtttttttttttacttcctgata
score :96

15985_atlI#MHJ24.8#At5g64100 peroxidase ATP3a (emb
CAA67340.1);ICOORDS: 24945888 24944650
cattttagtttttacaaaaaaaaaatcacattttagtgtttcttgaatagatttgatcacttttCTTaATG
tatcgttcattttttTcCAAtTAatcaactgcaattttggccgatttagctacatgtcgaaaaacactaat
cgatttttcttccagagttttaagatgctttacgtttataccattCACctTATattagttttcattttgtt
ttcgaccaaaactattgatccaaattattatAGCTAGGTACGAtttgaacaaactttgattctcttaccaa
tCATctACAacttaaaaccaAGaTAGcTTcGCTCAaaaagaaaaaaataatgacaacatgtataaTACAta
TAtgttcAaTGGACCGaCTCATCTACActgttattcaaactaattattttataaatgattactaaatcagc
ttattaaattcccataatttctgcgtcgtgTGCCgAaGttgCTCGTtACaattgttaTtCcCACAactttt
tTTGCCTATAtATAcaaaccctttaacatcaaactcaaaacacacaacaaacacaacttctacaagactca
aataGttTCTATttaattactaaaaagaaaaa
score :156

14039_atll#F3P11.19#At2g19590 1-aminocyclopropane-l-carboxylate
oxidase;ICOORDS: 8425903 8424788
aacaccacgCaTATCcAaatttggtgacaccagaataacacattTTACAAGTAcgcCAaTGACgcttagat
gggTAGtTtAGtgaagttttaaagaaaaaaaaaatcagtgaaaagaaaaacataatacacaattagaCATG
ATGGCTAGGtccatcgaatatgagtagggataagaaaaaaaaaaaaaaaaaaaaaaaactttccttgtgaa
gaagaaagaaacttcacataaatgatGTTACaaGgaaacacaatgggagtgacccaaaacCATtAAGaaat
tatgaaaaattcaaaACTTGtgAtcacatggttGtATgGACtgaacttggacttttcttcaatcctattg
ttcattttacatttccaaagcagattcttatacaattttttattcattaaacgtaatgatgcttcaaattc
aatgagttaacagaatagtgttgaaagaaactattgttttgCcTCCAAGTAGGCAACTATGCCTCcTaAC
acacaagagaaacagagatttaattttcacttaTGCCTATAaATActctttctagAttGCATTcAacaaat
ctgtgTGTGaGaAaaagtaaataaaaaagaga
score :216


















APPENDIX F
ACTIVATED ABA REGULATED GENES

15611 s at117406_ iatl #K24M7.4#At5g52310 low-temperature-induced
protein 78 (sp Q06738);ICOORDS: 20534755 20537152
tatttcatctacttcttttatcttcTAcCaGTAgaggaataaacaatatttagctcctttGTaaATACaaa
ttaattttcgttcttgacatcattcaattttaattttacgtataaaataaaagatcatacctattagaacg
attaaggagaaatacaattcgaatgagaaggatgtgCCGTtTGttataataaacagCcACACGACGTAaAC
GTaaaatgACCACATGatGGGccaatagacatgGaCCGACtactaataatagtaagttacattttaggatg
gaataaatatcataCCGACaTcagtttgaaagaaaagggaaaaaaagaaaaaataaataaaagatatacta
CCGACATGaGTtccaaaaagcaaaaaaaaagatcaaGCCGACACaGACACGcGTAGagagcaaaatgactt
tgacgtcacaccacgaaaacagacgcttcATACGTGTCCctttatctctctcagtctctctataaacttag
tgaGaCCCtCCtctgttttactcacaaatatgcaaactagaaaacaatcatcaggaataaagggtttgatt
acttctattggaaagaaaaaaatctttggaaa
score :228

15695 s at I#T27K22.8#At2gl8050 histone HI;|COORDS: 7794644 -
7795219
taaACAatTGTtttatgtatatattttttttttattagggttaaggatttttttctatttttgtttttaaa
tgtaataaaatttgaAACACaTGTaaatatcgtattaGTaaATACcgaccaaaaaaaatattgtattagta
aatttgacacatatcgcaatttttgtgagctaacaattttaaaaatcaaataagatgacgaacaaagctct
ggtttaaactttctcccatcaattttttcattaaaccaaatttaaccaattatttggcctaataactgcGt
cTACGTtattaagaataagaacttattttgtgtttcagtagaaaacACACtcGTtcacaaaatgcctagta
agagtaaaggacgatcaccgccaCCAaGTGtgtttCtCGGaTAAaCACaTGGaaTcCaGCCAttacttaaA
CGACACGTGTACGCtcatgatttattaatGCACACGTAatCgatcctctgacaaaaaccataacgaataca
gaaAacACACGaATacacttccctgcgctataaataagctagcacgaaaaaatttaacagatagagacaag
acaagcaaagcaacactttcactaatcctcta
score :348

18969_g_at|18968_atll#MUA2.12#At5g57550 endoxyloglucan transferase
(gb AAD45127.1);|COORDS: 22600210 22598881
cgaaatcaaaccggaattttgcatgtaatttgattggtgtCGTTACtttaaatctttaatccacaaaacaa
aatttactcgattttaGTATtaACcgaaccaattatagtttattgaaatttaattttaattctatcaaatt
gcatatatattcttgagttattttttataaaaatactgaaaccaactaaaataatagagttTGGCgGaAct
accgtaccaaatttgattgtatttggagtatcatttttgcaaacctaattagcctgaagactgagatatcc
ttgtccactcttatgaagaaccaatttaacaaggtgaaaaccagaatctctaaaccaaaCaTGGCAtcAac
tgaaccggatcaggCAGaCtTAaaccaaaacaaagaacaAGCACACGTaGCAtgaggcaaaattaAgCACa
TGcttgctttacttcaaaacaaaaaccagctgttcacagctaaaactacACAaGaGTcaCAaACGGcgaac
tatactacaaaaagactaagacttgcctcccttatataaaaccccccaacacataaggtcccaatgaatga
tttcaattctctatcttattgacacataaaca
score :102

19177_atl #MQJ16.4#At5g22500 male sterility 2-like protein (emb
CAA68191.1);ICOORDS: 7369180 7372555
gtttactgaattctatagctcttaccttgcacgactatgtcccaaggagaggaagtaccttaactataatt
ctgaacataattttgtctatcttggtgagtattatatgacctaaaccctttaataagaaaaagtataatac
tggcGTAACGtaataaattaacacaatcataagttgttgacaagcaaaaaaacatacataatttgtttaat
gagatatattagttatagttcttatgtcaaagtacaattatgcctaccaaaattaattaatgatttcaaca
ggaagtctgagATGatGGGCCGACGTGTAGTtACGTtTCttgaattgtgagagatggtatttattatactg
aagaaaacattatttactaaataaattttcatttcacatcttctgtaatcaatgcgggtagatgaagaagt








81


tGTtaATACgatggccaaccaataggatctcttttttggcgtttctatatatagtaacctcgactccaaaG
GCAttACGTGaCtcaataaaatcaagtcttttgtttccttttatccaaaaaaaaaaaaaagtcttgtgttt
ctcttaggttggttgagaatcatttcatttca
score :102

17929 s atl #MHM17.19#At5g57050 protein phosphatase 2C ABI2 (PP2C)
(sp 004719);ICOORDS: 22381546 22383129
ctaattactttgttgttcactaaaacatctcacattgtgctattttttttaataaataaagacctctctct
ttaaaactgattatcccctaaactatttgtttgtgagaaagaaagagaaaaaaaagtttttttggttggaa
attaattgagggagagagagagttcagaatctctaaaTGGCtGaAacgaattgcccagaatccaggaaact
ctgattttagtcctttTTtCCtCGgGaAaACGTtTGCtttttcttttttgtgtgtgtttggttctttaatt
gacaaatctctctctatctgtttCAtGTGcTagctaaggttgtcttcactttcctctaatgagtcatgagt
ttcttgtcttcatcttcatctaataaagttaacaacactctcatcaatttttattatgttcaaatctgtct
gacttttcgttcttttctcttaCaCCCAaGcgttttaacaatctaACAtcTGTctctttgaatacttagat
atccaacttcaatttctctcctttctcttcccaactttgattcctgatttgggtttttgttaaagttcaag
aaagttcttttttctttttttttcctccttta
score :60

12994 s at|13004_atll#T19K4.110#At4g35980 putative protein;physical
impedance induced protein, Zea mays, gb:AF0016351COORDS: 15998351 -
15995305
agaaaaaattttcatcatcataattgattataccttttaatcattttttttgccttttatgttttctaata
gaaaaaaaaaccaaagttttcctccatggtttaggtaagatccTACtGtTAaatagaatttcattTGGCAa
aACaaaGTcaATACagattaccattatcatagtcaattacacatcttaattacttgaataattcttgttta
gtgtgacactaaaaaagGTGGCAaAACaataaACAaaTGTacaatatacactcaacaaaaagaaattttgg
atataaacaaatttaatatttggataaaGCGcGGGAAaatctcctaaaatagttaaaaaaaaggttaaacc
ggcccggttcatagtcaaaaagtatgctcctcttgaaacacgcaaCGCCCACGTCTtCctatctccgagac
tccgacagcaaggttacttaacagttaatcaaactagattcgctttcttccatcactaatcttctgacttt
ttctatTtCTGCCccaaatctttttcaaattcatataaataaccaAACACtTGTtcttcttcattccccct
tcgtcttcctcctcctaaatcaccggcgataa
score :144

14062_at I#F17A22.43#At2g47780 unknown protein;ICOORDS: 19518609 -
19519397
gcgttggtctgaaatcaaactgtgagattcttaaaaggcgatcgttttacttttatgtcgccgaaacggaa
caaattCAaACGGtaaagccaactatactttgatcagttttgggttgggccgttgacactgttttgtgggc
ttctcagtcaGTGGtAaAcctaaaattaatataggaactcgaggattagggcccaatccattttAACACGT
GaTGatatcatgtaaggtttggagggaatatacaaaaagtggatgaatgatagcgaacgagaggtccttgt
gcttttttgatttaccctcgaGACtCGTGCCGACggagtggcCGTGGCtTtCGTgcaacagtctcgaccca
tacctCGACACGTGTTGaaaacggtccattaagagtaagACAtcTGTcaataggACAgcTGTcctaGaGGT
CACtcGTCtctcacaatccaatccctttcgcttaaattaaaactaaaatggagCaTGGCAGcAacaTGCCA
CGTGGCtcaccagtttcaaatctgagccgtccggtgtgatataagtctttttaaagagagagAgaGCGGTt
cgtttgtcTGCaACGgcaccaagtctttgaca
score :498

20042_atl #F28A23.230#At4g34010 putative protein;|COORDS: 15262345 -
15262509
tgcagttctggagaaagtgattgagagaaggcaaaaaaggatgataaagaatagggaatcagctgcaagaT
CCCgcGCtcgcaagcaagtgagtgtttgtttaaattttggagattaaagaaaccttaaaactgtgaccatg
ttatttactatttcactttcttgcttgaCAGgCtTAtacgatggaactggaagcagaaattgcgcaactca
aagaattgaatgaagagttgcagaagaaACAaGTGTgtctcgcttcttccctatcacaattaagaatctcg
agattttcatattttcttgaggttGTATtcACtgaccaaatgtttcatgcaggttgaaatcatggaaaagc
agaaaaatcaggtactgtcttgatttgaatatcctctatggttgttggctaggcttttaactctcactcat
aatgaattacacttttggacagtattctaagcttttgagtagaataGTGTaaGCtataccatgaagtgaga
catatcatcacatttttgaTtTCCCACtctgcataaagtatttaagatttgtgaatatgttgcaatgccaa
tttggatatttcatgagactaatctgacgagc
score :54








82




18701_s_at I#F1N13.100#At5gl5960 cold and ABA inducible protein
kinl;ICOORDS: 5132890 5133371
gaaaaaaacatgaaaaatacgggaggttcgGCaaACACaacatttaacttgccAaACGTaTCatctaacTt
TCCCACcttatacaaggaaccattttttcaataataaagtttttttttttttgtcttcgcaaataagagca
cgaaatgtttgccaaacgcatatgcaacaaaCCCACGTTACataattctgttTaCAGCCATAgagcaagct
atattgttaaagacctaaaaaaaactttactataacatatagaggcttcgagatatttcgaaagactcaac
ttatatataaataaactcaaaaagAaaACACGgAgGCgagaggatcatactctcacacagaaagagtcaca
ttattatatcctctaaaaaaccaaactaaaACGACACGTGaagtcttgatcagccgataaatagctaCCGA
CaTaaGGCAaaACtgatcgtaccatcaaatgtaatCCACGTGGTtttagatTACtCGTGGCAccACactcc
ctttagcctataaatataaaccattaagcccacatctcttctcatcatcactaaccaaaacacacttcaaa
aacgattttacaagaaataaatatctgaaaaa
score :330

18594_at I#F22L4.3#Atlg01470 hypothetical protein;contains
similarity to l-phosphatidylinositol-4-phosphate 5-kinase(AtPIP5Kl)
GI:3702691 from (Arabidopsis thaliana)ICOORDS: 172826 172295
tgggcttatggcctgtggcccatttaagtttgaccttaataccatcagaaacccataccaatACGgGTCa
TaagtgcattgtccatatagtgtACCGCaaTGCaTGGGAGtactactacattccaCCGACGTGcatATGGg
GGGaCatgGTCGGtCcattaaTCCCttGCgcatcttattccttacgcgaatacttttccctcttttagttt
tgtataatactttccatttttttggtattaaatttatcgtgtgactatctaaaaaggtatttaattatcta
gtttagctataactgcaaacaaaatatctattttcatgaatgatatatttttgaaataaaatattgttgtc
atGTAACGaaaaaatcaaaaatcaagcagAaGACACGTATaacCCACGctTtcactctcACtCGTGGaAAA
ccCtcACACGTAcaCaaaccattataaaattatatccatttacatcGaCCGACtTcataagaacatcgtaa
tcccatccgtttatatatatattccaagttaaacttcatatcatacacacaaacctaaaacaccgaaacaa
aaacaaagagatttaaacaagaagagttatta
score :216

12319_atl #T10P11.10#At4g02630 putative serine threonine protein
kinase;ICOORDS: 1150683 1152161
attgcgatgagggcagagttgcgagcagggatcgtagctctccCAGcCaTAtttgattcaactacacacga
taaacaattccagaacaaaaatcaatccagcgaagaagacaattatcgaaattgggtaaaaaccctgatca
attcctgatcttagattcgaagataaagcaaaagctcatagattttactcattacctctttttctctgttc
ttctctccttcttcgatttgatttgatagatagatttgtgtaaatcacaaccaaagaccacgcgcaaaaaa
aaaacttgttcAgaACACGatTCaattgacaatattaCcCCTGaCtatacctttaataacgatactaccct
cctttaataattgctcatTAaCaGTAaaaGTGGCAaAAgtgtaaatatatactaaaaataatgatgagtgc
aagtcccatcaaagccagcttgcctcttcgaGggTACGTttgtcttttcgacatctcttttcttgtctcta
cttcagaaaaacctcctgtgatgattattatatgtgaagcagaacaaaagcaaaaagagagagtcagagga
agaagaagaagaagaagagtcagtgagctgca
score :54

14832_atl #F9D16.70#At4g23600 tyrosine transaminase like
protein;tyrosine transaminase (EC 2.6.1.5) rat, EMBL:X027411COORDS:
11275155 11277383
aataaaatttaGTATtaACtattacgtcacattaaaatatacttgttttaaaattaaatataaatttatgc
AtGACACGGATAaccttgtaaagataataacaACGaGTAaatcatgaaatcgtaCGTTACtatttatattt
taaaatgGGCAaaACataacaaaaaacCaTGGGtGgttttcctcctctcGTCGGtCgaagacagagacaaa
aaatatttaaatacgtaagaatctttgcttcttcttgatgctgatgagtgatgagtaatcgacagagttat
tttgtttcttttgtataataaaggagattgcgactctacaagagagggctaatatagcatgcattaagatc
aatcatacagtttacttaaacaagatttaagatcaccgaaggctcgagattcgagatacattattttcttt
ttgtcttctatgttgatgttaaaTaCCCACGTGagatatcaaatgtttagaacaatgtgaaaaccatTGGC
AaaAaaatgaaaagtggttgggaattttcactatatattcgacttCGTaGCAatgagagttagtcacccaa
aaacaaaatctaatagagaaaaataagtagat
score :150








83


18935_atll#T26J12.4#Atlg23200 putative pectinesterase;similar to
GB:AAB57669, location of EST gb Z35063 and gblZ350621COORDS: 8227234 -
8229398
gaattatattaactgaaatagttaaatttttgctaattgttatactatttcaaatcacattttctgtACGT
GGattgttcttttccttatgatatgttatctttcatttggagtctcagactttctGACAtGTGGCAttACg
ggaaaggtgtgtgtctctcaaacttcttACACATGTGGcaggccattagccaaatctcatttgagtatcta
tactaatttgattattagcttgagttcctttCCtACGTagttgttttaatttacataaagctctaaattat
gttgtgttcaatgtttatgcaataagttcgaaccaatggccattgctaacttggattctataacatataat
ttgttcttaatttttatttttttgttcttaattttcgttttcatatgtttatatcaaacataaatcagacg
ctttGACGTGGCtGCATACGTGTATACattttcatttatgagattgaaagagagtatccatACAttTGTta
cttatacatgcatttatatataactctctacttccctaagaaaacaaTaTCCCACatctctactcatccac
attagtcaaagatctctattggtacttctcaa
score :180

15476_at I#F2G1.17#At2g21560 unknown protein;ICOORDS: 9179844 -
9179020
gttatttacaagacaACAaGaGTtatggtaatgatttttgcttgttaaaatataagctttatattactaaa
ctattatgattttgctttgatgatacaaatcatatatcttacatataaacGTaaATACaatcttaaaaata
caacaagacaatggagtctatggagtcaaatgaaaatataaggacacaaaactatgatttttattagattt
tcagttcaagtagattcataatttaagggtgaaagatgaattaatttttctgtttttgtcaaataagaaca
taaTAaCtGTAatccaattgaaaaaagaaatggttCGTGGaAAAagaaagcaaagcactcagtgactcttt
ataaatactctcattgcaaagttacattctttcaagctaacaaaaaacacacaaacaaagagagagagaaa
aacagacacacactcttctctgtttttttttgctccacaaacttaaaatcaaattctcctttttctctaca
acatctttgtttgatcttcaaagaagatttaagcttgaagttacaaagcttctctccttcagatcctactg
ttttaagttaatctttttcccttcaacgccaa
score :36

15776_atIl#T14C9.150#At5g25610 dehydration-induced protein
RD22;ICOORDS: 8768168 8765982
aaataactcgaaaatatctgaactaagttagtagttttaaaatataatcccggtttggaccgGGCAGTATg
tACttcaatacttgtgggttttgacgattttggatcggattgGGCGGGCCAgccagattgatctattacaa
atttCACCTGTCaacgctaactccgaacttaatcaaagattttgagctaaggaaaactaatcagtgatcac
ccaaagaaaacATtCGTGaataattgtttgctttcCaTGGCAGcAaaacaaataggacccaaataggaatg
tcaaaaaaaagaAaGACACGaaacgaagtagtataacgtaacacacaaaaataaactagagatattaaaAA
CACaTGTCcaCACaTGGatacaagagcatttaaggagcagaaGgCACGTagtggttagaaggtatgtgata
taattaatcggcccaaatagattggtaagtagTAgCCGTctatatcatccatactcatcataacttcaacc
tcagctcctttctactaaaacccttttactaataaattctACGTACACGTacCacttcttctcctcaaattc
atcaaacccatttctattccaactcccaaaaa
score :180

15629 s atl #FllA6.8#Atlgl7740 unknown protein;ICOORDS: 6101158 -
6104980
aagtcttgttctttcatgtttcttttcatatACACtTGTtaatgaataaagttgaaaaaATtCGTGaggaa
aagtccgaatctatcttctctgcaaaaataaaattaagtaaaagaaacataaagttatggtttaaagatgc
tttcggaaactccatcattttaGTGGGAgAatcacataaaagattggtcgtcaattatttttcttttatta
ataaggatgcacatgatgatgaatatTAaCCGTAatatactaatatagatgacaatgtatttcagtgctaa
attatagtaacaattaattgaatgcttaaatgaatcttttattatatactactttatatgtgttttttttg
ttttaattgagacaatgatgatgcacctagacaagaacgtcctactctctgtgCAtGTGcTtctcatttta
tttactcttttaccaaacaaactgttcaacttaatacataatgttggtagcccacaatgtgaaagattaaa
ttaaaactcatccattacctcctgtaagctctatattaaccactttagctcttctGCacACACtcacaaca
accaatcatatttattattaaaaaaaagagtc
score :60

16141_s_at|l #F19C14.3#Atlg58360 amino acid permease I;identical to
amino acid permease I GI:22641 from (Arabidopsis thaliana)ICOORDS:
20949311 20953001








84


ttagaacactataaattagttttacaagttcttagaaatgTAtCtGTAaatttcaaaaaggaaaaatatag
catttaattttgaagatttttttctacattatatatatgataaaaatattgtattttgtactttgtagtta
caaaaagtcattatatcaacaaatctaaatataaaatatttttctatatattactccaaattaactgtcag
aataaaaaagaagaataattattacagaatctgaacattaaaatcGtCCCtCCATatGtGGTCtCtgtcta
gtccaaaagcaatttacacaTCCCaaGCcgaaactatattaaataaacatttttttttctttaactaaaac
atttataacatttaacaataaaagttaaaaatCGaACACGTATaacgtatcatttattttttACGTATACGTcTtgt
tggcatatatgcttaaaaacttcattacatacatatacaagtatgtctatatatatgatattatGCaaACA
Caaatctgttgactataattagacttcttcatttactctctctctgacttaaaacatttattttatcttct
tcttgttctctctttctctttctctcatcact
score :114

15625_at I#F22G10.9#Atlg53580 glyoxalase II, putative;similar to
GI:1644427 from (Arabidopsis thaliana)ICOORDS: 19265928 19264226
agttttttaacaattatctcatgGATtCGTGGtATACCGTTACttaataacaattataaactgtaaaatat
aaatatttaataaaaataaaatttgcaagttttaatatatattacttttaaaaaataaatcgtcccgcgat
atACCGCGGTTAaaatctagtttgatatttatgaagttgtactcaaactcaaaggtcaaaaccaaatgcta
taatttcatctttgtaatggacctacaaatagaatcataacgaagatcctgactgttgacctaAacCGTGG
agtccacCAGaCaTAtgacttgttGTCGGCAacACtttggtttttagttaagattaagcccaatagcccaa
tataagcccattaGTGTagGCcctattaagtccatctacactcgaaaacttcatcatcatcttcaaggttt
aagattggcttcatgaatctccaatcactgttctcagatgattgtttaatcttatctgataaaaatcattt
gcttccataattgttactGAatCGTGcagaactgatcaatCACGgGTCCCACCacattgtttgtttcaaga
agctattaACGTCGGCtgattccaaatggtga
score :162

19987_atl #F6F22.16#At2gl9810 putative CCCH-type zinc finger
protein;ICOORDS: 8498968 8500047
aatatttttcacaataatttttacaaaattgacatcaatgaaataactaaataattttactactattttaa
atagcaaaataaACGGtTAtgagtgattttatctaaaaatgttagattatgtgttcccttaatttctaatt
aattaaaaatataaatttttagttttgtcgtttatcagcgtatctaaaagttttcttccaaTcTtCCACtg
ccttcggaactcaaagaaaattaaaatcaaatttttagtaatataacagccgctcactCACGgGGctgtca
cctaaaagtctatcacaaatcatCCACGTCaattaaatatccttgtggaaaggtctcaagccgctgtcctt
aacctctgcatcaCCACACGTCACtgtagcttctcactttcgatttTAACCGCGGTtgataaaagcagaa
atatttaaccatggttaggttttaacccaaaggtggtttaagtaTaTCCCACtactcagcatcagtttata
aagaagtaaagctttgccggagatttaGCaaACACaaacacaaaaaacaaaaccaactcagacgaatgtga
ttttttcttcttgagtgaattgttgtaaaaaa
score :108

14025_s_at118909_s_at|18908 i at I#F3L12.2#At2g04160 hypothetical
protein;ICOORDS: 1406689 1400446
caggtcataaattatcagtttTtCTGCCAaatatagcatcaatagaaactccattatTGGgaCGCaatGtG
ACCACtTGagcatcttatcatttttatttcaaaaccaccttattacactgagcttcccttttttcagcaac
aaattaatgtcaaatcaattattaaaaatacaatataaattctgggagaataacaagtatttattagtcat
tacacataaaaaaccgtattttggtgtttcgtagaaagtgatatcatactagtctataatgtgtttcacaa
aaaaaaaaaaaaaacttatatcaaaatcaaaatttcatgttcttgaaaagatttcaatatatttttttttt
ttgtcaatcgatattaatatagactcaaacgaccctaccccaaaagatattgtaatttcaatattttatta
acataaaagatCGCACACGGtacaaccaaaatagttttcgaaaccaataaaaaaggaaataaaaaccgatt
ctttttttttgcaatcccattataaaggagcaacgcacaagccacaagtaaaaagtgaaagtccaacaaag
atacaaagaagaagagagaagaagaagaagaa
score :60

14720 s at I#F14D7.2#Atlg35720 Ca2+-dependent membrane-binding
protein annexin;idenctical to GB:AAD34236 from (Arabidopsis
thaliana)ICOORDS: 13226651 13228286
aaaaaaaaaaactgttaaaagcatttttagataatggtcattgtgttactcctcacatatgaacattcaaa
taaagttttggatactgtcTAtCtGTAtatcgccagaattagtaagagatttcttatgcatagtaggagat
taaaaaaaaaaatgcatagtaggagatttgtaaagaaagaaataagtttttttttgtataatgagttctaa
tggaCAtCACGTtTGCAaGTGaTaataaaaaaatcttacACtCtTGTactctGTAACGTGTtaaaataata








85


caaaatggattataataacgaaatctaaacacttataatcttgcggatgattttgtgacaaccaatgaaac
cgttaatgcgagattacgatagttttttatgaaatcatgatttgtgcataaagttaagcaaatgctaaata
attcataaagtgtaagaAACACGTGGCAGAAgaattaataattcgtttgggatatttttgGTATATACgga
caGGCCACGTCGTGTCcTaaacctcttagcctttccctttataagtcaatcttGTGTCGGCttcgactCCC
aaCATacacaaaacactaaaagtagaagaaaa
score :294

15672 s at I#F14M13.13#At2g22470 unknown protein;ICOORDS: 9487342 -
9486947
taacagttgaaaattttgatagacatactatatatgaatatgaacttaaataatgacccatttttcgtata
atgttaattattTACtCGTaaacgcgttaTTtCCACGaaacattaGGCAaaACtcaagttaattTACGCCT
GGCAttGTAACGCGGTTAaccaaaaagcaaattacgcagagtcaaatcatatctaaaaaccaatataaacA
tAACACGTGTCaATACttaactgatctcagaattaacatcgttaaaggAaAACACGTGGCAGAgatctgtg
TAtCCGTtTGgtgctccttcatgtagatgattcttcaagaaaacttcaaaaactCaaACACGTCaagttta
agaaagaaaaaagacaacaattattttaaACCGCcaTtgaaaagctaagccatgttgtatttttgtatgTG
GttCGCatgattagtgtcacaccaataattaattattaactatttcccaacCAtCgCGTatatatagagct
ctcttctctcattgttctacaccatcaacaaaaataaagaagagtttataacattaaacagagagagtttc
aagattcagacaaagaagagataatctaaaaa
score :426

19060_atll#F1707.17#Atlg70300 potassium transporter,
putative;similar to potassium transporter GI:2654088 from (Arabidopsis
thaliana)ICOORDS: 25692154 25689404
ctttttctttctttctcaaaccattctctgtttctcaactcttcactttccagtggtaagatttgaatcat
gggtttctcaaattcttaacttttcacaaaagCCCacCATggaaatcgaatcaggaagttatcaaaacgcc
aaggtaagtcatcatcagatttgatcacaaagttcttacctttacgaaaactctcttctcTTtCCtCGtaa
tcatcgatatctgattttgttgattgcagaaggaatcatggagaacaGTATtaACattagcgtatcaaagc
ttaggtgttgtatacggagacttgagtatttcacctctatatgtatacaaaagcacattcgctgaagacat
tcatcactctgaatccaacgaagagatctttggtgttttgtctttcatcttttggacaatcACtCtTGTcc
ctcttcttaaAtACGTtTtCatagttcttagagcagatgataatggtgaaggaggcacttttgctctttac
tcgcttttatgtCGACACGCACGaGTCAACtCGTTACcgagttgtcaattggctgatgagcagcttattga
gtaTAaGaCTGattcaattggttcgtcttcga
score :108

13176_atl #T16L1.30#At4g33540 putative protein;ICOORDS: 15095371 -
15098167
atcgtctccgccaaactcctcatcaatctcagacccatcgcctccactgctttctTCTtCCACGTGaaaca
tcaatcaccgttggaaaacactgaagatctcgagattgtgattcagattcgtatctctgatccaaggaaac
aggattggaattggtgtttttgagagattgagagatggaagagagagattgatctacatacactggagagg
aCcTGGCACGAATgagaaagaaGCtTACACGTGTCCaatcatgattggattcgagactcacggtttaagga
aaaacaaaccagaccaaattaggcttaaccgctaaaaaaccgggttctcgttttgaaagattgagagagac
gatctacaaaggaggacAgGACcCGgCACGaATgagaagaaGCtTACACGTGTCCaatcaggattgaacga
tttaatcaagctTAaCCGTATGtaaaccggattttagctgggtccacaagtagtcaaatatagatttttta
atagtcaaataattttcataggggcgaagttcaagatgagttactacactcatcaaagctcacaaaaagag
aagagaagagacgaggatcaatcaccattctc
score :486

19673_g_at|19672_atl #F1I21.18#Atlg43160 AP2 domain containing
protein, putative;similar to AP2 domain containing protein GI:2281637
from (Arabidopsis thaliana)ICOORDS: 15592159 15592833
aaaGaGACCaCcgacgaatcattttgggttcacaaaattgtacttcgatttcTAaGcCTGaatgtgaacgC
ACGTtTttgaatatttCAACACGTGTTtcaatatttcattacatgcattataacataaatattacatctttg
agtctttaactagttgaccaacaaaaaaaaaaactttaactaagtctagctagttttgttactacatatat
aaaaacaaaaccgaaataaatatttaaaatttataatatatttgtgtggctaaatcaatcaACGTGTCaTg
aaggtctaattcaagttggtaaggaaatcttttgtttatgtccaTTTCCCACGTGTCactatttgtatgAC
GGcTAgagaaagacatgttgaattaactagtgactccggattatataagcaagcatctactaaaaagatag
gaacaacacaatttgattacactgagcaacacaaaactgGCGaaCCAACGTGaCtctaacgaagaaaccgg











caatggccagtatcactacaatgccgaagaaataacaagaatcataaacgagccagaatattatcccccgg
gttacaacttgTcTaCCACCGCaaTttcaaac
score :384

20390 s atll#T23015.3#At2g04350 putative acyl-CoA
synthetase;ICOORDS: 1515081 1518173
ggttatgggtGTAACGttttaatatttaaaatcgcaaaattggaACCGCttTtttaaactgagtgaaatat
agcgttttgaaaaaAgACGTtTCtggtggcttctgaattgaaaaatcagcgtttttacttttcacaatatt
caattggagaatctttagaacttcaacgaagtagcgtctccggaatttcagGtAaACGTGGCAAgAaaaag
cCAaACGGagaagcatCCACGTGTCCCtCCATtagccaTCTCCCACGTGTCCattaccgtaccgacgaaac
attccttaaccaaaggcttcccgaaaaaggacagagtccacaagcttcggattgttcacaacttcaaagca
tcgaaaaggtttaaactttacgtttttctctggaacccgattcgaagaaatcatcaagaagtttcctttca
ctgatttctcaatctattcgtaggtaacccaattccaaaAaACGTtTCttttttctcgatgccttagcttc
tttggattctgttgtgatctaccgattttgaatttgcttgatcgatcatcggatcactgtgattagtgtta
ttgttcctttcctttgtgtagctttcgcgctt
score :414

17962_atl #F7D19.21#At2g42790 putative citrate synthase;|COORDS:
17754533 17751674
CCGTtTGtattttccatcttttgattaatgataactataatttatgaagaacatccataaactcatgagtt
ttattttgaacgcaacatccatatttagctttacattcatttttttaataagatttCAGtCaTAgttcgaa
aaatgaATGttGGGttggttccgatggtaaagcgagccGTATcaACaaaaaaggcccaaataaaaaGTGGt
AtActgttgggcctaaagatgaatatacagaagcccatattaagacaattcggcccattcagcattgCGTa
GCAatcgaattgatatttagcaaatCGaGGaAAgcaattgactaactagataaTACtGaTAagtgggTCCC
ACataattctgacagcttagagatggagaagatcatcaccgttagtccaagttgggtcaataaggcttacg
ccgttggatatcaagaTAtCCGTtagatcGCaTACACGTGGATACtgttcaataCtCCTGaCcGCagGGGA
tAaGACACGTGTTaTattaacgaagagtaagaatggtttctaatattcttcgctccctttcttcttctt
ctctccttctttgaattttgattttggttgaa
score :348

19884_atIl#F2K11.25#Atlg63370 flavin-binding monooxygenase,
putative;contains Pfam profile: PF00743 Flavin-binding monooxygenase-
likelCOORDS: 22716779 22714626
atctttccaaaggttTCCCaaGCttctttaACCGCttTtcgaataGTGTctGCttctgatGAaACGTcTag
ctCAaGTGCTACGgcttggactccgatagcaccaaagctgttgatttcagaacagagagagttgagacgat
CaACACGACGaGcTGctgctacaatcttacaaccagctttgcatagatcaagacagatctctctccctata
ccagaggaAgCACCTGTCacaagaaccactttatctttaagttcacaccatggctccaattgcttcaatac
ctgcttctgtccacaaaaaaaacaaacaatcatcaagaaagtggaagggcaaaatgagagaaaagaaatct
cacagtttgatgattgctcatattgattgatgtttcactgtctcggtggagaagaaatacaatgACACaaG
TtatataataaaagcatttctctctatcaaccaAaACGTGGCAAtcTAcGtCTGtaataacaGATtCGTGT
aaGtttctaacacaccaaattgattattaaaaacattgtaatcataaattgtaACAaGTGTgtgtgattgt
gattcatagattctaatcaaagaacacacaaa
score :186

13036_atl #F23E12.140#At4g35300 putative sugar transporter
protein;sugar transporter, Arabidopsis thaliana, db xref
PID:gl4952731COORDS: 15763562 15760923
tcactaatcccataggcttctctctctgttcactgtccttccgcacaaaaaaaaaaagtattcatttttat
attctCACGGtTActtcgaaatctaGCtTtCGTtttctttttttttgatttaaaccaaaagggtaaacaat
gctttatgattgcttagtgacgccattgaagcagcggctcactgatagtggcttcTtCTGCCtccaattcc
aagttgctaaccctagttttcatctctctctctgtatatcatattctctcgtactttcttactttctcttc
taCtTGGtAGagataaaaagtagtcaaattttattcaaagttttgattttgacatctgaggtttcacctct
gttttttttttctagaactcctttgggtatgtgtcttttacgtcactaaagggtttgttatagtcttaggt
catgctttttggtgttatctctctctgttttatcttttagctcaCCAaGTGaTtgtgtctgattttcgaat
tttCAgGTGGattttgtttcactcactcatcgtcgttgatcagtgactctgtttttgcgaaattctcttca
gatttcttgataaaagatagaagacggaatta
score :48








87




20471_at I#F2G19.32#Atlg46768 AP2 domain containing protein
RAP2.1;similar to AP2 domain containing protein RAP2.1 GI:2281627 from
(Arabidopsis thaliana)ICOORDS: 16537949 16537515
ttttaaCCCatCATtaacatctctacttgaatcaaattTAaCCGaGaagtactggaccaaaaccaaACCAC
tTGattaataggagagcatctaactcaagataacattttgtattagtttatgtggtcctgtgataccaaca
ttcaagtctctgtcataccAaaACACGGGTAtttgtcttaataagcaacttaggaataatatatgaaacaa
tcatcatctatgaaaataaaatatttatcaagagtttgaattattgaatcaaatatttacaaaggttgatc
attttctaaaagttcttgatttgtatataaactgagttttcgtgattgtttttataatttttgtaagccat
ctgatctcgatgaatcttcagtattcaatttcaGTCGGtCtcctaaccatccatgacaaattcagtgataa
tatagtaaatagataacaagtGAgACGTacgaggttacttggctatttctacaaaaccaatatttAtGTCG
GCCatagttctatattaaatattgtcaccatacttaataaacttccattGCGccCCAaaagtttgtctttt
atcaatggaaagagaacaagaagagtctacga
score :120

12443_at I#T4L20.60#At4g34480 putative protein (fragment);ICOORDS:
15448478 15446450
acttatcatggatagaaaaagttgcaatcctgtttttggaatatgcagttatttatttaacctttttcttt
ggttttgttttgtatatgatGtGACCaCagtcaacataataattgtgagttgtacattgtggaatcACGaA
aGCtaaaattattggtggttgtgggtgaaactttgaagaatgtttaacttggactactgagaaatagcatg
gaaacgaacaaatggtGTATttACacactgaaatatgtgttttgcgatactaatcatcctatggaatacct
catgcttcaaaagcaaactcaaaaaacaaatgctctcttttcgattctcctaaagatttcctctccagaaa
tagcacttttcagtttccaaattagatgacgagaaaaatcaaaagtataaaattacaaaataaagaccgaa
acgagaacttaagctcaaGACttGTGaacagaagtCAACACGTGTAGTGTGGaAaAacaagcacaAGACGT
GTCtTtggagtttgtataTAaCCGTTACagtaaaaaacctcttctttaagtaacaacagcaacaacaacac
aattcctaaatctctctctctctcaccatcaa
score :270

15088_at I#F19G10.19#Atlg23040 unknown protein;Location of EST gb
AA395277 gb T448071COORDS: 8165023 8165457
ataaccattttttcaagataaaccatttagtacaacaaatatagtatgaaaattacaTACaGtTAtattag
tggaccactataatatacctctaaaacatgtataaattaatgtatagtttaagccttttgttgtggtggac
aacgGTGGGAttttaatcgctaattaatgattctacacaaaaaaaaataaaaaattaatggcgataaaaac
caaaatagttcATGagGGGatccacctaacatgat aaatacgaccaattagtgGaCCGACgACGaGTAca
gtgctccttaactaaccttcttcctaatttatcatttacgttcaGACtaGTGTATACttttatttaagtat
aatttcgtttaaaaaaatgactagaagcttacagaaaaattgttatatacacaatacactatttaataagt
ctatgaCTCCCAtGCacttgacttacGTAACGaGaacgcattcatcatcacctctcaattcattttCACGT
CTCCtcttaattttcaactctttaagctcaccatatatacaaaactctgtttttgattttctctttgatca
aaatcaaaaaacgccgatcacaagttcacaac
score :114

14990_atl #F6P23.2#At2gl6990 putative tetracycline transporter
protein;ICOORDS: 7331746 7336335
ttgaagtgctttatttgtaactgatccttattttcatgcatgtgagtgatatcaagttgtactttgttgtt
ttgaaatgcctagaaatgacaaAagCGTGTGGGAatggacataatcgtttttacaaaattGTGGGAatgga
catgatccttatttatggtttgtacgcgACGGaTAaatttttatttaattACGTAatCgtctggatttata
tatctttaatgcagattgaaataTGGCAGtACcaaattttttgcatagtcttaacttcaaatcgatcgtag
ttattacaaaatgtcacataattcgtaatgtgtaattaaaaggtactccctacacaagagatggtgactaa
tgatcgaGTCGGCCagaaaataccaaaaaagtcctcaaaagtctgatgtgatCTCCCAtGCttcaaccgaa
tttttggaataaagaatagattggaaaaataTcTgCCACattgtagtattttcaagtaagctttctttgtt
tgttcaataatcaaaagcaaaaacaaaaacaaaactctcaacactttttgtgtttattttctgttattttg
aatcgAaGTCGGCaagcataggctaaaaggtg
score :114

15058_s_at113242_atll#F20D10.100#At4g37980 cinnamyl-alcohol
dehydrogenase ELI3-1;ICOORDS: 16817150 16818782








88


tagacttgccaaaggggatgttagatatcgatttgtcattgacatttctaatacattggctgctactcgat
cttaattaaagtcgatgttctatatgtattcaaaataatctggatttcaaTCCCACaaaacttaaggatat
atatatatatatatatatagtctattttatataaatggagtatagtcaaataaatatgcattatcaacgat
atatagtcttctattacataGATACGTGGGAGttcacCCaACGTaGATACGTtcggttgaaacaagtcaat
ttcatcaatgcctcttccaaaaaaaaaacaaattgcattattgatgAaCACaTGcatcattatcaaatagg
ttggttaaaatgaccaagatgactaaagccaatcacactactaccagatcgagtaaccattagggaccatt
aattCACGTGGaCgtagtgaatatggtccttgtgaattaatGagTACGTaattgtcctcattcatatatgg
atcggttccacaaacatttcctgtataaaattctacatctttcctctcattattatctctacacttctcat
ctctcaatcccattcgcttattatttatccat
score :150

16524 atl #F15I1.19#Atlg54100 aldehyde dehydrogenase homolog,
putative;similar to aldehyde dehydrogenase homolog GI:913941 from
(Brassica napus)|COORDS: 19471537 19468119
attgttggtaatcgtttagtggacgagattgaatcaaaggttCAaGTGGTaatcgttttCtCCTGaCgcaa
aatcgaaagaaaaaagatcggtagCGTcGCAtcctaatcgggtgacccggaaaccaatagttgattcgttt
tagTGGCgGtAaaacccggtttgatgaacaaatattaatgggcctggcccatacgaggatatCGTGGCAA
tgtcgatggtaacaacaactcctctattcgggtttatgttgacccggaaaACGaAaGCatAGGACACGTGA
CACATGTGaTgtgagtgaagccaaaaataataatattgggaaaggatgaacacagcagctcaGCtTtCGTc
ttctccgtcaatccaataaaaaaatcagcaaccgttgtttgtttttaagctttttttacaaaagACGTACA
CGTCTCtctctctcactccctctttaagatcagaagctcatttcttcgatacgatcaaccattaggtgatt
tttttctctgatcttcgagttctgataattgctcttttttctctggctttgttatcgataatttctctgga
ttttctttctggggtgaatttttgcgcagaga
score :306

12767_at I#F21P24.18#At2g23120 unknown protein;|COORDS: 9790649 -
9790900
ataTcTtCCACaattaattataaatgccgctcctctgattttctcaccaattaaagaagaatcactaaatt
aacatgttctagtctcaaaaaaaaaaaacatgtaatagtctcaaggatgttctaaatgaaaaggctcaata
aaaccaaataatgaatcatcacaaaagctatgtcaacgaagacaaaatgcaagtgtggatatttttcagaa
actgcgtaaaacaccgtgaaaatcattcttcttttttttttttttggtcaaactctctacagagtttat
CAaGTGaTtttTACCGGTAcctgtttttccttgtttggttcaactcaattattcgccttaaacccacttta
agaaaagcccactgaaaagcctactttaaaattcaaccaaaaggcccataagtactacaGCCGACaTcagc
agacaacaaGCCACGTCgatctcgaggaaccgcgtaattaACGTGTCGGaAtcACACGaGccacCGTGTac
TagatagctccataaaaGCagACACatctcatgaagcagaacctaaaacctttaataGCAaACGaaaaata
aaaaagaagaaactaacaaaacaaaaaaaaaa
score :138

17842 i at117843 s atl #T20D16.15#At2g23220 putative cytochrome
P450;ICOORDS: 9833097 9835299
cgtagagatttgtaatctatttcattcattgtctaacatttactagaatattaacagattttctattagta
ttatttatacatatctcacacaaataataaaaataaaaaagagatcacatttaaattgataacatatgcaa
ttaaggaatgcactttgttgtttttcaatcaattTACGGGTAaaactaactctgatactaaaaatctctgc
ttcaataaaataataaaatatagtacatgtttttgttacgaaacaagtttcggattctgaatagtgatga
tacaataaata ttgtagttaactacaatgtatttatttaatcattattTTaCCcCGcacattttgacctctc
attttttttGTgtTGCCtattttcctaacttagctgaattcttgaactcaaagatcttgaatagctctatg
ctactgtgctggccaatctgaatttattaataatgtttgatatatAtTACACGTGTTtaattGCatGGGA
CGTGaatatatctttaGTAACGgGTGGGAtAaactccatgacaataccaaaatttgaactctcgcGCtTAC
GTtataaactaagctataacttctaacccccc
score :228

19442_at I#MVI11.13#At3gl9100 CDPK-related kinase;identical to
GB:2AAD38059 from (Arabidopsis thaliana)|COORDS: 6605677 6608976
tttataacctccctagttaattagtaataaaatatttaatttagtagtatcacaagatggagctttaaaag
ttatctttaatTCCCACttttgaaaatgataacgtccataaatacatttcaAaACGTtTttattctcccaa
taagatatcaaaaagacaatatcaacaacaaaaaagacaagaaaaaaagacaaaaaataatgatagataaa
atacaaaagcaaaaGaAaACGTaTagtattgcttagagaaagaacaaacatattgtgaattaaaaagaaaa








89


aaaaagagggacaaaagtagaaAGACGTGGCAAAAagcaaactaaatttcgaccaatccacatctaaaatc
taaccttcttcatcagatggaaggaagctacttcaccacccgcagaTtCTGCCAtGTGGcttctctctctt
ctacagctcacttcttctctctctgtctctctctaactacattgaaaccaaaccagagtgcttcaatgtca
ttgctgtgattctctttaagtcCAGaCaTAactcccatatcttcttcttctgtctttttttatttctttga
acggagaagaagaagaagaaggagaaggatat
score :150

20420_atll#T16H5.170#At4gl9810 putative chitinase;chitinase (EC
3.2.1.14) lysozyme (EC 3.2.1.17) PZ precursor, pathogenesis-related,
common tobacco, PIR2:S515911COORDS: 9730250 9728648
acccatacttgaagatgtgattcttcaaaaactaaatagtatcaaaattgtttgaagaaaaatatattatc
aaaattggtaaaaaacatattaaaaaaaTGGCAGtACtctgtagacagagactctccacaaccattattaa
cCCAACACGTCACatgcatggaaCtTGGCAGcAaccatacactttcactaatgagtcatTACCGCTATtag
tagccattggtttcttgttagacaagaaatcatataatatacgattaaaattgaatataaacaaaatcttt
cgtaagcaattggatattctataGGATACGTGaGCtgaatattcaactttgatataaatgaagttgaatat
gtgtctcttggtccttgctttcgcattcatattacACACtTGTGTgttgacaaaataaaatgcaaaatttt
gccccaactaagactttttctcgtttagcaaaaaacaaaaactaagacttttctctttcgtaagatattat
tcatagatttctcattttcaaagttgaattacgtaagaaagtaaatatactttttcaataaatagacaaca
aaccatctctgtttttcacagcaactcgaaaa
score :192

17014 s at I#T17M13.16#At2g02990 ribonuclease, RNS1;identical to
GB:U052061COORDS: 872712 873665
tctatctttttttgtcatctgaaattattatcgctcaaacgaagtaattctgaggaaagttgtttacaaac
tagttatttcattattgtctacttatataatagaattaaaaaaaattattgcttaatgcaatttagtttta
gataaaatcattaaacttaatagattatataagttagatatcaataattgggcttgcttaaaaacataaat
ataaaatattattgggcCGTTACGTGcATACaaaacgaaccttctaacaaACAaGTGTgaaCGTTACgact
tcaaaattaaaaaaaaacacaacaactatgtCcACACGTAatCtcatatgattcagattccaaggagaaca
aaattaaaaacaaatctcgtaaacatacatacacttcacataaaacaaaaggtacaGTATATACcataaat
ctccgagattcttttgatgtatctgtccatttcattattacacaaactaggaaactgatatctctctattc
acattcctctgattctatttctctttatatatattcacccattaaccatctcaatcttataaccctcaaaa
tcacaatcttctcttacaaaaaactttgaaag
score :84

15214 s atl16505_at I#T3F17.4#At2g46270 G-box binding bZIP
transcription factor;identical to PIR:S208851COORDS: 18949398 -
18951440
ctctttttcgctctggtttttttaagagagagaaagatgaaaatgcgtttaattgctgtttaggtttcga
attcgcgatttaaatttctgggtttctctctgtttaagcttcttcttcttcatcttctGCtTACGTtTCtt
cttcaaggagctttcggattcttgtaggtattgcgtaacttaaatggttcaaatagagattgtttggtcga
atagttttgtgatttggtttctctatcattcattgatcttctaagttttgtgaaaaTttTGCCAtttttga
tCGTtTGCttttgcatttacttgaattgaatgaaaaattataagttttgaatttgaacaaaaacagctaaa
gcgtagagaagaacagagtttctttttttgggtgCAtGTGaTtgtttcaggggtagaattgtccagtagtg
tctactttgttgattccttttgtcttcctgaaagttttctgaacataggatgtgagagagaaatactattt
tacccttctctgtctctatatttctcacacaacaacacccatccccatgattttgcagaaagagtcattgt
tctcttgaGTGGGAaAccttgaaaccattcct
score :54

15110 s at I#T23E18.12#Atlg76180 dehydrin, putative;similar to
dehydrin GI:975646 from (Arabidopsis thaliana)ICOORDS: 27800307 -
27799663
tcttactctatttaaagtttgaatcagttttctttaacttgataaaccgatcattgaactacaaagtccac
ataacaagtattaaaacaaacaaaaaaacttatgaaagtctaatatttaaattaaaaacAgtACACGccTt
gaacaaaccatttagtttgttttcactacttttgaaaaaaaactatctagttgccattcttatccattatt
tttTaTtCCACtaaattgatatcaGatCACGTAaGCaaactattaacaattaattaatgaattcttcttaa
tccaatataactttatctaattaacagaaaaatgataattaACGTGGGgacgaaggagaagatgccataaa
aagCGgGTCaTtttgtaatttcatataattagatatatgtttttaaattgcaaaaaaaaagtgtacaaaga








90



gcgtgaaatccgcggtggcttggacccctCcCGaGTCcTcaacactatataacacactttcttctctaatc
tccatcacctcttattttcatctgcttttgaatttcaaaccttcacataaaaaaaatacttttgaATcCGT
GTttcattcatcgaatctttttccgataacta
score :84

13950_atll#dl4810c#At4g17550 putative protein;|COORDS: 8744238 -
8742562
cagcaaactctcagttactcggagaaaATGgaGGGaagacaGTGGaAaAgagccttatctgcttttcgctt
cctattttgGCACACGTATGCcatcgCACGTGTgaattttctttatgggCCGTtTGttattgtgggccttt
agtatatcaatattaatataaatagatagtactatgatttataaaaatatcaataaaagaGTCGGCtaatg
acaCAGtCtTAttatgattacataaacctataatttgGgCCCaCCACCACGTCGGtCGTGTgGGCcgtcgc
ttcatGCGtaCCAaaGtGGTCaCCGTGGCAtgACAacTGTtGTtCTGCCttttggctctaccgtaacaaat
AGGACACGTGTCGcGTCtTacacgtttatgtCGTGTtcTatctctctctgtctctctctctcTcCcGCCAa
tatctctctccctttggtctcttctctctaatcgctctctctcaatcaaattaATtCGTGattctgattta
tcaaacactaaaacaaatggatttgggtatttcattttgatccgtaaaggttgattcttttttttttttgt
tatcattgaaaattttgttttgttgtgagaga
score :552

18560_atl #T3P18.7#Atlg62510 similar to 14KD proline-rich protein
DC2.15 precursor (sp P14009); similar to ESTs emblZ17709 and
emblZ47685;similar to hybrid proline-rich protein GB:CAA59472
GI:4454097 from (Catharanthus roseus) COORDS: 22348500 22348051
aaaaaaaaagattcataattttattaaaaattatagctgaacttactggatgttttttcttaagaaaactg
gacaatatattCCAaGTGtTatatgtatCCAaatcttatagaaaaGAgtCGTGGGAAActaaaggtgcaaaa
cttcttatttacctttttaacaaaaaaaaaaacttcttatttaccataatatacaaaatggacaatatata
ttccaaattttgcttttgtcaaTtTgCCACcacagagttatctAgCACtTGaatttaatcatataaattgt
tggaagaatcttctcgaacccagtctagaatcagcactaacatgctctgtttacacaggacattcatatat
caaataacctcttttTAtCaGTAatcaaaaataattataaaccaaaaattcagaatttcaccgatgttttt
cttaaataatcacatatatgaggtcataaacattctcggcatgtccttcaaactactataaaagcaacttc
gaaCCCtcCATctttctccaacatcccaattcacacataccaaagaaaacagtactgtttgtttttgaaac
ctctcctctataaccaaaagtgagattaacaa
score :72