Citation
Directed Evolution of DNA Polymerases

Material Information

Title:
Directed Evolution of DNA Polymerases
Creator:
HAVEMANN, STEPHANIE ANN ( Author, Primary )
Copyright Date:
2008

Subjects

Subjects / Keywords:
Amino acids ( jstor )
Cell lines ( jstor )
DNA ( jstor )
Emulsions ( jstor )
Gels ( jstor )
Genetic mutation ( jstor )
Libraries ( jstor )
Nucleotides ( jstor )
Plasmids ( jstor )
Polymerase chain reaction ( jstor )

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Stephanie Ann Havemann. Permission granted to University of Florida to digitize and display this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
7/12/2007
Resource Identifier:
659860320 ( OCLC )

Downloads

This item has the following downloads:

havemann_s ( .pdf )

havemann_s_Page_016.txt

havemann_s_Page_130.txt

havemann_s_Page_106.txt

havemann_s_Page_081.txt

havemann_s_Page_158.txt

havemann_s_Page_035.txt

havemann_s_Page_116.txt

havemann_s_Page_142.txt

havemann_s_Page_040.txt

havemann_s_Page_134.txt

havemann_s_Page_017.txt

havemann_s_Page_058.txt

havemann_s_Page_094.txt

havemann_s_Page_008.txt

havemann_s_Page_098.txt

havemann_s_Page_053.txt

havemann_s_Page_155.txt

havemann_s_Page_075.txt

havemann_s_Page_084.txt

havemann_s_Page_113.txt

havemann_s_Page_126.txt

havemann_s_Page_148.txt

havemann_s_Page_059.txt

havemann_s_Page_043.txt

havemann_s_Page_140.txt

havemann_s_Page_108.txt

havemann_s_Page_022.txt

havemann_s_Page_006.txt

havemann_s_Page_061.txt

havemann_s_Page_123.txt

havemann_s_Page_007.txt

havemann_s_Page_090.txt

havemann_s_Page_078.txt

havemann_s_Page_128.txt

havemann_s_Page_031.txt

havemann_s_Page_070.txt

havemann_s_Page_139.txt

havemann_s_Page_147.txt

havemann_s_Page_124.txt

havemann_s_Page_002.txt

havemann_s_Page_085.txt

havemann_s_Page_111.txt

havemann_s_Page_109.txt

havemann_s_Page_127.txt

havemann_s_Page_038.txt

havemann_s_Page_118.txt

havemann_s_Page_095.txt

havemann_s_Page_120.txt

havemann_s_Page_025.txt

havemann_s_Page_096.txt

havemann_s_Page_072.txt

havemann_s_Page_001.txt

havemann_s_Page_091.txt

havemann_s_Page_101.txt

havemann_s_Page_087.txt

havemann_s_Page_110.txt

havemann_s_Page_052.txt

havemann_s_Page_093.txt

havemann_s_Page_018.txt

havemann_s_Page_036.txt

havemann_s_Page_003.txt

havemann_s_Page_039.txt

havemann_s_Page_009.txt

havemann_s_Page_079.txt

havemann_s_Page_051.txt

havemann_s_Page_138.txt

havemann_s_Page_033.txt

havemann_s_Page_144.txt

havemann_s_Page_027.txt

havemann_s_Page_112.txt

havemann_s_Page_097.txt

havemann_s_Page_088.txt

havemann_s_Page_047.txt

havemann_s_Page_005.txt

havemann_s_Page_089.txt

havemann_s_Page_050.txt

havemann_s_Page_032.txt

havemann_s_Page_013.txt

havemann_s_Page_073.txt

havemann_s_Page_048.txt

havemann_s_Page_080.txt

havemann_s_Page_133.txt

havemann_s_Page_121.txt

havemann_s_Page_102.txt

havemann_s_Page_135.txt

havemann_s_Page_077.txt

havemann_s_Page_041.txt

havemann_s_Page_156.txt

havemann_s_Page_057.txt

havemann_s_Page_152.txt

havemann_s_Page_149.txt

havemann_s_Page_021.txt

havemann_s_Page_150.txt

havemann_s_Page_074.txt

havemann_s_Page_044.txt

havemann_s_Page_105.txt

havemann_s_Page_024.txt

havemann_s_Page_137.txt

havemann_s_Page_054.txt

havemann_s_Page_119.txt

havemann_s_Page_115.txt

havemann_s_Page_023.txt

havemann_s_Page_131.txt

havemann_s_Page_153.txt

havemann_s_Page_062.txt

havemann_s_Page_159.txt

havemann_s_Page_129.txt

havemann_s_Page_065.txt

havemann_s_Page_099.txt

havemann_s_Page_063.txt

havemann_s_Page_028.txt

havemann_s_Page_141.txt

havemann_s_Page_055.txt

havemann_s_Page_086.txt

havemann_s_Page_066.txt

havemann_s_Page_037.txt

havemann_s_Page_004.txt

havemann_s_Page_103.txt

havemann_s_Page_151.txt

havemann_s_Page_100.txt

havemann_s_Page_132.txt

havemann_s_Page_029.txt

havemann_s_Page_145.txt

havemann_s_Page_020.txt

havemann_s_Page_071.txt

havemann_s_Page_060.txt

havemann_s_Page_049.txt

havemann_s_Page_019.txt

havemann_s_Page_011.txt

havemann_s_Page_067.txt

havemann_s_Page_030.txt

havemann_s_Page_042.txt

havemann_s_Page_015.txt

havemann_s_Page_136.txt

havemann_s_Page_069.txt

havemann_s_Page_117.txt

havemann_s_Page_034.txt

havemann_s_Page_114.txt

havemann_s_Page_107.txt

havemann_s_Page_046.txt

havemann_s_Page_143.txt

havemann_s_Page_083.txt

havemann_s_Page_125.txt

havemann_s_Page_014.txt

havemann_s_Page_154.txt

havemann_s_Page_010.txt

havemann_s_Page_068.txt

havemann_s_Page_012.txt

havemann_s_pdf.txt

havemann_s_Page_122.txt

havemann_s_Page_157.txt

havemann_s_Page_064.txt

havemann_s_Page_104.txt

havemann_s_Page_056.txt

havemann_s_Page_092.txt

havemann_s_Page_076.txt

havemann_s_Page_082.txt

havemann_s_Page_026.txt

havemann_s_Page_045.txt

havemann_s_Page_146.txt


Full Text





DIRECTED EVOLUTION OF DNA POLYMERASES


By

STEPHANIE ANN HAVEMANN













A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2007
































Copyright 2007

by

Stephanie Ann Havemann


































To my family and my husband. Without your constant and vigilant support, I would not be
where I am today. Thank you!









ACKNOWLEDGMENTS

I would like to begin by thanking my advisor, Dr. Steven Benner for all of his wisdom and

guidance; it has been an honor and a privilege to study under his tutelage. His passion for all

facets of science, and how they can be intertwined, should serve as inspiration to us all.

I would like to thank the rest of my committee: Dr. Tom Lyons, for always having an open

door and an receptive ear when I had questions; Dr. Nemat Keyhani, whose enthusiasm for

science was contagious and whose knowledge of microbial genetics was extremely valuable; Dr.

Nicole Horenstein, whose constant support and knowledge helped guide me throughout my

graduate career; and Dr. Rob Ferl, whose eagerness to learn and share information about various

aspects of astrobiology helped me determine the field of study I wish to pursue.

Special thanks go to Dr. Eric Gaucher, Dr. Ryan Shaw, and Dr. Nicole Leal, all of whom

have worked closely with me over the past few years and who have assisted me in various

experimental designs and implementations. Eric performed the rational design of the Taq

mutants and was my source of knowledge for all things dealing with evolutionary biology. Ryan

and I worked closely to discern the best method of creating and isolating DNA from oil-in-water

emulsions; his idea of changing the composition of the oil layer drastically improved our yields.

Nicole assisted me in performing some of my primer-extension assays and was a valuable source

of information and never-ending support.

I am extremely grateful to Dr. Daniel Hutter for the synthesis of the 2'-

deoxypseudothymidine-5 '-triphosphate, to Dr. Shuichi Hoshika for the synthesis of the

pseudothymidine precursor, and to Dr. Ajit Kamath for the synthesis and purification of the

pseudouridine-containing oligonucleotides. Special appreciation also goes to Dr. Michael

Thompson for providing the wt taq gene and his suggestions for the purification of the

polymerase, and to Gillian Robbins for assisting on the growth curve studies.









I am also thankful for the assistance of Dr. Art Edison and Omj oy Ganesh for their

assistance in the circular dichroism experiments. Finally, I would like to thank all the members

of the Benner group for their advice and discussions over the years, and Romaine Hughes,

without whom, our group would be in total chaos.












TABLE OF CONTENTS


page

ACKNOWLEDGMENT S .............. ...............4.....


LI ST OF T ABLE S ............ ...... .__ ...............9....


LIST OF FIGURES .............. ...............10....


LI ST OF AB BREVIAT IONS ........._._ ...... .... ............... 12..


AB S TRAC T ............._. .......... ..............._ 15...


CHAPTER


1 INTRODUCTION ................. ...............17.......... ......


What are Nucleic Acids? .............. ...............17....
Rules of Complementarity ................. ...............17........... ....
DNA Helical Conformations ................. ...............18........... ....
Central Dogma of Molecular Biology ................. ...............19........... ...
W hat is AEGIS? ............ .. .. ...............20..
Use of AEGIS Components. ................ .......................... ......... ..........20
Problems with AEGIS Components ................. ...............22................
C-Glycosides .............. ...............23....
Pseudouridine ............ ..... .._ ...............23...
Pseudothymidine .............. ...............24....
DNA Polymerases ............... ...............25....
General Structure of Polymerases .............. ...............25....
Polymerase Families............... ...............27
Taq Polymerase .............. ...............28....
Directed Evolution............... ...............2

Mutagenic Libraries................ ...............3
Systems of Directed Evolution ............ ...... .... ...............32..
Phage display................ .... .. ...........3
Compartmentalized self-replication .............. ...............33....
Research Overview............... ...............34


2 POLYMERASE INCORPORATION OF MULTIPLE C-GLYCOSIDES INTO DNA:
PSEUDOTHYMIDINE AS A COMPONENT OF AN ALTERNATIVE GENETIC
SY STEM ................. ...............50................


Introducti on .................. ...............50._ ___......
Materials and Methods .............. .. ....... ...... ..........5

Synthesis of Triphosphates and Oligonucleotides............... ............5
Circular Dichroism .............. .... ...............53
Standing Start Primer-Extension As says............... ...............53.












Polymerase screen primer-extension assays............... ...............54.
Taq polymerase primer-extension assays............... ...............55.
Re sults ................. ...............56.................
Circular Dichroism .................. ..... .............5

Polymerase Screen Primer-Extension As says ................ ...............56................
Taq Polymerase Primer-Extension Assays............... ...............57.
Discussion ............._. ...._... ...............58....


3 CREATION OF A RATIONALLY DESIGNED MUTAGENIC LIBRARY AND
SELECTION OF THERMOSTABLE POLYMERASES USING WATER-IN-OIL
E MUL SIONS .............. ...............70....


Introducti on ............ ..... ._ ...............70....
M materials and M ethods .............. ...............74...
DNA Sequencing and Analysis............... ...............74
Construction of Plasmids............... ...............7
Construction of pSW 1 .................. .......... ..... ........7
Rationally designed mutagenic library (RD Library) creation. .............. .... ........._..75
Growth Curves and Cell Counts ........._._ ...... .... ...............75..
Purifieation of His(6)-wt Taq Polymerase ..........._..__.....__ ....___ ...........7
Incorporation of dyUTP by RD Library ........................ ..... ............7
Selection of Thermostable Mutants Using Water-In-Oil Emulsions .............. ................80
Water-in-oil emulsions ................. ...............80.................
Re-cloning of selected mutants .............. ...............81....
R e sults............... .. .... ...... ..... ......... .............8
Growth Curves and Cell Counts ................. ...............82........... ...
Purifieation of His(6)-wt Taq Polymerase ................. ...............83........... ...
Incorporation of dyUTP by RD Library ..................... ... .......... ...... .......8
Selection and Identification of Thermostable Mutants Using Water-In-Oil
Emul sions ................. ...............84........... ....
Discussion ................. ...............85.................


4 DISTRIBUTION OF THERMOSTABILITY IN POLYMERASE MUTATION SPACE.103


Introducti on ................. ...............103................
M materials and M ethods .............. ...............105...
DNA Sequencing and Analysis............... ...............10
Bacterial Growth Conditions and Strains ................. ...............105........... ...

Synthesis of Triphosphates and Oligonucleotides................... ........106
Random Mutagenic Library (L4 Library) Creation.................. ......... ........0
Incorporation of dNTPs by RD and L4 Libraries at Various Temperatures .................1 08
Incorporation of dyUNTPs by RD Library at Optimal Temperatures ................... .......108
Incorporation of dyUTP and dyTTP by co-Taq Polymerase at Various Melting
T emp erature s ................ ...............110....._._. ....
Re sults ................. ...... ....... ..... ..... .... ..... ..........11
Random Mutagenic Library (L4 Library) Creation. ...............__.. ......_._ ................110
Incorporation of dNTPs by RD and L4 Libraries at Various Temperatures .................11 1











Incorporation of dyUNTPs by RD Library at Optimal Temperatures ................... .......112
Incorporation of dyUTP and dyTTP by co-Taq Polymerase at Various Melting
T emp erature s ................ ...............113....._._. ....
Discussion ................. ...............114...............

5 CONCLUSIONS .............. ...............132....

DNA Helical Structure in the Presence of C-Glycosides ......... ................ ...............132
Polymerase Screen for the Incorporation of C-glycosides .............. ......................133
Taq Polymerase Primer-Extension Assays ...._ ........__.... ....._..............3
Growth and Purification of Taq Polymerase ........._._.... ...___....... ..........13
Creation of co-Taxq Polymerase Mutant Libraries .............. ........ ........ ... ........136
Creation of the Rationally Designed Mutagenic Library (RD Library) ................... ..... 136
Creation of the Random Mutagenic Library (L4 Library) ................. .....................137
Preliminary Studies of the Incorporation of dyUTP by the RD Library ................... ...........137
Incorporation of dNTPs by RD and L4 Libraries at Various Temperatures ........................ 138
Incorporation of dyUTP by the RD Library at Optimal Temperatures .............. ................139
Incorporati on of dyUTP and dyTTP by co- Taq Polymerase at Vari ous Temperature s...... 140
Selection of Thermostable RD Mutants Using Water-In-Oil Emulsions ................... ..........140
Future Experimentation .............. ...............141....

APPENDIX

A SYNTHESIS OF PSEUDOTHYMIDINE AND PSEUDOTHYMIDINE-
CONTAINING OLIGONUCLEOTIDES .............. ...............144....

B PHYLOGENETIC TREES OF FAMILY A POLYMERASES .................... ...............14

C GENETIC CODE AND AMINO ACID ABBREVIATIONS .............. ....................15

LIST OF REFERENCES ................. ...............151................

BIOGRAPHICAL SKETCH ................. ...............158......... ......










LIST OF TABLES


Table page

1-1 Comparison of the structural geometries of A, B, and Z-DNA forms. .............. ..... ..........3 8

1-2 Characteristics of the various polymerase families. ................ ................ ......... .46

2-1 Oligonucleotides used in this study. ............. ...............64.....

3-1 Oligonucleotides used in this study. ............. ...............91.....

3-2 Rationally Designed (RD) Mutant Library. ............. ...............95.....

3-3 Bacterial strains used in this study ................. ...............96........... ..

3-4 Incorporation of dyUTP at 94.0 oC by RD Library. ............. ...............100....

3-5 Mutations present after selection for active polymerases. ................... ...............10

3-6 Breakdown of types of mutations present after selection. ................ ............ .........102

4-1 Additional bacterial strains used in this study. ............. ...............120....

4-2 L4 Mutant Library ................. ...............121...............

4-3 Generation of full length PCR products from dNTPs by individual polymerases from
the rationally designed (RD) Library at the indicated temperatures. .............. ..... ........._.123

4-4 Generation of full length PCR products from dNTPs by individual polymerases from
the randomly generated (L4) Library at the indicated temperatures. .............. ..... ..........124

4-5 Incorporation of dyUTP by RD Library at optimal temperatures ................. ...............126

4-6 Incorporation of dyUTP and dyTTP by co-Taq Polymerase at various temperatures. ..129

C-1 The Genetic Code. ............. ...............150....

C-2 Amino acid abbreviations. ............ .............150......











LIST OF FIGURES


Figure page

1-1 The standard deoxyribonucleotides. ............. ...............37.....

1-2 Puckering of the furanose ring of nucleosides into various envelope forms. ................... .39

1-3 The central dogma of molecular biology. ................ .......................... ..........39

1-4 The six hydrogen bond patterns in an artificially expanded genetic information
sy stem (AEGI S) .............. ...............40....

1-5 The VersantTM branched DNA assay. ............. ...............41.....

1-6 An example of non-standard nucleobases coding for a non-standard amino acid.............42

1-7 Pseudouridine and pseudothymidine. ............. ...............43.....

1-8 The polymerization reaction of deoxyribonucleotides triphosphates catalyzed by
DNA polymerases ................. ...............44.................

1-9 Kinetic steps involved in the nucleotide incorporation pathway. .................. ...............44

1-10 Locations of active site residues in Taq polymerase. ............. ...............45.....

1-11 The staggered extension process (StEP) for rediversification of mutant libraries.............47

1-12 Phage display selection scheme. .............. ...............48....

1-13 General scheme for CSR ................. ...............49........... ...

2-1 A schematic representation of the CD spectra of A- and B-DNA forms. ................... .......62

2-2 The base pairing interactions between a standard A-T base pair and the non-standard
yT-A and yU-A base pairs............... ...............63.

2-3 Representative CD Spectra. ............. ...............65.....

2-4 Depiction of primer-extension assays used in the polymerase screen. .............. ..... ..........66

2-5 Family A polymerase screen. ..........._ ..... ..__ ...............67..

2-6 Family B polymerase screen. .............. ...............68....

2-7 Incorporation of one to twelve consecutive dT, dyT, or dyU residues by Taq
poly m erase. ............. ...............69.....

3-1 A phylogenetic tree of the Family A polymerases ................. ...............89........... .










3-2 Locations of the 3 5 rationally designed (RD) sites in the Taq polymerase structure.......90

3-3 View of the pASK-IBA43plus plasmid. ................ ...............92...............

3-4 View of the pSW1 plasmid. .............. ...............93....

3-5 View of the pSW2 plasmid ................. ...............94...............

3-6 Growth curves, cell counts, and expression of various E. coli TG-1 cell lines. ................97

3-7 Purification and activity of His(6)-wt Taq polymerase. ........._..._.. ...._.._ ........._......98

3-8 Representative gels showing the amount of full-length PCR products generated with
different dNTP/dyUNTP ratios and the indicated polymerases............... ...............9

4-1 Epimerization of 2' -deoxypseudouridine. .........._..._ ......... ...._.._ ..........19

4-2 Representative images of ethidium-bromide stained agarose gels resolving products
arising from PCR amplification using standard dNTPs and three different
polym erases............... ..............12

4-3 Number of active RD and L4 mutants at various temperatures ................. ................. 125

4-4 Generation of full length PCR product at 86.3 oC using dyUTP by the co-Taq
polymerase and the RD polymerase in the SW29 cell line ................. ............. .......127

4-5 Generation of full length PCR product at 94.0 oC and 86.3 oC using dyUTP by the
RD polymerase in the SW8 cell line ................. ...............128........... ..

4-6 Generation of full length PCR product at 86.3 oC by co-Taq polymerase using
various TTP:dyUTP and TTP:dyTTP ratios. ............. ...............130....

4-7 Graphical comparisons of the band densities listed in Table 4-6. ..........._.._ ..........._...13 1

A-1 Synthesis of pseudothymidine precursor. ............. ...............146....

B-1 A seed alignment of the Family A polymerases. ................ ...............147........... .

B-2 Inset of the phylogenetic tree of Family A polymerases (from Fig. 3-1) showing the
location of Taxq polymerase. ........._...._ ...._._. ...............148...

B-3 Inset of the phylogenetic tree of Family A polymerases (from Fig. 3-1) showing the
location of some viral polymerases............... ..............14









LIST OF ABBREVIATIONS

adenosine

artificially expanded genetic information system

ampicillin

ammonium persulfate

adenosine triphosphate

base pair

Bacillus stearothermophilus

cytosine

Curie (1 Ci = 3.7 x 107 Bequerel)

circular dichroism

cell-free extract

colony forming unit

counts per minute

counts

compartmentalized self-replication

dimethyl sulfoxide

deoxyribonucleoside (dA, dG, dC, T, yT, yU, etc.)

deoxyribonucleic acid

deoxyribonucleic acid specific endonuclease

double-stranded nucleic acid chain

1,4-dithio-DL-threitol

Escherichia coli


A

AEGIS

Amp

APS

ATP

bp

Bst

C

Ci

CD

Cfe

efu

CPM

CNT

CSR

DMSO

dN

DNA

DNase I

ds

DTT

E. coli









EDTA ethylendiamino tetraacetate

exo- lacking 3' 5' exonuclease activity

FLP full-length product

G guanosine

HIV human immunodeficiency virus type-1

hr hours

HPLC high performance liquid chromatography

isoC deoxyisocytidine

isoG deoxyisoguanosine

LB Luria-Bertani medium

mmn minutes

M-MuLV moloney murine leukemia virus

mRNA messenger ribonucleic acid

MWCO molecular weight cut-off

NMR nuclear magnetic resonance

NSB non-standard nucleobase

OD optical density

PAGE polyacrylamide gel electrophoresis

PCR polymerase chain reaction

Pfu Pyrococcus furious

PMSF phenylmethylsulfonyl fluoride

PNK polynucleotide kinase

REAP reconstructing evolutionary adaptive paths










ribonucleic acid

ribonucleic acid specific endonuclease

ribosomal ribonucleic acid

reverse transcriptase

sodium dodecylsulfate

seconds

staggered extension processes

thymidine

pseudothymidine

Thermus aquaticus DNA Polymerase I

Tris / borate / EDTA buffer

N,N,N,N-tetramethylethylenediamine

tetracycline

tri s(hy droxymethyl)ami nom ethane

octyl phenol ethoxylate

transfer ribonucleic acid

Thermus thermophilus

uracil

pseudouridine

ultraviolet

wild tyipe


RNA

RNase A

rRNA

RT

SDS

s

StEP

T




Taq

TBE

TEMED

Tet

Tris

Triton X-100

tRNA

Tth

U




UV

wt









Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

DIRECTED EVOLUTION OF DNA POLYMERASES

By

Stephanie Ann Havemann

May 2007

Chair: Steven A. Benner
Major Department: Chemistry

To achieve the long-term goal of the Benner research group to create a synthetic biology

based on an Artificially Expanded Genetic Information System (AEGIS), polymerases that are

able to incorporate non-standard bases (NSBs) into DNA must be identified. In this dissertation,

a polymerase from Thermus aquaticus (Taq Polymerase) was identified that was able to

incorporate non-standard nucleotide analogs that contain a C-glycosidic linkage. This activity

was limited, meaning that the polymerase needed modification to support this goal. Further, we

asked whether sequential C-glycosides destabilized the duplex and altered its structure, to better

understand whether a synthetic biology based on C-glycoside nucleotides was possible.

To this end, two libraries of polymerases were created to identify mutations necessary to

alter the polymerases' ability to withstand high temperatures. One library was created by the

random mutagenesis of the taq gene, the other was rationally designed based on previous studies.

Seventy-four mutants from each library were screened for their ability to generate a full-length

polymerase chain reaction (PCR) product using standard nucleoside triphosphates at various

temperatures; the library of random mutants contained more thermostable polymerases than the

library obtained by rational design. Water-in-oil emulsions were then tested to determine

whether these, as artificial cells, might deliver thermostable polymerase variants from those used









in the screen. This identified difficulties in tools used to analyze the output of the library,

suggesting solutions that will guide future work. We also tested the individual components of

the rationally designed library for their ability to incorporate C-glycoside triphosphates in a PCR.

Structural studies with synthetic DNA containing multiple, consecutive C-glycosides showed no

change in conformation, at least not one that is detectible by circular dichroism.

These results represent a step towards the goal of creating an AEGIS-based synthetic

biology, an artificial chemical system that mimics emergent biological behaviors such as

replication, evolution, and adaptation. In addition, the mutant polymerases created in these

experiments are an inventory of polymerases useful in biotechnology, possibly allowing the

development of new, as well as improving on existing, clinical diagnostic techniques and helping

to facilitate a better understanding of polymerase-DNA interactions.









CHAPTER 1
INTTRODUCTION

What are Nucleic Acids?

Deoxyribonucleic acid (DNA), one of the fundamental constituents of life, serves as a key

component for the storage and transfer of genetic information. It is built from four building

blocks, adenosine, guanosine, cytidine, and thymidine, all of which are comprised of a

nucleobase attached to a 2'-deoxyribose molecule (Fig. 1-1). Similarly, ribonucleic acid (RNA)

is also built from four building blocks, except that thymidine is replaced by uridine and the sugar

moiety is a ribose. When a phosphate group replaces the 5'-hydroxyl group of these molecules,

they become acids that can be linked by their phosphate groups, resulting in the formation of the

backbone of a nucleic acid strand. Genetic information is commonly stored in a double stranded

(ds) helix, which is formed when the nucleobases are paired by hydrogen bonds. These helical

duplex strands are aligned so that the chains are anti-parallel to one another; in other words, one

strand lies in the 5' 3' direction and the complement is in the 3' 5' orientation.

Rules of Complementarity

Watson and Crick proposed that the interactions between nucleobases are governed by two

rules of complementarity: size complementarity and hydrogen-bonding complementarity

(Watson and Crick, 1953a, Watson and Crick, 1953b). Size complementarity means that a large

purine, such as adenosine or guanosine, pairs with a small pyrimidine, like cytosine, thymidine,

or uridine. Hydrogen-bonding complementarity means that hydrogen bond donors from one

nucleobase pair with the hydrogen bond acceptors from another. With these rules, it is expected

that in the formation of nucleic acid duplexes, guanosine must pair with cytosine and adenosine

must pair with either thymidine or uridine.









DNA Helical Conformations

The conformation of a DNA duplex is often assumed to be described using one of three

abstract models: A-DNA, B-DNA, or Z-DNA (Saenger, 1984). The most common form of

DNA found in living organisms is presumed to be the B-DNA helix. A-DNA is the common

helical structure whose geometries are described in Table 1-1. It is also interesting to note that

many other minor helical conformations of dsRNA or dehydrated DNA, but it can also be found

when certain DNA sequences repeat (Ghosh and Bansal, 2003). The only left-handed helix

known is the Z-DNA conformation, which appears to be a characteristic of alternating GC-rich

sequences that may help stabilize DNA during transcription (Rich and Zhang, 2003). Many

other helical conformations of DNA are possible, of course. Indeed, over twenty-six different

forms have been described in the literature to date (Egli, 2004, Ghosh and Bansal, 2003, Saenger,

1984). Nevertheless, for this work, we will reference the A-, B-, and Z-DNA models.

In actuality, the conformation of a DNA molecule must be described by examining the

structure atom by atom. Terms used to abstract the results of such an examination are described

in Table 1-1. Thus, the different types of helices are characterized by different geometries, such

as the number of base pairs per turn, the height of a turn, the rotation per base pair, the size and

depth of the maj or and minor grooves, and the type of sugar pucker. The sugar pucker refers to

the conformation of the sugar, which can exist in one of four envelope forms: C2'-endo, C2'-eXO,

C3'-endo, and C3'-eXO (Fig. 1-2) (Saenger, 1984).

In some cases, helical structures can be transformed from one conformation into another

simply by the modification of the humidity of the environment (for fibers) and/or the

concentrations of salt in the solution (Saenger, 1984). Helical structures can also be changed by

altering the chemical structure of the constituents. The conformation of the sugar pucker can

alter the helical form of the DNA by increasing or decreasing the distances between the










phosphate groups, thereby changing the number of base pairs per turn and the size of the

grooves. The C2'-endo conformation is usually found in B-DNA, while the A-DNA prefers the

C3'-endo pucker. The maj or and minor grooves found in B-DNA can act as binding pockets for

polymerases, since they allow for the presentation of nucleobase hydrogen bond donors and

acceptors (Garrett and Grisham, 1999). The grooves presented by A-DNA are more

symmetrical, making it difficult for polymerases to gain access to these potential hydrogen-

bonding sites (Garrett and Grisham, 1999).

The conformation of a DNA helix can be assessed in several ways. X-ray crystallography

is, of course, the best way to identify the position of individual atoms, with nuclear magnetic

resonance (NMR) emerging as a preferred choice in solution. The general overall conformation

can be estimated, however, by circular dichroism (CD) (Ghosh and Bansal, 2003).

Central Dogma of Molecular Biology

Nucleic acids maintain genetic information inside a cell by means of replication and

transcription; translation uses this genetic information to create proteins. This sequence has been

called the central dogma of molecular biology by Crick (Fig. 1-3) (Crick, 1970). DNA is

transcribed into messenger RNA (mRNA) using RNA polymerases, which is then translated into

proteins. The translation of the mRNA uses a combination of ribosomes, which are composed of

ribosomal RNA (rRNA) and proteins, and transfer RNA (tRNA), which carry amino acids to the

ribosomes. In situations where the genetic material is stored as RNA, such as in viruses, the

information is first converted back into DNA by enzymes known as reverse transcriptases prior

to being translated. DNA can replicate itself by employing enzymes known as DNA

polymerases, and RNA replicates itself using RNA polymerases.

This feature of life raises an obvious question: Which came first, nucleic acids or proteins?

At first glance, the answer appears to be nucleic acids, since proteins cannot store genetic









information. Upon further study, one realizes that without proteins, the genetic material could

not be replicated. One possible answer to this question is that the nucleic acids were once able to

act as both storage molecules and as proteins that could catalyze their own replication.

The discovery of ribozymes and deoxyribozymes lends support to this theory by showing

that nucleic acid molecules are not limited to the ability to store genetic information, they can

catalyze reactions both within their own structure or upon other structures (Muller, 2006,

Emilsson and Breaker, 2002, Paul and Joyce, 2004). Many of these nucleic acid catalysts have

been created using non-standard nucleobases (NSBs) to add additional functionality to the

nucleic acid molecules (Muller, 2006).

What is AEGIS?

Using Watson and Crick' s rules of complementarity and the requirement that the

nucleobases be joined with three hydrogen bonds, it is feasible to create an artificially expanded

genetic information system (AEGIS) containing eight additional base pairs (Fig. 1-4), thereby

expanding the genetic alphabet from four to twelve letters (Switzer et al., 1989, Piccirilli et al.,

1990, Geyer et al., 2003). Since these bases retain the Watson and Crick geometry, they can be

incorporated into growing DNA strands via synthesis, primer-extension experiments, or by the

polymerase chain reaction (PCR), which can subsequently be used in a variety of different

techniques.

Use of AEGIS Components

The importance of AEGIS components has already been illustrated in many ways. It has

been used in clinical diagnostics, to expand the genetic code, to understand DNA and polymerase

interactions, and has even been implicated as a factor for evolution of life on Earth. These

components have also been used in the first successful six-letter PCR reaction, lending support to

the development of a synthetic biology.










The powerful VersantTM branched-DNA assay, used to monitor the viral load of patients

infected with HIV, Hepatitis B, or Hepatitis C viruses, requires the use of at least two non-

standard nucleobases (NSBs) (Collins et al., 1997). This assay uses 5-methyl-2'-

deoxyisocytidine (isoC) and 5-methyl-2'-deoxyi soguanosine (isoG) to decrease the non-specific

binding of a nucleic acid probe (Fig. 1-4), thereby increasing signal amplification relative to

noise by eight-fold over previous systems used (Fig. 1-5) (Huisse, 2004, Collins et al., 1997).

EraGen Biosciences (Madison, WI) is now using these AEGIS components in a similar

multiplexed system to identify newborns with cystic fibrosis (Johnson et al., 2004). These

assays have barely begun to scratch the surface of the potential clinical diagnostic uses of this

expanded genetic alphabet.

The current genetic code uses 64 three-letter codons to encode for the incorporation of 20

canonical amino acids (Appendix C); use of all twelve AEGIS nucleotides would allow for 1728

three-letter codes, and if the AEGIS components were functionalized, the possibilities are

seemingly nearly endless. AEGIS components have been already been used to encode for the

incorporation of non-standard amino acids in ribosome-mediated translation. For example, in

1992 Bain et al. used isoC and isoG in a codon-anti-codon pair to generate peptides containing

the non-standard amino acid L-iodotyrosine (Bain et al., 1992). More recently, Hirao et al. used

the 2-amino-(2-thienyl)purine and pyridine-2-one in a codon-anti-codon pair in an in vitro

transcription study to generate peptides containing 3-chlorotyrosine (Fig. 1-6) (Hirao et al., 2002,

Hirao et al., 2006).

Some of these AEGIS components have also been used in the characterization of the

kinetic parameters of polymerases (Joyce and Benkovic, 2004, Sismour and Benner, 2005), and

in the first six-letter PCR, which was catalyzed by a mutant of the HIV-reverse transcriptase










(Sismour et al., 2004). AEGIS components have also been used to better understand the

interactions between polymerases and DNA (Lutz et al., 1998, Joyce and Benkovic, 2004,

Hendrickson et al., 2004, Delaney et al., 2003). For example, studies have been performed using

variety of different NSBs, such as those lacking minor-groove electrons (Hendrickson et al.,

2004) and those with a C-glycosidic linkage (Lutz et al., 1999), in order to identify

characteristics of nucleobases that are essential for correct incorporation by polymerases.

Problems with AEGIS Components

Although the AEGIS components retain Watson and Crick geometry, it is possible that

some of the features present on the NSBs, such as the absence of minor groove electrons or the

presence of C-glycosidic linkages, may present a challenge to polymerases. The ability of

polymerases to function in the absence of an unshared pair of electrons in the minor grove of

dsDNA, as seen in the pyDAD-puADA base pair (Fig. 1-4), was previously examined by

Hendrickson et al (Hendrickson et al., 2004). In those studies, Hendrickson discovered that the

presence of electrons in the minor grove may only be necessary for exonuclease activity of

polymerases, and not for incorporation (Hendrickson et al., 2004). This, however, presents a

problem when trying to incorporate NSBs with efficiency and fidelity, since the polymerase has

no proofreading ability. Lutz et al. examined the ability of polymerases to function in the

presence of nucleosides exhibiting a C-glycosidic linkage, a carbon-carbon bond between the

nucleobase and sugar as seen in the pyDAD, pyAAD, and pyADD nucleosides (Fig. 1-4) (Lutz et

al., 1999). He also reported that polymerases with exonuclease activity were less likely to accept

the C-glycoside than were those lacking the proofreading ability, making replication with fidelity

difficult.










C-Glycosides

An N-glycoside is a nucleoside with a carbon-nitrogen bond linking the nucleobase to the

sugar; all standard nucleosides are therefore N-glycosides. However, three of the AEGIS

nucleosides use a carbon-carbon bond to join the nucleobase to the sugar, making these

nucleosides C-glycosides by definition (Fig. 1-4). This carbon-carbon linkage can cause a

structural change in the sugar pucker of the nucleoside, making it a C3-endo pucker instead of a

C2--endo pucker, possibly changing the form of the DNA from B-DNA to A-DNA (Davis, 1995).

Wellington and Benner detailed strategies by which these molecules can be chemically

synthesized in a current review article (Wellington and Benner, 2006). C-glycosides have also

been found in vivo in various types of RNA, however (Charette and Gray, 2000). These C-

glycosides are of great interest, not only because of their presence in the AEGIS nucleosides, but

also for their clinical uses; many naturally occurring C-glycosides are antibiotics or antiviral

agents (Michelet and Genet, 2005, Zhou et al., 2006). More generally, C-glycosides can be used

in gene therapy (Li et al., 2003, Li et al., 2004).

Pseudouridine

Pseudouridine (yU), the 5-ribosyl isomer of uridine (Fig. 1-7A), is present in both tRNA

and rRNA and is vital to the fitness of organisms (Raychaudhuri et al., 1998, Charette and Gray,

2000). This modified nucleoside, found in all three domains of life, was the first naturally

occurring NSB discovered (Charette and Gray, 2000), and is introduced into the RNA sequences

by the posttranscriptional modification of uridine (Argoudelis and Mizsak, 1976, Grosjean et al.,

1995). Pseudouridine has been reported to have a propensity to adopt a syn conformation around

the glycosyl bond when in solution, although the data supporting this are questionable; it is,

however, found only in the anti conformation when in a nucleic acid strand (Fig. 1-7B) (Lane et

al., 1995, Neumann et al., 1980). The anti conformation allows the coordination of a water









molecule between the 5' phosphate group of the yU residue, the 5' phosphate group of the

preceding residue, and the N1-H of the yU residue (Fig. 1-7C) (Arnez and Steitz, 1994). The

coordination of this water molecule results in an enhanced base stacking ability and a reduced

conformational flexibility of the RNA molecule, thus increasing the local rigidity of the RNA

(Charette and Gray, 2000, Davis, 1995).

Pseudouridine is thought to play several roles in Nature, as described in the review by

Charette and Gray (Charette and Gray, 2000). In tRNA, it is thought to play a critical role in the

binding of the tRNA to the ribosome during translation because it stabilizes the tRNA structure,

allowing tighter binding to occur, thereby increasing translational accuracy. Pseudouridine also

has been implicated in alternative codon usage in tRNA, and as a player in the folding of rRNA

and ribosome assembly by its contributions to RNA stability.

Pseudothymidine

Pseudothymidine (qT), or 1 -methylpseudouridine (Fig. 1-7D), was originally isolated from

Streptomyces platensis in 1976 by Argoudelis and Mizsak (Argoudelis and Mizsak, 1976). This

naturally occurring C-glycoside, found in RNA, is also thought to be created by a

posttranscriptional modification ofuridine (Limbach et al., 1994). The first successful in vitro

transcription of yT was performed by Piccirilli et al. using T7 RNA polymerases with a template

containing yT and standard ribonucleosides (Piccirilli et al., 1991). Further studies, conducted

by Stefan Lutz, observed the ability of DNA polymerases to not only incorporate this NSB into a

growing DNA strand in primer-extension assays, but also challenged a polymerase to use yT in

a PCR reaction that required the successful incorporation of up to three consecutive dyT

residues. (Lutz et al., 1999). Since then, no further studies requiring the incorporation of this C-

glycoside into nucleic acids have been performed.









DNA Polymerases

DNA polymerases are the enzymes that perform template directed DNA synthesis from

deoxyribonucleotides and an existing DNA template. These enzymes, essential for the

replication of the genetic information carried in all living organisms, were originally discovered

in 1956 by Arthur Kornberg (Kornberg et al., 1956), for which he was awarded a Nobel Prize in

1959. The synthesis of the complementary DNA strand always occurs in the 5' 3' direction

through the addition of incoming nucleotide's triphosphate group onto the 3'-OH group of the

preceding nucleotide, releasing a pyrophosphate group in the process (Fig. 1-8) (Garrett and

Grisham, 1999, Lewin, 1997). After the successful replication of a DNA strand, the new strand

is complementary to the template (leading) strand, and identical to the lagging strand. Since all

DNA polymerases function in this manner, it is easy to comprehend that their structures are also

generally conserved.

General Structure of Polymerases

All DNA polymerases share a common structural framework that is commonly referred to

as a right hand comprised of three subdomains: the fingers, the palm, and the thumb. The fingers

domain is responsible for nucleotide recognition and binding, the thumb domain binds the DNA

substrate, and the palm domain is the catalytic center of the protein. It appears that this

framework is the same in all DNA polymerase families. It is not clear whether this represents

convergent or divergent evolution; there is no sequence similarity between, for example, Family

A and Family B polymerases that makes a case for their distant homology (Rothwell and

Waksman, 2005). In 1985, the laboratory of Thomas Steitz first solved the crystal structure of

the Klenow fragment, the C-terminal domain of the Escherichia coli DNA Polymerase I (Ollis et

al., 1985). Since then, the crystal structure of many different polymerases have been solved, not

only in their nascent states, but some with DNA or dNTPs and pyrophosphate bound to the









catalytic site (Rothwell and Waksman, 2005, Beese et al., 1993b, Beese et al., 1993a). It has also

been determined that during polymerization, divalent metal cations, such as Mg2+, r

coordinated in polymerase active sites to help activate the 3'-OH group for attack on the

incoming nucleotide (Steitz, 1999).

Features of polymerases that are not conserved throughout the families include both the 5'

-3' and 3' 5' exonuclease subdomains that allow for proofreading, and other subunits used for

different types of repair. The exonuclease subdomains, when present, are the proofreading

centers of the polymerase. The 5' 3' exonuclease activity is usually involved in nick

translation, or the synthesis of DNA at a location where there is a break in the phosphodiester

bond of one strand (Perler et al., 1996). The 3' 5' exonuclease activity is the true

"proofreading" activity of the polymerase, responsible for the excision of a newly synthesized

mismatch (Perler et al., 1996).

The process by which a DNA polymerase adds an incoming nucleotide onto the 3'-

hydroxyl group of the preceding nucleoside involves many steps, which are only now being fully

understood. Figure 1-9 details the kinetic steps involved in this addition (Patel and Loeb, 2001,

Rothwell and Waksman, 2005). In Step 1, the polymerase (E) binds to the DNA primer:template

complex (TP); the polymerase then binds the incoming nucleotide triphosphate (dNTP) in Step 2.

The polymerase then undergoes a conformational change (E') in Step 3 that brings the various

components into positions that can support the chemistry of this reaction; this is the rate-limiting

step of polymerization. The polymerase performs the addition of the nucleotide, remains

completed with the pyrophosphate, and undergoes another conformational change in Step 4.

The pyrophosphate group is released in Step 5; in Step 6, the polymerase can dissociate from the

DNA or translocate the substrate for another round of synthesis.










Polymerase Families

Based on sequence similarity, seven major families of homologous polymerases have been

classified (Patel and Loeb, 2001, Rothwell and Waksman, 2005): A, B, C, D, X, Y, and RT. The

most extensively studied are those of the Family A and Family B polymerases, but Table 1-2

identifies characteristics and representative polymerases of all seven families. Polymerases

behave differently not only between the families, but also within the families themselves, based

on their ability to repair, their processivity, and their fidelity. Processivity is defined as the

ability of the polymerase to continue catalysis without dissociating from the DNA (Kelman et al.,

1998); this is important when dealing with AEGIS components since it has been previously

shown that polymerases tend to "pause," or fall off the DNA, after the incorporation of a NSB

(Lutz et al., 1999, Sismour and Benner, 2005). Fidelity is the ability of the polymerase to select

and incorporate the correct complementary nucleoside opposite the template from a pool of

similar structures (Beard et al., 2002, Cline et al., 1996); this is important to AEGIS components

to guarantee that the newly replicated DNA contains the correct sequence.

Family A polymerases, which contain some of the prokaryotic, eukaryotic, and viral

polymerases, are best known for the E. coli DNA Pol I, Thermus aquaticus (Taq) Pol I, and the

T7 DNA polymerases (Perler et al., 1996). The E. coli DNA Pol I and Taq polymerases are

known as repair polymerases since they contain the 5' 3' exonuclease domains, while the T7 is

known as a replicative polymerase since it has a strong 3' 5' exonuclease activity (Rothwell

and Waksman, 2005, Kunkel and Bebenek, 2000).

Family B polymerases contain representatives from prokaryotic, eukaryotic, archaeal, and

viral polymerases, this is the only family of polymerases with members from all four of these

populations (Patel and Loeb, 2001). This family of polymerases is predominately involved with

DNA replication, as opposed to repair, and exhibit extremely strong 3' 5' exonuclease









activities. In eukaryotes, these polymerases carry out the replication of chromosomal targets

during cell division. The most well known of the archaeal polymerases from this family,

Pyrococcus furious (Pfu) DNA Polymerase, has the lowest known error rate of all thermophilic

DNA polymerases that can be used for PCR amplification mutationall frequency/bp/duplication

is 1.3 x 10-6 ) (Hogrefe et al., 2001, Cline et al., 1996).

Family C polymerases contain the bacterial chromosomal replicative polymerases, and

Family D polymerases are suggested to act as archaeal replicative polymerases (Patel and Loeb,

2001, Rothwell and Waksman, 2005). Family X polymerases are found in eukaryotes, and are

believed to play a role in the base-excision repair pathway that is important for correcting abasic

sites in DNA (Patel and Loeb, 2001, Rothwell and Waksman, 2005). Family Y polymerases,

found in prokaryotes, eukaryotes, and archaea, are part of a replicative complex, and function by

recognizing and bypassing lesions created by UV damage so that replication of the DNA is not

stalled (Zhou et al., 2001, Rothwell and Waksman, 2005). The last characterized family of

polymerases, the reverse transcriptases (RT), found in eukaryotes and viruses, catalyze the

conversion of RNA into DNA, but they can also replicate DNA templates as well (Najmudin et

al., 2000, Goldman and Marcy, 2001, Rothwell and Waksman, 2005).

Taq Polymerase

Thermus aquaticus, an organism found in thermal springs, hydrothermal vents, and even

hot tap water, was first isolated by Brock and Freeze in 1969 (Brock and Freeze, 1969). Taq

polymerase, a 94 kDa protein, was isolated from this organism by Chien et al. in 1976 (Chien et

al., 1976), and belongs to the Family A polymerases. This thermophilic polymerase has 5' 3'

exonuclease activity, but lacks the 3' 5' exonuclease activity required for the proofreading

ability, therefore giving this polymerase a low replication fidelity of about 8 x 10-6 mutationall

frequency/bp/duplication) (Cline et al., 1996). However, Taxq is fairly processive with an









average incorporation of 40 nucleotides before dissociating from the DNA, and it has a quick

extension rate of about 100 nucleotides per second (Pavlov et al., 2004, Perler et al., 1996).

Taq polymerase, one of the most extensively studied polymerases, was the first

thermostable polymerase to be used in PCR; thereby eliminating the need to add additional

polymerase after every round of PCR as was necessary when E. coli DNA Pol I was used for

thermocycling experiments (Saiki et al., 1988). In 1995, the Steitz laboratory was the first to

crystallize nascent Taxq polymerase (Kim et al., 1995), and have since crystallized the polymerase

with DNA at the active site (Eom et al., 1996). These, and other studies, have allowed

researchers to identify the active site of the polymerase and the specific residues which contact

the DNA, the incoming nucleotides, or are involved in metal ion chelation (Eom et al., 1996, Fa

et al., 2004, Li et al., 1998b, Li et al., 1998a, Kim et al., 1995, Suzuki et al., 1996).

Due to Taq polymerase's lack of proofreading ability, it has been identified previously as a

candidate for replication of DNA containing non-standard nucleosides (Lutz et al., 1999). Taq

has been used to incorporate and/or replicate NSBs exhibiting C-glycosidic linkages (Lutz et al.,

1999), NSBs lacking an unshared pair of electrons in the minor groove (Hendrickson et al.,

2004), and nonpolar nucleoside isoteres (Morales and Kool, 2000). Directed evolution has

created Taq polymerase mutants that have been used to incorporate an even larger repertoire of

NSBs (Henry and Romesberg, 2005).

Directed Evolution

A recent review by Griffiths and Tawfik discussed the application of techniques developed

for the in vitro evolution of various proteins to increase their rate of catalysis, perform different

functions, and accept new substrates (Griffiths and Tawflk, 2006). These procedures all select

for desired enzyme characteristics from pools of millions of genes with schemes designed to link

genotype to phenotype. This provides a great advantage over the older methods of screening









mutant library members individually, because these approaches use a "one-pot" technique that

allows for the testing of a large number of variants (2 x 10s or more) at once (Griffiths and

Tawfik, 2006)

Other common features of these directed evolution systems include the development of a

mutagenic library, expression of this library, a high-throughput assay designed to identify

individuals with the desired characteristics, and a means for reshuffling mutants between rounds

of selection (Brakmann, 2005, Lutz and Patrick, 2004, Arnold and Georgiou, 2003a). The most

challenging part of any selection experiment is the design of the technique that will be used to

isolate variants with the desired characteristics (Brakmann, 2005), because "you get what you

select for." In other words, scientists may want to select for a specific characteristic of an

enzyme, but if the technique is not designed correctly, they may end up selecting for an enzyme

with a different characteristic.

Mutagenic Libraries

The first step in any directed evolution experiment is to create a large library of mutant

enzymes. There are many ways to accomplish this task, varying from the rational design of

mutations at selected sites to the random mutagenesis of residues along the length of the

sequence. Francis Arnold co-authored a book with George Georgiou that gave detailed

instructions on how to perform nineteen different techniques to generate libraries for directed

evolution (Arnold and Georgiou, 2003b). This book gave attention to standard error-prone PCR

techniques that use MnCl2 inStead of MgCl2 in PCR reactions catalyzed by a polymerase with

low fidelity, such as Taq, and to methods that could be used for the rediversification of libraries

between rounds of selection, such as the staggered extension process (Fig. 1-11).

An important consideration when creating a true random library of mutants is the bias of

some techniques to create certain transitional or transversional mutations preferentially.









Transitional mutations occur when one purine-pyrimidine pair is replaced with another purine-

pyrimidine pair; this creates four possible transition mutations with the standard nucleotides.

Transversional mutations occur when a purine-pyrimidine pair is replaced by a pyrimidine-

purine pair, creating eight possible transition mutations when using standard dNTPs. When

creating an unbiased library, sometimes it is necessary to use two or more methods in order to

allow for the same approximate percentage of transitional and transversional mutations to occur.

The use of the MnCl2 and Taxq polymerase in an error-prone PCR allows for all four

transitions and all eight transversions to occur, however the A-T to T-A transition and A-T to G-

C transversion tend to be more prevalent when using this technique (Vartanian et al., 1996, Lin-

Goerke et al., 1997, Arnold and Georgiou, 2003b). Biases such as this can be altered by

increasing or decreasing the concentrations of some of the nucleotides in the reaction. This

technique can be performed on a low budget, and can be easily modified to increase or decrease

the frequency of mutagenesis by altering the concentration of dNTPs or the number of PCR

cycles (Arnold and Georgiou, 2003b).

Another method of creating mutagenic libraries is by rational design. The random library

approach generates a large, diverse repertoire of polymerases, but a low number of active clones.

Guo et al. has shown that at least one-third of all random amino acid changes will result in the

inactivation of a protein (Guo et al., 2004), so it is likely that a protein with more than a few

random amino acid changes will be inactive. Furthermore, Guo et al. also calculated that

approximately 70% of random mutations in the active sites of polymerases will result in an

inactive polymerase variant (Guo et al., 2004). A desirable library for directed evolution

experiments would optimally have a large, diverse number of proteins with a high number of

active clones (Hibbert and Dalby, 2005). To generate a library such as this, the reconstructing









evolutionary adaptive paths (REAP) approach can be used (Gaucher, 2006); this approach allows

researches to modify only the sites where functional divergence occurred within a family of

polymerases. In other words, sites that, in the historical evolution of the polymerase, had a split

"conserved but different" pattern of evolutionary variation, are chosen for modification. In

theory, this technique has a high probability to generate new activities and functions (Gaucher,

2006).

Systems of Directed Evolution

Some of the more common methods used in directed evolution experiments include phage

display (Fa et al., 2004), ribosome display (Yan and Xu, 2006), complementation (Arnold and

Georgiou, 2003a), and compartmentalized self-replication (CSR) (Ghadessy et al., 2001, Tawfik

and Griffiths, 1998). Two of these techniques, phage display and CSR (Henry et al., 2004), were

applied to the evolution of polymerases to increase thermostability (Ghadessy et al., 2001),

permit activity in the presence of an inhibitor (Ghadessy et al., 2001), and allow incorporation of

non-standard bases (Ghadessy et al., 2004, Fa et al., 2004, Xia et al., 2002). Both phage display

and CSR systems have been successfully used to evolve Taq polymerase in vitro (Ghadessy et

al., 2001, Ghadessy et al., 2004, Fa et al., 2004).

Phage display

The phage display directed evolution system was developed by attaching a fragment of

Taq polymerase and an oligonucleotide primer substrate to the exterior of a phage particle via its

minor phage coat protein pIII (Fa et al., 2004). Since there are approximately five of these coat

proteins per phage, all localized to one area on the phage coat, researchers were able to

successfully link phenotype to genotype (Fig. 1-12). The mutant polymerases were challenged

to add non-standard nucleosides and one biotinylated nucleoside onto the oligonucleotide primer

by template directed synthesis; those polymerases with the ability to do so were immobilized on










streptavidin beads, and were recovered. The genes encoding the active polymerases were

identified by sequencing, or rediversified and shuttled into another round of selection. This

technique, while excellent for identifying polymerase mutants able to incorporate a small number

of non-standard bases, does not require the polymerase to perform a PCR; this would not be

conducive to the design of an AEGIS based synthetic biology that requires the polymerase to

replicate its own gene.

Compartmentalized self-replication

Compartmentalized self-replication makes use of water-in-oil emulsions as a way to link

genotype to phenotype, and requires polymerase mutants to replicate their encoding gene in a

PCR reaction (Tawflk and Griffiths, 1998, Ghadessy et al., 2004, Ghadessy et al., 2001,

Williams et al., 2006), theoretically an excellent technique for developing polymerases for a

synthetic biology. A library of polymerase gene variants is cloned and expressed in cells (Fig. 1-

13A); the bacterial cells containing the polymerases and their encoding genes are then suspended

in aqueous droplets in an oil emulsion. Each of these droplets, on average, contains one cell as

well as the primers and dNTPs/NSBs required for PCR (Fig. 1-13B). The thermostable

polymerase is released from the cell during the first denaturing cycle of PCR, allowing

replication of its encoding gene to proceed. Poorly adapted polymerases fail to replicate their

encoding gene, while better-adapted polymerases succeed in replication (Fig. 1-13C). The

resulting polymerase genes are then released from emulsions by extraction with ether; those

encoding the most active polymerases dominate these clones. A run-off PCR using standard

nucleotides prepares the DNA for recloning, which can then be subj ected to another cycle of

selection (Fig. 1-13E).

CSR has been previously used to generate Taq polymerase variants that are more

thermostable (Ghadessy et al., 2001), have an increased resistance to inhibitors (Ghadessy et al.,










2001), and are able to incorporate various non-standard bases (Ghadessy et al., 2004). More

recently, Philipp Holliger and co-workers, who originally performed the aforementioned

selections, have modified this technique to change a selected region of the polymerase sequence,

and replicate that region in CSR reactions (Ong et al., 2006). This short-patch

compartmentalized self-replication reaction (spCSR) has already been used to develop Taq

polymerase variants able to function with both NTPs and dNTPs, and variants that are able to

incorporate NSBs with 2'-substitutions. This technique allows the researcher to mutate only the

active site of the polymerase, and then challenges the polymerase to amplify the region encoding

the active site; this makes it easier for polymerases with the ability to incorporate NSBs, but who

lack the catalytic efficiency and processivity, to be isolated from a pool of mutants. By reducing

the stringency of the initial selections, more clones can be isolated with the desired traits;

catalytic efficiency and processivity of the polymerase can be selected for later using the

polymerase sequence of the desired variant under normal CSR conditions.

Research Overview

To create an AEGIS, the first step should be to create or identify polymerases with the

ability to incorporate multiple, consecutive NSBs into a growing strand of dsDNA, efficiently

and faithfully. Rather than challenging a polymerase with a gamut of NSBs containing different

unique features, we decided to focus on one unique characteristic of AEGIS nucleosides, the C-

glycosidic linkage. Previous studies have shown that polymerases have a difficult time

incorporating the non-standard base pairs containing a C-glycosidic linkage (Switzer et al., 1993,

Sismour et al., 2004), therefore representative C-glycosides, 2' -deoxypseudouridine (dyU) and

2' -deoxypseudothymidine (dyT), that could base pair with a canonical nucleotide, in order to

decrease the strain on the polymerase, were selected for study (Lutz et al., 1999).









The research presented here began with the determination of the effect of multiple,

sequential C-glycosides on duplex DNA structure, to better understand the obstacles a

polymerase would have to overcome in order to incorporate bases exhibiting C-glycosides.

Next, a screening of a variety of Family A and Family B polymerases, identified Taxq as a

polymerase that exhibited a limited ability to incorporate non-standard bases that contain a C-

glycosidic linkage. However, further modification of the protein sequence of this enzyme was

needed to identify a mutant Taq polymerase with an increased ability to incorporate multiple,

sequential C-glycosides NSBs more efficiently.

To achieve this, the second part of this dissertation focused on the creation of a rationally

designed (RD) library of 74 mutant Taq polymerases. Variants were screened for the ability to

incorporate dyU in a PCR amplification of their encoding gene. None of these variants were

shown to produce more full-length PCR product than the wild tyipe Taq polymerase. Only 18

variants showed any activity at all in this first test, even with standard dNTPs, under these

reaction conditions. A rationally designed library was then used to perform an initial selection,

by using water-in-oil emulsions to select for the active mutant polymerases we identified in our

initial screen.

It was postulated that the low number of active variants in our RD library was due to a

decrease in the thermostability of the enzyme. After altering the PCR reaction conditions to test

this hypothesis, we were able to identify 33 active mutant polymerases in this library. Since this

library was rationally designed, it was interesting to speculate as to whether a randomly created

library of polymerase clones would tend to have increased or decreased thermostability when

compared to the number of active clones in our RD library. A random library (L4) was created

for this purpose, and was screened for activity at various temperatures in PCR reactions; 39









clones were found to be active. This comparison of the thermostability of the two libraries

shows that the randomly created library has an enhanced ability to retain polymerase

thermostability when compared to our rationally designed library.

The RD library was designed to identify mutants able to incorporate non-standard bases,

and not to have a high degree of thermostability. Optimal temperatures for function in a PCR

were determined for each of the RD variants, and the mutants were then screened for their ability

to incorporate various concentrations of dyU at that optimal temperature. One mutant in the

pSW27 plasmid, containing the A597S, A740R, and E742V residue changes, was identified with

the ability to generate, on average, 72% more product at all dyU concentrations tested, than wt

Taq polymerase at a temperature of 86.3 oC.

While dyU is a C-glycoside with the ability to pair with 2'-deoxyadenosine, it has been

shown to epimerize (Wellington and Benner, 2006, Cohn, 1960, Chambers et al., 1963). Since

dyT cannot epimerize, due to the presence of the extra methyl group, we performed a

comparative analysis between wt Taq polymerases' ability to cope with dyU and dyT in various

concentrations and at different temperatures in a PCR. Results indicated that it may be the

epimerization of the nucleotide hindering the incorporation of dyU, and therefore it should not

be used as a model C-glycoside for directed evolution studies.

These results presented in this work represent a significant step towards the long-term goal

of creating an AEGIS-based synthetic biology. In addition, the repertoire of mutant polymerases

designed and created in these experiments will assist in creating an inventory of polymerases

useful in biotechnology, possibly allowing the development of new, as well as improving on

existing diagnostic techniques and helping to facilitate a better understanding of polymerase-

DNA interactions.






















OH H OH H
2'-deoxyadenosine 2'-deoxyguanosine


OH H OH H
2'-deoxyvthymidine 2'-deoxyvcytosine


Figure 1-1.


The standard deoxyribonucleotides. The nucleobases pair based on the two rules of
complementarity: hydrogen-bonding complementarity, when the hydrogen bond
donor from one nucleobase pairs with the hydrogen bond acceptor from another, and
size complementarity, when a large purine (top row) pairs with small pyrimidine
(bottom row) (Watson and Crick, 1953a, Watson and Crick, 1953b). Therefore, 2'-
deoxyadenosine j oins with 2'-deoxythymidine and 2'-deoxyguanosine j oins with 2'-
deoxycytosine. When a phosphate group replaces the 5'-hydroxyl group of these
molecules, they become acids and can be linked by their phosphate groups to create
a DNA strand.










Table 1-1. Comparison of the structural geometries of A, B, and Z-DNA forms.

Geometry A-DNA B-DNA Z-DNA

Helical Sense Right-handed Right-handed Left-handed


Helix diameter 2.6 nm 2.0 nm 1.8 nm


Repeating unit 1 base pair 1 base pair 2 base pairs


Rotation per base pair 340 360 600/2


Rise per base pair 0.256 nm 0.338 nm 0.38 nm


Base pairs per turn 11 10 12


Pitch per turn of helix 2.82 nm 3.38 nm 4.56 nm

Very narrow and Very wide and
Major Groove Flat
very deep deep

Very broad and Very narrow and
Mmnor Groove Narrow and deep
very shallow deep

C: C2'-endo &
Sugar Pucker C3'-endo C2'-endo
G: C2'-eXO
*Data adapted from Saenger and Garrett & Grisham (Saenger, 1984, Garrett and Grisham, 1999).











B) c's

3'


Figure 1-2.


Puckering of the furanose ring of nucleosides into various envelope forms. In the
envelope form, four of the five atoms are coplanar, the remaining atom departs this
plane: A) a C2--eXO sugar pucker, B) a C2'-endo sugar pucker, C) a C3--eXO sugar
pucker, and D) a C3-endo sugar pucker. B-DNA has a C2 -endo pucker, while A-
DNA exhibits a C3 -endo pucker (Saenger, 1984).


T rans cr iption
DNA~ dep enders
RNA Polym erases






Rever se T rans cr iption
RNA depndernDNA
Polymerases or Reverse
Tranisciptases


Replication,
RNAL dependent
RNAPolyrm rases
or eplicans





RNA


mRNA tRNA anid
riba somes






Pr otein


Replic~atin ',(., .
DNAdpedn.
DNA1Polymerases~ --- NA


Figure 1-3. The central dogma of molecular biology (Lewin, 1997, Crick, 1970). Genetic
material, in the form of DNA, is first transcribed into RNA and then is translated
into proteins. On the occasion that genetic material is stored as RNA, it first
undergoes reverse transcription to create DNA before it is shuttled back into the
system.















V aminoA





donor
H\ acceptor
donor



pu-DAD




N~ isoG


donor
acceptor
acceptor


CR

py-DAA pu-ADD


py-ADA


acceptor
acceptor
donor









isoC R

py-AAD




acceptor
acceptor
donor


donor
acceptor
donor


donor

ace or



pu-DDA


py-DAD pu-ADA


acceptor
donor
donor


py-AAD pu-DDA


py-ADD pu-DAA


Figure 1-4.


The six hydrogen bond patterns in an artificially expanded genetic information
system (AEGIS). These patterns are constrained by Watson and Crick's rules of
complementarity and by the requirement that the nucleobases be joined by three
hydrogen bonds (Switzer et al., 1989, Piccirilli et al., 1990, Geyer et al., 2003,
Benner, 2004, Watson and Crick, 1953a, Watson and Crick, 1953b). Purines are
denoted by "pu," pyrimidines by "py," hydrogen-bond acceptors by "A," hydrogen
bond donors by "D," and R indicates the point of attachment of the backbone. Note
the presence of a C-glycosidic linkage in the pyDAD, pyADD, and pyDDA
nucleotides.




















ratio


Solid
Support


SAnalyte DNA
- Capture Strand
-Branched DNA
NSB-containing Duplex


Figure 1-5. The VersantTM branched DNA assay. This assay exploits the pairing of non-
standard bases (NSBs) to reduce the signal to noise ratio 8-fold over a previous
version of the assay that did not use NSBs (Huisse, 2004, Collins et al., 1997). The
branched DNA assay is used to monitor the viral load counts of patients with the
HIV, Hepatitis B, or Hepatitis C viruses (Collins et al., 1997).











A) -pyridin-2-one (y) B) 3' 5
H ~DNA -sTC-
Transcription
S 5' 3'
mRNA yACP~I-
/N rasato
,H Ribose
N ~N tRNA


QN H ,

Ribose
2-amino-6-(2-thienyl)purine~s)







o = -O



3-chlorotyrosine


Figure 1-6. An example of non-standard nucleobases coding for a non-standard amino acid.
This shows the transcription and translation (seen in B) of the non-standard base pair
(seen in A and denoted as s and y) to generate a protein containing the non-standard
amino acid 3-chlorotyrosine. This picture is adapted from Hirao et al (Hirao et al.,
2002, Hirao et al., 2006).
























B) K K D) .
HN NH HN NH N NH

O O'r -O



HO H HO H HO H
OH OH OH OH OH OH

Figure 1-7. Pseudouridine and pseudothymidine. A) This naturally occurring C-glycoside,
found in RNA, is thought to be created by a posttranscriptional isomerization of
uridine (Argoudelis and Mizsak, 1976, Grosjean et al., 1995). B) Pseudouridine has
a propensity to adopt a syn conformation around the glycosyl bond when in solution,
but it is only found in the anti conformation when in a nucleic acid strand (Lane et
al., 1995, Neumann et al., 1980). C) The anti conformation allows for the
coordination of a water molecule between the 5' phosphate group of the yU residue,
the 5' phosphate group of the preceding residue, and the N1-H of the yU residue
(Arnez and Steitz, 1994). The coordination of this water molecule results in an
enhanced base stacking ability and a reduced conformational flexibility of the RNA
molecule, thus increasing the local rigidity of the RNA (Charette and Gray, 2000,
Davis, 1995). D) The structure of pseudothymidine (1 -methylpseudouridine). This
naturally occurring C-glycoside, found in RNA, is also thought to be created by a
posttranscriptional modification ofuridine (Limbach et al., 1994).















o.,
0-
HO O\


O


Figure 1-8.


The polymerization reaction of deoxyribonucleotides triphosphates catalyzed by
DNA polymerases. The triphosphate of the incoming group is linked to the 3'-
hydroxyl group of the preceding nucleoside, releasing a pyrophosphate in the
process; therefore DNA synthesis requires synthesis of new molecules in the 5' 3'
direction (Garrett and Grisham, 1999).


1 2 3 4 5 6
E + TP E -TP E -TP-dNTP E' -TP-dNTP E -TP z-PP, E -TP z+PPi E -TP,


Figure 1-9.


Kinetic steps involved in the nucleotide incorporation pathway. The kinetic steps
involved in the addition of a nucleotide onto a growing DNA strand (Patel and Loeb,
2001, Rothwell and Waksman, 2005). In Step 1, the polymerase (E) binds to the
DNA primer:template complex (TP); the polymerase then binds the incoming
nucleotide triphosphate (dNTP) in Step 2. The polymerase then undergoes a
conformational change (E') in Step 3 that brings the various components into
positions that can support the chemistry of this reaction; this is the rate-limiting step
of polymerization. The polymerase performs the addition of the nucleotide, remains
completed with the pyrophosphate, and undergoes another conformational change in
Step 4. The pyrophosphate group is released in Step 5; in Step 6, the polymerase can
dissociate from the DNA or translocate the substrate for another round of synthesis.









1
51
101
151
201
251
301
351
401
451
501
551
601
651
701
751
801


MRGMLPLFEP
LLKALKEDGD
ELVDLLGLAR
DRIHALHPEG
EKTARKLLEE
DLPLEVDFAK
PPEGAFVGFV
LLAKDLSVLA
EAGERAALSE
LDVAYLRALS
PAIG G
DLIHPRTGRL
EEGWLLVALla
REAV PLMRig
PKVRAWIEKT
MPV GTAADL
RLAKEVMEGV


KGRVLLVDGH
AVIVVFDAKA
LEVPGYEADD
YLITPAWLWE
WGSLEALLKN
RREPDRERLR
LSRKEPMWAD
LREGLGLPPG
RLFANJLWGRL
LEVAEEIARL
K ~AAVLE
HTRFNJQT
YSQIELRVLA
AAK3IN VL
LEEGRRRGYV
MKLAMVKLFP
YPLAVPLEVE


HLAYRTFHAL
PSFRHEAYGG
VLASLAKKAE
KYGLRPDQWA
LDRLKPAIRE
AFLERLEFGS
LLALAA~ARGG
DDPMLLAYLL
EGEERLLWLY
EAEVFRLAGH
ALREAHPIVE
TG L Pg
HLSGDENJLIR
~GMAHPLSQ
ETLFGRYV
RLEEMGARML
VGIGEDWLSA


KGLTTSRGEP
YKAGRAPTPE
KEGYEVRILT
DYRALTGDES
KILAHMDDLK
LLHEFGLLES
RVHRAPEPYK
DPSNJTTPEGV
REVERPLSAV

KILQ CELT
LQ TPL
VFQEGRDIHT
ELAIPYEEAQ
PDLEARV SV
Le~I LVLE
KE


VQAVYGFAKS
DFPRQLALIK
ADKDLYQLLS
DNJLPGVKGIG
LSWDLAKVRT
PKALEEAPWP
ALRDLKEARG
ARRYGGEWTE
LAHMEATGVR
ERVLFDELGL
L IDDPLP
GQRIRRAFIA
ETASWMFGVP
AFIERYFQSF
rAAg FAF
APKERAEARA


Figure 1-10. Locations of active site residues in Taxq polymerase. Residues shown in blue are
involved in contacting the DNA during polymerization; those shown in red indicate
residues involved in metal ion coordination (Eom et al., 1996, Fa et al., 2004, Li et
al., 1998b, Li et al., 1998a, Kim et al., 1995, Suzuki et al., 1996).















Feature A B C D X Y RT
Prokarvotes,
Domin Cntinng roarotsEukaryotes, Archaea, Prokaryotes Archaea Eukaryotes PrkroeEukaryotes, Viruses
Polymerase Eukaryotes, Viruses Eukaryotes, Archaea
Viruses

E colz DNA Pol I; HIV-RT;
Representative Pfu DNA Pol I; E. colz DNA Pol IV;
Taq Pol I; E. colz Pol III(a) Pfu DNA Pol II Eukaryotic DNA Pol b M-MuLV-RT;
Polymerases Eukaryotic DNA Pol a E colz DNA Pol V
T7 DNA Pol Eukaryotic telomerases

General Use Repair Replicative Replicative Replicative Repair Replicative/Repair Replicative
Fidelity Good Excellent Excellent Excellent N/A Poor Good


Table 1-2. Characteristics of the various polymerase families.











A





Bm


C


mm


mm mmmm mm m


Figure 1-11.


The staggered extension process (StEP) for rediversification of mutant libraries.
This process has already been successfully used to rediversify libraries between
rounds of selection in CSR reactions (Arnold and Georgiou, 2003b, Zhao et al.,
1998, Ghadessy et al., 2001). A) Denatured template genes are primed with the
same primer. B) Short fragments are produced by brief primer-extension. C) In
the next cycle, fragments randomly prime the templates and extend further. D)
This process is repeated until full-length genes are produced. E) Full-length genes
are then purified, amplified, and recloned into a vector for another round of
selection.











A) ,
pol.Pept de
pilll
(T28)ATCCCA~n)GGCTC
~(P60)TAGGG
Basic peptide-DNA duplex
(T28)ATCC
B) (P660) TAG




dATP, dGTP,
Blotin-16-dCTP
(T28)ATCC
C) (P660) TAG




Streptavidin Coated Beads

(T28)ATCC
D) _L-~(P60) TAGG


Blotin









BSt etvdin


Figure 1-12.


Phage display selection scheme. This details the scheme used in the directed
evolution of a Taq polymerase fragment to incorporate non-standard nucleosides
into a growing DNA strand (Fa et al., 2004). A) A phage particle is displaying an
acidic peptide and a mutant polymerase on the pIII minor coat protein of the phage.
These coat proteins are localized to one area on the phage molecule, allowing
genotype to be linked to phenotype. B) The primer-template complex is attached to
the phage particle via a basic peptide, which links with the acidic peptide displayed
on the coat protein. C) The polymerase incorporates modified nucleotides in a
primer-extension assay, which terminates with the addition of a biotinylated
standard nucleotide. D) The biotin tag is captured by streptavidin and the entire
complex is immobilized on magnetic beads, allowing those phage particles
displaying inactive polymerases to be washed away. E) DNase I is used to
dissociate the phage complex from the DNA strands, allowing the phage displaying
the active polymerase to be captured in an elution. The genes encoding the active
polymerases can then be identified by sequencing and/or rediversified and shuttled
into another round of selection.


D seI clevag


















Heating In first PCR
cycle lyses cell



C)
water drop In oil

O on
plasmid OO
dATP,
dCTP,
dGTP, TTP,
ad/or
PCR primers




temperature
cycle


Suspend In
water-oll
emulsion )


A)




E coli cell


Run-off with standard
nucleotides & re cone
for next round of
selection


Extac
oilawa



Maycolsofgn
(l o s cie


Figure 1-13.


General scheme for CSR. CSR allows for the selection of polymerases with an
ability to incorporate an unnatural nucleotide using water-in-oil emulsions. ) A
library of polymerase gene variants is cloned and expressed in E. coli. Spheres
represent active polymerase molecules inside of a bacterial cell. B) The bacterial
cells containing the polymerases and their encoding genes are suspended in
aqueous droplets in an oil emulsion. C) The thermostable polymerase enzyme and
encoding gene are released from the cell during the first denaturing cycle of PCR,
allowing self-replication to proceed. D) The resulting mixture of polymerase genes
is released by extraction with ether. E) A single run-off PCR with standard
nucleotides prepares the DNA for recloning and another cycle of selection.









CHAPTER 2
POLYMERASE INCORPORATION OF MULTIPLE C-GLYCOSIDES INTTO DNA:
PSEUDOTHYMIDINE AS A COMPONENT OF AN ALTERNATIVE GENETIC SYSTEM

Introduction

Each of the four standard nucleobases found in natural DNA (adenine, guanine, cytosine,

and thymine) is joined to their sugar via a carbon-nitrogen bond. This, by definition, makes

standard nucleotides N-glycosides. The nature of the glycosidic linkage is believed to have

consequences on the detailed conformation of the nucleoside, including through the operation of

the anomeric effect. In particular, the nature of the glycosidic bond may influence the puckering

of the sugar.

Unlike the standard nucleotides, the nucleotides that allow artificially expanded genetic

information systems (AEGIS) to be created are frequently C-glycosides, which have a carbon-

carbon bond between the nucleobase and the sugar. This is exemplified in the case of non-

standard pyrimidines that present Donor- Donor-Acceptor, Donor-Acceptor-Donor and

Acceptor-Donor-Donor hydrogen bonding patterns seen in Figure 1-4. If replacing the N-

glycosidic linkage by a C-glycosidic linkage changes features of the nucleoside that are

important specificity determinants for polymerases, problems are created for those seeking to

expand the genetic alphabet artificially and develop a synthetic biology from an expanded

genetic alphabet.

Reverse transcriptases have an ability to process both DNA and RNA, whose sugars have

different conformations. Reverse transcriptases, therefore, should be able to accept components

of an artificially expanded genetic information system that incorporate C-glycosides. Perhaps it

is not surprising that the first reported example of PCR amplification of a six letter genetic

alphabet, where one the extra two letters was a C-glycoside, exploited HIV-RT (Sismour et al.,

2004).









When attempting to develop a synthetic biology using C-glycosides, the physical structure

of the DNA must be considered, especially since the presence of multiple, sequential C-

glycosides can possibly alter the structure and stability of duplex DNA. Previous studies have

shown that poly(U)*poly(A) helices favor the A-DNA form while poly(T)*poly(A) helices

display perfect B-DNA structure (Ivanov et al., 1973, Saenger, 1984, Chandrasekaran and

Radha, 1992). Circular dichroism was employed to infer the secondary structure of our DNA,

since the spectra generated by A-DNA and B-DNA are quite different (Fig. 2-1) (Ivanov et al.,

1973). Duplex DNA containing one to twelve consecutive dA-dyU base pairs was studied and it

was determined that all remained in the B-DNA form.

To take the next step towards a synthetic biology with an expanded genetic alphabet, it

would be desirable to have DNA polymerases that accept multiple C-glycoside nucleotides. To

determine whether natural DNA polymerases have this capability and the extent to which this

capability is conserved, four Family A DNA polymerases and four Family B DNA polymerases

were screened for their ability to incorporate multiple 2' -deoxypseudothymidine-5 '-triphosphate

(dyTTP) and 2' -deoxypseudouridine-5 '-triphosphate (dyUTP) across from template dA. These

C-glycosides are steric analogs of thymidine-5' -triphosphate (TTP) and present the same

hydrogen bonding pattern to a complementary strand as TTP (Fig. 2-2). Consequently, they

should serve as a relatively specific probe for this non-standard structural feature.

In these experiments, all of the polymerases tested were able to incorporate both C-

glycosides to an extent; but there was room for improvement in some, such as Taxq. To

determine the extent of Taxq polymerase's ability to incorporate the C-glycosides, it was screened

for its ability to incorporate anywhere from one to twelve consecutive dyTTP or dyUTP across

from template dA.









Materials and Methods

Synthesis of Triphosphates and Oligonucleotides

Dr. Shuichi Hoshika, from the Foundation for Applied Molecular Evolution (FfAME,

Gainesville, Florida), synthesized the pseudothymidine precursor as described in Appendix A.

Dr. Daniel Hutter (FfAME) synthesized 2' -deoxypseudothymidine-5 '-triphosphate (dyTTP) as

described in Appendix A. 2'-Deoxypseudouridine-5 '-triphosphate (dyUTP) was purchased

from TriLink BioTechnologies (San Diego, California). Standard deoxynucleotide triphosphates

(dNTPs) of 2'-deoxyadenosine-5 '-triphosphate (dATP), 2'-deoxycytidine-5' -triphosphate

(dCTP), 2'-deoxyganosine-5' -triphosphate (dGTP), and thymidine-5' -triphosphate (TTP) and

were purchased from Promega Corporation (Madison, Wisconsin). Triphosphate solutions

identified as dyTNTPs were comprised of dATP, dCTP, dGTP, and dyTTP, while those

acknowledged as dyUNTPs were contained dATP, dCTP, d GTP, and dyUTP.

The oligonucleotides used for these experiments are listed in Table 2-1. Those sequences

containing only standard nucleotides were commercially obtained from Integrated DNA

Technologies (Coralville, Iowa) as desalted or PAGE (Polyacrylamide Gel Electrophoresis)

purified oligonucleotides. Those oligonucleotides containing dyU were synthesized by Dr. Ajit

Kamath (University of Florida, Gainesville, Florida) and were prepared using standard

monomers and reagents (Glen Research, Sterling, Virginia) on an Expedite 8909 DNA

Synthesizer (PerSeptive Biosystems, Inc., Framingham, Massachusetts). The crude products

were digested, with agitation, in 1 mL of concentrated ammonium hydroxide at 55 oC for 16 hrs

to release and deprotect the oligonucleotide (Sambrook et al., 1989). The mixtures were briefly

centrifuged and the supernatants were passed through 2 Clm cellulose acetate syringe filters. The

residual products were washed three times with 1 mL portions of sterile water. The combined









fi1trates were lyophilized to dryness and were purified by polyacrylamide gel electrophoresis

(PAGE) and isolated by reversed-phase chromatography on a silica gel as described previously

(Sambrook et al., 1989).

Circular Dichroism

Each template, containing one through twelve consecutive dA or dyU residues (T-13

through T-22 or T-23 through T-34, respectively), was annealed to its complement template,

containing consecutive dT or dA residues (T-3 5 through T-46 or T-47 through T-58,

respectively). Reactions contained 5 nmol of each template and 290 CIL of CD buffer (1 M

NaC1, 10 mM Na2HPO4, 1 mM Na2EDTA at pH 7.0) for a total volume of 300 CIL. The mixtures

were incubated for 5 min at 96 oC and allowed to cool to room temperature over the course of 1

hr.

The CD spectra from 200 to 320 nm, using a wavelength step of 1 nm, were measured in a

nitrogen atmosphere at 25 OC in a 0.1 cm pathlength cuvette, using an Aviv Model 215 Circular

Dichroism Spectrometer (Proterion Corporation, Inc., Piscataway, NJ). Scans were performed in

triplicate for each sample mixture and the data was averaged.

Standing Start Primer-Extension Assays

Radiolabeled primer was prepared by incubating 0.5 nmol P-1, 100 CICi y-32P-ATP, lX T4

Polynucleotide Kinase (PNK) Buffer, 50 U T4 PNK (New England BioLabs, Beverly,

Massachusetts), and sterile dH20 in a Einal volume of 100 C1L, for 1 hr at 37 oC. The

radiolabeled primer was purified using the QIAquick Nucleotide Removal Kit (Qiagen,

Valencia, California) and eluted from the column in 100 CIL Buffer EB (10 mM Tris-HC1, pH

8.5).









Radiolabeled template, to depict the location of full-length product (FLP), was prepared by

incubating 50 pmol T-4, 10 CICi y-32P-ATP, lX T4 PNK Buffer, 25 U T4 PNK, and sterile dH20

in a final volume of 50 CIL, for 1 hr at 37 oC. The radiolabeled T-4 was purified using the

QIAquick Nucleotide Removal Kit, and eluted from the column in 50 CIL Buffer EB. 200 CIL

DNA PAGE Loading Dye (98% formamide, 10 mM EDTA, 1 mg/mL xylene cyanol, and 1

mg/mL bromophenol blue) was added to the 1 C1M radiolabeled T-4 for a final concentration of

0.2 CIM radiolabeled T-4.

Radiolabeled 10 base-pair (bp) ladder was prepared by first incubating 1.95 Clg 10 bp DNA

Step Ladder (Promega Corporation), 30 CICi y-32P-ATP, lX T4 PNK Buffer, and sterile dH20 in

a final volume of 27 CIL, for 1 min at 90 oC. Immediately following, 30 U T4 PNK was added

and the mixture was incubated for 30 min at 37 oC. The radiolabeled 10 bp ladder was purified

using the QIAquick Nucleotide Removal Kit, and eluted from the column in 30 CIL Buffer EB.

120 CIL DNA PAGE Loading Dye was added to the 65 ng/CIL radiolabeled 10 bp DNA Ladder

for a final concentration of 13 ng/CIL radiolabeled 10 bp DNA Ladder.

Polymerase screen primer-extension assays

Klenow Fragment (3' 5' exo-), Bst DNA Polymerase (Large Fragment), Taxq DNA

Polymerase, VentR~ (eXO-) DNA Polymerase, Deep VentR~ (eXO-) DNA Polymerase, and

TherminatorTM DNA Polymerase were purchased from New England BioLabs. Tth DNA

Polymerase was purchased from Promega Corporation. Pfu (exo-) DNA Polymerase was

purchased from Stratagene (La Jolla, California). Buffers used in these experiments were

supplied by the manufacturer as follows: reactions using Bst, Taq, Tth, Vent (exo-), Deep Vent

(exo-), and Therminator were performed in lX ThermoPol Buffer (20 mM Tris-HCI (pH 8.8), 10

mM (NH4)2SO4, 10 mM KC1, 2 mM MgSO4, 0. 1% Triton X-100); Klenow (exo-) reactions were










performed in lX NEBuffer 2 (10 mM Tris-HCI (pH 7.9), 50 mM NaC1, 10 mM MgCl2, 1 mM

dithiothreitol); and reactions using Pfu (exo-) were performed in lX Cloned Pfu Buffer (20 mM

Tris-HCI (pH 8.8), 2 mM MgSO4, 10 mM KC1, 10 mM (NH4)2SO4, 0.1% Triton X-100, 0.1

mg/mL nuclease-free Bovine Serum Albumin). Optimal temperatures for polymerase function

were 37 oC for Klenow (exo-), 65 oC for Bst, and 72 oC for Taq, Tth, Vent (exo-), Deep Vent

(exo-), Pfu (exo-), and Therminator.

T-4 Primer-Template complex was prepared by mixing 25 pmol radiolabeled P-1, 200

pmol non-radiolabeled P-1, and 300 pmol non-radiolabeled T-4, in a final volume of 15 CIL. The

mixture was incubated for 5 min at 96 oC and allowed to cool to room temperature over the

course of 1 hr.

For primer-extension assays, 1.5 CIL of the primer-template complex, lX of the appropriate

manufacturer' s supplied buffer, 1 U/CIL of the appropriate polymerase, and sterile dH20 were

used in a final volume of 9 C1L. Reactions were then incubated at the appropriate temperature for

30 s. Each reaction was initiated by adding 1 C1L of one of the following: 1 mM dTTP, 1 mM

dyTTP, 1 mM dyUTP, 1 mM dNTPs, 1 mM dyTNTPs, or 1 mM dyUNTPs, and incubated for

two more minutes at the appropriate temperature. Reactions were immediately quenched with 5

CIL of DNA PAGE Loading Dye. Samples (1 CIL) were resolved on denaturing PAGE gels (7 M

Urea and 20% 40:1 acrylamide: bisacrylamide) and analyzed on a Molecular Imager FX System

(Bio-Rad, Hercules, California).

Taq polymerase primer-extension assays

Primer-Template complexes were prepared by mixing 25 pmol radiolabeled P-1, 200 pmol

non-radiolabeled P-1, and 300 pmol of non-radiolabeled template (T-1 through T-12), in a final










volume of 15 CIL. The mixtures were incubated for 5 min at 96 oC and allowed to cool to room

temperature over the course of 1 hr.

For primer-extension assays, 1.5 CIL of the appropriate primer-template complex, lX

ThermoPol buffer, 1 U/CIL Taq Polymerase, and sterile dH20 were used in a final volume of 9

CIL. Reactions were then incubated at 72 oC for 30 s. Each reaction was initiated by adding 1 CIL

of one of the following: 1 mM dNTPs, 1 mM dyTNTPs, or 1 mM dyUNTPs, and incubated for

two more minutes at 72 oC. Reactions were immediately quenched with 5 CIL of DNA PAGE

Loading Dye. Samples (1 CIL) were resolved on denaturing PAGE gels (7 M Urea and 20% 40:1

acrylamide: bisacrylamide) and analyzed on a Molecular Imager FX System (Bio-Rad).

Results

Circular Dichroism

Duplexes were formed by annealing each template (T-13 through T-34) to its complement

sequence (T-3 5 through T-58) creating twelve control helices containing only thymidine and

twelve helices containing pseudouridine. Figure 2-3 [A-E] shows a representative set of these

spectra, specifically the spectra of duplexes containing 1, 3, 6, 9, or 12 A-yU base pairs. When

compared to the spectra seen in Figure 2-1, all spectra are consistent with B-DNA being the

overall conformation of all duplexes. In addition, the spectra representing the oligonucleotides

containing the dA-dyU base pairs are similar to the patterns of the spectra containing the dA-dT

base pairs.

Polymerase Screen Primer-Extension Assays

Four Family A and four Family B polymerases were screened for their ability to

incorporate non-standard bases exhibiting a C-glycosidic linkage with efficiency. Polymerases

were tested in both 4-base and 13-base extension assays, and were challenged to incorporate (4-










bases) or incorporate and extend beyond (13-bases) four consecutive dT, dyT, or dyU residues

across from template dA under the polymerases' optimal conditions (Fig. 2-4). Reactions used

TTP, dyTTP, or dyUTP in the 4-base extensions and either dNTPs, dyTNTPs, or dyUNTPs for

the 13-base extension reactions. Family A polymerases (Fig. 2-5[A-B]) were represented by

Klenow (exo-), Bst, Taq, and Tth; Family B polymerases (Fig. 2-6[A-B]) were represented by

Vent (exo-), Deep Vent Exo-, Pfu (exo-), and Therminator.

Pfu (exo-) was the only polymerase that was not able to generate FLP when challenged to

incorporate and extend beyond both of the non-standard bases. All other Family A and Family B

polymerases were able to incorporate the four consecutive non-standard bases (NSBs) and

extend beyond them, to some measure, to generate FLP. Bst and Therminator polymerases

appeared to have consumed almost all of the primer in the course of their reactions, generating

large amounts of FLP, with all of the different NTPs tested. Klenow (exo-) and Vent (exo-) also

did an exceptional j ob at incorporating the NSBs, but the remainder of the polymerases did

appear to have difficulty given the intensity of the pause sites relative to the intensity of the FLP

bands.

Taq Polymerase Primer-Extension Assays

To replicate its own encoding polymerase gene, Taq polymerase would be required to

incorporate and extend beyond four consecutive dT, dyT, or dyU residues. In these

experiments, Taq polymerase was challenged to incorporate and extend beyond twelve

consecutive dT/dyT/dyU residues opposite template dA. From these results (Fig. 2-7[A-B]), it

was determined that Taq appears to have some difficulty incorporating twelve consecutive dT

residues, as evidenced by the pausing in those lanes, but it is still able to generate FLP (N+13).

It is also apparent that Taq has difficulty incorporating multiple consecutive residues of C-










glycosides, since it was not able to generate FLP when forced to incorporate five or more dyT or

dyU residues. However, it does, generate a small amount of FLP when challenged to insert four

consecutive dT, dyT, or dyU residues, and therefore should be able to replicate its own gene

using a C-glycoside substitute for TTP.

Discussion

It was first necessary to determine if the presence of multiple dyU residues in double-

stranded DNA would perturb the helical structure to a point where there is a phase transition

from B-DNA to A-DNA, perhaps making it difficult for polymerases to replicate the DNA. It is

well known that poly(U)*poly(A) favors the A-helicies, while poly(T)*poly(A) favors B-DNA

helicies (Ivanov et al., 1973, Saenger, 1984, Chandrasekaran and Radha, 1992).

The distinctive differences in the CD between the canonical A-duplex and the canonical B-

duplex structures involves a shift of the positive potion of the spectrum to shorter wavelengths,

to 267 nm for the A-form compared to 275 nm for the B-form (Ivanov et al., 1973). A similar

shift with a similar magnitude is seen in the negative portion. Further, the Q-DNA shows a

stronger Cotton effect than the B-DNA. Therefore, to determine whether the addition of C-

glycosidic units tends to drive the conformation of the duplex from B towards A, we look for an

increase in the Cotton effect and a shift towards shorter wavelengths.

Circular dichroism was performed on 24 duplex DNA molecules containing anywhere

from one to twelve consecutive dyU~dA or dTedA base pairs. The observed spectra (Fig. 2-

3 [A-E]) were compared to those in Figure 2-1, the reference spectra for canonical A and B

duplexes. In all spectra containing dyU, the wavelength shifted marginally (ca. 4 nm) towards

longer wavelengths. This shift does not display a trend, however. The shift is the same no

matter how many dyU units are incorporated into the strand.









The only possible trend is a change in the relative intensity of the positive (at 275 nm) and

negative (at 264 nm) band intensities (Ivanov et al., 1973). Here the intensity of the 246 nm

band and the 275 nm band both decrease. As concentrations were carefully controlled, we do not

believe that this reflects a change in the concentration of the oligonucleotides. This is also

suggested by the intensity of signals at lower wavelengths, although these are notoriously

compromised by any trace of impurity. Disregarding this detail, the trend is the opposite of what

one expects for the conversion of the duplex structure from canonical B to canonical A.

These results provide no evidence that addition of dyU units causes the duplex structure to

change from a B-DNA to an A-DNA conformation. Thus, there was no evidence to suggest that

there would be a conformational problem with the duplex structure when incorporating multiple,

sequential C-glycosides. It should be mentioned, however, that CD is indicative only of the

gross properties of the system; it does not provide information about detailed structure. It is

conceivable that the conformation is changed in a different way, or some subtly.

Nevertheless, these results encouraged us to test polymerases for their ability to work with

C-glycosides. Polymerases that already display some of the desired catalytic activity, in this case

the incorporation of the C-glycosides, should facilitate in the evolution and/or creation of an

AEGIS. Previous studies have shown that polymerases are able to incorporate up to three C-

glycosides, but have not tested their ability to incorporate more than three multiple, sequential C-

glycosides that would be required for an AEGIS (Lutz et al., 1999, Sismour et al., 2004, Piccirilli

et al., 1991). Accordingly, four Family A polymerases, Klenow (exo-), Bst, Taq, and Tth, and

four Family B polymerases, Deep Vent (exo-), Vent (exo-), Pfu (exo-), and Therminator, were

screened for the ability to incorporate TTP, d yTTP, and dyUTP across from template dA in

both 4-base and 13-base primer extension assays (Fig. 2-5[A-B] and Fig. 2-6[A-B]). In the 4-









base extension assay, polymerases were challenged to incorporate four consecutive TTP,

dyTTP, or dyUTP across from template dA during two-minute incubations at the optimal

temperature for each enzyme. The 13-base assay, incubated as described above, took place in

the presence of dCTP, dGTP, dATP, and TTP, dyTTP, or dyUTP, and required incorporation

and extension beyond the four consecutive TTP, dyTTP, or dyUTP.

The Bst and Therminator polymerases appeared to have worked extremely well and

consumed almost all of the primer in the course of all of their reactions, while Pfu (exo-) did not

appear to generate any 13-base FLP when presented with either of the two NSBs. All other

polymerases generated varying amounts of FLP with both of the NSBs, suggesting that any of

the aforementioned polymerases could be potential candidates for adaptation to an AEGIS, based

on the qualification that the polymerase must already be able to incorporate C-glycosides.

However, two of these polymerases, Klenow (exo-) and Bst are not thermostable, and thus could

not undergo PCR and, according to the manufacturer, Therminator is not recommended for any

applications except DNA sequencing and primer-extension reactions, thereby making these three

polymerases unlikely candidates for future studies. As in previous studies, Taq was selected as

the best polymerase candidate to undergo further testing since it so readily accepted the

consecutive non-standard bases (Lutz et al., 1999).

In an AEGIS system, a polymerase would be required to replicate its own encoding gene

with efficiency and fidelity. In order for Taq to replicate its encoding polymerase gene, it would

be required to incorporate four consecutive dyT or dyU across from template dA. Since we

have already shown that Taq can in fact incorporate and extend beyond four consecutive C-

glycosides, we next tested its ability to incorporate and extend beyond up to twelve consecutive

dyT-dA or dyU-dA base pairs. Primer extension experiments were performed under optimal










polymerase conditions using templates T-1 through T-12. Based on the results of the study (Fig.

2-7[A-B]), Taq polymerase will not readily incorporate and extend beyond more than five

consecutive C-glycosides to generate FLP. If this polymerase is to be used as a potential

candidate for an AEGIS system, it must be modified, possibly by directed evolution experiments,

so that it can incorporate more of these non-standard bases.


































220 260 300
Wavelength (nm)


Figure 2-1. A schematic representation of the CD spectra of A- and B-DNA forms. The dotted
line indicates the position of the absorption maxima (adapted from Ivanov et al.,
1973 (Ivanov et al., 1973)).




















Ro minor groove o-s

T A

mqjor groove n






ROO OR
o ~minor groove


RO O
yT A


mqjor groove n






ROO OR
o_ minor groove

RO O
VTU A

Figure 2-2. The base pairing interactions between a standard A-T base pair and the non-standard
yT-A and yU-A base pairs. Note the C-glycosidic bond (shown in blue) between
the base and the sugar in both yT and yU.





Oligo Sequence (5'-3' Direction)


Purification


:AG AGA CGlly CIA 1AG
:GG ACG Allpy CTA TAG
:GG CGA Ilnly CTA TAG
;GC GAlly lnly CTA TAG
;CG Allpy Ilnly CTA TAG
:GA AAI nliqAJI CTA TA(


~The yr represent the incorporation of a pseudouridine residue.


Table 2-1. Oligonucleotides used in this study.


























20* 12 A+1 T6 -*-1 A+1 psedo







Wavelength (nm)
*- 3A+3 T -*- 3A+ pseudoU














Wavelength (nm)
~-*-A+6T-*-6A+6 pseudoU


Waveleneth (nm)
t9A+9 T -*-9A+9pseudoU















--12A+12T--12A+12 pseudoU


Figure 2-3.


Representative CD Spectra. Circular dichroism spectra of select double stranded
templates with their complements containing varying amounts of dA-dT or dA-dyU
base pairs at 25 oC. All of the spectra above are indicative of B-DNA (Ivanov et al.,
1973). Note that the conformation does not dramatically change as the amount of
yU is increased. (A) The spectra of duplexes containing 1 dA-dT base pair vs. 1
dA-dyU base pair. (B) The spectra of duplexes containing 3 dA-dT base pairs vs. 3
dA-dyU base pairs. (C) The spectra of duplexes containing 6 dA-dT base pairs vs. 6
dA-dyU base pairs. (D) The spectra of duplexes containing 9 dA-dT base pairs vs.
9 dA-dyU base pairs. (E) The spectra of duplexes containing 12 dA-dT base pairs
vs. 12 dA-dyU base pairs.











TTrP, dyCITTP
or dyrUTP



7P .Primer P-1l


STo plateT-4 A AA AC CT GT GT C G-


13-base exten sion
FLP IdNTPs,
d~yTNTPs, or
dyU]NTPs are
present)
4-base extension FLP
(only TTP, dqiTTP, or
dyrUTP is present)


Figure 2-4. Depiction of primer-extension assays used in the polymerase screen. In the 4-base
extension assays, polymerases were challenged to incorporate up to four consecutive
dT, dyT, or dyU residues across from template dA. In the 13-base extension
assays, the polymerases were forced to incorporate and extend beyond those first
four residues.




























IC
N br

20 bp I
~6;~ PP~ d ~ C t B P P a p~ a a b~
8T 2 CC
CCC ~CCCCC
P~LCCI. C~~ C~
05
2 r 322 2
-E ~05 P+ .t
"~' 3 3
na o
o-,
IS
a
i:


Klenow exo- Bot Taq Tth
Polymerase Polymerase Polymerase Polymerase


B) 3 Klenow exo- But Taq Tth
B) Polymerase Polymerase; Polymerase Polymerase


30 bp *


N+13


30 bp
a
)1 rr
N+a CI r r


srarlr


lZ
N+*


~L~LIC

11441
III~~~~


ZI
II


Figure 2-5.


Family A polymerase screen. Unextended primer is at position N; N+4 is the full-

length product (FLP) for the 4-base extension assays; N+13 is the FLP for the 13-
base extension assays. Final concentrations: TTPs/dyTTPs/dyUTPs/dNTPs/

dyTNTPs/dyUNTPs (100 CIM), radiolabeled P-1 (2.5 pmol), non-radiolabeled P-1
(20 pmol), non-radiolabeled template T-4 (30 pmol), and appropriate polymerase (1
U). The mixtures were prewarmed to the polymerase's optimal temperature for 30 s
and initiated with the appropriate NTP mixture. The mixtures were incubated at the

polymerase's optimal temperature for 2 min and immediately terminated with DNA
PAGE Loading Dye (formamide, EDTA, and dyes). An aliquot (1 CIL) was loaded
onto denaturing polyacrylamide gels (20%, 7 M urea) and resolved. A) The

incorporation and extension of dT and dyT by various Family A polymerases. All

polymerases were able to incorporate and extend beyond the four consecutive A-T
or A-yT base pairs to generate some FLP in both the 4-base and 13-base extension

assays. Klenow (exo-) and Bst most likely generated higher amounts of yT
containing FLP since their optimal temperatures are lower than that of Taq and Tth.

B) The incorporation and extension of dT and dyU by various Family A
polymerases. All polymerases were able to incorporate and extend beyond the four
consecutive A-T or A-yU base pairs to generate some FLP in both the 4-base and
13-base extension assays. Klenow (exo-) and Bst most likely generated higher

amounts of yU containing FLP since their optimal temperatures are lower than that
of Taq and Tth.


~~- ~ *L
1 "+ I ~i*e ~,
it
*Q,.. 4
9*r1~6(1 i 4L
r
*
r


'"^"pggap~~~Dg$~~
PP
IX
P+ C3~~C3 3~~ 3~
PiC 5 ~5:5: piC
E a ~o 3 a B 3 a ~ 3
o,
'E
g
a
ii:












Whnt exo- Deep Vent exo Pfu exo- Therrninator
Polymerase Polymerase Polymerase Polymerase B)


Wnt exo- Deep Vnt exo- Pfu exo- Therminator
Polymeras Polymerase Polymerase Polymerase


A)
N+18 ""


N+~18


i


1
IIC


N+19


It
90 bp Ir.


8
3Pbp 11


ic

rlr


N~ ~~cs~~rr~-

rIIiel)iclCIII

#
2U ~p 1L


N*4 Ill*"~LCLllnr*~ 1
eL Icr *C ~r, cll
PI ." CC I* ; *L ~r, lli bjlr
cc ~ ~r*lrrl~ra~
N ~r~ rrr II *P 1 4 1* *L ~ ~ ~~


2~ bp ~
1
EfiZ~~p~
2; bt
Zi','=~

op
~1
d


Figure 2-6.


Family B polymerase screen. Unextended primer is at position N; N+4 is the full-
length product (FLP) for the 4-base extension assays; N+13 is the FLP for the 13-
base extension assays. Final concentrations: TTPs/dyTTPs/dyUTPs/dNTPs/

dyTNTPs/dyUNTPs (100 CIM), radiolabeled P-1 (2.5 pmol), non-radiolabeled P-1
(20 pmol), non-radiolabeled template T-4 (30 pmol), and appropriate polymerase (1
U). The mixtures were prewarmed to the polymerase's optimal temperature for 30 s
and initiated with the appropriate triphosphate mixture. The mixtures were
incubated at the polymerase's optimal temperature for 2 min and immediately
terminated with DNA PAGE Loading Dye (formamide, EDTA, and dyes). An

aliquot (1 CL) was loaded onto denaturing polyacrylamide gels (20%, 7 M urea) and
resolved. A) The incorporation and extension of dT and dyT by various Family B
polymerases. All polymerases, except Pfu (exo-), were able to incorporate and
extend beyond the four consecutive A-T or A-yT base pairs to generate some FLP
in both the 4-base and 13-base extension assays. Pfu (exo-) was able to generate
FLP in the 4-base assay, but not the 13-base assay. Therminator was extremely

adept at incorporating the dyT residues, as depicted by the low levels of unextended
primer remaining in those lanes. B) The incorporation and extension of dT and dyU
by various Family B polymerases. All polymerases, except Pfu (exo-), were able to
incorporate and extend beyond the four consecutive A-T or A-yT base pairs to
generate some FLP in both the 4-base and 13-base extension assays. Pfu (exo-) was
able to generate FLP in the 4-base assay, but not the 13-base assay. Therminator
was extremely adept at incorporating the dyU residues, as depicted by the low levels
of unextended primer remaining in those lanes.


~~npp~g~~gg~~pp~$a
eZ..z t ~te E;;
+ I--Cc cce
E rr B -d ~Z'E'' c
~t"
,~ B B rr B
a

p


'LCDPB~~~LL
9~, cc21 c ~ c c
2 3 22 ,2r +22
3 TI 'D 3 3 3 3
3 +
rr











A) 40 bp





+1 12 I 12~LO~
Inoprtino t 2Icrprro a o1
CosctveTParos Cnectv pTPare
frm eplt AfrmTepat


B) 40 bp
N+14
N+13

30 op

N+4


e


* .


Figure 2-7.


Incorporation of one to twelve consecutive dT, dyT, or dyU residues by Taq
polymerase. Unextended primer is at position N; FLP is denoted by N+13 in all of
these assays (see Table 2-1 for oligonucleotides used). Final concentrations:
dNTPs/dyTNTPs/dyUNTPs (100 CIM), radiolabeled P-1 (2.5 pmol), non-
radiolabeled P-1 (20 pmol), non-radiolabeled templates T-1 through T-12 (30 pmol),
and Taq polymerase (1 U). The mixtures were prewarmed to 72 OC for 30 s and
initiated with the appropriate NTP mixture. The mixtures were incubated at 72 OC
for 2 min and immediately terminated with DNA PAGE Loading Dye (formamide,
EDTA, and dyes). An aliquot (1 CIL) was loaded onto denaturing polyacrylamide
gels (20%, 7 M urea) and resolved. A) The incorporation and extension of 1 to 12
dT or dyT residues across from template A by Taq polymerase. It appears that very
little to no FLP is generated after the incorporation of five or more consecutive
dyTs. B) The incorporation and extension of 1 to 12 dT or dyU residues across
from template A by Taq polymerase. It appears that very little to no FLP is
generated after the incorporation of five or more consecutive dyUs.


bp

r Incorporation of 1 to 12 Incorporati~on of 1 to 12
-1 4 Consecutive TTP across consecutive dyUTP
I rom Template A across from Template A









CHAPTER 3
CREATION OF A RATIONALLY DESIGNED MUTAGENIC LIBRARY AND SELECTION
OF THERMOSTABLE POLYMERASES USING WATER-IN-OIL EMULSIONS

Introduction

To create synthetic biology using an artificially expanded genetic information system

(AEGIS), a polymerase that is capable of incorporating non-standard nucleotides (NSBs) is

needed. Unfortunately, studies have not found an extant thermostable polymerase able to

incorporate a variety of NSBs with efficiency and fidelity. Polymerases usually perform more

efficiently with one type ofNSB, than they do with another (Hendrickson et al., 2004, Leal et al.,

2006, Roychowdhury et al., 2004).

Directed evolution may help to rectify this situation and allow us to mutate an existing

polymerase to generate one with an increased ability to incorporate a variety of NSBs (Ghadessy

et al., 2001, Ghadessy et al., 2004). Therefore, we became interested in directed evolution as a

way to modify Taq polymerase to better incorporate NSBs, specifically ones exhibiting a C-

glycosidic linkage.

Taq polymerase, a member of the Family A polymerases, has already been successfully

evolved under direction to incorporate various other NSBs using directed evolution (Ghadessy et

al., 2001, Ghadessy et al., 2004, Fa et al., 2004). Ghadessy et at. provided a procedure for doing

so using water droplets in oil (Ghadessy et al., 2004, Ghadessy et al., 2001); these served as

artificial cells. They began with large, diverse random libraries of the Taxq polymerase, with

approximately 7 amino acid residue replacements. Ghadessy et at. found that three to four

rounds of selection was sufficient to identify a polymerase able to incorporate various NSBs

using these random libraries.

This result was initially surprising, as Guo et at. has shown that approximately one-third of

all random multiple amino acid changes will result in the inactivation of a protein, and that 70%









of random changes in the active site of a polymerase will also result in inactivation (Guo et al.,

2004). This implies that a protein having more than a few random amino acid changes has a

high likelihood of being inactive. One might have expected that a very large fraction of the

variants created by Ghadessy et al. would have been inactive, especially at high temperatures,

and this expectation is consistent with results reported below.

This raises a general question: What is the likelihood that a library contains a protein

having a novel but desirable property? A desirable library for directed would optimally have a

large, diverse number of proteins with a high number of active clones (Hibbert and Dalby, 2005).

One approach to achieving this goal involves the selection of sites to introduce replacements.

For example, if replacements throughout the protein are equally likely to lower thermal stability,

while replacements in sites near the active site are more likely to change catalytic behavior, it

makes sense to focus randomization in residues near the active site (Arnold and Georgiou,

2003b, Arnold and Georgiou, 2003a, Fa et al., 2004, Miller et al., 2006, Ghadessy et al., 2004,

Ghadessy et al., 2001).

An alternative approach recognizes that natural history has already explored polymerase

"sequence space." Much of this natural history is available to us in genomic sequence databases.

This permits an approach, originally called "evolutionary guidance," that extracts information

from that history to identify sites that are more likely to influence behavior in a way that is

desired, and less likely to damage the enzyme (Allemann et al., 1991, Presnell and Benner,

1988).

Eric Gaucher, at the Foundation for Applied Molecular Evolution (FfAME), recently

developed this approach a step further under the reconstructing evolutionary adaptive paths

(REAP) rubric (Gaucher, 2006). He identified sites where functional divergence occurred within









a family of polymerases, but where natural history suggested that the site was under strong

selective pressure. In theory, this has the highest probability to generate new activities and

functions.

Using the sites identified by the REAP approach, the Type II sequence divergence of the

Family A polymerases was studied (Gu, 2002, Gu, 1999). In this approach, sites were identified

that had a split "conserved but different" pattern of historical evolutionary variation, and had

been previously suggested to lead to a change in the function or behavior of the polymerase.

Using Pfam (Fig. 3-2), a total of 57 amino acid changes across 35 sites within the 719 members

of Family A polymerases that were available were identified (Bateman, 2006, Finn et al., 2006).

The 35 sites for mutational studies, distributed as seen in Figure 3-2, were derived from these

analyses, and from sequences discussed in a recent review by Henry and Romesberg on the

evolution of novel polymerase activities (Henry and Romesberg, 2005). The 57 replacement

amino acid residues were selected based on the Family A viral polymerase sequences at the 35

mutational sites. The viral sequences were exploited since literature has told us that viral

polymerases are more adept at incorporating NSBs than other polymerases (Sismour et al., 2004,

Leal et al., 2006, Horlacher et al., 1995), and ancient viruses have also been implicated in the

origins of cellular DNA replication machinery (Forterre, 2006).

The company DNA 2.0 created and synthesized the rationally designed (RD) library

containing 74 different mutants using the 57 amino acid changes identified by REAP, in various

combinations to yield three or four amino acid mutations per sequence. In addition to creating

the library, DNA 2.0 also designed and generated a version of the wt taq polymerase gene that

was optimized for codon usage in E. coli cells (co-Taq polymerase). The optimization of codon

usage results in higher expression levels of the protein within the cell (Gustafsson et al., 2004).









Each of these 75 polymerases (co-Taq and the 74 mutants) were tested for their ability to

incorporate increasing concentrations of a representative C-glycoside (Fig. 2-3), 2'-

deoxypesuouridine-5'triphosphate (dyUTP). None were able to incorporate dyUTP more

efficiently than the co-Taq polymerase, and only eighteen of the 74 mutants of the RD Library

showed activity with the canonical dNTPs under the conditions with which they were presented.

Selections require that some members of the library perform differently than the original

protein of interest (Arnold and Georgiou, 2003a, Lutz and Patrick, 2004). We did not perform a

selection to identify a polymerase with an increased ability to incorporate dyUTP, since we

determined there were no clones in the RD Library that functioned with the NSB better than co-

Taq polymerase. In order to demonstrate our laboratories ability to perform in vitro selections,

we decided select for the eighteen mutant polymerases that exhibited activity with dNTPs from

the pool of 74 mutants.

To perform our selection experiments, we used a variation of the compartmentalized self-

replication (CSR) method developed in the laboratories of Griffiths and Holliger to create water-

in-oil emulsions as a way to link genotype to phenotype (Miller et al., 2006, Tawfik and

Griffiths, 1998, Ghadessy et al., 2001, Ghadessy et al., 2004). This method (Fig. 1-13) uses cells

expressing the polymerase as the sole source of polymerase and plasmid template in a PCR

reaction, which takes place inside the aqueous phase of the emulsion. Inactive polymerases fail

to replicate their encoding gene, so they are effectively selected against after the extraction of

products from the emulsion.

After our selection, products were recloned into the expression vector using a version of

the megaprimer PCR method (Miyazaki and Takenouchi, 2002). As this protocol generated

products that were crossover mutations, sequencing of the products provided a list of the









mutations that survived the selection, without providing information about which mutations were

associated with each other. The megaprimer PCR is, nevertheless, an effective method for

library rediversification between rounds of selection.

Materials and Methods

DNA Sequencing and Analysis

DNA sequencing was carried out by the University of Florida Interdisciplinary Center for

Biotechnology Research, DNA Sequencing Core Facility using an ABI 3 130Oxl Genetic Analyzer

(Applied Biosystems, Foster City, California) and primers P-6 through P-9 (Table 3-1). BLAST

2 software was used for sequence similarity searching (Tatusova and Madden, 1999); Derti's

Reverse and/or complement DNA sequences website was used to find the reverse complement of

various DNA strands (Derti, 2003); and ExPASy's translate tool was used to translate DNA

sequences into their amino acid counterparts (Swiss Institute of Bioinformatics, 1999).

Construction of Plasmids

Construction of pSW1

The gene (wt taq) encoding wt Taq polymerase was cloned from a vector generously

donated by Dr. Michael Thompson (UNC, Chapel Hill, North Carolina) using primers P-2 and P-

3. The product was digested with the SacII and Ncol restriction enzymes (New England

BioLabs, Beverly, Massachusetts) according to manufacturer' s protocol. The restricted wt taq

was then ligated into the identically digested pASK-IBA43plus vector (IBA GmbH, St. Louis,

Missouri)(Fig. 3-3), using T4 DNA ligase (New England BioLabs) according to manufacturer' s

protocol (16 oC overnight with a 4: 1 insert:vector ratio) to make the new plasmid pSW1 (Fig. 3-

4), and adding an N-terminal hexahistidine tag onto the wt taq gene (His(6)-wt Taq). Plasmid

constructs were verified by restriction digest analysis, using the enzymes BamnHI and Ncol

according to the manufacturer' s protocol (New England BioLabs), as well as sequencing.









Rationally designed mutagenic library (RD Library) creation

DNA 2.0 (Menlo Park, California) synthesized a variant of the wt taq polymerase gene

(co-taq) that was optimized for the codon-usage ofE. coli, which was then used to construct the

pSW2 plasmid (Fig. 3-5). Plasmids pSW3 pSW76 (Table 3-2) were designed by Dr. Eric

Gaucher (Foundation for Applied Molecular Evolution, Gainesville, Florida) and DNA 2.0 using

the REAP approach. Sequence alignments and phylogenetic tree construction of 719 Family A

polymerase protein sequences were generated using the Pfam website (Bateman, 2006, Finn et

al., 2006). Type II functional divergence between the bacterial/eukaryotic Family A

polymerases and the viral Family A polymerases was estimated with DIVERGE 2.0 software

(Gu, 2002, Gu, 1999). The 35 sites for mutational studies were derived from these analyses, as

well as sequences discussed in Henry and Romesberg (Henry and Romesberg, 2005); the

replacement amino acid residues were selected based on the viral sequences at those sites. The

sites chosen are all located in or near the active site of the polymerase.

DNA 2.0 randomized the mutations throughout the 74 sequences so they were equally

distributed (3 to 4 amino acid changes per gene). In addition to the synthesis of the genes, DNA

2.0 cloned all 75 of these plasmids (co-taq and 74 mutants) into the pASK-IBA43plus vector

using the SacII and Ncol restriction sites. Plasmid constructs were verified both by restriction

digest analysis, using the enzymes BamHI and Ncol according to the manufacturer's protocol

(New England BioLabs), and by sequencing.

Growth Curves and Cell Counts

The bacterial strains used in this study are listed in Table 3-3. The rich media used in these

studies was Luria-Bertani (LB) medium (Difco Laboratories, Detroit, Michigan) (Miller, 1972).

Ampicillin was provided in liquid or solid medium at a final concentration of 100 Clg/mL.

Plasmids were transformed into the E. coli TG-1 cell line according to manufacturer' s protocol









(Zymo Research, Orange, California). Cell growth was determined by measuring optical density

at 550 nm using a SmartSpec Plus Spectrophotometer (Bio-Rad, Hercules, Califomnia).

Anhydrotetracycline (2 mg/mL stock in N,N-dimethylformamide) was used at a final

concentration of final concentration of 0.2 ng/C1L to induce expression.

Inocula for the growth experiments were prepared as follows: bacterial strains were grown

overnight (14.25 hrs) at 37 oC and 250 rpm in LB medium (supplemented with ampicillin, if

applicable) in 14 mL 2059 Falcon Tubes (BD Biosciences, San Jose, Califomnia). Cells (1 mL)

from the 5 mL overnight culture were used to inoculate 100 mL LB or LB-Amp cultures in 500

mL baffled flasks. Cultures were grown at 37 oC and 250 rpm for 8.75 hrs. Cell counts were

measured by performing a dilution series using 10-fold dilutions of the cells in 0.85% NaC1.

Dilutions were plated onto LB plates (supplemented with ampicillin, if applicable), grown

overnight at 37 oC, and colonies were counted the next morning to determine the number of

colony-forming units per milliliter of culture (cfu/mL).

Samples of cells were taken at various time points to determine the levels of protein

expression, before and after induction. 2X SDS-PAGE (62.5 mM, pH 6.8, 25% glycerol, 2%

SDS, 0.01% bromophenol blue, 5% P-mercaptoethanol (Laemmli, 1970))loading dye was added

to the samples, and to 50 U Taq Polymerase (New England BioLabs). Samples were boiled for 8

minutes, then loaded onto a Tris-HCI Ready Gel (7.5%, Bio-Rad) and resolved for 45 min at 200

V. Gels were stained via the Fairbanks Method (Fairbanks et al., 1971).

Purification of His(6-wt Taq Polymerase

The SW3 cell line was grown overnight in 5 mL of LB-Amp broth for 14.25 hr at 37 oC

and 250 rpm in 14 mL 2059 Falcon Tubes (BD Biosciences). Approximately 2 x 10s colony-

forming units (cfu), roughly equal to 500 CIL of a culture with an OD5sonm Of 4.0, were used to









inoculate two 100 mL cultures of LB-Amp in 500 mL baffled flasks. These cultures were grown

at 37 oC and 250 rpm for 3.75 hrs to an approximate OD5sonm Of 1.8, and were then induced by

addition of anhydrotetracycline (0.2 ng/CIL final concentration). The cells were allowed to grow

for an additional 5 hrs to an approximate OD5sonm Of 3.5. Samples of the undinduced and

induced cells were taken and stored at -20 oC for further analysis.

Cultures were then combined and the cells harvested by centrifugation (9000 rpm, 10 min,

4 oC). The SW3 cells were washed in 40 mL of Cell Harvest Buffer (50 mM Tris-HC1, pH 7.9,

50 mM dextrose, 1 mM EDTA, 4 oC) and centrifuged again (8000 rpm, 10 min, 4 oC). The cell

pellet was then resuspended in Cell Lysis Buffer (20 mM Tris-HC1, pH 7.9, 50 mM NaC1, 5 mM

imidazole, 1 mg/mL lysozyme, 5 Clg/mL DNasel, and 10 Clg/mL RNasel) at a concentration of 2

mL/gram of cells.

The cells were gently lysed by rocking (GyroMini Nutating Mixer) at ambient temperature

for 15 min, the proteins were then denatured by heating to 75 oC for 20 min. The lysed cells

were centrifuged (39,000 x g, 10 min, 4 oC) and the cell-free extract (cfe) removed and placed

into a clean tube. The efe was then sonicated with six 10 s bursts at 71% output with a 10 s

cooling periods at 4 oC between each burst (Model 500 Sonic Dismembrator with a 1/2 inch

tapped horn with flat tip, Fisher Scientific, Suwannee, Georgia). The efe was centrifuged

(39,000 x g, 10 min, 4 oC) and the supernatant (cleared efe) was removed.

The cleared efe was added to 1 mL of a 50% Ni-NTA slurry (Qiagen, Valencia, California)

and incubated at 4 oC for 60 min with gentle mixing (GyroMini Nutating Mixer). The lysate-Ni-

NTA mixture was loaded onto a Poly-Prep Column (Bio-Rad, Hercules, California) and allowed

to settle for 10 min at 4 oC. A portion of the flow-through (10 CIL) was then collected and saved

for analysis. The column was washed twice with 4 mL of Ni-NTA Wash Buffer (20 mM Tris-










HC1, pH 7.9), 50 mM NaC1, 60 mM imidazole) and a portion of the flow-through (10 CIL) was

saved for future analysis. The protein was eluted four times (0.5 mL each) with Ni-NTA Elution

Buffer (10 mM Tris-HCL, pH 7.9, 250 mM NaC1, 500 mM imidazole) and portions of each (10

CIL) were saved for future analysis at -20 oC. 2X SDS-PAGE loading dye was added to each of

the samples mentioned above. Samples were prepared, resolved, stained, as described in the

previous section, and the elutions containing the majority of the purified Hi ssc-wlf Taq

polymerase were identified.

Elution fractions 2 4 were combined and loaded into a Slide-A-Lyzer 10K MWCO 0.5 -

3 mL Dialysis Cassette (Pierce, Rockford, Illinois) that was pre-hydrated in Taq Dialysis Buffer

A (50 mM Tris-HC1, pH 8.0, 50 mM KC1, 0.1 mM EDTA, 0.5 mM PMSF, 0.5% Nonidet-P40,

0.5% Triton X-100). The sample was dialyzed at 4 oC for 4 hrs against 500 mL of Dialysis

Buffer A. It was then dialyzed for another 4 hrs at 4 oC against 500 mL of Taq Dialysis Buffer B

(50 mM Tris-HC1, pH 8.0, 50 mM KC1, 0.1 mM EDTA, 0.5 mM PMSF, 0.5% Nonidet-P40,

0.5% Triton X-100, 1 mM DTT). Finally, it was dialyzed for 8 hrs at 4 oC against 1 L of Taq

Storage Buffer (50 mM Tris-HC1, pH 8.0, 50 mM KC1, 1 mM DTT, 0.1 mM EDTA, 0.5 mM

PMSF, 0.5% Nonidet-P40, 0.5% Triton X-100, 1 mM DTT, 50% glycerol). The sample was

removed, quantitated, and the protein concentration determined using the Bio-Rad Protein Assay

Dye Reagent according to manufacturer' s instructions.

The purified His(6)-wt Taq polymerase and Taxq polymerase (New England BioLabs) were

used in separate PCR reactions. The same concentration of each polymerase (enough protein to

equate to 3 U of Taq polymerase from New England BioLabs) were added to PCR reactions

containing: lX Modified ThermoPol Buffer (2 mM Tris-HC1, pH 9, 10 mM KC1, 1 mM

(NH4)2SO4, 2.5 mM MgCl2, 0.2% Tween 20), 250 CIM dNTPs, 1.0 CIM P-4, 1.0 CIM P-5, and 1









ng/CIL pSW1. The PCRs (50 CIL) were run under the following conditions: 5 min, 94 oC; (1 min,

94.0 oC; 1 min, 55.0 oC; 3 min, 72.0 oC)x15 cycles; 7 min, 72.0 oC. Products were analyzed by

agarose gel electrophoresis and quantitated using the Molecular Imager Software (Bio-Rad).

Incorporation of dyUTP by RD Library

2' -deoxypseudouridine-5 '-triphosphate (dyUTP) was purchased from TriLink

BioTechnologies (San Diego, California). Standard deoxynucleotide triphosphates (dNTPs)

were comprised of 2'-deoxyadenosine-5 '-triphosphate (dATP), 2'-deoxycytidine-5' -triphosphate

(dCTP), 2'-deoxyganosine-5' -triphosphate (dGTP), and thymidine-5' -triphosphate (TTP) and

were purchased from Promega Corporation (Madison, Wisconsin). dyUNTPs were comprised

of dATP, dCTP, d GTP, and dyUTP.

Individual cultures (5 mL LB-Amp ) of the SW5 SW78 cell lines were grown for 14.25

hrs at 250 rpm and 37 oC in 14 mL 2059 Falcon Tubes (BD Biosciences). Approximately 2 x

10s colony-forming units (cfu), roughly equal to 500 CIL of a culture with an OD5sonm Of 4.0,

were used to inoculate individual 100 mL cultures of LB-Amp in 500 mL baffled flasks. These

cultures were grown at 37 oC and 250 rpm for 3.75 hrs to an approximate OD5sonm Of 1.8, and

were then induced with anhydrotetracycline. The cells were allowed to grow for 1 hr longer to

an approximate OD5sonm Of 3.0.

Approximately 1 x 106 ofu (~2 CIL cells) were used as the sole source of polymerase and

template in separate PCR reactions containing final concentrations of these constituents: lX

Modified ThermoPol Buffer, 1.4 CIM P-4, 1.4 CIM P-5, 1.1 ng/CIL RNaseA, and 6% DMSO. The

final concentration of nucleotide triphosphates added to the reactions were one of the following:

500 C1M dNTPs; 500 CIM dATP/dGTP/dCTP; 500 CIM dATP/dGTP/dCTP + 450 CIM TTP + 50

CIM dyUTP; 10 C1M dATP/dGTP/dCTP + 400 C1M TTP + 100 CIM dyUTP; 10 CIM









dATP/dGTP/dCTP + 350 CIM TTP + 150 CIM dyUTP; 10 CIM dATP/dGTP/dCTP + 300 CIM

TTP + 200 C1M dyUTP; 10 CIM dATP/dGTP/dCTP + 250 CIM TTP + 250 CIM dyUTP; 10 CIM

dATP/dGTP/dCTP + 200 CIM TTP + 300 CIM dyUTP; 10 CIM dATP/dGTP/dCTP + 150 CIM

TTP + 350 C1M dyUTP; 10 CIM dATP/dGTP/dCTP + 100 CIM TTP + 400 CIM dyUTP; 10 CIM

dATP/dGTP/dCTP + 50 CIM TTP + 450 CIM dyUTP; 500 CIM dyUTPs. The PCRs (50 CIL) were

run under the following conditions: 5 min, 94 oC; (1 min, 94.0 oC; 1 min, 55.0 oC; 3 min, 72.0

oC)x15 cycles; 7 min, 72.0 oC. Products were analyzed by agarose gel electrophoresis and

quantitated using the GeneTools Software, version 3.07 (SynGene, Cambridge, England).

Selection of Thermostable Mutants Using Water-In-Oil Emulsions

Water-in-oil emulsions

The appropriate cell line was grown overnight in LB-Amp broth (5 mL) for 14.25 hr at 37

oC and 250 rpm in 14 mL 2059 Falcon Tubes (BD Biosciences). Approximately 2 x 10s colony-

forming units (cfu), roughly equal to 500 CIL of a culture with an OD5sonm Of 4.0, were used to

inoculate a 100 mL culture of LB-Amp in 500 mL baffled flasks. These cultures were grown at

37 oC and 250 rpm for 3.75 hrs to an approximate OD5sonm Of 1.8, induced with

anhydrotetracycline, and allowed to grow for 1 hr longer to an approximate OD5sonm Of 3.0. The

amount of culture containing 2 x 10s ofu was determined; that amount was centrifuged (13,000

rpm, 2 min), the supernatant removed, and the remaining pellet was stored on ice.

The aqueous phase of the emulsions was prepared by resuspending the cell pellet in a 200

CIL solution containing: lX Modified ThermoPol Buffer, 500 CIM dNTPs, 1.4 C1M P-4, 1.4 CIM P-

5, 1.1 ng/CIL RNaseA, and 6% DMSO. For control reactions, without cells, 1 ng/CIL of pSW2

and 10 U Taxq Polymerase were added to the aqueous phase. Reactions were stored on ice until

further use.










To prepare the oil-phase of the emulsions, Arlacel Pl3 5 (Uniqema, New Castle, Delaware)

was heated to 75 oC, as was mineral oil (Sigma-Aldrich, St. Louis, Missouri). The mineral oil

was mixed with the Arlacel Pl35 (1.5% v/v) in a 5 mL Corning Externally Threaded Cryogenic

Vial (Corning, Acton, Massachusetts) containing an 8 x 3 mm stir bar with pivot ring. The oil-

phase was stirred at 1000 rpm on ice while the 200 CIL aqueous phase was added drop-wise over

a period of 2 minutes. The emulsion was stirred for 5 min longer, then subj ected to PCR [5 min,

94 oC; (1 min, 94.0 oC; 1 min, 55.0 oC; 3 min, 72.0 oC)x15 cycles; 7 min, 72.0 oC].

Products were extracted from the emulsions with the addition of two volumes of water-

saturated ether. The ether and emulsions were mixed by vortexing, centrifuged (5 min, 8000

rpm), and the aqueous phases extracted. To rid the aqueous phases of contaminating enzyme, the

products were subj ected to a QIAquick PCR Purification Kit (Qiagen), and products were eluted

from the column in Qiagen Buffer EB (50 C1L). Products were separated using agarose gel

electrophoresis; the product band was extracted and then purified using a QIAquick Gel

Extraction Kit (Qiagen). Samples were eluted in Qiagen Buffer EB (50 CIL), and product

concentration was determined by measuring absorption at 260 nm.

Re-cloning of selected mutants

The final products of the emulsions were used in an adaptation of the Miyazaki and

Takenouchi megaprimer PCR protocol (Miyazaki and Takenouchi, 2002). CSR products were

digested with Ncol and SacII according to manufacturer' s protocol (New England BioLabs).

Digested samples (10 ng in 1 C1L) were added to a 49 CIL PCR mixture (lX Native Pfu Buffer,

100 ng pSW2, 500 C1M dNTPs, 6% DMSO). Mixture was heated to 96 oC for 30 s prior to the

addition of 0.05 U/C1L Native Pfu Polymerase (Stratagene, La Jolla, California). Samples were









then subjected to PCR [2 min, 96 oC; (30 s, 96.0 oC; 10 min, 68.0 oC)x25 cycles; 30 min, 72.0

oC].

The template strands of DNA (pSW2 plasmid in the PCR) were digested with 2 U DpnI

(New England BioLabs) at 37 oC for 2.5 hrs. Reactions were cooled to room temperature,

purified using a Qiagen PCR Purifieation Kit, and eluted with Qiagen Buffer EB (30 CL).

Purified products were transformed into the E. coli DH500 cell line according to manufacturer's

protocol (Invitrogen, Carlsbad, California). Fifty isolated colonies were selected after the

transformation (cell lines SW79 through SW128). Overnight 5 mL LB-Amp cultures (250 rpm,

37 oC) were grown for each colony, and their plasmids isolated using the QIAprep Spin

Miniprep Kit (Qiagen). Plasmid constructs were verified by restriction digest analysis, using the

enzymes BamnHI and Ncol according to the manufacturer' s protocol (New England BioLabs),

and mutations were determined by sequencing.

Results

Growth Curves and Cell Counts

Growth curves, cfu counts, and protein expression studies were performed on the SW1 -

SW4 cell lines to determine the optimal times for induction (Fig. 3-6[A-C]). The optimal time (1

hr) for induction for both the SW3 and SW4 cell lines was found to be during late log phase at an

optical density of approximately 1.8 at 550 nm. The optimal length of induction was 1 hr, due to

the rapid death of the cells after the induction of the taq gene, as is evidenced by a drop in the

efu/mL counts (Fig. 3-6B). Inductions longer than 1 hr, or induction at early to mid-log phases

caused the cells to perish due to toxicity because of the over-expression of a polymerase in vivo

(data not shown) (Moreno et al., 2005, Andraos et al., 2004). When the migration of the

recombinant Taq polymerases (His(6)-wt Taq and co-Taq) are compared to that of the Taq









Polymerase purchased from New England BioLabs, they all appear to have the same observed

molecular weight of 94 kDa on a Coomassie Blue stained SDS-PAGE (7.5%) gel (Fig. 3-6C).

Purification of His(g)-wt Taq Polymerase

The His(6)-wt Taq polymerase was purified from SW3 cells that were over-expressing the

His(6)-wt taq gene using nickel affinity chromatography. The polymerase was purified to a single

band on a Coomassie Blue stained SDS-PAGE (7.5%) gel (Fig. 3-7A), and elution fractions 2 -

4 were combined and concentrated via dialysis to generate a working stock of His(6)-wt Taq

polymerase. The protein concentration was determined to be 0.744 Clg/CIL, using the Bio-Rad

Protein Assay Dye Reagent. To verify the ability of the purified Hi sgc-wlf Taq polymerase to

amplify DNA in a PCR reaction, similar to that of Taq polymerase (New England BioLabs),

each of these polymerases were used in separate, identical PCRs. The final concentration of

polymerase (5.5 Clg/mL) in each reaction was kept constant. Figure 3-7B shows the products of

these PCRs, and after analysis it was determined that the densities of these two bands were

almost identical.

Incorporation of du/UTP by RD Library

In efforts to find a polymerase that can incorporate and extend beyond dyUs with higher

efficiency than the co-Taq polymerase, each of the mutant Taq polymerases in the RD Library

were tested for their ability to incorporate dyUTP across from template dA in PCR reactions

containing varying ratios of TTP to dyUTP. Reactions contained induced cells as the sole

source of polymerase and template plasmid, so active polymerases were forced to replicate their

own encoding gene (2603 bp).

Figure 3-8[A-B] shows the difference between the PCR products from the co-Taq

polymerase screen (Fig. 3-8A) and a representative (SW21) of the RD Library (Fig. 3-8B). In









both of these reactions, the polymerase could not produce full-length product (FLP) with

concentrations of dyUTP higher than 400 C1M (final concentration). Based on the product band

densities, it was found that none of the active RD Library polymerases displayed a higher

propensity for the incorporation of dyUTP than the co-Taq polymerase (Table 3-4). It was also

noted that only 18 of the 74 mutant polymerases tested showed activity with only dNTPs under

these assay conditions (Table 3-2 and Table 3-4).

Selection and Identification of Thermostable Mutants Using Water-In-Oil Emulsions

We pooled all 74 RD Library strains to perform a selection in water-in-oil emulsions to

isolate those 18 mutants that showed activity. After the products were isolated, they were used

in a modified version of the Miyazaki and Takenouchi megaprimer PCR protocol (Miyazaki and

Takenouchi, 2002), creating the full-length plasmid (pASK-IBA43plus with insert). Purified

products were transformed into the E. coli DH500 cell line; fifty clones were isolated, sequenced,

and compared to the co-Taq amino acid sequence (Table 3-5). Of these fifty clones, 22 showed

no changes relative to the co-Taq sequence, and the remaining 28 had at least one residue

modified. Table 3-6 shows a breakdown of these mutations, and states whether they are random

mutations or RD Library mutations. In the case of the RD Library mutations, it is indicated if

they are true RD Library sequences, RD Library sequences with additional mutations, RD

Library sequences with reversions to the co-Taq sequence, and/or crossovers between two or

more RD Library sequences. In addition, only 5% of the mutations found in these sequences

encode silent mutations (Table 3-6).

As a control, the selection was also performed using only cells expressing the co-Taq

polymerase. Five clones were submitted for sequencing following the megaprimer PCR

protocol. Of these five, four were the correct co-Taq polymerase sequence found in SW4, and









the fifth contained only two amino acid mutations in relation to the co-Taq sequence (data not

shown).

Discussion

Previously, directed evolution experiments have defined mutations that allow Taxq

polymerase, and other Family A polymerases, to be used in different situations; for example, a

few allow for the incorporation of non-standard bases, others are more thermostable, and some

are resistant to inhibitors (Ghadessy et al., 2001, Ghadessy et al., 2004, Henry and Romesberg,

2005). The design of our RD Library was based off mutations discussed in the review by Henry

and Romesberg (Henry and Romesberg, 2005), and were carried out by using the REAP

approach with the Family A polymerases. A library of 74 polymerases was designed, which

contained three to four amino acid mutations per polymerase out of a pool of thirty-five possible

mutations, in an attempt to identify a polymerase with the ability to incorporate non-standard

bases, exhibiting a C-glycosidic linkage, with efficiency and fidelity.

It has been demonstrated previously that the over-expression of a polymerase in a cell can

cause toxicity problems and cause premature cell death (Moreno et al., 2005, Andraos et al.,

2004). To circumvent this problem, the gene encoding His(6)-wt Taxq polymerase was optimized

for codon-usage in E. coli, and cloned into a tightly-regulated plasmid (Skerra, 1994) in an

attempt to express the polymerase at higher levels only after induction. After appropriate

expression conditions were found, the members of the RD Library were individually tested for

their ability to incorporate dyUTP, a representative non-standard nucleotide exhibiting a C-

glycosidic linkage. The polymerases were challenged with increasing concentrations of the

dyUTP as the concentration of TTP presented was decreased. None of the RD Library

polymerases were able to incorporate dyUTP more efficiently than the codon-optimized Taq










sequence. In the future, other possible mutation sites and combinations of mutations may need

to be made and tested to find a polymerase that can accomplish this task. Interestingly, only

eighteen of the 74 mutant polymerases tested showed activity with standard dNTPs under these

assay conditions.

Ideally, a selection would have been performed using the RD Library to identify

polymerases able to incorporate dyUTP with efficiency. Since none were able to incorporate the

NSB more efficiently than the co-Taq polymerase, as evidenced by the densities of the FLP

bands, a selection was performed to identify those polymerases that showed activity with the

dNTPs under these assay conditions. A water-in-oil emulsion system, similar to that Ghadessy et

al. described (Ghadessy et al., 2001), was used as a means to link geneotype to phenotype,

forcing active polymerases to replicate their own genes in a PCR reaction. All 74 cell lines

containing the RD-Library were used in equal proportions to perform such a selection. After

products were extracted from the emulsion system, they were recloned into a plasmid using a

version of the megaprimer PCR (Miyazaki and Takenouchi, 2002).

The megaprimer PCR method was chosen as the method for recombining the polymerase

genes with the plasmid based on its "one pot" approach. After extracting the final products from

the emulsions, all further recloning can take place in one reaction vessel, and undergoes only one

purification step prior to transformation into a cell line. Other methods, using digestions and

ligations, require several purification steps between the various procedures, resulting in low

yields of final product.

After sequencing, it was noted that 22 out of the 50 clones sequenced contained the

original co-Taxq polymerase sequence; 15 carried partial forms of the original RD Library

sequences, and only four were true RD library sequences. The remaining nine sequences were









random mutations most likely created during the PCR in the emulsions. This could be due to the

fact that Taq polymerase has an error rate of approximately 8 x 10-6 mutationall

frequency/bp/duplication) (Cline et al., 1996). It is also noteworthy that two of 50 sequences

(SW119 and SW122) contained frameshift mutations, which tend to occur once every 2.4 x 10-5

base pairs when using Taq polymerase (Tindall and Kunkel, 1988).

Since the plasmid carrying the co-taq gene was only introduced during the megaprimer

PCR, and the plasmid used as template was digested with DpnI, it was determined that during the

course of the megaprimer PCR reaction, recombinations and reversions of the various sequences

most likely occurred during this procedure. This would explain the high number of co-Taq

sequences and the large number that contain various additions, reversions, and crossovers

relative to the original RD Library mutations. This also accounts for the presence of the

numerous co-Taxq polymerase clones identified after sequencing.

Out of the four exact RD library sequences that were recovered, only one coded for a

mutant that was previously shown to have activity in the assay using dyUTP. This could

indicate that the emulsions are breaking, allowing active polymerases to replicate the genes of

inactive polymerases. Further tests could be performed to confirm or deny this conclusion; an

example would be using two different cell lines in an emulsion, one expressing active

polymerase and one expressing inactive polymerase. Identification of the Einal product would

allow us to determine if indeed these emulsions are rupturing. If this is the case, modifications

could be made to the oil phase of the emulsions, such as increasing the percentage of Arlacel

Pl35, to prevent this from occurring.

We have determined that the megaprimer PCR method would be an efficient way of

introducing diversity into a library between rounds of selection, but it is not an effective means









for recloning if trying to identify specific products. Once the stability of the emulsion system is

verified, and the recloning of the CSR products is performed using the standard

digestion/ligation/transformation protocol (Sambrook et al., 1989), it is likely that we will be

able to identify thermostable polymerases using this technique. The next step would be using

this method with a random library, instead of a rationally designed library, to identify

thermostable polymerases and/or polymerases that can incorporate C-glycosides with efficiency

and fidelity. After several rounds of evolution, we may be able to identify a polymerase capable

of functioning with an AEGIS.



























Non-viral Fblymerases






















Viral Polymerases











Figure 3-1. A phylogenetic tree of the Family A polymerases. This tree was generated using
Pfam (Bateman, 2006, Finn et al., 2006), and analyzed for sites that underwent Type
II functional divergence. Appendix B has parts of this tree expanded so that it is
readable.










































Figure 3-2.


Locations of the 3 5 rationally designed (RD) sites in the Taq polymerase structure.
These held the mutations in the RD Library. There were 57 mutations made at these
sites: sites in red were sites where the natural amino acid was replaced by one
different amino acid. Amino acids in blue indicate sites that were replaced by two
different amino acids. Sites in green represent sites where three residues were
substituted for the original amino acid. Image created by Dr. Eric Gaucher using the
PyMOL Molecular Graphic System (DeLano, 2002).















Oligo Sequence (5'-3' Direction) Purification
P-2 GAT GAC CGC GGT ATG CTG CCC CTC Desalted
P-3 CAT TAC AGA CCA TGG TCA CTC CTT GGC GGA G Desalted
P-4 CAA ATG GCT AGC AGA GGA TCG CAT CAC CAT CAC Desalted
P-5 CAG GTC AAG CTT ATT ATT TTT CGA ACT GCG GGT GGC Desalted
P-6 GAG TTA TTT TAC CAC TCC CT Desalted
P-7 CGC AGT AGC GGT AAA CG Desalted
P-8 GAA AAC CGC GCG TAA ACT GC Desalted
P-9 CCT GGA ACA CGC GAA TCA GG Desalted

*All oligonucleotides were synthesized by Integrated DNA Technologies (Coralville, Iowa).


Table 3-1. Oligonucleotides used in this study.













Sac II
Eco RI
Ec113611
Sacl
Acc651


B HI

Accl
Sally
BpIMI
Pstl
NcolI
Eco 4711l


aN in Ill
Prom %Fis Tag
Strep-Tag ivae l
Ngo""MIV


3000
Sfl Crigin


pASK-IBA43plus
2500
3286 bps ,000


2000
Tet-Repressor


1500


Ppu 11


Amp R


Xmn I


Eco NI/
Bpu 101
Nsp


Scal


Fspl
Mun I


Figure 3-3. View of the pASK-IBA43plus plasmid. This plasmid was purchased from IBA
GmbH (St. Louis, Missouri) and it can generate an N-terminal hexahistidine and a
C-terminal Strep-tag". This high copy number plasmid is a tightly controlled
tetracycline expression system conferring ampicillin resistance.














Hincll


Promoter


Spel~
Ndel
Ppul01
Nsil
Bsml
Nrul
SnaBI


sooo
Tet-Repressor


1000




2000


wt taq


4000 5723 bps


Ahdl,
Bmrl

MunI


AmpR


Pvul
Scal


BamHI


fl ori


PshAl


Figure 3-4. View of the pSW1 plasmid. This is a ligation of the pASK-IBA43plus plasmid with
the His(6)-wt taq polymerase gene using the SacII and Ncol restriction sites. This
plasmid generates an N-terminal hexahistidine translated with the His(6)-wt taq gene.
This high copy number plasmid is a tightly controlled tetracycline expression system
conferring ampicillin resistance.


pSW1





















Promoter


Ppul0\
Nsil


.Eco521


sooo
Tet-Repressor


1000




2000


co taq


4000 5723 bps


Ahdl


AmpR


,Ecl1361l
Sacl


fl Origin


Figure 3-5. View of the pSW2 plasmid. This is a ligation of the pASK-IBA43plus plasmid with
the codon-optimized taq polymerase gene using the SacII and Ncol restriction sites.
This plasmid generates an N-terminal hexahistidine translated with the co-taq gene.
This high copy number plasmid is a tightly controlled tetracycline expression system
conferring ampicillin resistance.


pSW2
















Plasmid DNA 2.0 Mutations Present Plasmid DNA 2.0 Mutations Present
Name Gene ID # in RD Taq Library Name Gene ID # in RD Taq Library


pSW14 5351 S573E,D575F,F595V

pSW16 533 S7E,D6 L,E JHH
pSW17 55 404 ,T 11 J4E
pSW18 55 94 ,F 64H,H60

pSW20 359 S1 I, 60E,E 1 I
pSW21 360 A54 ,E 1 I,H60
pSW22 361 S1 I, 59,I 11

pSW24 5364 T 41 65,L6S


pSW31 5371 R5 I,R5 4 ,F 4L pSW68 5421 Q579A,R657D,F664Y,A740R
pSW69 5422 R533I,K537I,A605K,L613I
pS33 5375 S7H,F 64 ,R 43
pSW71 5425 D575T,F664H,E742V,R743A
pSW355377 T541A,F664L,R743A pSW72 5426 A594C,I611E,F664L,A740S
pSW 6 5378 T511V,R533I,D622A pSW7 3 5427 N5 80OS, L613A, A7 40S, R7 43A
pSW375379 A597S,I611E,Y668F
pSW38 5381 Y542E,F595W,L606C pSW75 52 V583K,E612I,L613D,Y668F
pSW76 53 S573E,R584V,A594C,D622S

*The pink cells denote the sequences of polymerases showing activity. The blue cells signify the
sequences of polymerases that lack evidence of activity under these assay conditions. All are derivatives
of the co-taq gene and inserted into the pASK-IBA43plus vector. Mutations were designed by Dr. Eric
Gaucher (Foundation for Applied Molecular Evolution) and were synthesized and assembled by DNA
2.0.


41A,L606P,L61
42E,V583K,A60
,F595W,A605E,
.D575F.T,613A.


Table 3-2. Rationally Designed (RD) Mutant Library.




















Name Strain Genotype IName Strain Gntp


SW1 cobTG-1F'traD36lacl"A~lacZ) MI5 proA B' /supE A~hsdM-morB)5 (rk
mi MorB) thl A(lac-proAB)


E cobr TG-1 SW1/pSW66 (pASK-IBA43+ with co taq mutant 5419, Ap')


E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1


SW1/pASK-IBA43+
SW1/pSW1 (ASK-IBA43+
SW1/pSW2 (pASK-IBA43+
SW1/pSW3 (pASK-IBA43+
SW1/pSW4 (pASK-IBA43+
SW1/pSW5 (pASK-IBA43+
SW1/pSW6 (pASK-IBA43+
SW1/pSW7 (pASK-IBA43+
SW1/pSW8 (pASK-IBA43+
SW1/pSW9 (pASK-IBA43+
SW1/pSW10 (pASK-IBA43
SW1/pSW11(pASK-IBA43
SW1/pSW12 (pASK-IBA43
SW1/pSW13 (pASK-IBA43
SW1/pSW14 (pASK-IBA43
SW1/pSW15 (pASK-IBA43
SW1/pSW16 (pASK-IBA43
SW1/pSW17 (pASK-IBA43
SW1/pSW18 (pASK-IBA43
SW1/pSW19 (pASK-IBA43
SW1/pSW20 (pASK-IBA43
SW1/pSW21(pASK-IBA43
SW1/pSW22 (pASK-IBA43I
SW1/pSW23 (pASK-IBA43I
SW1/pSW24 (pASK-IBA43I
SW1/pSW25 (pASK-IBA43
SW1/pSW26 (pASK-IBA43I
SW1/pSW27 (pASK-IBA43I
SW1/pSW28 (pASK-IBA43I
SW1/pSW29 (pASK-IBA43
SW1/pSW30 (pASK-IBA43
SW1/pSW31 (pASK-IBA43
SW1/pSW32 (pASK-IBA43I
SW1/pSW33 (pASK-IBA43I
SW1/pSW34 (pASK-IBA43
SW1/pSW35 (pASK-IBA43
SW1/pSW36 (pASK-IBA43
SW1/pSW37 (pASK-IBA43I
SW1/pSW38 (pASK-IBA43
SW1/pSW39 (pASK-IBA43
SW1/pSW40 (pASK-IBA43
SW1/pSW41 (pASK-IBA43I
SW1/pSW42 (pASK-IBA43I
SW1/pSW43 (pASK-IBA43I
SW1/pSW44 (pASK-IBA43
SW1/pSW45 (pASK-IBA43I
SW1/pSW46 (pASK-IBA43I
SW1/pSW47 (pASK-IBA43I
SW1/pSW48 (pASK-IBA43
SW1/pSW49 (pASK-IBA43
SW1/pSW50 (pASK-IBA43
SW1/pSW51 (pASK-IBA43I
SW1/pSW52 (pASK-IBA43I
SW1/pSW53 (pASK-IBA43
SW1/pSW54 (pASK-IBA43
SW1/pSW55 (pASK-IBA43I
SW1/pSW56 (pASK-IBA43I
SW1/pSW57 (pASK-IBA43
SW1/pSW58 (pASK-IBA43
SW1/pSW59 (pASK-IBA43
SW1/pSW60 (pASK-IBA43I
SW1/pSW61 (pASK-IBA43I
SW1/pSW62 (pASK-IBA43I
SW1/pSW63 (pASK-IBA43
SW1/pSW64 (pASK-IBA43


E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1
E cobr TG-1


SW1/pSW67 (pASK-IBA43
SW1/pSW68 (pASK-IBA43
SW1/pSW69 (pASK-IBA43
SW1/pSW70 (pASK-IBA434
SW1/pSW71 (pASK-IBA434
SW1/pSW72 (pASK-IBA43
SW1/pSW73 (pASK-IBA43
SW1/pSW74 (pASK-IBA434
SW1/pSW75 (pASK-IBA434
SW1/pSW76 (pASK-IBA434


with co taq mutant 5420, Ap')
with co taq mutant 5421, Ap')
with co taq mutant 5422, Ap')
with co taq mutant 5423, Ap')
with co taq mutant 5425, Ap')
with co taq mutant 5426, Ap')
with co taq mutant 5427, Ap')
with co taq mutant 5428, Ap?
with co taq mutant 5429, Ap')
with co taq mutant 5430, Ap')


with wt taq nsert, Ap')
wi~th co taq Insert, Ap')
with co taq mutant 5339, Ap')
with co taq mutant 5340, Ap')
wi~th co taq mutant 5342, Ap')
wi~th co taq mutant 5343, Ap')
with co taq mutant 5344, Ap)
with co taq mutant 5345, Ap')
with co taq mutant 5346, Ap')
+ wth co taq mutant 5347, Ap')
+ wth co taq mutant 5348, Ap')
+ wth co taq mutant 5349, Ap')
+ with co taq mutant 5350, Ap')
+ with co taq mutant 5351, Ap')
+ wth co taq mutant 5352, Ap')
+ wth co taq mutant 5353, Ap')
+ wth co taq mutant 5356, Ap')
+ with co taq mutant 5357, Ap')
+ wth co taq mutant 5358, Ap')
+ wth co taq mutant 5359, Ap')
+ wth co taq mutant 5360, Ap')
+ with co taq mutant 5361, Ap')
+ with co taq mutant 5363, Ap')
+ with co taq mutant 5364, Ap')
+ wth co taq mutant 5365, Ap')
+ with co taq mutant 5366, Ap')
+ with co taq mutant 5367, Ap')
+ with co taq mutant 5368, Ap')
+ wth co taq mutant 5369, Ap')
+ wth co taq mutant 5370, Ap')
+ wth co taq mutant 5371, Ap')
+ with co taq mutant 5372, Ap')
+ with co taq mutant 5375, Ap')
+ wth co taq mutant 5376, Ap')
+ wth co taq mutant 5377, Ap')
+ wth co taq mutant 5378, Ap')
+ with co taq mutant 5379, Ap')
+ wth co taq mutant 5381, Ap')
+ wth co taq mutant 5382, Ap')
+ wth co taq mutant 5383, Ap')
+ with co taq mutant 5384, Ap')
+ with co taq mutant 5385, Ap')
+ with co taq mutant 5387, Ap')
+ wth co taq mutant 5388, Ap')
+ with co taq mutant 5389, Ap')
+ with co taq mutant 5390, Ap')
+ with co taq mutant 5391, Ap')
+ wth co taq mutant 5393, Ap')
+ wth co taq mutant 5395, Ap')
+ wth co taq mutant 5396, Ap')
+ with co taq mutant 5397, Ap')
+ with co taq mutant 5398, Ap')
+ wth co taq mutant 5399, Ap')
+ wth co taq mutant 5400, Ap')
+ with co taq mutant 5401, Ap')
+ with co taq mutant 5402, Ap')
+ wth co taq mutant 5405, Ap')
+ wth co taq mutant 5408, Ap')
+ wth co taq mutant 5409, Ap')
+ with co taq mutant 5410, Ap')
+ with co taq mutant 5411, Ap')
+ with co taq mutant 5413, Ap')
+ wth co taq mutant 5414, Ap')
+ wth co taq mutant 5417, Ap')


E cobr DH5a SW1/pCSRMut1 (pASK-IBA43+ with co taq CSR mut 1, Apr)
E cobr DH5a SW1/pCSRMu2 (pASK-IBA43+ with co taq CSR mut 2, Ap,)
E cobr DH5a SW1/pCSRMut (pASK-IBA43+ with co taq CSR mut 3, Ap,)
E cobr DH5a SW1/pCSRMut4 (pASK-IBA43+ with co taq CSR mut 4, Ap,)
E cobr DH5a SW1/pCSRMut5 (pASK-IBA43+ with co taq CSR mut 5, Ap,)
E cobr DH5a SW1/pCSRMut6 (pASK-IBA43+ with co taq CSR mut 6, Ap,)
E cobr DH5a SW1/pCSRMut7 (pASK-IBA43+ with co taq CSR mut 7, Ap,)
E cobr DH5a SW1/pCSRMut8 (pASK-IBA43+ with co taq CSR mut 8, Ap,)
E cobr DH5a SW1/pCSRMul9 (pASK-IBA43+ with co taq CSR mut 9, Ap,)
E cobr DH5a SW1/pCSRMutl0 (pASK-IBA43+ with co taq CSR mut 10, Ap,)
E cobr DH5a SW1/pCSRMutl1 (pASK-IBA43+ with co taq CSR mut 11, Ap,)
E cobr DH5a SW1/pCSRMutl2 (pASK-IBA43+ with co taq CSR mut 12, Ap,)
E cobr DH5a SW1/pCSRMutl3 (pASK-IBA43+ with co taq CSR mut 13, Ap,)
E cobr DH5a SW1/pCSRMutl4 (pASK-IBA43+ with co taq CSR mut 14, Ap,)
E cobr DH5a SW1/pCSRMutl5 (pASK-IBA43+ with co taq CSR mut 15, Ap,)
E cobr DH5a SW1/pCSRMutl6 (pASK-IBA43+ with co taq CSR mut 16, Ap,)
E cobr DH5a SW1/pCSRMutl7 (pASK-IBA43+ with co taq CSR mut 17, Ap,)
E cobr DH5a SW1/pCSRMutl8 (pASK-IBA43+ with co taq CSR mut 18, Ap,)
E cobr DH5a SW1/pCSRMutl9 (pASK-IBA43+ with co taq CSR mut 19, Ap,)
E cobr DH5a SW1/pCSRMut20 (pASK-IBA43+ with co taq CSR mut 20, Ap,)
E cobr DH5a SW1/pCSRMut2 1 (pASK-IBA43+ with co taq CSR mut 21, Ap,)
E cobr DH5a SW1/pCSRMut22 (pASK-IBA43+ with co taq CSR mut 22, Ap,)
E cobr DH5a SW1/pCSRMut23 (pASK-IBA43+ with co taq CSR mut 23, Ap,)
E cobr DH5a SW1/pCSRMut24 (pASK-IBA43+ with co taq CSR mut 24, Ap,)
E cobr DH5a SW1/pCSRMut25 (pASK-IBA43+ with co taq CSR mut 25, Ap,)
E cobr DH5a SW1/pCSRMut26 (pASK-IBA43+ with co taq CSR mut 26, Ap,)
E cobr DH5a SW1/pCSRMut27 (pASK-IBA43+ with co taq CSR mut 27, Ap,)
E cobr DH5a SW1/pCSRMut28 (pASK-IBA43+ with co taq CSR mut 28, Ap,)
E cobr DH5a SW1/pCSRMut29 (pASK-IBA43+ with co taq CSR mut 29, Ap,)
E cobr DH5a SW1/pCSRMut30 (pASK-IBA43+ with co taq CSR mut 30, Ap,)
E cobr DH5a SW1/pCSRMut31 (pASK-IBA43+ with co taq CSR mut 31, Ap,)
E cobr DH5a SW1/pCSRMut32 (pASK-IBA43+ with co taq CSR mut 32, Ap,)
E cobr DH5a SW1/pCSRMut33 (pASK-IBA43+ with co taq CSR mut 33, Ap,)
E cobr DH5a SW1/pCSRMut34 (pASK-IBA43+ with co taq CSR mut 34, Ap,)
E cobr DH5a SW1/pCSRMut35 (pASK-IBA43+ with co taq CSR mut 35, Ap,)
E cobr DH5a SW1/pCSRMut36 (pASK-IBA43+ with co taq CSR mut 36, Ap,)
E cobr DH5a SW1/pCSRMut37 (pASK-IBA43+ with co taq CSR mut 37, Ap,)
E cobr DH5a SW1/pCSRMut38 (pASK-IBA43+ with co taq CSR mut 38, Ap,)
E cobr DH5a SW1/pCSRMut39 (pASK-IBA43+ with co taq CSR mut 39, Ap,)
E cobr DH5a SW1/pCSRMut40 (pASK-IBA43+ with co taq CSR mut 40, Ap,)
E cobr DH5a SW1/pCSRMut41 (pASK-IBA43+ with co taq CSR mut 41, Ap,)
E cobr DH5a SW1/pCSRMut42 (pASK-IBA43+ with co taq CSR mut 42, Ap,)
E cobr DH5a SW1/pCSRMut43 (pASK-IBA43+ with co taq CSR mut 43, Ap,)
E cobr DH5a SW1/pCSRMut44 (pASK-IBA43+ with co taq CSR mut 44, Ap,)
E cobr DH5a SW1/pCSRMut45 (pASK-IBA43+ with co taq CSR mut 45, Ap,)
E cobr DH5a SW1/pCSRMut46 (pASK-IBA43+ with co taq CSR mut 46, Ap,)
E cobr DH5a SW1/pCSRMut47 (pASK-IBA43+ with co taq CSR mut 47, Ap,)
E cobr DH5a SW1/pCSRMut48 (pASK-IBA43+ with co taq CSR mut 48, Ap,)
E cobr DH5a SW1/pCSRMut49 (pASK-IBA43+ with co taq CSR mut 49, Ap,)
E cobr DH5a SW1/pCSRMut50 (pASK-IBA43+ with co taq CSR mut 50, Ap,)
E cobr DH5a SW1/pCSRwt1 (pASK-IBA43+ with co taq wt mut 1, Ap,)
E cobr DH5a SW1/pCSRwt2 (pASK-IBA43+ with co taq wt mut 2, Ap,)
E cobr DH5a SW1/pCSRwt3 (pASK-IBA43+ with co taq wt mut 3, Ap,)
E cobr DH5a SW1/pCSRwt4 (pASK-IBA43+ with co taq wt mut 4, Ap,)
E cobr DH5a SW1/pCSRwt5 (pASK-IBA43+ with co taq wt mut 5, Ap,)
F-80dlac ZAM15 A(lac ZYA-argF) Ul69 rec Al endAl
E o HuhsdR17(rk-, mk+ gal phoA sup E44 hE thf 1 gvr A96 relAl


E cobr TG-1 SW1/pSW65 (pASK-IBA43+ with co taq mutant 5418, Ap')


Table 3-3. Bacterial strains used in this study.

























Time (hr)
-*SW1 SW2 -SW3-U -tnSW3- -eSW4-U -*-SW4-1


Ti me (hr)
SW1 SW2 SW3-U SW3-I SW4-U SW4-I
0 1.36E+07 2.01E+07 6.10E+06 7.95E+06 6.00E+06 7.22E+06
1 5 .40E+07 1.31E+07 1.22E+07 8.73E+06 9.53E+06 1.04E+07
2 1.80E+08 3.04E+07 8.14E+07 5.21E+07 2.65E+07 3.09E+07
3 6.48E+08 8.63E+07 2.87E+08 2.79E+08 3.88E+08 1.90E+08
3.75 1.47E+09 3.22E+08 6.83E+08 4.93E+08 4.09E+08 4.36E+08
4.75 2.73E+09 7.95E+08 1.08E+09 2.36E+08 8.26E+08 5.66E+08
5.75 2.94E+09 1.01E+09 1.08E+09 5.07E+07 1.10E+09 6.29E+08
6.75 6.06E+09 1.39E+09 1.88E+09 1.22E+07 1.53E+09 5.01E+08
7.75 3.08E+09 1.11E+09 1.77E+09 7.39E+06 2.23E+09 7.63E+08
8.75 3.14E+09 1.14E+09 1.55E+09 3.90E+06 2.15E+09 5.69E+08


MWU U 13 14 U E



Rorrair psW RentpS


B)


Colony Counts (cfulmL)


Figure 3-6.


Growth curves, cell counts, and expression of various E. coli TG-1 cell lines. The
SW3 (denoted SW3-I) and SW4 (denoted SW4-I) cell lines were induced after 3.75
hrs with a final concentration of 0.2 ng/CIL anhydrotetracycline. A) Growth curves
of various cell lines. Samples were grown in LB media, cultures SW2 SW4 were
supplemented with ampicillin (100 Clg/mL final concentration), at 250 rpm and 37
OC for 8.75 hrs. B) Colony counts (cfu/mL) of each of the cell lines in part A at the
various time points. Cells were grown on LB or LB-Amp agar overnight at 37 oC.
C) Coomassie Blue stained SDS-PAGE (7.5%) gel showing protein expression of
induced cells at various time points. U stands for uninduced cells, I-1 through I-4
indicate time-points at hours one through four after induction (t = 4.75 through t =
7.75 hrs), and NEB Taq depicts the migration of the 94 kDa Taq polymerase
purchased from New England BioLabs. Since their genetic code has been optimized
for use in E. coli cells, the SW4 strain, containing the co-taq gene, appear to grow to
a higher OD550nm than the SW3 strain containing the His(6)-wt taq gene.











A)
250 kDa
150 kDa

100 kDa .
94 kDa-
75 kDa ag


B)

4000 bp
3000 bp

S.2000 bp
1550 bp
1400 I:-p


50 kDa

37 kDa


Polymerase Polymerase


MW U 1-5 L W-1 W-2 E-1 E-2 E-3 E-4


Figure 3-7.


Purification and activity of His e,,-in't Taq polymerase. A) The purification ofHis(6)-
wt Taq polymerase from SW3 cells after five hours of induction. U uninduced
cells, I-5 cells after 5 hrs of induction, L load from the Ni2+ COlumn, W-1 and W-
2 wash fractions from the column, E-1 through E-4 elution fractions from the
column. Elution fractions 2 4 were combined and subjected to dialysis. B)
Products of PCRs comparing identical concentrations of Taq polymerase (New
England BioLabs) and Hi s e,,-~in' Taq polymerase. The amount of product generated
with each polymerase was almost identical considering the density of the product
band using Taq polymerase was 1980 CNT/mm2 and the density of the product band
using His(6)-wt Taq polymerase was 1925 CNT/mm2













































E E E E E E E E E EU

E2E2E E2E2E E2E2E
D E ,_, ,_,E _,E ,,E ,_ E E ,,E ,_ E


Representative gels showing the amount of full-length PCR products generated with
different dNTP/dyUNTP ratios and the indicated polymerases. Concentrations of
dNTPs/dyUNTPs listed are the starting concentrations (see Materials and Methods
for listing of final concentrations). All PCRs used 1 x 106 CfU Of cells expressing
polymerase as the sole source of polymerase and template plasmid for the reaction.
Polymerases were forced to replicate their own encoding gene (2603 bp). A)
Incorporation of various dNTP/dyUNTP ratios by co-Taq polymerase. FLP is not
generated beyond the ratio of 3 mM TTP/7 mM dyUTP. B) Incorporation of
various dNTP/dyUNTP ratios by a representative of the RD Library (SW21). FLP
is not generated beyond the ratio of 3 mM TTP/7 mM dyUTP.


A)
4000 bp
3000 bp


2000 bp

1550 bp
1400 bp










B)
4000 bp

3000 bp


2000 bp
1550 bp
1400 bp












Figure 3-8.


a F F F F F F F F Fa
z O ~z
UU
Ci U O U C)U 0'3 O U U U O ~ O U O i~J
E E E E r E E E E
U
n~E~E~E~E~E~E~E~ESE
o r00 O
U U ~i U E~ ~3 P U r U ~ ;3 t U ~ ;3 ~ E
E~~~ahahahaha~aha
o TJ ~ TJ ~ rJ ~ rJ ~ rJ ~ TJ ~ n ~ rJ ~ n ~ O
EEEEETEEE
ETETETETETETETETET
EEEEEEEEE
ooooooooo














Raw Densities (CNT/nd2)
Cell
LieSubstitutions Al 9 mM dT/ 8 mM dT/ 7 mMdT/ 6 mM dT/ 5 mM dT/ 4 mM dT/ 3 mM dT/ 2 mM dT/ 1 mM dT/ AHl
dNTPs 1 mM dyU 2 mM dyU 3 mM dyU 4 mM dyU 5 mM dyU 6 mM dyU 7 mM dyU 8 mM dyU 9 mM dyU dyrUNTPs


Table 3-4. Incorporation of dryUTP at 94.0 oC by RD Library.




Full Text

PAGE 1

1 DIRECTED EVOLUTION OF DNA POLYMERASES By STEPHANIE ANN HAVEMANN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2007

PAGE 2

2 Copyright 2007 by Stephanie Ann Havemann

PAGE 3

3 To my family and my husband. Without your co nstant and vigilant s upport, I would not be where I am today. Thank you!

PAGE 4

4 ACKNOWLEDGMENTS I would like to begin by thanking my advisor, Dr Steven Benner for all of his wisdom and guidance; it has been an honor a nd a privilege to study under his tu telage. His passion for all facets of science, and how they can be intert wined, should serve as inspiration to us all. I would like to thank the rest of my committee: Dr. Tom Lyons, for always having an open door and an receptive ear when I had questions ; Dr. Nemat Keyhani, whose enthusiasm for science was contagious and whose knowledge of microbial genetics was extremely valuable; Dr. Nicole Horenstein, whose constant support a nd knowledge helped guide me throughout my graduate career; and Dr. Rob Ferl whose eagerness to learn and sh are information about various aspects of astrobiology helped me determin e the field of study I wish to pursue. Special thanks go to Dr. Eric Gaucher, Dr. Ryan Shaw, and Dr. Nicole Leal, all of whom have worked closely with me over the past fe w years and who have assisted me in various experimental designs and implementations. Er ic performed the rational design of the Taq mutants and was my source of knowledge for all th ings dealing with evol utionary biology. Ryan and I worked closely to discern the best method of creating and isolating DNA from oil-in-water emulsions; his idea of changing th e composition of the oil layer dr astically improved our yields. Nicole assisted me in performing some of my pr imer-extension assays and was a valuable source of information and never-ending support. I am extremely grateful to Dr. Daniel Hutter for the synthesis of the 2deoxypseudothymidine-5-triphosphate, to Dr. Shuichi Hoshika for the synthesis of the pseudothymidine precursor, and to Dr. Ajit Kamath for the synthesis and purification of the pseudouridine-containing oligonucleotides. Speci al appreciation also goes to Dr. Michael Thompson for providing the wt taq gene and his suggestions for the purification of the polymerase, and to Gillian Robbins for assisting on the growth curve studies.

PAGE 5

5 I am also thankful for the assistance of Dr. Art Edison and Omjoy Ganesh for their assistance in the circular dichroism experiments. Finally, I would like to thank all the members of the Benner group for their advice and discussions over the years, and Romaine Hughes, without whom, our group would be in total chaos.

PAGE 6

6 TABLE OF CONTENTS page ACKNOWLEDGMENTS...............................................................................................................4 LIST OF TABLES................................................................................................................. ..........9 LIST OF FIGURES................................................................................................................ .......10 LIST OF ABBREVIATIONS........................................................................................................12 ABSTRACT....................................................................................................................... ............15 CHAPTER 1 INTRODUCTION..................................................................................................................17 What are Nucleic Acids?........................................................................................................17 Rules of Complementarity...............................................................................................17 DNA Helical Conformations...........................................................................................18 Central Dogma of Molecular Biology.............................................................................19 What is AEGIS?................................................................................................................. ....20 Use of AEGIS Components.............................................................................................20 Problems with AEGIS Components................................................................................22 C-Glycosides...................................................................................................................23 Pseudouridine...........................................................................................................23 Pseudothymidine......................................................................................................24 DNA Polymerases................................................................................................................ ..25 General Structure of Polymerases...................................................................................25 Polymerase Families........................................................................................................27 Taq Polymerase...............................................................................................................28 Directed Evolution............................................................................................................. .....29 Mutagenic Libraries.........................................................................................................30 Systems of Directed Evolution........................................................................................32 Phage display............................................................................................................32 Compartmentalized self-replication.........................................................................33 Research Overview.............................................................................................................. ...34 2 POLYMERASE INCORPORATION OF MU LTIPLE C-GLYCOSIDES INTO DNA: PSEUDOTHYMIDINE AS A COMPONENT OF AN ALTERNATIVE GENETIC SYSTEM......................................................................................................................... ........50 Introduction................................................................................................................... ..........50 Materials and Methods.......................................................................................................... .52 Synthesis of Triphosphates and Oligonucleotides...........................................................52 Circular Dichroism..........................................................................................................53 Standing Start Primer-Extension Assays.........................................................................53

PAGE 7

7 Polymerase screen primer-extension assays.............................................................54 Taq polymerase primer-extension assays.................................................................55 Results........................................................................................................................ .............56 Circular Dichroism..........................................................................................................56 Polymerase Screen Primer-Extension Assays.................................................................56 Taq Polymerase Primer-Extension Assays......................................................................57 Discussion..................................................................................................................... ..........58 3 CREATION OF A RATIONALLY DESIGNED MUTAGENIC LIBRARY AND SELECTION OF THERMOSTABLE PO LYMERASES USING WATER-IN-OIL EMULSIONS...................................................................................................................... ...70 Introduction................................................................................................................... ..........70 Materials and Methods.......................................................................................................... .74 DNA Sequencing and Analysis.......................................................................................74 Construction of Plasmids.................................................................................................74 Construction of pSW1..............................................................................................74 Rationally designed mutagenic lib rary (RD Library) creation.................................75 Growth Curves and Cell Counts......................................................................................75 Purification of His(6)wt Taq Polymerase........................................................................76 Incorporation of d UTP by RD Library.........................................................................79 Selection of Thermostable Muta nts Using Water-In-Oil Emulsions..............................80 Water-in-oil emulsions.............................................................................................80 Re-cloning of selected mutants................................................................................81 Results........................................................................................................................ .............82 Growth Curves and Cell Counts......................................................................................82 Purification of His(6)wt Taq Polymerase........................................................................83 Incorporation of d UTP by RD Library.........................................................................83 Selection and Identification of Ther mostable Mutants Using Water-In-Oil Emulsions.....................................................................................................................84 Discussion..................................................................................................................... ..........85 4 DISTRIBUTION OF THERMOSTABILITY IN POLYMERASE MUTATION SPACE.103 Introduction................................................................................................................... ........103 Materials and Methods.........................................................................................................105 DNA Sequencing and Analysis.....................................................................................105 Bacterial Growth Conditions and Strains......................................................................105 Synthesis of Triphosphates and Oligonucleotides.........................................................106 Random Mutagenic Library (L4 Library) Creation.......................................................106 Incorporation of dNTPs by RD and L4 Libraries at Various Temperatures.................108 Incorporation of d UNTPs by RD Library at Optimal Temperatures..........................108 Incorporation of d UTP and d TTP by coTaq Polymerase at Various Melting Temperatures..............................................................................................................110 Results........................................................................................................................ ...........110 Random Mutagenic Library (L4 Library) Creation.......................................................110 Incorporation of dNTPs by RD and L4 Libraries at Various Temperatures.................111

PAGE 8

8 Incorporation of d UNTPs by RD Library at Optimal Temperatures..........................112 Incorporation of d UTP and d TTP by coTaq Polymerase at Various Melting Temperatures..............................................................................................................113 Discussion..................................................................................................................... ........114 5 CONCLUSIONS..................................................................................................................132 DNA Helical Structure in the Presence of C-Glycosides.....................................................132 Polymerase Screen for the Incorporation of C-glycosides...................................................133 Taq Polymerase Primer-Extension Assays...........................................................................134 Growth and Purification of Taq Polymerase........................................................................134 Creation of coTaq Polymerase Mutant Libraries................................................................136 Creation of the Rationally Designed Mutagenic Library (RD Library)........................136 Creation of the Random Mutage nic Library (L4 Library)............................................137 Preliminary Studies of the Incorporation of d UTP by the RD Library..............................137 Incorporation of dNTPs by RD and L4 Libraries at Various Temperatures........................138 Incorporation of d UTP by the RD Library at Optimal Temperatures...............................139 Incorporation of d UTP and d TTP by coTaq Polymerase at Various Temperatures......140 Selection of Thermostable RD Muta nts Using Water-In-Oil Emulsions.............................140 Future Experimentation........................................................................................................141 APPENDIX A SYNTHESIS OF PSEUDOTHYMI DINE AND PSEUDOTHYMIDINECONTAINING OLIGONUCLEOTIDES............................................................................144 B PHYLOGENETIC TREES OF FAMILY A POLYMERASES..........................................147 C GENETIC CODE AND AMINO ACID ABBREVIATIONS.............................................150 LIST OF REFERENCES.............................................................................................................151 BIOGRAPHICAL SKETCH.......................................................................................................158

PAGE 9

9 LIST OF TABLES Table page 1-1 Comparison of the structural geom etries of A, B, and Z-DNA forms...............................38 1-2 Characteristics of the various polymerase families...........................................................46 2-1 Oligonucleotides used in this study...................................................................................64 3-1 Oligonucleotides used in this study...................................................................................91 3-2 Rationally Designed (R D) Mutant Library........................................................................95 3-3 Bacterial strains used in this study.....................................................................................96 3-4 Incorporation of d UTP at 94.0 C by RD Library........................................................100 3-5 Mutations present after sele ction for active polymerases................................................101 3-6 Breakdown of types of mutati ons present after selection................................................102 4-1 Additional bacterial stra ins used in this study.................................................................120 4-2 L4 Mutant Library.......................................................................................................... ..121 4-3 Generation of full length PCR products from dNTPs by individual polymerases from the rationally designed (RD) Library at the indicated temperatures................................123 4-4 Generation of full length PCR products from dNTPs by individual polymerases from the randomly generated (L4) Library at the indicated temperatures................................124 4-5 Incorporation of d UTP by RD Library at optimal temperatures...................................126 4-6 Incorporation of d UTP and d TTP by coTaq Polymerase at various temperatures...129 C-1 The Genetic Code........................................................................................................... .150 C-2 Amino acid abbreviations................................................................................................150

PAGE 10

10 LIST OF FIGURES Figure page 1-1 The standard deoxyribonucleotides...................................................................................37 1-2 Puckering of the furanose ring of nucle osides into various envelope forms.....................39 1-3 The central dogma of molecular biology...........................................................................39 1-4 The six hydrogen bond patterns in an arti ficially expanded genetic information system (AEGIS).................................................................................................................40 1-5 The Versant branched DNA assay.................................................................................41 1-6 An example of non-standard nucleobas es coding for a non-standard amino acid.............42 1-7 Pseudouridine and pseudothymidine.................................................................................43 1-8 The polymerization reac tion of deoxyribonucleotides triphosphates catalyzed by DNA polymerases..............................................................................................................44 1-9 Kinetic steps involved in the nucleotide incorporation pathway.......................................44 1-10 Locations of active site residues in Taq polymerase.........................................................45 1-11 The staggered extension process (StEP) fo r rediversification of mutant libraries.............47 1-12 Phage display selection scheme.........................................................................................48 1-13 General scheme for CSR....................................................................................................49 2-1 A schematic representation of the CD spectra of Aand B-DNA forms...........................62 2-2 The base pairing interactions between a st andard A-T base pair and the non-standard T-A and U-A base pairs................................................................................................63 2-3 Representative CD Spectra................................................................................................65 2-4 Depiction of primer-extension assa ys used in the polymerase screen...............................66 2-5 Family A polymerase screen..............................................................................................67 2-6 Family B polymerase screen..............................................................................................68 2-7 Incorporation of one to twelve consecutive dT, d T, or d U residues by Taq polymerase..................................................................................................................... ....69 3-1 A phylogenetic tree of the Family A polymerases.............................................................89

PAGE 11

11 3-2 Locations of the 35 rationally designed (RD) sites in the Taq polymerase structure........90 3-3 View of the pASK-IBA43plus plasmid.............................................................................92 3-4 View of the pSW1 plasmid................................................................................................93 3-5 View of the pSW2 plasmid................................................................................................94 3-6 Growth curves, cell counts and expression of various E. coli TG-1 cell lines.................97 3-7 Purification and activity of His(6)wt Taq polymerase.......................................................98 3-8 Representative gels showing the amount of full-length PCR products generated with different dNTP/d UNTP ratios and the indicated polymerases........................................99 4-1 Epimerization of 2-deoxypseudouridine........................................................................119 4-2 Representative images of ethidium-bromi de stained agarose gels resolving products arising from PCR amplification using standard dNTPs and three different polymerases.................................................................................................................... ..122 4-3 Number of active RD and L4 mu tants at various temperatures.......................................125 4-4 Generation of full le ngth PCR product at 86.3 C using d UTP by the coTaq polymerase and the RD polymer ase in the SW29 cell line..............................................127 4-5 Generation of full le ngth PCR product at 94.0 C and 86.3 C using d UTP by the RD polymerase in the SW8 cell line................................................................................128 4-6 Generation of full le ngth PCR product at 86.3 C by coTaq polymerase using various TTP:d UTP and TTP:d TTP ratios..................................................................130 4-7 Graphical comparisons of the ba nd densities listed in Table 4-6....................................131 A-1 Synthesis of pseudothymidine precursor.........................................................................146 B-1 A seed alignment of the Family A polymerases..............................................................147 B-2 Inset of the phylogenetic tree of Family A polymerases (from Fig. 3-1) showing the location of Taq polymerase..............................................................................................148 B-3 Inset of the phylogenetic tree of Family A polymerases (from Fig. 3-1) showing the location of some viral polymerases..................................................................................149

PAGE 12

12 LIST OF ABBREVIATIONS A adenosine AEGIS artificially expanded genetic information system Amp ampicillin APS ammonium persulfate ATP adenosine triphosphate bp base pair Bst Bacillus stearothermophilus C cytosine Ci Curie (1 Ci = 3.7 x 107 Bequerel) CD circular dichroism Cfe cell-free extract cfu colony forming unit CPM counts per minute CNT counts CSR compartmentalized self-replication DMSO dimethyl sulfoxide dN deoxyribonucleoside (dA, dG, dC, T, T, U, etc.) DNA deoxyribonucleic acid DNase I deoxyribonucleic acid specific endonuclease ds double-stranded nucleic acid chain DTT 1,4-dithio-DL-threitol E. coli Escherichia coli

PAGE 13

13 EDTA ethylendiamino tetraacetate exolacking 3 5 exonuclease activity FLP full-length product G guanosine HIV human immunodeficiency virus type-1 hr hours HPLC high performance liquid chromatography isoC deoxyisocytidine isoG deoxyisoguanosine LB Luria-Bertani medium min minutes M-MuLV moloney murine leukemia virus mRNA messenger ribonucleic acid MWCO molecular weight cut-off NMR nuclear magnetic resonance NSB non-standard nucleobase OD optical density PAGE polyacrylamide gel electrophoresis PCR polymerase chain reaction Pfu Pyrococcus furiosus PMSF phenylmethylsulfonyl fluoride PNK polynucleotide kinase REAP reconstructing evolutionary adaptive paths

PAGE 14

14 RNA ribonucleic acid RNase A ribonucleic acid specific endonuclease rRNA ribosomal ribonucleic acid RT reverse transcriptase SDS sodium dodecylsulfate s seconds StEP staggered extension processes T thymidine T pseudothymidine Taq Thermus aquaticus DNA Polymerase I TBE Tris / borate / EDTA buffer TEMED N,N,N,N-tetramethylethylenediamine Tet tetracycline Tris tris(hydroxymethyl)aminomethane Triton X-100 octyl phenol ethoxylate tRNA transfer ribonucleic acid Tth Thermus thermophilus U uracil U pseudouridine UV ultraviolet wt wild type

PAGE 15

15 Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DIRECTED EVOLUTION OF DNA POLYMERASES By Stephanie Ann Havemann May 2007 Chair: Steven A. Benner Major Department: Chemistry To achieve the long-term goal of the Benner research group to crea te a synthetic biology based on an Artificially Expanded Genetic Inform ation System (AEGIS), polymerases that are able to incorporate non-standard bases (NSBs) into DNA must be id entified. In this dissertation, a polymerase from Thermus aquaticus ( Taq Polymerase) was identified that was able to incorporate non-standard nucleotide analogs that contain a C-glycosidic linkage. This activity was limited, meaning that the polyme rase needed modification to s upport this goal. Further, we asked whether sequential C-glycosid es destabilized the duplex and al tered its structure, to better understand whether a synthetic biology based on C-glycoside nucleotides was possible. To this end, two libraries of polymerases were created to identify mutations necessary to alter the polymerases ability to withstand hi gh temperatures. One library was created by the random mutagenesis of the taq gene, the other was rationally de signed based on previous studies. Seventy-four mutants from each library were scr eened for their ability to generate a full-length polymerase chain reaction (PCR) product using sta ndard nucleoside triphosphates at various temperatures; the library of random mutants cont ained more thermostable polymerases than the library obtained by rational desi gn. Water-in-oil emulsions were then tested to determine whether these, as artificial cells might deliver thermostable polym erase variants from those used

PAGE 16

16 in the screen. This identified difficulties in tools used to analyze th e output of the library, suggesting solutions that will guide future work. We also tested the individual components of the rationally designed libr ary for their ability to incorporate Cglycoside triphosphates in a PCR. Structural studies with synthe tic DNA containing multiple, consecu tive C-glycosides showed no change in conformation, at least not one th at is detectible by circular dichroism. These results represent a step towards the goal of creating an AEGIS-based synthetic biology, an artificial ch emical system that mimics emerge nt biological behaviors such as replication, evolution, and adapta tion. In addition, the mutant polymerases created in these experiments are an inventory of polymerases useful in biotechnology, possibly allowing the development of new, as well as improving on existing, clinical diagnostic t echniques and helping to facilitate a better understanding of polymerase-DNA interactions.

PAGE 17

17 CHAPTER 1 INTRODUCTION What are Nucleic Acids? Deoxyribonucleic acid (DNA), one of the fundamental constituents of life, serves as a key component for the storage and transfer of genetic information. It is built from four building blocks, adenosine, guanosine, cytidine, and thymidine, all of which are comprised of a nucleobase attached to a 2-deoxyribose molecule (Fig. 1-1). Similarly, ribonucleic acid (RNA) is also built from four building blocks, except that thymidine is re placed by uridine and the sugar moiety is a ribose. When a phosphate group replaces the 5-hydroxyl group of these molecules, they become acids that can be linked by their phos phate groups, resulting in the formation of the backbone of a nucleic acid strand. Genetic info rmation is commonly stored in a double stranded (ds) helix, which is formed when the nucleoba ses are paired by hydrogen bonds. These helical duplex strands are aligned so that the chains are anti-parallel to one another; in other words, one strand lies in the 5 3 direction and the complement is in the 3 5 orientation. Rules of Complementarity Watson and Crick proposed that the interactio ns between nucleobases are governed by two rules of complementarity: size compleme ntarity and hydrogen-bonding complementarity (Watson and Crick, 1953a, Watson and Crick, 1953b) Size complementarity means that a large purine, such as adenosine or guanosine, pairs w ith a small pyrimidine, like cytosine, thymidine, or uridine. Hydrogen-bonding complementar ity means that hydrogen bond donors from one nucleobase pair with the hydrogen bond acceptors from another. With these rules, it is expected that in the formation of nucleic acid duplexes, guanosine must pair with cytosine and adenosine must pair with either thymidine or uridine.

PAGE 18

18 DNA Helical Conformations The conformation of a DNA duplex is often assu med to be described using one of three abstract models: A-DNA, B-DNA, or Z-DNA (Saenger, 1984). The most common form of DNA found in living organisms is presumed to be the B-DNA helix. A-DNA is the common helical structure whose geometries are described in Table 1-1. It is also in teresting to note that many other minor helical conformations of dsRN A or dehydrated DNA, but it can also be found when certain DNA sequences repeat (Ghosh and Bansal, 2003). The only left-handed helix known is the Z-DNA conformation, which appears to be a characteristic of alternating GC-rich sequences that may help stabilize DNA duri ng transcription (Rich and Zhang, 2003). Many other helical conformations of D NA are possible, of course. I ndeed, over twenty-six different forms have been described in the literature to date (Egli, 2004, Ghosh a nd Bansal, 2003, Saenger, 1984). Nevertheless, for this work, we w ill reference the A-, B-, and Z-DNA models. In actuality, the conformation of a DNA mol ecule must be described by examining the structure atom by atom. Terms used to abstract the results of such an examination are described in Table 1-1. Thus, the different types of helic es are characterized by different geometries, such as the number of base pairs per turn, the height of a turn, the rotation per base pair, the size and depth of the major and minor grooves, and the type of sugar pucker. The sugar pucker refers to the conformation of the sugar, which can ex ist in one of four envelope forms: C2-endo, C2-exo, C3-endo, and C3-exo (Fig. 1-2) (Saenger, 1984). In some cases, helical structures can be tr ansformed from one conformation into another simply by the modification of the humidity of the environment (for fibers) and/or the concentrations of salt in the so lution (Saenger, 1984). Helical structures ca n also be changed by altering the chemical structure of the constituents. The conf ormation of the sugar pucker can alter the helical form of the DNA by increasi ng or decreasing the distances between the

PAGE 19

19 phosphate groups, thereby changing the number of base pairs per turn and the size of the grooves. The C2-endo conformation is usually found in B-DNA, while the A-DNA prefers the C3-endo pucker. The major and minor grooves f ound in B-DNA can act as binding pockets for polymerases, since they allow for the pres entation of nucleobase hydrogen bond donors and acceptors (Garrett and Grisham, 1999). The grooves presented by A-DNA are more symmetrical, making it difficult for polymerases to gain access to these potential hydrogenbonding sites (Garrett and Grisham, 1999). The conformation of a DNA helix can be assesse d in several ways. X-ray crystallography is, of course, the best way to identify the posi tion of individual atoms, with nuclear magnetic resonance (NMR) emerging as a preferred choice in solution. The general overall conformation can be extimated, however, by circular di chroism (CD) (Ghosh and Bansal, 2003). Central Dogma of Molecular Biology Nucleic acids maintain genetic information inside a cell by means of replication and transcription; translation uses this genetic inform ation to create proteins. This sequence has been called the central dogma of mo lecular biology by Crick (Fig. 1-3) (Crick, 1970). DNA is transcribed into messenger RNA (mRNA) using RNA polymerases, which is then translated into proteins. The translation of the mRNA uses a combination of ribosomes, which are composed of ribosomal RNA (rRNA) and proteins and transfer RNA (tRNA), wh ich carry amino acids to the ribosomes. In situations where the genetic material is stored as RNA, such as in viruses, the information is first converted back into DNA by enzymes known as reverse transcriptases prior to being translated. DNA can replicate itself by employing enzymes known as DNA polymerases, and RNA replicates itself using RNA polymerases. This feature of life raises an obvious question: Which came first, nucleic acids or proteins? At first glance, the answer appe ars to be nucleic acids, sin ce proteins cannot store genetic

PAGE 20

20 information. Upon further study, one realizes that without proteins, the genetic material could not be replicated. One possible answ er to this question is that the nucleic acids were once able to act as both storage molecules and as proteins that could ca talyze their own replication. The discovery of ribozymes and deoxyribozymes lends support to this theory by showing that nucleic acid molecules are not limited to the ability to stor e genetic information, they can catalyze reactions both within their own structure or upon ot her structures (Muller, 2006, Emilsson and Breaker, 2002, Paul an d Joyce, 2004). Many of these nucleic acid catalysts have been created using non-standard nucleobases (N SBs) to add additional functionality to the nucleic acid molecules (Muller, 2006). What is AEGIS? Using Watson and Cricks rules of comple mentarity and the requirement that the nucleobases be joined with three hydrogen bonds, it is feasible to create an artificially expanded genetic information system (AEGIS) containing ei ght additional base pairs (Fig. 1-4), thereby expanding the genetic alphabet from four to twelve letters (Switzer et al ., 1989, Piccirilli et al., 1990, Geyer et al., 2003). Since these bases retain the Watson and Crick geometry, they can be incorporated into growing DNA strands via synthe sis, primer-extension experiments, or by the polymerase chain reaction (PCR), which can subse quently be used in a variety of different techniques. Use of AEGIS Components The importance of AEGIS components has already been illustrated in many ways. It has been used in clinical diagnos tics, to expand the genetic code to understand DNA and polymerase interactions, and has even been implicated as a factor for evolution of life on Earth. These components have also been used in the first su ccessful six-letter PCR r eaction, lending support to the development of a synthetic biology.

PAGE 21

21 The powerful Versant branched-DNA assay, used to monitor the viral load of patients infected with HIV, Hepatitis B, or Hepatitis C viruses, requires the use of at least two nonstandard nucleobases (NSBs) (Collins et al., 1997). This assay uses 5-methyl-2'deoxyisocytidine (isoC) and 5-me thyl-2'-deoxyisoguanosine (isoG) to decrease the non-specific binding of a nucleic acid probe (F ig. 1-4), thereby increasing signa l amplification relative to noise by eight-fold over previous systems used (Fig. 1-5) (Huisse, 2004, Collins et al., 1997). EraGen Biosciences (Madison, WI) is now using these AEGIS components in a similar multiplexed system to identify newborns with cystic fibrosis (Johnson et al., 2004). These assays have barely begun to scratch the surface of the potential clinical diagnostic uses of this expanded genetic alphabet. The current genetic code uses 64 three-letter codons to encode for the incorporation of 20 canonical amino acids (Appendix C); use of all twelve AEGIS nucleotides would allow for 1728 three-letter codes, and if the AEGIS compone nts were functionalize d, the possibilities are seemingly nearly endless. AEGI S components have been already been used to encode for the incorporation of non-standard am ino acids in ribosome-mediated translation. For example, in 1992 Bain et al. used isoC and isoG in a codon-anti-codon pair to generate peptides containing the non-standard amino acid L-iodotyrosine (Bain et al., 199 2). More recently, Hirao et al. used the 2-amino-(2-thienyl)purine and pyridine -2-one in a codon-anti-codon pair in an in vitro transcription study to generate peptides containi ng 3-chlorotyrosine (Fig. 1-6) (Hirao et al., 2002, Hirao et al., 2006). Some of these AEGIS components have also been used in the characterization of the kinetic parameters of polymerases (Joyce a nd Benkovic, 2004, Sismour and Benner, 2005), and in the first six-letter PCR, which was catalyzed by a mutant of the HIV-reverse transcriptase

PAGE 22

22 (Sismour et al., 2004). AEGIS components have also been us ed to better understand the interactions between polymerases and DNA (Lutz et al., 1998, Joyce and Benkovic, 2004, Hendrickson et al., 2004, Delaney et al., 2003). For example, studi es have been performed using variety of different NSBs, such as those lacki ng minor-groove electrons (Hendrickson et al., 2004) and those with a C-glycosidic linkage (Lutz et al., 1999), in order to identify characteristics of nucleobases that are esse ntial for correct incor poration by polymerases. Problems with AEGIS Components Although the AEGIS components retain Watson and Crick geometry, it is possible that some of the features present on the NSBs, such as the absence of minor groove electrons or the presence of C-glycosidic linkages, may presen t a challenge to polymerases. The ability of polymerases to function in the ab sence of an unshared pair of el ectrons in the minor grove of dsDNA, as seen in the pyDAD-puADA base pair (Fig. 1-4), was previously examined by Hendrickson et al (Hendrickson et al., 2004). In those studies, Hendrickson discovered that the presence of electrons in the minor grove ma y only be necessary for exonuclease activity of polymerases, and not for incorporation (Hendric kson et al., 2004). This, however, presents a problem when trying to incorporat e NSBs with efficiency and fi delity, since the polymerase has no proofreading ability. Lutz et al. examined the ability of polymerases to function in the presence of nucleosides exhi biting a C-glycosidic linkage, a carbon-carbon bond between the nucleobase and sugar as seen in the pyDAD, py AAD, and pyADD nucleosides (Fig. 1-4) (Lutz et al., 1999). He also reported that polymerases with exonuclease activity were less likely to accept the C-glycoside than were those lacking the proof reading ability, making replication with fidelity difficult.

PAGE 23

23 C-Glycosides An N-glycoside is a nucleoside with a carbon -nitrogen bond linking the nucleobase to the sugar; all standard nucleosides are therefore N-glycosides. However, three of the AEGIS nucleosides use a carbon-carbon bond to join the nucleobase to the sugar, making these nucleosides C-glycosides by de finition (Fig. 1-4). This car bon-carbon linkage can cause a structural change in the sugar puc ker of the nucleoside, making it a C3-endo pucker instead of a C2-endo pucker, possibly changing the form of the DNA from B-DNA to A-DNA (Davis, 1995). Wellington and Benner detailed strategies by which these molecules can be chemically synthesized in a current review article (Wellington and Benner, 2006). C-glycosides have also been found in vivo in various types of RNA, howeve r (Charette and Gray, 2000). These Cglycosides are of great interest, not only because of their presence in the AEGIS nucleosides, but also for their clinical uses; many naturally occu rring C-glycosides are antibiotics or antiviral agents (Michelet and Genet, 2005, Zhou et al., 2006) More generally, C-glycosides can be used in gene therapy (Li et al., 2003, Li et al., 2004). Pseudouridine Pseudouridine ( U), the 5-ribosyl isomer of uridine (Fig. 1-7A), is present in both tRNA and rRNA and is vital to the fitn ess of organisms (Raychaudhuri et al., 1998, Charette and Gray, 2000). This modified nucleoside, found in all th ree domains of life, wa s the first naturally occurring NSB discovered (Charette and Gray, 2000) and is introduced into the RNA sequences by the posttranscriptional modification of uridin e (Argoudelis and Mizsak, 1976, Grosjean et al., 1995). Pseudouridine has been reported to have a propensity to adopt a syn conformation around the glycosyl bond when in solution, although the data supporting this are questionable; it is, however, found only in the anti conformation when in a nucleic acid strand (Fig. 1-7B) (Lane et al., 1995, Neumann et al., 1980). The anti conformation allows the coordination of a water

PAGE 24

24 molecule between the 5 phosphate group of the U residue, the 5 phosphate group of the preceding residue, and the N1-H of the U residue (Fig. 1-7C) (Arnez and Steitz, 1994). The coordination of this water molecule results in an enhanced base stacking ability and a reduced conformational flexibility of the RNA molecule, thus increasing the local rigidity of the RNA (Charette and Gray, 2000, Davis, 1995). Pseudouridine is thought to play several roles in Nature, as described in the review by Charette and Gray (Charette and Gray, 2000). In tRNA, it is thought to play a critical role in the binding of the tRNA to the ribosome during transl ation because it stabilizes the tRNA structure, allowing tighter binding to occur, thereby increas ing translational accuracy. Pseudouridine also has been implicated in alternat ive codon usage in tRNA, and as a player in the folding of rRNA and ribosome assembly by its contributions to RNA stability. Pseudothymidine Pseudothymidine ( T), or 1-methylpseudouridine (Fig. 1-7D ), was originally isolated from Streptomyces platensis in 1976 by Argoudelis and Mizsak (Ar goudelis and Mizsak, 1976). This naturally occurring C-glycoside, found in R NA, is also thought to be created by a posttranscriptional modification of uridine (Limbach et al., 1994). The first successful in vitro transcription of T was performed by Piccirilli et al. using T7 RNA polymerases with a template containing T and standard ribonucleoside s (Piccirilli et al., 1991). Further studies, conducted by Stefan Lutz, observed the ability of DNA polymer ases to not only incor porate this NSB into a growing DNA strand in primer-exten sion assays, but also challenged a polymerase to use T in a PCR reaction that required the successful incorporation of up to three consecutive d T residues. (Lutz et al., 1999). Si nce then, no further studies requiri ng the incorporation of this Cglycoside into nucleic acid s have been performed.

PAGE 25

25 DNA Polymerases DNA polymerases are the enzymes that perfor m template directed DNA synthesis from deoxyribonucleotides and an existing DNA templa te. These enzymes, essential for the replication of the genetic information carried in all living organisms, were originally discovered in 1956 by Arthur Kornberg (Kornb erg et al., 1956), for which he was awarded a Nobel Prize in 1959. The synthesis of the complementar y DNA strand always occurs in the 5 3 direction through the addition of incoming nucleotides triphosphate group onto the 3-OH group of the preceding nucleotide, releasing a pyrophosphate gr oup in the process (Fig 1-8) (Garrett and Grisham, 1999, Lewin, 1997). After the successful replication of a DN A strand, the new strand is complementary to the template (leading) strand, and identical to the la gging strand. Since all DNA polymerases function in this manner, it is easy to comprehend that their structures are also generally conserved. General Structure of Polymerases All DNA polymerases share a common structural framework that is commonly referred to as a right hand comprised of three subdomains: th e fingers, the palm, and the thumb. The fingers domain is responsible for nuc leotide recognition and binding, the thumb domain binds the DNA substrate, and the palm domain is the catalytic center of the protein. It appears that this framework is the same in all DNA polymerase familie s. It is not clear wh ether this represents convergent or divergent evolution; there is no sequence similarity between, for example, Family A and Family B polymerases that makes a case for their distant homology (Rothwell and Waksman, 2005). In 1985, the laboratory of Thomas Steitz first solved th e crystal structure of the Klenow fragment, the C-terminal domain of the Escherichia coli DNA Polymerase I (Ollis et al., 1985). Since then, the crystal structure of ma ny different polymerases have been solved, not only in their nascent states, but some w ith DNA or dNTPs and pyrophosphate bound to the

PAGE 26

26 catalytic site (Rothwell and Wa ksman, 2005, Beese et al., 1993b, Beese et al., 1993a). It has also been determined that during polymerizat ion, divalent metal cations, such as Mg2+, are coordinated in polymerase activ e sites to help activate the 3-OH group for attack on the incoming nucleotide (Steitz, 1999). Features of polymerases that are not conser ved throughout the families include both the 5 3 and 3 5 exonuclease subdomains that allow for pr oofreading, and other subunits used for different types of repair. The exonuclease s ubdomains, when present, are the proofreading centers of the polymerase. The 5 3 exonuclease activity is usually involved in nick translation, or the synthesis of DNA at a location where there is a break in the phosphodiester bond of one strand (Perler et al., 1996). The 3 5 exonuclease activity is the true proofreading activity of the polymerase, res ponsible for the excision of a newly synthesized mismatch (Perler et al., 1996). The process by which a DNA polymerase adds an incoming nucleotide onto the 3hydroxyl group of the preceding nucleoside involves many steps, which are only now being fully understood. Figure 1-9 details the kinetic steps involved in this addition (Patel and Loeb, 2001, Rothwell and Waksman, 2005). In Step 1, the po lymerase (E) binds to the DNA primer:template complex (TP); the polymerase then binds the incomi ng nucleotide triphosphate (dNTP) in Step 2. The polymerase then undergoes a co nformational change (E) in St ep 3 that brings the various components into positions that can support the chemis try of this reaction; this is the rate-limiting step of polymerization. The polymerase perfor ms the addition of the nucleotide, remains complexed with the pyrophosphate, and undergoes another conformational change in Step 4. The pyrophosphate group is released in Step 5; in Step 6, the polymerase can dissociate from the DNA or translocate the substrate fo r another round of synthesis.

PAGE 27

27 Polymerase Families Based on sequence similarity, seven major fam ilies of homologous polymerases have been classified (Patel and Loeb, 2001, Rothwell and Waks man, 2005): A, B, C, D, X, Y, and RT. The most extensively studied are t hose of the Family A and Family B polymerases, but Table 1-2 identifies characteristics and representative polymerases of all seven families. Polymerases behave differently not only between the families, but also within the families themselves, based on their ability to repair, their processivity, and their fidelity. Processivity is defined as the ability of the polymerase to continue catalysis without dissociating from the DNA (Kelman et al., 1998); this is important when dealing with AE GIS components since it has been previously shown that polymerases tend to pause, or fa ll off the DNA, after the incorporation of a NSB (Lutz et al., 1999, Sismour and Benner, 2005). Fidelity is the ability of the polymerase to select and incorporate the correct complementary nucl eoside opposite the template from a pool of similar structures (Beard et al., 2002, Cline et al., 1996); this is importa nt to AEGIS components to guarantee that the newly replicat ed DNA contains the correct sequence. Family A polymerases, which contain some of the prokaryotic, eukaryotic, and viral polymerases, are best known for the E. coli DNA Pol I, Thermus aquaticus ( Taq ) Pol I, and the T7 DNA polymerases (Perler et al., 1996). The E. coli DNA Pol I and Taq polymerases are known as repair polymerases since they contain the 5 3 exonuclease domains, while the T7 is known as a replicative polymerase since it has a strong 3 5 exonuclease activity (Rothwell and Waksman, 2005, Kunkel and Bebenek, 2000). Family B polymerases contain representatives from prokaryotic, eukaryotic, archaeal, and viral polymerases, this is the onl y family of polymerases with me mbers from all four of these populations (Patel and Loeb, 2001). This family of polymerases is predominately involved with DNA replication, as opposed to repa ir, and exhibit extremely strong 3 5 exonuclease

PAGE 28

28 activities. In eukaryotes, thes e polymerases carry out the rep lication of chromosomal targets during cell division. The most well known of the archaeal pol ymerases from this family, Pyrococcus furiosus ( Pfu ) DNA Polymerase, has the lowest know n error rate of all thermophilic DNA polymerases that can be used for PCR amp lification (mutational fr equency/bp/duplication is 1.3 x 10-6 ) (Hogrefe et al., 2001, Cline et al., 1996). Family C polymerases contain the bacterial chromosomal replicative polymerases, and Family D polymerases are suggested to act as ar chaeal replicative polyme rases (Patel and Loeb, 2001, Rothwell and Waksman, 2005). Family X polymerases are found in eukaryotes, and are believed to play a role in the ba se-excision repair path way that is important for correcting abasic sites in DNA (Patel and Loeb, 2001, Rothwell an d Waksman, 2005). Family Y polymerases, found in prokaryotes, eukaryotes, and archaea, are part of a re plicative complex, and function by recognizing and bypassing lesions cr eated by UV damage so that replication of the DNA is not stalled (Zhou et al., 2001, Rothwell and Waksma n, 2005). The last characterized family of polymerases, the reverse transcriptases (RT), found in eukaryotes and viruses, catalyze the conversion of RNA into DNA, but they can also replicate DNA templates as well (Najmudin et al., 2000, Goldman and Marcy, 2001, Rothwell and Waksman, 2005). Taq Polymerase Thermus aquaticus an organism found in thermal spri ngs, hydrothermal vents, and even hot tap water, was first isolated by Brock and Freeze in 1969 (Brock and Freeze, 1969). Taq polymerase, a 94 kDa protein, was isol ated from this organism by Chien et al. in 1976 (Chien et al., 1976), and belongs to the Family A polymerases. This thermophilic polymerase has 5 3 exonuclease activity, but lacks the 3 5 exonuclease activity required for the proofreading ability, therefore giving this polymerase a low replication fidelity of about 8 x 10-6 (mutational frequency/bp/duplication) (C line et al., 1996). However, Taq is fairly processive with an

PAGE 29

29 average incorporation of 40 nucle otides before dissociating fr om the DNA, and it has a quick extension rate of about 100 nuc leotides per second (Pavlov et al., 2004, Perler et al., 1996). Taq polymerase, one of the most extensiv ely studied polymerases, was the first thermostable polymerase to be used in PCR; thereby eliminating the need to add additional polymerase after every round of PCR as was necessary when E. coli DNA Pol I was used for thermocycling experiments (Saiki et al., 1988). In 1995, the Steitz laboratory was the first to crystallize nascent Taq polymerase (Kim et al., 1995), and ha ve since crystallized the polymerase with DNA at the active site (Eom et al., 1996) These, and other studies, have allowed researchers to identify the active site of the polymerase and the specifi c residues which contact the DNA, the incoming nucleotides, or are involve d in metal ion chelation (Eom et al., 1996, Fa et al., 2004, Li et al., 1998b, Li et al., 1998a, Kim et al., 1995, Suzuki et al., 1996). Due to Taq polymerases lack of proofreading ability, it has been identifie d previously as a candidate for replication of DNA containing non-standard nucl eosides (Lutz et al., 1999). Taq has been used to incorporate and/ or replicate NSBs exhi biting C-glycosidic linkages (Lutz et al., 1999), NSBs lacking an unshared pa ir of electrons in the minor groove (Hendrickson et al., 2004), and nonpolar nucleoside isoteres (Morales and Kool, 2000). Directed evolution has created Taq polymerase mutants that have been used to incorporate an even larger repertoire of NSBs (Henry and Romesberg, 2005). Directed Evolution A recent review by Griffiths and Tawfik discus sed the application of techniques developed for the in vitro evolution of various proteins to increase their rate of catalysis, perform different functions, and accept new substrates (Griffiths a nd Tawfik, 2006). These procedures all select for desired enzyme characteristics from pools of millions of genes with schemes designed to link genotype to phenotype. This pr ovides a great advantage over th e older methods of screening

PAGE 30

30 mutant library members individually, because th ese approaches use a one-pot technique that allows for the testing of a la rge number of variants (2 x 108 or more) at once (Griffiths and Tawfik, 2006) Other common features of these directed evol ution systems include the development of a mutagenic library, expression of this librar y, a high-throughput assa y designed to identify individuals with the desired ch aracteristics, and a means for re shuffling mutants between rounds of selection (Brakmann, 2005, Lutz and Patrick, 2004, Arnold and Georgiou, 2003a). The most challenging part of any selection experiment is the design of the technique that will be used to isolate variants with the desired character istics (Brakmann, 2005), because you get what you select for. In other words, sc ientists may want to select for a specific characteristic of an enzyme, but if the technique is not designed corr ectly, they may end up selecting for an enzyme with a different characteristic. Mutagenic Libraries The first step in any directed evolution experi ment is to create a large library of mutant enzymes. There are many ways to accomplish th is task, varying from the rational design of mutations at selected sites to the random mutagenesis of re sidues along the length of the sequence. Francis Arnold co-authored a book with George Georgiou that gave detailed instructions on how to perform ni neteen different techniques to generate libraries for directed evolution (Arnold and Georgiou, 2003b). This book gave attention to stan dard error-prone PCR techniques that use MnCl2 instead of MgCl2 in PCR reactions catalyzed by a polymerase with low fidelity, such as Taq and to methods that could be used for the rediversification of libraries between rounds of selection, such as the staggered extension pr ocess (Fig. 1-11). An important consideration when creating a true random library of mutants is the bias of some techniques to create certain transitional or transversional mutations preferentially.

PAGE 31

31 Transitional mutations occur when one purine-pyrimidine pair is replaced with another purinepyrimidine pair; this creates four possible transi tion mutations with the standard nucleotides. Transversional mutations occur when a purinepyrimidine pair is replaced by a pyrimidinepurine pair, creating eight possibl e transition mutations when using standard dNTPs. When creating an unbiased library, someti mes it is necessary to use two or more methods in order to allow for the same approximate percentage of transitional and transversional mutations to occur. The use of the MnCl2 and Taq polymerase in an error-prone PCR allows for all four transitions and all eight transversions to occur, however the A-T to T-A transition and A-T to GC transversion tend to be more prevalent when using this technique (V artanian et al., 1996, LinGoerke et al., 1997, Arnold and Georgiou, 2003b). Biases such as this can be altered by increasing or decreasing the concen trations of some of the nucle otides in the reaction. This technique can be performed on a low budget, and can be easily modified to increase or decrease the frequency of mutagenesis by altering the concen tration of dNTPs or the number of PCR cycles (Arnold and Georgiou, 2003b). Another method of creating muta genic libraries is by rational design. The random library approach generates a large, diverse repertoire of polymerases, but a low nu mber of active clones. Guo et al. has shown that at least one-third of all random amino acid changes will result in the inactivation of a protein (Guo et al., 2004), so it is likely that a protein with more than a few random amino acid changes will be inactive. Furthermore, Guo et al. also calculated that approximately 70% of random mutations in the active sites of polymerases will result in an inactive polymerase variant (Guo et al., 2004). A desirable library for directed evolution experiments would optimally have a large, dive rse number of proteins with a high number of active clones (Hibbert and Dalby, 2005). To genera te a library such as this, the reconstructing

PAGE 32

32 evolutionary adaptive paths (REAP) approach can be used (Gaucher, 2006); this approach allows researches to modify only the sites where func tional divergence occurred within a family of polymerases. In other words, sites that, in the hi storical evolution of the polymerase, had a split conserved but different pattern of evolutionary variation, are chosen for modification. In theory, this technique has a high probability to generate new activities and functions (Gaucher, 2006). Systems of Directed Evolution Some of the more common methods used in di rected evolution experiments include phage display (Fa et al., 2004), ribos ome display (Yan and Xu, 2006), complementation (Arnold and Georgiou, 2003a), and compartmentalized self-rep lication (CSR) (Ghadessy et al., 2001, Tawfik and Griffiths, 1998). Two of these techniques, pha ge display and CSR (Henry et al., 2004), were applied to the evolution of pol ymerases to increase thermostability (Ghadessy et al., 2001), permit activity in the presence of an inhibitor (Ghadessy et al., 2001), and allow incorporation of non-standard bases (Ghadessy et al., 2004, Fa et al., 2004, Xia et al ., 2002). Both phage display and CSR systems have been successfully used to evolve Taq polymerase in vitro (Ghadessy et al., 2001, Ghadessy et al., 2004, Fa et al., 2004). Phage display The phage display directed evolution system was developed by atta ching a fragment of Taq polymerase and an oligonucleotide primer substrat e to the exterior of a phage particle via its minor phage coat protein pIII (Fa et al., 2004). Si nce there are approximately five of these coat proteins per phage, all localized to one area on the phage coat researchers were able to successfully link phenotype to ge notype (Fig. 1-12). The mutant polymerases were challenged to add non-standard nucleosides and one biotinyl ated nucleoside onto the oligonucleotide primer by template directed synthesis; those polymerases with the ability to do so were immobilized on

PAGE 33

33 streptavidin beads, and were recovered. Th e genes encoding the active polymerases were identified by sequencing, or rediversified and s huttled into another round of selection. This technique, while excellent for identifying polymer ase mutants able to incorporate a small number of non-standard bases, does not require the pol ymerase to perform a PCR; this would not be conducive to the design of an AEGIS based synthe tic biology that require s the polymerase to replicate its own gene. Compartmentalized self-replication Compartmentalized self-replication makes use of water-in-oil emulsions as a way to link genotype to phenotype, and require s polymerase mutants to replic ate their encoding gene in a PCR reaction (Tawfik and Griffiths, 1998, Ghad essy et al., 2004, Ghadessy et al., 2001, Williams et al., 2006), theoretically an excellent technique for devel oping polymerases for a synthetic biology. A library of polymerase gene va riants is cloned and ex pressed in cells (Fig.113A); the bacterial cells containing the polymerases and thei r encoding genes are then suspended in aqueous droplets in an oil emulsion. Each of these droplets, on average, contains one cell as well as the primers and dNTPs/NSBs required for PCR (Fig. 1-13B). The thermostable polymerase is released from the cell during th e first denaturing cycle of PCR, allowing replication of its encoding gene to proceed. P oorly adapted polymerases fail to replicate their encoding gene, while better-adapted polymerases succeed in replication (Fig. 1-13C). The resulting polymerase genes are then released from emulsions by extraction with ether; those encoding the most active polymerases dominate th ese clones. A run-o ff PCR using standard nucleotides prepares the DNA for recloning, which can then be subjected to another cycle of selection (Fig.1-13E). CSR has been previously used to generate Taq polymerase variants that are more thermostable (Ghadessy et al., 2001), have an incr eased resistance to inhi bitors (Ghadessy et al.,

PAGE 34

34 2001), and are able to incorporate various non-st andard bases (Ghadessy et al., 2004). More recently, Philipp Holliger and co-workers, w ho originally performed the aforementioned selections, have modified this technique to chan ge a selected region of the polymerase sequence, and replicate that region in CSR reacti ons (Ong et al., 2006). This short-patch compartmentalized self-replica tion reaction (spCSR) has alre ady been used to develop Taq polymerase variants able to func tion with both NTPs and dNTPs, a nd variants that are able to incorporate NSBs with 2-substitutions. This tec hnique allows the researcher to mutate only the active site of the polymerase, and then challenge s the polymerase to amplify the region encoding the active site; this makes it easier for polymerases with the ability to in corporate NSBs, but who lack the catalytic efficiency and processivity, to be isolated from a pool of mutants. By reducing the stringency of the initial selections, more cl ones can be isolated with the desired traits; catalytic efficiency and processivity of the polymerase can be selected for later using the polymerase sequence of the desired variant under normal CSR conditions. Research Overview To create an AEGIS, the first step should be to create or identify polymerases with the ability to incorporate multiple, consecutive NSBs into a growi ng strand of dsDNA, efficiently and faithfully. Rather than challenging a polymeras e with a gamut of NSBs containing different unique features, we decided to focus on one uniqu e characteristic of AEGIS nucleosides, the Cglycosidic linkage. Previous studies have shown that poly merases have a difficult time incorporating the non-standard ba se pairs containing a C-glycosid ic linkage (Switzer et al., 1993, Sismour et al., 2004), therefore representati ve C-glycosides, 2-deoxypseudouridine (d U) and 2-deoxypseudothymidine (d T), that could base pair with a canonical nucleotide, in order to decrease the strain on the polymerase, were selected for study (Lutz et al., 1999).

PAGE 35

35 The research presented here began with the determination of the effect of multiple, sequential C-glycosides on duplex DNA structur e, to better understand the obstacles a polymerase would have to overcome in order to incorporate bases exhi biting C-glycosides. Next, a screening of a variety of Family A and Family B polymerases, identified Taq as a polymerase that exhibited a limited ability to in corporate non-standard bases that contain a Cglycosidic linkage. However, fu rther modification of the protei n sequence of this enzyme was needed to identify a mutant Taq polymerase with an increased ability to incorporate multiple, sequential C-glycosides NSBs more efficiently. To achieve this, the second part of this dissertation focuse d on the creation of a rationally designed (RD) library of 74 mutant Taq polymerases. Variants were screened for the ability to incorporate d U in a PCR amplification of their encodi ng gene. None of these variants were shown to produce more full-length PCR product than the wild type Taq polymerase. Only 18 variants showed any activity at all in this fi rst test, even with standard dNTPs, under these reaction conditions. A rationally designed library wa s then used to perform an initial selection, by using water-in-oil emulsions to select for the active mutant polymerases we identified in our initial screen. It was postulated that the low number of activ e variants in our RD library was due to a decrease in the thermostability of the enzyme. After altering the PCR reac tion conditions to test this hypothesis, we were able to identify 33 active mutant polymerases in this library. Since this library was rationally designed, it was interesting to speculate as to whether a randomly created library of polymerase clones would tend to have increased or decreased thermostability when compared to the number of active clones in our RD library. A random library (L4) was created for this purpose, and was screened for activity at various temperatures in PCR reactions; 39

PAGE 36

36 clones were found to be active. This comparis on of the thermostability of the two libraries shows that the randomly created library has an enhanced ability to retain polymerase thermostability when compared to our rationally designed library. The RD library was designed to identify mutant s able to incorporat e non-standard bases, and not to have a high degree of thermostabilit y. Optimal temperatures for function in a PCR were determined for each of the RD variants, and the mutants were then screened for their ability to incorporate various concentrations of d U at that optimal temperature. One mutant in the pSW27 plasmid, containing the A597S, A740R, and E742V residue changes, was identified with the ability to generate, on aver age, 72% more product at all d U concentrations tested, than wt Taq polymerase at a temperature of 86.3 C. While d U is a C-glycoside with the ability to pair with 2-deoxyade nosine, it has been shown to epimerize (Wellington and Benner, 2006, Cohn, 1960, Chambers et al., 1963). Since d T cannot epimerize, due to the presence of the extra methyl group, we performed a comparative analysis between wt Taq polymerases ability to cope with d U and d T in various concentrations and at different temperatures in a PCR. Results indicated that it may be the epimerization of the nucleotide hindering the incorporation of d U, and therefore it should not be used as a model C-glycoside for directed evol ution studies. These results presented in this work represen t a significant step towards the long-term goal of creating an AEGIS-based synthe tic biology. In addition, the repe rtoire of mutant polymerases designed and created in these experiments will a ssist in creating an inventory of polymerases useful in biotechnology, possibly allowing the de velopment of new, as well as improving on existing diagnostic techniques and helping to facilitate a bett er understanding of polymeraseDNA interactions.

PAGE 37

37 N N N N NH2 O H OH H H H H HO O H OH H H H H HO N N NH2 O O H OH H H H H HO N NH O O NH N N O NH2 N O H OH H H H H HO 2'-deoxyadenosine2'-deoxyguanosine 2'-deoxythymidine2'-deoxycytosine Figure 1-1. The standard deoxyrib onucleotides. The nucleobases pair based on the two rules of complementarity: hydrogen-bonding co mplementarity, when the hydrogen bond donor from one nucleobase pairs with the hydrogen bond acceptor from another, and size complementarity, when a large purine (top row) pairs with small pyrimidine (bottom row) (Watson and Crick, 1953a, Watson and Crick, 1953b). Therefore, 2deoxyadenosine joins with 2 -deoxythymidine and 2-de oxyguanosine joins with 2deoxycytosine. When a phosphate group replaces the 5-hydroxyl group of these molecules, they become acids and can be linked by their phosphate groups to create a DNA strand.

PAGE 38

38 Table 1-1. Comparison of the structural geometries of A, B, and Z-DNA forms. GeometryA-DNAB-DNAZ-DNAHelical SenseRight-hande dRight-handedLeft-handed Helix diameter2.6 nm2.0 nm1.8 nm Repeating unit1 base pair 1 base pair2 base pairs Rotation per base pair34 36 60 /2 Rise per base pair0.256 nm0.338 nm0.38 nm Base pairs per turn111012 Pitch per turn of helix2.82 nm3.38 nm4.56 nm Major Groove Very narrow and very deep Very wide and deep Flat Minor Groove Very broad and very shallow Narrow and deep Very narrow and deep Sugar Pucker C3-endoC2-endo C: C2-endo & G: C2-exo *Data adapted from Saenger and Garrett & Gris ham (Saenger, 1984, Garrett and Grisham, 1999).

PAGE 39

39 O C' 5 N B)1' 2' 3' 4'O C' 5 N D)1' 2' 3' 4'O C' 5 N C)1' 2' 3' 4'O C' 5 N A)1' 2' 3' 4' Figure 1-2. Puckering of the furanose ring of nucle osides into various envelope forms. In the envelope form, four of the five atoms are coplanar, the remaining atom departs this plane: A) a C2-exo sugar pucker, B) a C2-endo sugar pucker, C) a C3-exo sugar pucker, and D) a C3-endo sugar pucker. B-DNA has a C2-endo pucker, while ADNA exhibits a C3-endo pucker (Saenger, 1984). Figure 1-3. The central dogma of molecular biology (Lewin, 1997, Crick, 1970). Genetic material, in the form of DNA, is first tran scribed into RNA and then is translated into proteins. On the occasion that gene tic material is stored as RNA, it first undergoes reverse transcripti on to create DNA before it is shuttled back into the system.

PAGE 40

40 Figure 1-4. The six hydrogen bond pa tterns in an artificially expanded genetic information system (AEGIS). These patterns are c onstrained by Watson and Cricks rules of complementarity and by the requirement th at the nucleobases be joined by three hydrogen bonds (Switzer et al., 1989, Pi ccirilli et al., 1990, Geyer et al., 2003, Benner, 2004, Watson and Crick, 1953a, Wa tson and Crick, 1953b). Purines are denoted by pu, pyrimidines by py, hydrogen-bond acceptors by A, hydrogen bond donors by D, and R indicates the point of attachment of the backbone. Note the presence of a C-glycosidic li nkage in the pyDAD, pyADD, and pyDDA nucleotides. N N N N N R H N H H H N N O O H R N N N N O R N H H N N N O R H H H N N N N N R H H N N O N R N NH N N O R N N N N R H H H H H O H O H H N N N N O R N N N O R N N N N N R N N O N R N O H H H H H H H H H H T aminoA acceptor donor acceptor donor acceptor donor pyA D A puD A D C G donor acceptor acceptor acceptor donor donor pyD AA puA DD X donor acceptor donor acceptor donor acceptor pyD A D puA D A isoC isoG acceptor acceptor donor donor donor acceptor pyAA D puDD A acceptor donor donor donor acceptor acceptor pyA DD p uD AA acceptor acceptor donor donor donor acceptor pyAA D p uDD A

PAGE 41

41 Signal Molecules Solid Support NSBshere improvethe signal-to-noise ratio AnalyteDNA CaptureStrand BranchedDNA NSB-containingDuplex Figure 1-5. The Versant branched DNA assay. This assay exploits the pairing of nonstandard bases (NSBs) to reduce the signa l to noise ratio 8-fold over a previous version of the assay that did not use NS Bs (Huisse, 2004, Collins et al., 1997). The branched DNA assay is used to monitor th e viral load counts of patients with the HIV, Hepatitis B, or Hepatitis C vi ruses (Collins et al., 1997).

PAGE 42

42 N N N N Ribose S N H H N H O Ribose H 2-amino-6-(2-thienyl)purine(s) pyridin-2-one(y) A) B) CUsNH2CH C H2C O O OH Cl sTC 3' 5' yAG 5' 3' Transcription Translation DNA mRNA tRNA 3-chlorotyrosine Figure 1-6. An example of non-standard nucle obases coding for a non-standard amino acid. This shows the transcription and translation (seen in B) of the nonstandard base pair (seen in A and denoted as s and y) to ge nerate a protein containing the non-standard amino acid 3-chlorotyrosine. This picture is adapted from Hirao et al (Hirao et al., 2002, Hirao et al., 2006).

PAGE 43

43 NH O O N O OH OH H H H H HO HNNH O O O OH OH H H H H HO 1 3 5 6 5 6 13 HNNH O O O OH OH H H H H HO NH HN O O O OH OH H H H H HO 5 6 1 3 6 5 3 1AntiSynH NNH O O O OH OH H H H H NNH O O O OH OH H H H H HO 5 6 13A) B) C)O P O O O O H H O P O O O D) Figure 1-7. Pseudouridine and pseudothymidine. A) This naturally occurring C-glycoside, found in RNA, is thought to be created by a posttranscriptional isomerization of uridine (Argoudelis and Mizsa k, 1976, Grosjean et al., 1995 ). B) Pseudouridine has a propensity to adopt a syn conformation around the glycosyl bond when in solution, but it is only found in the anti conformation when in a nucleic acid strand (Lane et al., 1995, Neumann et al., 1980). C) The anti conformation allows for the coordination of a water molecule be tween the 5 phosphate group of the U residue, the 5 phosphate group of the precedi ng residue, and the N1-H of the U residue (Arnez and Steitz, 1994). The coordination of this water molecule results in an enhanced base stacking ability and a redu ced conformational flexibility of the RNA molecule, thus increasing the local rigi dity of the RNA (Charette and Gray, 2000, Davis, 1995). D) The structure of pseudothymidine (1-methylpseudouridine). This naturally occurring C-glycoside, found in R NA, is also thought to be created by a posttranscriptional modification of uridine (Limbach et al., 1994).

PAGE 44

44 N O O N O O N N N N N O O O H H H P O O P HO O P O HO O N O O N O O N N N N N O O H H H PH O O HO O P O O N O O N O O N N N N N O O H H H PH O O HO O P O O O P HO O O DNA Polymerase Figure 1-8. The polymerization reaction of deoxyribonucleotid es triphosphates catalyzed by DNA polymerases. The triphosphate of th e incoming group is linked to the 3hydroxyl group of the preceding nucleoside releasing a pyrophosphate in the process; therefore DNA synthesis requires synthesis of new molecules in the 5 3 direction (Garrett and Grisham, 1999). E+TPE-TPE-TP-dNTPE'-TP-dNTPE-TP+1-PPiE-TP+1+PPiE-TP+1 123456 Figure 1-9. Kinetic step s involved in the nucleotide incorpor ation pathway. The kinetic steps involved in the addition of a nucleotide ont o a growing DNA strand (Patel and Loeb, 2001, Rothwell and Waksman, 2005). In Step 1, the polymerase (E) binds to the DNA primer:template complex (TP); the polymerase then binds the incoming nucleotide triphosphate (dNTP) in Step 2. The polymerase then undergoes a conformational change (E) in Step 3 th at brings the vari ous components into positions that can support the chemistry of this reaction; this is the rate-limiting step of polymerization. The polymerase performs the addition of the nucleotide, remains complexed with the pyrophosphate, and underg oes another conformational change in Step 4. The pyrophosphate group is released in Step 5; in Step 6, the polymerase can dissociate from the DNA or tr anslocate the substrate for another round of synthesis.

PAGE 45

45 1 MRGMLPLFEP KGRVLLV DGH HLAYRTFH AL KGLTTSRGEP VQAVYGFAKS 51 LLKALKEDGD AVIVV FDAKA PSFRHE AYGG YKAGRAPTP E DFPRQLALIK 101 ELVDLLGLAR LEVPG YEADD VLASLA KKAE KEGYEVRIL T ADKDLYQLLS 151 DRIHALHPEG YLITP AWLWE KYGLRP DQWA DYRALTGDE S DNLPGVKGIG 201 EKTARKLLEE WGSLE ALLKN LDRLKP AIRE KILAHMDDL K LSWDLAKVRT 251 DLPLEVDFAK RREPD RERLR AFLERL EFGS LLHEFGLLE S PKALEEAPWP 301 PPEGAFVGFV LSRKE PMWAD LLALAA ARGG RVHRAPEPY K ALRDLKEARG 351 LLAKDLSVLA LREGL GLPPG DDPMLL AYLL DPSNTTPEG V ARRYGGEWTE 401 EAGERAALSE RLFAN LWGRL EGEERL LWLY REVERPLSA V LAHMEATGVR 451 LDVAYLRALS LEVAE EIARL EAEV FRLAGH PF N L NSR D Q L ERVLFDELGL 501 PAIGK TEKT G KR STS AAVLE ALREAH PIVE KILQY R ELT K LK STY IDPLP 551 DLIHPR TGRL HTRFNQT ATA TG R L SSSD P N LQ NI P VR TPL GQRIRRAFIA 601 EEGWLLVAL D YSQIELRVLA HL SGDENLIR VFQEGRD IHT ETASWMFGVP 651 REAV D PLMR R AAK T IN FG VL Y GM S AH R LSQ ELAIPYEEA Q AFIERYFQSF 701 PKVRAWIEKT LE EGRRRGYV ETLFGRR R YV PDLEARV K SV R E AA ERM AF N 751 MPV Q GTAADL MKLAMVKLFP RLEEMGARML LQ VHDE LVLE APKERAEAVA 801 RLAKEVMEGV YPLAV PLEVE VGIG EDWLSA KE Figure 1-10. Locations of active site residues in Taq polymerase. Residues shown in blue are involved in contacting the DNA during polymerization; thos e shown in red indicate residues involved in metal ion coordination (Eom et al., 1996, Fa et al., 2004, Li et al., 1998b, Li et al., 1998a, Kim et al., 1995, Suzuki et al., 1996).

PAGE 46

46Table 1-2. Characteristics of the various polymerase families. FeatureABCDXYRTDomains Containing Polymerase Prokaryotes, Eukaryotes, Viruses Prokaryotes, Eukaryotes, Archaea, Viruses ProkaryotesArchaeaEukaryotes Prokaryotes, Eukaryotes, Archaea Eukaryotes, Viruses Representative Polymerases E. coli DNA Pol I; Taq Pol I; T7 DNA Pol Pfu DNA Pol I; Eukaryotic DNA Pol a E. coli Pol III(a) Pfu DNA Pol IIEukaryotic DNA Pol b E. coli DNA Pol IV; E. coli DNA Pol V HIV-RT; M-MuLV-RT; Eukaryotic telomerases General UseRepairReplicativeReplicativeReplicativeRepairReplicative/RepairReplicative FidelityGoodExcellentExcellentExcellentN/APoorGood

PAGE 47

47 A B C D E Figure 1-11. The staggered extens ion process (StEP) for rediversif ication of mutant libraries. This process has already been successfully used to rediversify libraries between rounds of selection in CSR reactions (Arnold and Georgiou, 2003b, Zhao et al., 1998, Ghadessy et al., 2001). A) Denature d template genes are primed with the same primer. B) Short fragments are pr oduced by brief primer-extension. C) In the next cycle, fragments randomly prime the templates and extend further. D) This process is repeated until full-length genes are produced. E) Full-length genes are then purified, amplified, and reclone d into a vector for another round of selection.

PAGE 48

48 Taq gene AcidPeptide Taq polymerase pIII pIII Taq gene (P60)TAGGG (T28)ATCCCA(n)GGCTCC Basicpeptide-DNAduplex (P60)TAGGG (T28)ATCCCA(n)GGCTCC Taq gene (P60)TAGGGU(n)C (T28)ATCCCA(n)GGCTCC dATP,dGTP, Biotin-16-dCTP Biotin Taq gene (P60)TAGGGU(n)C (T28)ATCCCA(n)GGCTCC StreptavidinCoatedBeads Biotin Streptavidin Taq gene DNAseIcleavageA) B) C) D) E) Figure 1-12. Phage display selection scheme. Th is details the scheme used in the directed evolution of a Taq polymerase fragment to incorp orate non-standard nucleosides into a growing DNA strand (Fa et al., 2004). A) A phage particle is displaying an acidic peptide and a mutant polymerase on the pIII minor coat protein of the phage. These coat proteins are localized to one area on the phage molecule, allowing genotype to be linked to phenot ype. B) The primer-template complex is attached to the phage particle via a basic peptide, whic h links with the acidic peptide displayed on the coat protein. C) The polymerase in corporates modified nucleotides in a primer-extension assay, which terminates with the addition of a biotinylated standard nucleotide. D) Th e biotin tag is captured by streptavidin and the entire complex is immobilized on magnetic b eads, allowing those phage particles displaying inactive polymerases to be wa shed away. E) DNase I is used to dissociate the phage complex from the DNA strands, allowing the phage displaying the active polymerase to be captured in an elution. The genes encoding the active polymerases can then be identified by se quencing and/or rediversified and shuttled into another round of selection.

PAGE 49

49 waterdropinoil plasmid plasmid PCRprimers E.coli cell E.coli cell waterdropinoil plasmid PCRprimers waterdropinoil plasmid Manycopiesofgene (ifpolisactive) Manycopiesofgene (ifpolisactive) Suspendin water-oil emulsion HeatinginfirstPCR cyclelysescell PCR temperature cycle Extract oilaway Run-offwithstandard nucleotides&reclone fornextroundof selectionA) B) C) D) E)dATP, dCTP, dGTP,TTP, and/or d UTP dATP, dCTP, dGTP,TTP, and/or d UTP Figure 1-13. General scheme for CSR. CSR allo ws for the selection of polymerases with an ability to incorporate an unnatural nucleotide using wa ter-in-oil emulsions. ) A library of polymerase gene varian ts is cloned and expressed in E. coli Spheres represent active polymerase molecules inside of a bacterial cell. B) The bacterial cells containing the polymerases and their encoding genes are suspended in aqueous droplets in an oil emulsion. C) The thermostable polymerase enzyme and encoding gene are released from the cell during the first denatu ring cycle of PCR, allowing self-replication to proceed. D) The resulting mixture of polymerase genes is released by extraction with ether. E) A single run-off PCR with standard nucleotides prepares the DNA for recloni ng and another cycle of selection.

PAGE 50

50 CHAPTER 2 POLYMERASE INCORPORATION OF MU LTIPLE C-GLYCOSIDES INTO DNA: PSEUDOTHYMIDINE AS A COMPONENT OF AN ALTERNATIVE GENETIC SYSTEM Introduction Each of the four standard nucleobases found in natural DNA (adenine, guanine, cytosine, and thymine) is joined to their sugar via a carbon-nitrogen bond. This, by definition, makes standard nucleotides N-glycosides. The nature of the glycosidic linkage is believed to have consequences on the detailed conformation of the nucleoside, including through the operation of the anomeric effect. In particular, the nature of the glycosidic bond may influence the puckering of the sugar. Unlike the standard nucleotides, the nucleotides that allow artifici ally expanded genetic information systems (AEGIS) to be created are frequently C-glycosides, which have a carboncarbon bond between the nucleobase and the sugar. This is exemplified in the case of nonstandard pyrimidines that present DonorDonor-Acceptor, Donor-Acceptor-Donor and Acceptor-Donor-Donor hydrogen bonding patterns s een in Figure 1-4. If replacing the Nglycosidic linkage by a C-glycos idic linkage changes features of the nucleoside that are important specificity determinants for polymeras es, problems are created for those seeking to expand the genetic alphabet artificially and develop a synthetic biology from an expanded genetic alphabet. Reverse transcriptases have an ability to process both DNA and RNA, whose sugars have different conformations. Reverse transcriptases therefore, should be able to accept components of an artificially expanded genetic information syst em that incorporate C-glycosides. Perhaps it is not surprising that the first reported exampl e of PCR amplification of a six letter genetic alphabet, where one the extra two letters was a Cglycoside, exploited HIV-RT (Sismour et al., 2004).

PAGE 51

51 When attempting to develop a synthetic biol ogy using C-glycosides, the physical structure of the DNA must be considered, especially since the presence of multiple, sequential Cglycosides can possibly alter the structure and stability of duplex DNA. Previous studies have shown that poly(U) poly(A) helices favor th e A-DNA form while poly(T) poly(A) helices display perfect B-DNA struct ure (Ivanov et al., 1973, Saenge r, 1984, Chandrasekaran and Radha, 1992). Circular dichroism was employed to infer the secondary structure of our DNA, since the spectra generated by ADNA and B-DNA are quite different (Fig. 21) (Ivanov et al., 1973). Duplex DNA containing one to twelve consecutive dA-d U base pairs was studied and it was determined that all remained in the B-DNA form. To take the next step towards a synthetic biology with an expande d genetic alphabet, it would be desirable to have DNA polymerases that accept multiple C-glycos ide nucleotides. To determine whether natural DNA polymerases have th is capability and the extent to which this capability is conserved, four Family A DNA pol ymerases and four Family B DNA polymerases were screened for their ability to incorporate multiple 2-deoxypseudothymidine-5-triphosphate (d TTP) and 2-deoxypseudourid ine-5-triphosphate (d UTP) across from template dA. These C-glycosides are steric analogs of thymidin e-5-triphosphate (TTP) and present the same hydrogen bonding pattern to a complementary stra nd as TTP (Fig. 2-2). Consequently, they should serve as a relatively specific probe fo r this non-standard structural feature. In these experiments, all of the polymerases tested were able to incorporate both Cglycosides to an extent; but there was room for improvement in some, such as Taq To determine the extent of Taq polymerases ability to incorporate the C-glycosides, it was screened for its ability to incorporate anywhere from one to twelve consecutive d TTP or d UTP across from template dA.

PAGE 52

52 Materials and Methods Synthesis of Triphosphates and Oligonucleotides Dr. Shuichi Hoshika, from the Foundation for Applied Molecular Evolution (FfAME, Gainesville, Florida), synthesized the pseudothymid ine precursor as described in Appendix A. Dr. Daniel Hutter (FfAME) synthesized 2-deoxypseudothymidine-5-triphosphate (d TTP) as described in Appendix A. 2-D eoxypseudouridine-5-triphosphate (d UTP) was purchased from TriLink BioTechnologies (San Diego, California). Standard deoxynucleotide triphosphates (dNTPs) of 2-deoxyadenosine-5-triphosphate (dATP), 2-deoxycytidine-5-triphosphate (dCTP), 2-deoxyganosine-5-triphosphate (dGT P), and thymidine-5-triphosphate (TTP) and were purchased from Promega Corporation (M adison, Wisconsin). Triphosphate solutions identified as d TNTPs were comprised of dATP, dCTP, dGTP, and d TTP, while those acknowledged as d UNTPs were contained dATP, dCTP, d GTP, and d UTP. The oligonucleotides used for these experiment s are listed in Table 2-1. Those sequences containing only standard nucle otides were commercially obtained from Integrated DNA Technologies (Coralville, Iowa) as desalted or PAGE (Polyacrylamide Gel Electrophoresis) purified oligonucleotides. Thos e oligonucleotides containing d U were synthesized by Dr. Ajit Kamath (University of Florida, Gainesville, Florida) and we re prepared using standard monomers and reagents (Glen Research, St erling, Virginia) on an Expedite 8909 DNA Synthesizer (PerSeptive Biosystems, Inc., Fram ingham, Massachusetts). The crude products were digested, with agitation, in 1 mL of concentrated ammonium hydroxide at 55 C for 16 hrs to release and deprotect the o ligonucleotide (Sambrook et al., 1989). The mixtures were briefly centrifuged and the supernat ants were passed through 2 m cellulose acetate syringe filters. The residual products were washed three times with 1 mL portions of sterile water. The combined

PAGE 53

53 filtrates were lyophilized to dryness and were purified by polyacrylamide gel electrophoresis (PAGE) and isolated by reversed-phase chromat ography on a silica gel as described previously (Sambrook et al., 1989). Circular Dichroism Each template, containing one through twelve consecutive dA or d U residues (T-13 through T-22 or T-23 through T-34, respectively), was annealed to its complement template, containing consecutive dT or dA residue s (T-35 through T-46 or T-47 through T-58, respectively). Reactions containe d 5 nmol of each template and 290 L of CD buffer (1 M NaCl, 10 mM Na2HPO4, 1 mM Na2EDTA at pH 7.0) for a total volume of 300 L. The mixtures were incubated for 5 min at 96 C and allowed to cool to room temperature over the course of 1 hr. The CD spectra from 200 to 320 nm, using a wave length step of 1 nm, were measured in a nitrogen atmosphere at 25 C in a 0.1 cm pathlength cuvette, us ing an Aviv Model 215 Circular Dichroism Spectrometer (Proterion Corporation, Inc ., Piscataway, NJ). Scans were performed in triplicate for each sample mixture and the data was averaged. Standing Start Primer-Extension Assays Radiolabeled primer was prepared by incubating 0.5 nmol P-1, 100 Ci -32P-ATP, 1X T4 Polynucleotide Kinase (PNK) Buffer, 50 U T4 PNK (New England BioLabs, Beverly, Massachusetts), and sterile dH2O in a final volume of 100 L, for 1 hr at 37 C. The radiolabeled primer was purif ied using the QIAquick Nucleo tide Removal Kit (Qiagen, Valencia, California) and eluted from the column in 100 L Buffer EB (10 mM Tris-HCl, pH 8.5).

PAGE 54

54 Radiolabeled template, to depict the locati on of full-length product (FLP), was prepared by incubating 50 pmol T-4, 10 Ci -32P-ATP, 1X T4 PNK Buffer, 25 U T4 PNK, and sterile dH2O in a final volume of 50 L, for 1 hr at 37 C. The radiolabeled T-4 was purified using the QIAquick Nucleotide Removal Kit, and eluted from the column in 50 L Buffer EB. 200 L DNA PAGE Loading Dye (98% formamide, 10 mM EDTA, 1 mg/mL xylene cyanol, and 1 mg/mL bromophenol blue) was added to the 1 M radiolabeled T-4 for a final concentration of 0.2 M radiolabeled T-4. Radiolabeled 10 base-pair (bp) ladder was prepared by first incubating 1.95 g 10 bp DNA Step Ladder (Promega Corporation), 30 Ci -32P-ATP, 1X T4 PNK Buffer, and sterile dH2O in a final volume of 27 L, for 1 min at 90 C. Immediately following, 30 U T4 PNK was added and the mixture was incubated for 30 min at 37 C. The radiolabeled 10 bp ladder was purified using the QIAquick Nucleotide Removal K it, and eluted from the column in 30 L Buffer EB. 120 L DNA PAGE Loading Dye was added to the 65 ng/ L radiolabeled 10 bp DNA Ladder for a final concentration of 13 ng/ L radiolabeled 10 bp DNA Ladder. Polymerase screen primer-extension assays Klenow Fragment (3 5 exo-), Bst DNA Polymerase (Large Fragment), Taq DNA Polymerase, VentR (exo-) DNA Polymerase, Deep VentR (exo-) DNA Polymerase, and Therminator DNA Polymerase were purch ased from New England BioLabs. Tth DNA Polymerase was purchased from Promega Corporation. Pfu (exo-) DNA Polymerase was purchased from Stratagene (La Jolla, California) Buffers used in these experiments were supplied by the manufacturer as follows: reactions using Bst Taq Tth Vent (exo-), Deep Vent (exo-), and Therminator were performed in 1X ThermoPol Buffer (20 mM Tris-HCl (pH 8.8), 10 mM (NH4)2SO4, 10 mM KCl, 2 mM MgSO4, 0.1% Triton X-100); Klenow (exo-) reactions were

PAGE 55

55 performed in 1X NEBuffer 2 (10 mM Tr is-HCl (pH 7.9), 50 mM NaCl, 10 mM MgCl2, 1 mM dithiothreitol); and reactions using Pfu (exo-) were performed in 1X Cloned Pfu Buffer (20 mM Tris-HCl (pH 8.8), 2 mM MgSO4, 10 mM KCl, 10 mM (NH4)2SO4, 0.1% Triton X-100, 0.1 mg/mL nuclease-free Bovine Serum Albumin). Optimal temperatures for polymerase function were 37 C for Klenow (exo-), 65 C for Bst and 72 C for Taq Tth Vent (exo-), Deep Vent (exo-), Pfu (exo-), and Therminator. T-4 Primer-Template complex was prepared by mixing 25 pmol ra diolabeled P-1, 200 pmol non-radiolabeled P-1, and 300 pmol non-ra diolabeled T-4, in a final volume of 15 L. The mixture was incubated for 5 min at 96 C and allowed to cool to room temperature over the course of 1 hr. For primer-extension assays, 1.5 L of the primer-template complex, 1X of the appropriate manufacturers supplied buffer, 1 U/ L of the appropriate polymerase, and sterile dH2O were used in a final volume of 9 L. Reactions were then incubated at the appropriate temperature for 30 s. Each reaction was initiated by adding 1 L of one of the followi ng: 1 mM dTTP, 1 mM d TTP, 1 mM d UTP, 1 mM dNTPs, 1 mM d TNTPs, or 1 mM d UNTPs, and incubated for two more minutes at the appropria te temperature. Reactions we re immediately quenched with 5 L of DNA PAGE Loading Dye. Samples (1 L) were resolved on denaturing PAGE gels (7 M Urea and 20% 40:1 acrylamide: bisacrylamide) and analyzed on a Molecular Imager FX System (Bio-Rad, Hercules, California). Taq polymerase primer-extension assays Primer-Template complexes were prepared by mixing 25 pmol radiolabeled P-1, 200 pmol non-radiolabeled P-1, and 300 pmol of non-radiolab eled template (T-1 through T-12), in a final

PAGE 56

56 volume of 15 L. The mixtures were incubated for 5 min at 96 C and allowed to cool to room temperature over the course of 1 hr. For primer-extension assays, 1.5 L of the appropriate primer-template complex, 1X ThermoPol buffer, 1 U/ L Taq Polymerase, and sterile dH2O were used in a final volume of 9 L. Reactions were then incubated at 72 C for 30 s. Each reaction was initiated by adding 1 L of one of the following: 1 mM dNTPs, 1 mM d TNTPs, or 1 mM d UNTPs, and incubated for two more minutes at 72 C. Reactions were immediately quenched with 5 L of DNA PAGE Loading Dye. Samples (1 L) were resolved on denaturing P AGE gels (7 M Urea and 20% 40:1 acrylamide: bisacrylamide) and analyzed on a Molecular Imager FX System (Bio-Rad). Results Circular Dichroism Duplexes were formed by annealing each template (T-13 through T-34) to its complement sequence (T-35 through T-58) creating twelve co ntrol helices containing only thymidine and twelve helices containing pseudouridine. Figure 2-3[A-E] shows a repres entative set of these spectra, specifically the spectra of duplexes containing 1, 3, 6, 9, or 12 AU base pairs. When compared to the spectra seen in Figure 2-1, all spectra are consiste nt with B-DNA being the overall conformation of all duplexe s. In addition, th e spectra representing the oligonucleotides containing the dA-d U base pairs are similar to the patter ns of the spectra containing the dA-dT base pairs. Polymerase Screen Primer-Extension Assays Four Family A and four Family B polymeras es were screened for their ability to incorporate non-standard bases e xhibiting a C-glycosidic linkage with efficiency. Polymerases were tested in both 4-base and 13-base extension assays, and were challenged to incorporate (4-

PAGE 57

57 bases) or incorporate and extend beyo nd (13-bases) four consecutive dT, d T, or d U residues across from template dA under the polymerases optimal conditions (Fig. 2-4). Reactions used TTP, d TTP, or d UTP in the 4-base extensions and either dNTPs, d TNTPs, or d UNTPs for the 13-base extension reactions. Family A pol ymerases (Fig. 2-5[A-B]) were represented by Klenow (exo-), Bst Taq and Tth ; Family B polymerases (Fig. 2-6[A-B]) were represented by Vent (exo-), Deep Vent Exo-, Pfu (exo-), and Therminator. Pfu (exo-) was the only polymerase that was not able to generate FLP when challenged to incorporate and extend beyond both of the non-standard bases. All other Family A and Family B polymerases were able to incorporate the four consecutive non-standa rd bases (NSBs) and extend beyond them, to some measure, to generate FLP. Bst and Therminator polymerases appeared to have consumed almost all of the pr imer in the course of their reactions, generating large amounts of FLP, with all of the different NTPs tested. Kl enow (exo-) and Vent (exo-) also did an exceptional job at incor porating the NSBs, but the rema inder of the polymerases did appear to have difficulty given th e intensity of the paus e sites relative to the intensity of the FLP bands. Taq Polymerase Primer-Extension Assays To replicate its own encoding polymerase gene, Taq polymerase would be required to incorporate and extend beyond four consecutive dT, d T, or d U residues. In these experiments, Taq polymerase was challenged to in corporate and extend beyond twelve consecutive dT/d T/d U residues opposite template dA. From these results (Fig. 2-7[A-B]), it was determined that Taq appears to have some difficulty incorporating twelve consecutive dT residues, as evidenced by the pausin g in those lanes, but it is stil l able to generate FLP (N+13). It is also apparent that Taq has difficulty incorporating multiple consecutive residues of C-

PAGE 58

58 glycosides, since it was not able to generate FL P when forced to incorporate five or more d T or d U residues. However, it does, generate a small amount of FLP when challe nged to insert four consecutive dT, d T, or d U residues, and therefore should be able to replicate its own gene using a C-glycoside substitute for TTP. Discussion It was first necessary to determine if the presence of multiple d U residues in doublestranded DNA would perturb the he lical structure to a point wher e there is a phase transition from B-DNA to A-DNA, perhaps making it difficult for polymerases to replicate the DNA. It is well known that poly(U) poly(A) favors the A-helicies, while poly(T) poly(A) favors B-DNA helicies (Ivanov et al., 1973, Saenger, 1984, Chandrasekaran and Radha, 1992). The distinctive differences in the CD betw een the canonical A-duplex and the canonical Bduplex structures involves a shift of the positiv e potion of the spectrum to shorter wavelengths, to 267 nm for the A-form compared to 275 nm for the B-form (Ivanov et al., 1973). A similar shift with a similar magnitude is seen in th e negative portion. Further, the Q-DNA shows a stronger Cotton effect than the B-DNA. Theref ore, to determine whether the addition of Cglycosidic units tends to drive the conformation of the duplex from B towards A, we look for an increase in the Cotton effect and a shift towards shorter wavelengths. Circular dichroism was performed on 24 duplex DNA molecules co ntaining anywhere from one to twelve consecutive d U dA or dT dA base pairs. The observed spectra (Fig. 23[A-E]) were compared to those in Figure 21, the reference spectra for canonical A and B duplexes. In all spectra containing d U, the wavelength shifted marginally (ca. 4 nm) towards longer wavelengths. This shift does not displa y a trend, however. The shift is the same no matter how many d U units are incorporat ed into the strand.

PAGE 59

59 The only possible trend is a cha nge in the relative intensity of the positive (at 275 nm) and negative (at 264 nm) band in tensities (Ivanov et al., 1973). Here the intensity of the 246 nm band and the 275 nm band both decrease. As concen trations were carefully controlled, we do not believe that this reflects a cha nge in the concentration of the oligonucleotides. This is also suggested by the intensity of signals at lo wer wavelengths, although these are notoriously compromised by any trace of impurity. Disregarding this detail, the trend is the opposite of what one expects for the conversion of the duplex structure from canonica l B to canonical A. These results provide no ev idence that addition of d U units causes the duplex structure to change from a B-DNA to an A-DNA conformation. Thus, there was no evidence to suggest that there would be a conformational problem with th e duplex structure when incorporating multiple, sequential C-glycosides. It should be mentione d, however, that CD is indicative only of the gross properties of the sy stem; it does not provide information a bout detailed structure. It is conceivable that the conformation is cha nged in a different way, or some subtly. Nevertheless, these results encouraged us to te st polymerases for their ability to work with C-glycosides. Polymerases that already display some of the desired catalytic activity, in this case the incorporation of the C-glycos ides, should facilitate in the e volution and/or creation of an AEGIS. Previous studies have shown that poly merases are able to incorporate up to three Cglycosides, but have not tested their ability to incorporate more than three multiple, sequential Cglycosides that would be required for an AEGI S (Lutz et al., 1999, Sismour et al., 2004, Piccirilli et al., 1991). Accordingly, four Fa mily A polymerases, Klenow (exo-), Bst Taq and Tth and four Family B polymerases, Deep Vent (exo-), Vent (exo-), Pfu (exo-), and Therminator, were screened for the ability to incorporate TTP, d TTP, and d UTP across from template dA in both 4-base and 13-base primer ex tension assays (Fig. 2-5[A-B] and Fig. 2-6[A-B]). In the 4-

PAGE 60

60 base extension assay, polymerases were challe nged to incorporate f our consecutive TTP, d TTP, or d UTP across from template dA during tw o-minute incubations at the optimal temperature for each enzyme. The 13-base assa y, incubated as descri bed above, took place in the presence of dCTP, dGTP, dATP, and TTP, d TTP, or d UTP, and required incorporation and extension beyond the f our consecutive TTP, d TTP, or d UTP. The Bst and Therminator polymerases appeared to have worked extremely well and consumed almost all of the primer in th e course of all of their reactions, while Pfu (exo-) did not appear to generate any 13-base FLP when presen ted with either of the two NSBs. All other polymerases generated varying amounts of FLP w ith both of the NSBs, suggesting that any of the aforementioned polymerases c ould be potential candidates for adaptation to an AEGIS, based on the qualification that the polymerase must alre ady be able to incorporate C-glycosides. However, two of these polymerases, Klenow (exo-) and Bst are not thermostable, and thus could not undergo PCR and, according to the manufactur er, Therminator is not recommended for any applications except DNA sequencing and primer-e xtension reactions, thereby making these three polymerases unlikely candidates for future studies. As in previous studies, Taq was selected as the best polymerase candidate to undergo furt her testing since it so readily accepted the consecutive non-standard ba ses (Lutz et al., 1999). In an AEGIS system, a polymerase would be required to replicate its own encoding gene with efficiency and fidelity. In order for Taq to replicate its encoding polymerase gene, it would be required to incorporate four consecutive d T or d U across from template dA. Since we have already shown that Taq can in fact incorporate and extend beyond four consecutive Cglycosides, we next tested its ability to incorporate and extend beyond up to twelve consecutive d T-dA or d U-dA base pairs. Primer extension experiments were performed under optimal

PAGE 61

61 polymerase conditions using templates T-1 throu gh T-12. Based on the results of the study (Fig. 2-7[A-B]), Taq polymerase will not readily incorporat e and extend beyond more than five consecutive C-glycosides to gene rate FLP. If this polymerase is to be used as a potential candidate for an AEGIS system, it must be modifi ed, possibly by directed evolution experiments, so that it can incorporate more of these non-standard bases.

PAGE 62

62 260300 220 0 A-DNA B-DNA Wavelength (nm) 260300 220 0 A-DNA A-DNA B-DNA B-DNA Wavelength (nm) Figure 2-1. A schematic representation of the CD spectra of Aand B-DNA forms. The dotted line indicates the pos ition of the absorption maxima (adapted from Ivanov et al. 1973 (Ivanov et al., 1973)).

PAGE 63

63 Figure 2-2. The base pairing inte ractions between a standard A-T base pair and the non-standard T-A and U-A base pairs. Note the C-glyc osidic bond (shown in blue) between the base and the sugar in both T and U.

PAGE 64

64 Table 2-1. Oligonucleotid es used in this study. OligoSequence (5' 3' Direction)Purification P-1GCG TAA TAC GAC TCA CTA TAGPAGE T-1GTT CCT GTG TCG ACT ATA GTG AGT CGT ATT ACG CDesalted T-2TTC CTG TGT CGA ACT ATA GTG AGT CGT ATT ACG CDesalted T-3TCC TGT GTC GAA ACT ATA GTG AGT CGT ATT ACG CDesalted T-4CCT GTG TCG AAA ACT ATA GTG AGT CGT ATT ACG CDesalted T-5CTG TGT CGA AAA ACT ATA GTG AGT CGT ATT ACG CDesalted T-6TGT GTC GAA AAA ACT ATA GTG AGT CGT ATT ACG CDesalted T-7GTG TCG AAA AAA ACT ATA GTG AGT CGT ATT ACG CDesalted T-8TGT CGA AAA AAA ACT ATA GTG AGT CGT ATT ACG CDesalted T-9GTC GAA AAA AAA ACT ATA GTG AGT CGT ATT ACG CDesalted T-10TCG AAA AAA AAA ACT ATA GTG AGT CGT ATT ACG CDesalted T-11CGA AAA AAA AAA ACT ATA GTG AGT CGT ATT ACG CDesalted T-12GAA AAA AAA AAA ACT ATA GTG AGT CGT ATT ACG CDesalted T-13 CGG CGT AAA CTA TAG TGA GTC GTA TTA CGC Desalted T-14 GGC GTA AAA CTA TAG TGA GTC GTA TTA CGC Desalted T-15 GCG TAA AAA CTA TAG TGA GTC GTA TTA CGC Desalted T-16 CGT AAA AAA CTA TAG TGA GTC GTA TTA CGC Desalted T-17 GTA AAA AAA CTA TAG TGA GTC GTA TTA CGC Desalted T-18 GAA AAA AAA CTA TAG TGA GTC GTA TTA CGC Desalted T-19 GTT CAA AAA AAA ACT ATA GTG AGT CGT ATT ACG C Desalted T-20 GTC AAA AAA AAA ACT ATA GTG AGT CGT ATT ACG C Desalted T-21 GCA AAA AAA AAA ACT ATA GTG AGT CGT ATT ACG C Desalted T-22 GAA AAA AAA AAA ACT ATA GTG AGT CGT ATT ACG C Desalted T-23CAG AGA CG CTA TAG TGA GTC GTA TTA CGCPAGE T-24CGG ACG A CTA TAG TGA GTC GTA TTA CGCPAGE T-25CGG CGA CTA TAG TGA GTC GTA TTA CGCPAGE T-26GGC GA CTA TAG TGA GTC GTA TTA CGCPAGE T-27GCG A CTA TAG TGA GTC GTA TTA CGCPAGE T-28CGA CTA TAG TGA GTC GTA TTA CGCPAGE T-29GA CTA TAG TGA GTC GTA TTA CGCPAGE T-30G CTA TAG TGA GTC GTA TTA CGCPAGE T-31GAA C CT ATA GTG AGT CGT ATT ACG CPAGE T-32GAC CT ATA GTG AGT CGT ATT ACG CPAGE T-33GC CT ATA GTG AGT CGT ATT ACG CPAGE T-34G CT ATA GTG AGT CGT ATT ACG CPAGE T-35 GCG TAA TAC GAC TCA CTA TAG TCG ACA CAG Desalted T-36 GCG TAA TAC GAC TCA CTA TAG TTA CGA CCG Desalted T-37 GCG TAA TAC GAC TCA CTA TAG TTT ACG CCG Desalted T-38 GCG TAA TAC GAC TCA CTA TAG TTT TAC GCC Desalted T-39 GCG TAA TAC GAC TCA CTA TAG TTT TTA CGC Desalted T-40 GCG TAA TAC GAC TCA CTA TAG TTT TTT ACG Desalted T-41 GCG TAA TAC GAC TCA CTA TAG TTT TTT TAC Desalted T-42 GCG TAA TAC GAC TCA CTA TAG TTT TTT TTC Desalted T-43 GCG TAA TAC GAC TCA CTA TAG TTT TTT TTT GAA C Desalted T-44 GCG TAA TAC GAC TCA CTA TAG TTT TTT TTT TGA C Desalted T-45 GCG TAA TAC GAC TCA CTA TAG TTT TTT TTT TTG C Desalted T-46 GCG TAA TAC GAC TCA CTA TAG TTT TTT TTT TTT C Desalted T-47 GCG TAA TAC GAC TCA CTA TAG ACG TCT CTG Desalted T-48 GCG TAA TAC GAC TCA CTA TAG AAT CGT CCG Desalted T-49 GCG TAA TAC GAC TCA CTA TAG AAA TCG CCG Desalted T-50 GCG TAA TAC GAC TCA CTA TAG AAA ATC GCC Desalted T-51 GCG TAA TAC GAC TCA CTA TAG AAA AAT CGC Desalted T-52 GCG TAA TAC GAC TCA CTA TAG AAA AAA TCG Desalted T-53 GCG TAA TAC GAC TCA CTA TAG AAA AAA ATC Desalted T-54 GCG TAA TAC GAC TCA CTA TAG AAA AAA AAC Desalted T-55 GCG TAA TAC GAC TCA CTA TAG AAA AAA AAA GTT C Desalted T-56 GCG TAA TAC GAC TCA CTA TAG AAA AAA AAA AGT C Desalted T-57 GCG TAA TAC GAC TCA CTA TAG AAA AAA AAA AAG C Desalted T-58 GCG TAA TAC GAC TCA CTA TAG AAA AAA AAA AAA C Desalted *The represent the incorporation of a pseudouridine residue.

PAGE 65

65 Figure 2-3. Representative CD Spectra. Circular dichroism sp ectra of select double stranded templates with their complements cont aining varying amounts of dA-dT or dA-d U base pairs at 25 C. All of the spectra above ar e indicative of B-DNA (Ivanov et al., 1973). Note that the conformation does not dramatically change as the amount of U is increased. (A) The spectra of duplex es containing 1 dA-dT base pair vs. 1 dA-d U base pair. (B) The spectra of duplex es containing 3 dA-dT base pairs vs. 3 dA-d U base pairs. (C) The spectra of duplex es containing 6 dA-dT base pairs vs. 6 dA-d U base pairs. (D) The spectra of dupl exes containing 9 dA-dT base pairs vs. 9 dA-d U base pairs. (E) The spectra of dupl exes containing 12 dA-dT base pairs vs. 12 dA-d U base pairs. -20 -15 -10 -5 0 5 10 15 20 200220240260280300320 Wavelength (nm)ABS 1 A+1 T 1 A+1 pseudoU A -15 -10 -5 0 5 10 15 20 25 200220240260280300320 Wavelength (nm)ABS 3 A+3 T 3 A+3 pseudoU B -20 -15 -10 -5 0 5 10 15 20 200220240260280300320 Wavelength (nm)ABS 6 A + 6 T 6 A + 6 pseudoU C -20 -15 -10 -5 0 5 10 15 20 200220240260280300320 Wavelength (nm)ABS 9 A+9 T 9 A+9 pseudoU D -20 -15 -10 -5 0 5 10 15 20 200220240260280300320 Wavelength (nm)ABS 12 A + 12 T 12 A + 12 pseudoU E

PAGE 66

66 Figure 2-4. Depiction of primer-e xtension assays used in the polym erase screen. In the 4-base extension assays, polymerases were challenge d to incorporate up to four consecutive dT, d T, or d U residues across from template dA. In the 13-base extension assays, the polymerases were forced to incorporate and extend beyond those first four residues.

PAGE 67

67 Figure 2-5. Family A polymerase screen. Unexte nded primer is at position N; N+4 is the fulllength product (FLP) for the 4-base exte nsion assays; N+13 is the FLP for the 13base extension assays. Final concentrations: TTPs/d TTPs/d UTPs/dNTPs/ d TNTPs/d UNTPs (100 M), radiolabeled P-1 (2.5 pmol), non-radiolabeled P-1 (20 pmol), non-radiolabeled template T-4 (30 pmol), and appropriate polymerase (1 U). The mixtures were prewarmed to th e polymerases optimal temperature for 30 s and initiated with the appropriate NTP mixtur e. The mixtures were incubated at the polymerases optimal temperature for 2 mi n and immediately terminated with DNA PAGE Loading Dye (formamide, EDTA, and dyes). An aliquot (1 L) was loaded onto denaturing polyacrylamide gels (20% 7 M urea) and resolved. A) The incorporation and extension of dT and d T by various Family A polymerases. All polymerases were able to incorporate a nd extend beyond the four consecutive A-T or AT base pairs to generate some FLP in both the 4-base and 13-base extension assays. Klenow (exo-) and Bst most likely generated higher amounts of T containing FLP since their optimal temp eratures are lower than that of Taq and Tth B) The incorporation and extension of dT and d U by various Family A polymerases. All polymerases were able to incorporate and extend beyond the four consecutive A-T or AU base pairs to generate some FLP in both the 4-base and 13-base extension assays. Klenow (exo-) and Bst most likely generated higher amounts of U containing FLP since their optimal temperatures are lower than that of Taq and Tth A) B)

PAGE 68

68 Figure 2-6. Family B polymerase screen. Unexte nded primer is at position N; N+4 is the fulllength product (FLP) for the 4-base exte nsion assays; N+13 is the FLP for the 13base extension assays. Final concentrations: TTPs/d TTPs/d UTPs/dNTPs/ d TNTPs/d UNTPs (100 M), radiolabeled P-1 (2.5 pmol), non-radiolabeled P-1 (20 pmol), non-radiolabeled template T-4 (30 pmol), and appropriate polymerase (1 U). The mixtures were prewarmed to th e polymerases optimal temperature for 30 s and initiated with the appr opriate triphosphate mixture. The mixtures were incubated at the polymerases optimal temperature for 2 min and immediately terminated with DNA PAGE Loading Dye (formamide, EDTA, and dyes). An aliquot (1 L) was loaded onto denaturing polyac rylamide gels (20%, 7 M urea) and resolved. A) The incorporati on and extension of dT and d T by various Family B polymerases. All polymerases, except Pfu (exo-), were able to incorporate and extend beyond the four consecutive A-T or AT base pairs to generate some FLP in both the 4-base and 13 -base extension assays. Pfu (exo-) was able to generate FLP in the 4-base assay, but not the 13-b ase assay. Therminator was extremely adept at incorporating the d T residues, as depicted by the low levels of unextended primer remaining in those lanes. B) Th e incorporation and extension of dT and d U by various Family B polymerases. All polymerases, except Pfu (exo-), were able to incorporate and extend beyond the four consecutive A-T or AT base pairs to generate some FLP in both the 4-ba se and 13-base extension assays. Pfu (exo-) was able to generate FLP in the 4-base assa y, but not the 13-base assay. Therminator was extremely adept at incorporating the d U residues, as depicted by the low levels of unextended primer remaining in those lanes. A ) B )

PAGE 69

69 Figure 2-7. Incorporation of one to twelve consecutive dT, d T, or d U residues by Taq polymerase. Unextended primer is at pos ition N; FLP is denoted by N+13 in all of these assays (see Table 2-1 for oligonucle otides used). Final concentrations: dNTPs/d TNTPs/d UNTPs (100 M), radiolabeled P-1 (2.5 pmol), nonradiolabeled P-1 (20 pmol), non-radiolabel ed templates T-1 through T-12 (30 pmol), and Taq polymerase (1 U). The mixtures we re prewarmed to 72 C for 30 s and initiated with the appropriate NTP mixture. The mixtures were incubated at 72 C for 2 min and immediately terminated w ith DNA PAGE Loading Dye (formamide, EDTA, and dyes). An aliquot (1 L) was loaded onto denaturing polyacrylamide gels (20%, 7 M urea) and reso lved. A) The incorporati on and extension of 1 to 12 dT or d T residues across from template A by Taq polymerase. It appears that very little to no FLP is generated after the inco rporation of five or more consecutive d Ts. B) The incorporation and extension of 1 to 12 dT or d U residues across from template A by Taq polymerase. It appears th at very little to no FLP is generated after the inco rporation of five or more consecutive d Us. A) B)

PAGE 70

70 CHAPTER 3 CREATION OF A RATIONALLY DESIGNED MUTAGENIC LIBRARY AND SELECTION OF THERMOSTABLE POLYMERASES USING WATER-IN-OIL EMULSIONS Introduction To create synthetic biology us ing an artificially expanded genetic information system (AEGIS), a polymerase that is capable of incorporating non-sta ndard nucleotides (NSBs) is needed. Unfortunately, studies have not found an extant thermostable polymerase able to incorporate a variety of NSBs with efficiency and fidelity. Polymerases usually perform more efficiently with one type of NSB, than they do with another (Hendr ickson et al., 2004, Leal et al., 2006, Roychowdhury et al., 2004). Directed evolution may help to rectify this situation and allow us to mutate an existing polymerase to generate one with an increased abili ty to incorporate a variety of NSBs (Ghadessy et al., 2001, Ghadessy et al., 2004). Therefore, we became interested in directed evolution as a way to modify Taq polymerase to better incorporate NS Bs, specifically ones exhibiting a Cglycosidic linkage. Taq polymerase, a member of the Family A pol ymerases, has already been successfully evolved under direction to incorporate various ot her NSBs using directed evolution (Ghadessy et al., 2001, Ghadessy et al., 2004, Fa et al., 2004). Ghadessy et al. provided a procedure for doing so using water droplets in oil (Ghadessy et al ., 2004, Ghadessy et al., 2001); these served as artificial cells. They began with large, diverse rando m libraries of the Taq polymerase, with approximately 7 amino acid residue replacements. Ghadessy et al. found that three to four rounds of selection was sufficien t to identify a polymerase able to incorporate various NSBs using these random libraries. This result was initia lly surprising, as Guo et al. has shown that approximately one-third of all random multiple amino acid changes will result in the inactivation of a protein, and that 70%

PAGE 71

71 of random changes in the active si te of a polymerase will also re sult in inactivation (Guo et al., 2004). This implies that a protein having more than a few random amino acid changes has a high likelihood of being inactive. One might have expected that a very large fraction of the variants created by Ghadessy et al. would have been inactive, es pecially at high temperatures, and this expectation is consistent with results reported below. This raises a general question: What is th e likelihood that a librar y contains a protein having a novel but desirable property? A desirabl e library for directed would optimally have a large, diverse number of proteins with a high nu mber of active clones (H ibbert and Dalby, 2005). One approach to achieving this goal involves the se lection of sites to in troduce replacements. For example, if replacements th roughout the protein are equally lik ely to lower thermal stability, while replacements in sites near the active site are more likely to change catalytic behavior, it makes sense to focus randomization in residue s near the active site (Arnold and Georgiou, 2003b, Arnold and Georgiou, 2003a, Fa et al., 2004, Miller et al., 2006, Ghadessy et al., 2004, Ghadessy et al., 2001). An alternative approach recognizes that natu ral history has already explored polymerase sequence space. Much of this natural history is available to us in genomic sequence databases. This permits an approach, origin ally called evolutionary guidance, that extracts information from that history to identify sites that are more likely to influence behavior in a way that is desired, and less likely to damage the enzy me (Allemann et al., 1991, Presnell and Benner, 1988). Eric Gaucher, at the Founda tion for Applied Molecular E volution (FfAME), recently developed this approach a step further under the reconstructing evolutionary adaptive paths (REAP) rubric (Gaucher, 2006). He identified s ites where functional diverg ence occurred within

PAGE 72

72 a family of polymerases, but where natural hi story suggested that th e site was under strong selective pressure. In theory, this has the hi ghest probability to generate new activities and functions. Using the sites identified by the REAP appro ach, the Type II sequence divergence of the Family A polymerases was studied (Gu, 2002, Gu, 1999). In this approach, sites were identified that had a split conserved but di fferent pattern of historical evolutionary variation, and had been previously suggested to lead to a change in the function or behavior of the polymerase. Using Pfam (Fig. 3-2), a total of 57 amino acid changes across 35 sites within the 719 members of Family A polymerases that were available we re identified (Bateman, 2006, Finn et al., 2006). The 35 sites for mutational studies, distributed as seen in Figure 3-2, we re derived from these analyses, and from sequences discussed in a recent review by Henry and Romesberg on the evolution of novel polymerase activities (Hen ry and Romesberg, 2005). The 57 replacement amino acid residues were selected based on the Family A viral polymerase sequences at the 35 mutational sites. The viral sequences were exploited since literature has told us that viral polymerases are more adept at incorporating NS Bs than other polymerases (Sismour et al., 2004, Leal et al., 2006, Horlacher et al ., 1995), and ancient viruses have also been implicated in the origins of cellular DNA replication machinery (Forterre, 2006). The company DNA 2.0 created and synthesized the rationally designed (RD) library containing 74 different mutants using the 57 amino acid chan ges identified by REAP, in various combinations to yield three or four amino acid mutations per sequence. In addition to creating the library, DNA 2.0 also designed a nd generated a version of the wt taq polymerase gene that was optimized for codon usage in E. coli cells (co -Taq polymerase). The optimization of codon usage results in higher expression levels of the protein within th e cell (Gustafsson et al., 2004).

PAGE 73

73 Each of these 75 polymerases (co -Taq and the 74 mutants) were tested for their ability to incorporate increasing concentrations of a representative C-glycoside (Fig. 2-3), 2deoxypesuouridine-5triphosphate (d UTP). None were able to incorporate d UTP more efficiently than the coTaq polymerase, and only eighteen of the 74 mutants of the RD Library showed activity with the canonical dNTPs under the conditions with which they were presented. Selections require that some members of the library perform differently than the original protein of interest (Arnold and Georgiou, 2003a, Lutz and Patrick, 2004). We did not perform a selection to identify a polymerase with an increased ability to incorporate d UTP, since we determined there were no clones in the RD Libr ary that functioned with the NSB better than coTaq polymerase. In order to demonstrat e our laboratories ab ility to perform in vitro selections, we decided select for the eighteen mutant polyme rases that exhibited activity with dNTPs from the pool of 74 mutants. To perform our selection experi ments, we used a variation of the compartmentalized selfreplication (CSR) method developed in the laboratories of Griffith s and Holliger to create waterin-oil emulsions as a way to link genotype to phenotype (Miller et al., 2006, Tawfik and Griffiths, 1998, Ghadessy et al., 2001, Ghadessy et al., 2004). This method (Fig. 1-13) uses cells expressing the polymerase as the sole source of polymerase and plasmid template in a PCR reaction, which takes place inside the aqueous ph ase of the emulsion. Inactive polymerases fail to replicate their encoding gene, so they are e ffectively selected agains t after the extraction of products from the emulsion. After our selection, products we re recloned into the expression vector using a version of the megaprimer PCR method (Miyazaki and Take nouchi, 2002). As this protocol generated products that were crossover mutations, sequenc ing of the products provided a list of the

PAGE 74

74 mutations that survived the sele ction, without providing informa tion about which mutations were associated with each other. The megaprimer PCR is, nevertheless, an effective method for library rediversification be tween rounds of selection. Materials and Methods DNA Sequencing and Analysis DNA sequencing was carried out by the University of Florida Interdis ciplinary Center for Biotechnology Research, DNA Sequencing Core Facility using an ABI 3130 xl Genetic Analyzer (Applied Biosystems, Foster City, California) an d primers P-6 through P-9 (Table 3-1). BLAST 2 software was used for sequence similarity searching (Tatusova and Madden, 1999); Dertis Reverse and/or complement DNA sequences website was used to find the reverse complement of various DNA strands (Derti, 2003); and ExPASys translate tool was used to translate DNA sequences into their amino acid counterparts (Swiss Institute of Bioinformatics, 1999). Construction of Plasmids Construction of pSW1 The gene ( wt taq ) encoding wt Taq polymerase was cloned fro m a vector generously donated by Dr. Michael Thompson (UNC, Chapel Hi ll, North Carolina) using primers P-2 and P3. The product was digested with the Sac II and Nco I restriction enzymes (New England BioLabs, Beverly, Massachusetts) according to manufacturers protocol The restricted wt taq was then ligated into the identically digested pASK-IBA43plus vector (IBA GmbH, St. Louis, Missouri)(Fig. 3-3), using T4 DNA ligase (New England BioLabs) according to manufacturers protocol (16 C overnight with a 4:1 insert:vector ratio) to make the new plasmid pSW1 (Fig. 34), and adding an N-terminal hexahistidine tag onto the wt taq gene (His(6)wt Taq ). Plasmid constructs were verified by restrictio n digest analysis, using the enzymes Bam HI and Nco I according to the manufacturers protocol (New England BioLabs), as well as sequencing.

PAGE 75

75 Rationally designed mutagenic library (RD Library) creation DNA 2.0 (Menlo Park, California) synthesized a variant of the wt taq polymerase gene (cotaq ) that was optimized for the codon-usage of E. coli which was then used to construct the pSW2 plasmid (Fig. 3-5). Plasmids pSW3 pSW76 (Table 3-2) were designed by Dr. Eric Gaucher (Foundation for Applied Molecular Evol ution, Gainesville, Florida) and DNA 2.0 using the REAP approach. Sequence alignments a nd phylogenetic tree constr uction of 719 Family A polymerase protein sequences were generated us ing the Pfam website (Bateman, 2006, Finn et al., 2006). Type II functional divergence between the bact erial/eukaryotic Family A polymerases and the viral Family A polymerases was estimated with DIVERGE 2.0 software (Gu, 2002, Gu, 1999). The 35 sites for mutational stud ies were derived from these analyses, as well as sequences discussed in Henry and Romesberg (Henry a nd Romesberg, 2005); the replacement amino acid residues were selected base d on the viral sequences at those sites. The sites chosen are all located in or ne ar the active site of the polymerase. DNA 2.0 randomized the mutations throughout the 74 sequences so they were equally distributed (3 to 4 amino acid changes per gene). In addition to the synthesis of the genes, DNA 2.0 cloned all 75 of these plasmids (cotaq and 74 mutants) into the pASK-IBA43plus vector using the Sac II and Nco I restriction sites. Plasmid construc ts were verified both by restriction digest analysis, using the enzymes Bam HI and Nco I according to the manufacturers protocol (New England BioLabs), and by sequencing. Growth Curves and Cell Counts The bacterial strains used in this study are list ed in Table 3-3. The rich media used in these studies was Luria-Bertani (LB) medium (Difco Laboratories, Detroit, Michigan) (Miller, 1972). Ampicillin was provided in liquid or solid medium at a final concentration of 100 g/mL. Plasmids were transformed into the E. coli TG-1 cell line according to manufacturers protocol

PAGE 76

76 (Zymo Research, Orange, Californi a). Cell growth was determined by measuring optical density at 550 nm using a SmartSpec Plus Spectropho tometer (Bio-Rad, Hercules, California). Anhydrotetracycline (2 mg/mL stock in N,N -dimethylformamide) was used at a final concentration of final concentration of 0.2 ng/ L to induce expression. Inocula for the growth experiments were prepar ed as follows: bacterial strains were grown overnight (14.25 hrs) at 37 C and 250 rpm in LB medium (suppl emented with ampicillin, if applicable) in 14 mL 2059 Falcon Tubes (BD Biosci ences, San Jose, California). Cells (1 mL) from the 5 mL overnight culture were used to inoculate 100 mL LB or LB-Amp cultures in 500 mL baffled flasks. Cultures were grown at 37 C and 250 rpm for 8.75 hrs. Cell counts were measured by performing a dilution series using 10-fold dilutions of the cel ls in 0.85% NaCl. Dilutions were plated onto LB plates (supplem ented with ampicillin, if applicable), grown overnight at 37 C, and colonies were counted the next morning to determine the number of colony-forming units per milli liter of culture (cfu/mL). Samples of cells were taken at various time points to determine the levels of protein expression, before and after induction. 2X SD S-PAGE (62.5 mM, pH 6.8, 25% glycerol, 2% SDS, 0.01% bromophenol blue, 5% -mercaptoethanol (Laemmli, 1970))loading dye was added to the samples, and to 50 U Taq Polymerase (New England BioLabs). Samples were boiled for 8 minutes, then loaded onto a Tris-HCl Ready Gel (7.5%, Bio-Rad) and resolved for 45 min at 200 V. Gels were stained via the Fair banks Method (Fairbanks et al., 1971). Purification of His(6)wt Taq Polymerase The SW3 cell line was grown overnight in 5 mL of LB-Amp broth for 14.25 hr at 37 C and 250 rpm in 14 mL 2059 Falcon Tubes (B D Biosciences). Approximately 2 x 108 colonyforming units (cfu), roughly equal to 500 L of a culture with an OD550nm of 4.0, were used to

PAGE 77

77 inoculate two 100 mL cultures of LB-Amp in 500 mL baffled flasks. These cultures were grown at 37 C and 250 rpm for 3.75 hrs to an approximate OD550nm of 1.8, and were then induced by addition of anhydrotetracycline (0.2 ng/ L final concentration). The cells were allowed to grow for an additional 5 hrs to an approximate OD550nm of 3.5. Samples of the undinduced and induced cells were take n and stored at -20 C for further analysis. Cultures were then combined and the cells harvested by centrifuga tion (9000 rpm, 10 min, 4 C). The SW3 cells were washed in 40 mL of Cell Harvest Buffer ( 50 mM Tris-HCl, pH 7.9, 50 mM dextrose, 1 mM EDTA, 4 C) and centrifuged again (8000 rpm, 10 min, 4 C). The cell pellet was then resuspended in Cell Lysis Buffe r (20 mM Tris-HCl, pH 7.9, 50 mM NaCl, 5 mM imidazole, 1 mg/mL lysozyme, 5 g/mL DNaseI, and 10 g/mL RNaseI) at a concentration of 2 mL/gram of cells. The cells were gently lysed by rocking (Gyr oMini Nutating Mixer) at ambient temperature for 15 min, the proteins were th en denatured by heating to 75 C for 20 min. The lysed cells were centrifuged (39,000 x g, 10 min, 4 C) and the cell-free extrac t (cfe) removed and placed into a clean tube. The cfe was then sonicated with six 10 s bursts at 71% output with a 10 s cooling periods at 4 C between each burst (Model 500 Sonic Dismembrator with a 1/2 inch tapped horn with flat tip, Fisher Scientific Suwannee, Georgia). The cfe was centrifuged (39,000 x g, 10 min, 4 C) and the supernatant (cleared cfe) was removed. The cleared cfe was added to 1 mL of a 50% Ni-NTA slurry (Qiagen, Valencia, California) and incubated at 4 C for 60 min with gentle mixing (GyroM ini Nutating Mixer). The lysate-NiNTA mixture was loaded onto a Poly-Prep Column (Bio-Rad, Hercules, California) and allowed to settle for 10 min at 4 C. A portion of the flow-through (10 L) was then collected and saved for analysis. The column was washed twice with 4 mL of Ni-NTA Wash Buffer (20 mM Tris-

PAGE 78

78 HCl, pH 7.9), 50 mM NaCl, 60 mM imidazo le) and a portion of the flow-through (10 L) was saved for future analysis. The protein was elut ed four times (0.5 mL each) with Ni-NTA Elution Buffer (10 mM Tris-HCL, pH 7.9, 250 mM NaCl 500 mM imidazole) and portions of each (10 L) were saved for future analysis at -20 C. 2X SDS-PAGE loading dye was added to each of the samples mentioned above. Samples were prep ared, resolved, stained, as described in the previous section, and the elutions cont aining the majority of the purified His(6)wt Taq polymerase were identified. Elution fractions 2 4 were combined and loaded into a Slide-A-Lyzer 10K MWCO 0.5 3 mL Dialysis Cassette (Pierce, Rockford, Illinois) that was prehydrated in Taq Dialysis Buffer A (50 mM Tris-HCl, pH 8.0, 50 mM KCl, 0.1 mM EDTA, 0.5 mM PMSF, 0.5% Nonidet-P40, 0.5% Triton X-100). The sample was dialyzed at 4 C for 4 hrs against 500 mL of Dialysis Buffer A. It was then dial yzed for another 4 hrs at 4 C against 500 mL of Taq Dialysis Buffer B (50 mM Tris-HCl, pH 8.0, 50 mM KCl, 0.1 mM EDTA, 0.5 mM PMSF, 0.5% Nonidet-P40, 0.5% Triton X-100, 1 mM DTT). Finall y, it was dialyzed for 8 hrs at 4 C against 1 L of Taq Storage Buffer (50 mM Tris-HCl, pH 8.0, 50 mM KCl, 1 mM DTT, 0.1 mM EDTA, 0.5 mM PMSF, 0.5% Nonidet-P40, 0.5% Triton X-100, 1 mM DTT, 50% glycerol). The sample was removed, quantitated, and the prot ein concentration determined us ing the Bio-Rad Protein Assay Dye Reagent according to manufacturers instructions. The purified His(6)wt Taq polymerase and Taq polymerase (New England BioLabs) were used in separate PCR reactions. The same c oncentration of each polymerase (enough protein to equate to 3 U of Taq polymerase from New England BioL abs) were added to PCR reactions containing: 1X Modified Th ermoPol Buffer (2 mM Tris-H Cl, pH 9, 10 mM KCl, 1 mM (NH4)2SO4, 2.5 mM MgCl2, 0.2% Tween 20), 250 M dNTPs, 1.0 M P-4, 1.0 M P-5, and 1

PAGE 79

79 ng/ L pSW1. The PCRs (50 L) were run under the following conditions: 5 min, 94 C; (1 min, 94.0 C; 1 min, 55.0 C; 3 min, 72.0 C)x15 cycles; 7 min, 72.0 C. Products were analyzed by agarose gel electrophoresis and qua ntitated using the Molecular Imager Software (Bio-Rad). Incorporation of d UTP by RD Library 2-deoxypseudouridine5-triphosphate (d UTP) was purchased from TriLink BioTechnologies (San Diego, California). St andard deoxynucleotide triphosphates (dNTPs) were comprised of 2-deoxyadenosine-5-triphos phate (dATP), 2-deoxy cytidine-5-triphosphate (dCTP), 2-deoxyganosine-5-triphosphate (dGT P), and thymidine-5-triphosphate (TTP) and were purchased from Promega Corporation (Madison, Wisconsin). d UNTPs were comprised of dATP, dCTP, d GTP, and d UTP. Individual cultures (5 mL LB-Amp ) of the SW5 SW78 cell lines were grown for 14.25 hrs at 250 rpm and 37 C in 14 mL 2059 Falcon Tubes (BD Bi osciences). Approximately 2 x 108 colony-forming units (cfu), roughly equal to 500 L of a culture with an OD550nm of 4.0, were used to inoculate individua l 100 mL cultures of LB-Amp in 500 mL baffled flasks. These cultures were grown at 37 C and 250 rpm for 3.75 hrs to an approximate OD550nm of 1.8, and were then induced with anhydrotetracycline. The cells were allowed to grow for 1 hr longer to an approximate OD550nm of 3.0. Approximately 1 x 106 cfu (~2 L cells) were used as the sole source of polymerase and template in separate PCR reactions containing fi nal concentrations of these constituents: 1X Modified ThermoPol Buffer, 1.4 M P-4, 1.4 M P-5, 1.1 ng/ L RNaseA, and 6% DMSO. The final concentration of nucleotide triphosphates adde d to the reactions were one of the following: 500 M dNTPs; 500 M dATP/dGTP/dCTP; 500 M dATP/dGTP/dCTP + 450 M TTP + 50 M d UTP; 10 M dATP/dGTP/dCTP + 400 M TTP + 100 M d UTP; 10 M

PAGE 80

80 dATP/dGTP/dCTP + 350 M TTP + 150 M d UTP; 10 M dATP/dGTP/dCTP + 300 M TTP + 200 M d UTP; 10 M dATP/dGTP/dCTP + 250 M TTP + 250 M d UTP; 10 M dATP/dGTP/dCTP + 200 M TTP + 300 M d UTP; 10 M dATP/dGTP/dCTP + 150 M TTP + 350 M d UTP; 10 M dATP/dGTP/dCTP + 100 M TTP + 400 M d UTP; 10 M dATP/dGTP/dCTP + 50 M TTP + 450 M d UTP; 500 M d UTPs. The PCRs (50 L) were run under the following conditions: 5 min, 94 C; (1 min, 94.0 C; 1 min, 55.0 C; 3 min, 72.0 C)x15 cycles; 7 min, 72.0 C. Products were analyzed by agarose gel electrophoresis and quantitated using the GeneT ools Software, version 3.07 (SynGene, Cambridge, England). Selection of Thermostable Mutants Using Water-In-Oil Emulsions Water-in-oil emulsions The appropriate cell line was grown overnight in LB-Amp broth (5 mL) for 14.25 hr at 37 C and 250 rpm in 14 mL 2059 Falcon Tubes (B D Biosciences). Approximately 2 x 108 colonyforming units (cfu), roughly equal to 500 L of a culture with an OD550nm of 4.0, were used to inoculate a 100 mL culture of LB -Amp in 500 mL baffled flasks. These cultures were grown at 37 C and 250 rpm for 3.75 hrs to an approximate OD550nm of 1.8, induced with anhydrotetracycline, and allowed to grow for 1 hr longer to an approximate OD550nm of 3.0. The amount of culture containing 2 x 108 cfu was determined; that amount was centrifuged (13,000 rpm, 2 min), the supernatant removed, and the remaining pellet was stored on ice. The aqueous phase of the emulsions was prep ared by resuspending th e cell pellet in a 200 L solution containing: 1X Modi fied ThermoPol Buffer, 500 M dNTPs, 1.4 M P-4, 1.4 M P5, 1.1 ng/ L RNaseA, and 6% DMSO. For cont rol reactions, without cells, 1 ng/ L of pSW2 and 10 U Taq Polymerase were added to the aqueous pha se. Reactions were stored on ice until further use.

PAGE 81

81 To prepare the oil-phase of the emulsions, Arlacel P135 (Uniqema, New Castle, Delaware) was heated to 75 C, as was mineral oil (Sigma-Aldrich, St. Louis, Missouri). The mineral oil was mixed with the Arlacel P135 (1.5% v/v) in a 5 mL Corning Externally Threaded Cryogenic Vial (Corning, Acton, Massachuse tts) containing an 8 x 3 mm stir bar with pivot ring. The oilphase was stirred at 1000 rpm on ice while the 200 L aqueous phase was added drop-wise over a period of 2 minutes. The emulsion was stirred fo r 5 min longer, then subjected to PCR [5 min, 94 C; (1 min, 94.0 C; 1 min, 55.0 C; 3 min, 72.0 C)x15 cycles; 7 min, 72.0 C]. Products were extracted from the emulsions with the addition of two volumes of watersaturated ether. The ether and emulsions were mixed by vortexing, centrifuged (5 min, 8000 rpm), and the aqueous phases extracted. To rid the aqueous phases of contaminating enzyme, the products were subjected to a QIAquick PCR Puri fication Kit (Qiagen), an d products were eluted from the column in Qiagen Buffer EB (50 L). Products were separated using agarose gel electrophoresis; the product ba nd was extracted and then pur ified using a QIAquick Gel Extraction Kit (Qiagen). Samples were eluted in Qiagen Buffer EB (50 L), and product concentration was determined by measuring absorption at 260 nm. Re-cloning of selected mutants The final products of the emulsions were used in an adaptation of the Miyazaki and Takenouchi megaprimer PCR protocol (Miyazaki and Takenouchi, 2002). CSR products were digested with Nco I and Sac II according to manufacturers prot ocol (New England BioLabs). Digested samples (10 ng in 1 L) were added to a 49 L PCR mixture (1X Native Pfu Buffer, 100 ng pSW2, 500 M dNTPs, 6% DMSO). Mixture was heated to 96 C for 30 s prior to the addition of 0.05 U/ L Native Pfu Polymerase (Stratagene, La Jo lla, California). Samples were

PAGE 82

82 then subjected to PCR [2 min, 96 C; (30 s, 96.0 C; 10 min, 68.0 C)x25 cycles; 30 min, 72.0 C]. The template strands of DNA (pSW2 plasmid in the PCR) were digested with 2 U Dpn I (New England BioLabs) at 37 C for 2.5 hrs. Reactions were cooled to room temperature, purified using a Qiagen PCR Pu rification Kit, and eluted with Qiagen Buffer EB (30 L). Purified products were transformed into the E. coli DH5 cell line according to manufacturers protocol (Invitrogen, Carlsbad, Ca lifornia). Fifty isolated col onies were selected after the transformation (cell lines SW79 through SW128). Overnight 5 mL LB-Amp cultures (250 rpm, 37 C) were grown for each colony, and their pl asmids isolated using the QIAprep Spin Miniprep Kit (Qiagen). Plasmid c onstructs were verified by restri ction digest analysis, using the enzymes Bam HI and Nco I according to the manufacturers pr otocol (New England BioLabs), and mutations were determined by sequencing. Results Growth Curves and Cell Counts Growth curves, cfu counts, and protein expr ession studies were performed on the SW1 SW4 cell lines to determine the op timal times for induction (Fig. 3-6[A-C]). The optimal time (1 hr) for induction for both the SW3 and SW4 cell line s was found to be during late log phase at an optical density of approximately 1.8 at 550 nm. Th e optimal length of induction was 1 hr, due to the rapid death of the cells after the induction of the taq gene, as is evidenced by a drop in the cfu/mL counts (Fig. 3-6B). Induc tions longer than 1 hr, or induc tion at early to mid-log phases caused the cells to perish due to toxicity because of the over-expression of a polymerase in vivo (data not shown) (Moreno et al., 2005, Andraos et al., 2004). When the migration of the recombinant Taq polymerases (His(6)wt Taq and coTaq ) are compared to that of the Taq

PAGE 83

83 Polymerase purchased from New England BioLabs, they all appear to have the same observed molecular weight of 94 kDa on a Coomassie Blue stained SDS-PAGE (7.5%) gel (Fig. 3-6C). Purification of His(6)wt Taq Polymerase The His(6)wt Taq polymerase was purified from SW3 cells that were over-expressing the His(6)wt taq gene using nickel affinity chromatography. The polymerase was purified to a single band on a Coomassie Blue stained SDS-PAGE (7.5 %) gel (Fig. 3-7A), and elution fractions 2 4 were combined and concentrated via dial ysis to generate a working stock of His(6)wt Taq polymerase. The protein concentration was determined to be 0.744 g/ L, using the Bio-Rad Protein Assay Dye Reagent. To veri fy the ability of the purified His(6)wt Taq polymerase to amplify DNA in a PCR reac tion, similar to that of Taq polymerase (New England BioLabs), each of these polymerases were used in separate identical PCRs. The final concentration of polymerase (5.5 g/mL) in each reaction was kept constant Figure 3-7B shows the products of these PCRs, and after analysis it was determined that the densities of these two bands were almost identical. Incorporation of d UTP by RD Library In efforts to find a polymerase that can incorporate and extend beyond d Us with higher efficiency than the co -Taq polymerase, each of the mutant Taq polymerases in the RD Library were tested for their ability to incorporate d UTP across from template dA in PCR reactions containing varying ratios of TTP to d UTP. Reactions contained induced cells as the sole source of polymerase and template plasmid, so activ e polymerases were forced to replicate their own encoding gene (2603 bp). Figure 3-8[A-B] shows the difference between the PCR products from the co -Taq polymerase screen (Fig. 3-8A) and a representative (SW 21) of the RD Library (Fig. 3-8B). In

PAGE 84

84 both of these reactions, the polymerase could not produce full-length product (FLP) with concentrations of d UTP higher than 400 M (final concentration). Based on the product band densities, it was found that none of the active RD Library polymerases displayed a higher propensity for the incorporation of d UTP than the coTaq polymerase (Table 3-4). It was also noted that only 18 of the 74 mutant polymerase s tested showed activity with only dNTPs under these assay conditions (Tab le 3-2 and Table 3-4). Selection and Identification of Thermos table Mutants Using Water-In-Oil Emulsions We pooled all 74 RD Library strains to perfor m a selection in wate r-in-oil emulsions to isolate those 18 mutants that show ed activity. After the products we re isolated, they were used in a modified version of the Miyazaki and Take nouchi megaprimer PCR protocol (Miyazaki and Takenouchi, 2002), creating the full -length plasmid (pASK-IBA43plus with insert). Purified products were transformed into the E. coli DH5 cell line; fifty clones we re isolated, sequenced, and compared to the coTaq amino acid sequence (Table 3-5). Of these fifty clones, 22 showed no changes relative to the coTaq sequence, and the remaining 28 had at least one residue modified. Table 3-6 shows a breakdown of thes e mutations, and states whether they are random mutations or RD Library mutations. In the case of the RD Library mutations, it is indicated if they are true RD Library sequences, RD Libr ary sequences with additional mutations, RD Library sequences with reversions to the coTaq sequence, and/or crossovers between two or more RD Library sequences. In addition, only 5% of the mutations found in these sequences encode silent mutations (Table 3-6). As a control, the selection was also pe rformed using only cells expressing the coTaq polymerase. Five clones were submitted for sequencing following the megaprimer PCR protocol. Of these five, four were the correct coTaq polymerase sequence found in SW4, and

PAGE 85

85 the fifth contained only two amino acid mutations in relation to the coTaq sequence (data not shown). Discussion Previously, directed evolution experime nts have defined mutations that allow Taq polymerase, and other Family A polymerases, to be used in different situations; for example, a few allow for the incorporation of non-standard bases, others are more thermostable, and some are resistant to inhibitors (Ghadessy et al., 2001, Ghadessy et al., 2004, Henry and Romesberg, 2005). The design of our RD Library was based o ff mutations discussed in the review by Henry and Romesberg (Henry and Romesberg, 2005), and were carried out by using the REAP approach with the Family A polymerases. A library of 74 polymerases was designed, which contained three to four amino aci d mutations per polymerase out of a pool of thirty-five possible mutations, in an attempt to identify a polymeras e with the ability to incorporate non-standard bases, exhibiting a C-glycosidic linka ge, with efficiency and fidelity. It has been demonstrated previously that th e over-expression of a polymerase in a cell can cause toxicity problems and cause premature cel l death (Moreno et al., 2005, Andraos et al., 2004). To circumvent this problem, the gene encoding His(6)wt Taq polymerase was optimized for codon-usage in E. coli and cloned into a tightly-regula ted plasmid (Skerra, 1994) in an attempt to express the polymerase at higher levels only after induction. After appropriate expression conditions were found, the members of the RD Library were individually tested for their ability to incorporate d UTP, a representative non-standa rd nucleotide exhibiting a Cglycosidic linkage. The polymerases were chal lenged with increasing concentrations of the d UTP as the concentration of TTP presented was decreased. None of the RD Library polymerases were able to incorporate d UTP more efficiently than the codon-optimized Taq

PAGE 86

86 sequence. In the future, other possible mutation sites and combinations of mutations may need to be made and tested to find a polymerase that can accomplish this task. Interestingly, only eighteen of the 74 mutant polymerases tested sh owed activity with stan dard dNTPs under these assay conditions. Ideally, a selection would ha ve been performed using th e RD Library to identify polymerases able to incorporate d UTP with efficiency. Since none were able to incorporate the NSB more efficiently than the coTaq polymerase, as evidenced by the densities of the FLP bands, a selection was performed to identify thos e polymerases that showed activity with the dNTPs under these assay conditions. A water-in-oil emulsion system similar to that Ghadessy et al. described (Ghadessy et al., 2001), was used as a means to link geneotype to phenotype, forcing active polymerases to replicate their ow n genes in a PCR reaction. All 74 cell lines containing the RD-Library were used in equal pr oportions to perform su ch a selection. After products were extracted from the emulsion system they were recloned into a plasmid using a version of the megaprimer PCR (Miyazaki and Takenouchi, 2002). The megaprimer PCR method was chosen as the method for recombining the polymerase genes with the plasmid based on its one pot appr oach. After extracting th e final products from the emulsions, all further recloning can take plac e in one reaction vessel, and undergoes only one purification step prior to transf ormation into a cell line. Othe r methods, using digestions and ligations, require severa l purification steps between the vari ous procedures, resulting in low yields of final product. After sequencing, it was noted that 22 out of the 50 clones sequenced contained the original coTaq polymerase sequence; 15 ca rried partial forms of th e original RD Library sequences, and only four were true RD library sequences. The remaining nine sequences were

PAGE 87

87 random mutations most likely created during the P CR in the emulsions. This could be due to the fact that Taq polymerase has an error rate of approximately 8 x 10-6 (mutational frequency/bp/duplication) (Cline et al., 1996). It is also notew orthy that two of 50 sequences (SW119 and SW122) contained frameshift muta tions, which tend to occur once every 2.4 x 10-5 base pairs when using Taq polymerase (Tindall and Kunkel, 1988). Since the plasmid carrying the cotaq gene was only introduced during the megaprimer PCR, and the plasmid used as template was digested with Dpn I, it was determined that during the course of the megaprimer PCR reaction, recombina tions and reversions of the various sequences most likely occurred during this procedure. This would explain th e high number of coTaq sequences and the large number that contain various additions, reversions, and crossovers relative to the original RD Library mutations. This also accounts for the presence of the numerous coTaq polymerase clones identified after sequencing. Out of the four exact RD lib rary sequences that were r ecovered, only one coded for a mutant that was previously shown to have activity in the assay using d UTP. This could indicate that the emulsions are breaking, allowing active polymeras es to replicate the genes of inactive polymerases. Further tests could be perf ormed to confirm or deny this conclusion; an example would be using two different cell lines in an emulsion, one expressing active polymerase and one expressing inactive polymeras e. Identification of the final product would allow us to determine if indeed these emulsions ar e rupturing. If this is the case, modifications could be made to the oil phase of the emulsions such as increasing the percentage of Arlacel P135, to prevent this from occurring. We have determined that the megaprimer PCR method would be an efficient way of introducing diversity into a library between rounds of selecti on, but it is not an effective means

PAGE 88

88 for recloning if trying to identify specific products. Once the stability of the emulsion system is verified, and the recloning of the CSR pr oducts is performed using the standard digestion/ligation/transformation protocol (Sambrook et al., 1989), it is likely that we will be able to identify thermostable polymerases using this technique. The ne xt step would be using this method with a random library, instead of a rationally designed library, to identify thermostable polymerases and/or polymerases that can incorporate C-glycosides with efficiency and fidelity. After several rounds of evolution, we may be able to identify a polymerase capable of functioning with an AEGIS.

PAGE 89

89 0.1 ViralPolymerases Non-viralPolymerase s Figure 3-1. A phylogenetic tree of the Family A polymerases. This tree was generated using Pfam (Bateman, 2006, Finn et al., 2006), and analyzed for sites that underwent Type II functional divergence. Appendix B has part s of this tree expanded so that it is readable.

PAGE 90

90 Figure 3-2. Locations of the 35 ratio nally designed (RD) sites in the Taq polymerase structure. These held the mutations in the RD Librar y. There were 57 mutations made at these sites: sites in red were sites where th e natural amino acid was replaced by one different amino acid. Amino acids in blue indicate sites that were replaced by two different amino acids. Sites in green re present sites where three residues were substituted for the original amino acid. Im age created by Dr. Eric Gaucher using the PyMOL Molecular Graphic System (DeLano, 2002).

PAGE 91

91 Table 3-1. Oligonucleotid es used in this study. OligoSequence (5 3 Direction) P urification P-2GAT GAC CGC GGT ATG CTG CCC CTCDesalted P-3CAT TAC AGA CCA TGG TCA CTC CTT GGC GGA GDesalted P-4CAA ATG GCT AGC AGA GGA TCG CAT CAC CAT CACDesalted P-5CAG GTC AAG CTT ATT ATT TTT CGA ACT GCG GGT GGCDesalted P-6GAG TTA TTT TAC CAC TCC CTDesalted P-7CGC AGT AGC GGT AAA CGDesalted P-8GAA AAC CGC GCG TAA ACT GCDesalted P-9CCT GGA ACA CGC GAA TCA GGDesalted *All oligonucleotides were synthesized by In tegrated DNA Technologies (Coralville, Iowa).

PAGE 92

92 pASK-IBA43plus3286 bps 500 1000 1500 2000 2500 3000 Van 91I Msc I Xba I Nhe I Sac II Eco RI Ecl 136II Sac I Acc 65I Kpn I Sma I Xma I Bam HI Xho I Acc I Sal I Bsp MI Sbf I Pst I Nco I Eco 47III Bst BI Hin dIII Nae I Ngo MIV Psi I Ear I Xmn I Sca I Pvu I Fsp I Mun I I Bgl I Bpm I Bmr I Ahd I Nsp 10I Bpu NI Eco BI Sna I Nru I Bsm I Nsi 10I Ppu I Oli I Bsg I Nde I Spe NI Alw Promoter 6x-His Tag Strep-Tag f1 Origin AmpR Tet-Repressor Figure 3-3. View of the pASK -IBA43plus plasmid. This plas mid was purchased from IBA GmbH (St. Louis, Missouri) and it can generate an N-te rminal hexahistidine and a C-terminal Strep -tag. This high copy number plasmid is a tightly controlled tetracycline expression system conferring ampicillin resistance.

PAGE 93

93 pSW15723 bps 1000 2000 3000 4000 5000 HincII XbaI SacII AccI AscI BssHII Tth111I AatII Acc65I KpnI BspMI BstXI FseI PstI BamHI XcmI PshAI NcoI BstBI ScaI PvuI MunI BmrI AhdI SnaBI NruI BsmI NsiI Ppu10I NdeI SpeI AlwNI Promoter wt taq f1 ori AmpR Tet-Repressor Figure 3-4. View of the pSW1 plasmid. This is a ligation of the pASK-IBA43plus plasmid with the His(6)wt taq polymerase gene using the Sac II and Nco I restriction sites. This plasmid generates an N-terminal hexa histidine translated with the His(6)wt taq gene. This high copy number plasmid is a tightly controlled tetracycline expression system conferring ampicillin resistance.

PAGE 94

94 pSW25723 bps 1000 2000 3000 4000 5000 MscI HincII XbaI NheI SacII XcmI BglII Eco52I Bpu1102I SapI ClaI Ecl136II SacI BamHI EcoRV PmlI BssHII BseRI NcoI BstBI PsiI ScaI BglI AhdI NsiI Ppu10I OliI NdeI SpeI Promoter co taq f1 Origin AmpR Tet-Repressor Figure 3-5. View of the pSW2 plasmid. This is a ligation of the pASK-IBA43plus plasmid with the codon-optimized taq polymerase gene using the Sac II and Nco I restriction sites. This plasmid generates an N-terminal hexahistidine translated with the cotaq gene. This high copy number plasmid is a tightly controlled tetracycline expression system conferring ampicillin resistance.

PAGE 95

95 Table 3-2. Rationally Desi gned (RD) Mutant Library. Plasmid Name DNA 2.0 Gene ID # Mutations Present in RD Taq Library P lasmid Name DNA 2.0 Gene ID # Mutations Present in RD Taq Library pSW35339S573E,Y668F,A740SpSW405383S573H,D575T,L613I pSW45340Q486H,K537I,M670GpSW415384T541A,L606P,L613D pSW55342A605G,L613A,E739PpSW425385Y542E,V583K,A605E pSW65343D575F,L606C,A740SpSW435387E517I,F595W,A605E,I611E pSW75344T511V,R584V,I611EpSW445388T541A,D575F,L613A,D622A pSW85345N480R,F595V,E742HpSW455389T511V,A594C,L606S,A740R pSW95346E517I,V583K,A597SpSW465390Q486H,R533I,L606C,L613A pSW105347D575F,V583K,M670ApSW475391Q486H,F595V,D622A,F664Y pSW115348E517I,D607W,D622SpSW485393E517I,S573H,A605G,E612I pSW125349A594C,F664Y,A774HpSW495395Y542E,R584V,A605K,E612I pSW135350F595W,L606P,D622SpSW505396D575T,A605E,L606C,D622A pSW145351S573E,D575F,F595VpSW515397A594T,L613A,F664Y,E742H pSW155352S510I,A605K,L606SpSW525398D575F,N580Q,W601G,D622S pSW165353S573E,D622L,E742HpSW535399K537I,L606P,A740S,E742H pSW175356N480R,T511V,Y542EpSW545400A597S,W601G,L606S,F664H pSW185357A594C,F664H,M670GpSW555401S510I,E517G,D607W,I611E pSW195358Q486H,D575T,N580SpSW565402S510I,V583K,R584V,L606P pSW205359S510I,A605E,E612IpSW575405N480R,R533I,A597S,M670G pSW215360A594C,E612I,M670ApSW585408E612I,D622L,F664L,E739P pSW225361S510I,Q579A,I611QpSW595409I611Q,M670G,E739P,E742H pSW235363A594T,L606C,R657DpSW605410F595W,F664H,Y668F,E739P pSW245364T541A,A605G,L606SpSW615411A597S,A605G,D622A,F664L pSW255365E517G,K537I,L613ApSW625413L606P,I611E,E739R,R743A pSW265366K537I,Q579A,E742VpSW635414D607W,I611Q,R657D,E742V pSW275367A597S,A740R,E742VpSW645417T541A,I611Q,L613I,D622L pSW285368N580Q,A605E,L613IpSW655418K537I,S573H,N580S,D622S pSW295369N580S,F595V,A605GpSW665419N480R,S573E,D607W,A740R pSW305370N580S,D622L,A774HpSW675420D575T,L613D,E739R,A774H pSW315371R533I,R584V,F664LpSW685421Q579A,R657D,F664Y,A740R pSW325372Q486H,E517G,A605KpSW695422R533I,K537I,A605K,L613I pSW335375S573H,F664Y,R743ApSW705423T511V,E517G,L606C,F664Y pSW345376D575T,N580Q,R584VpSW715425D575T,F664H,E742V,R743A pSW355377T541A,F664L,R743ApSW725426A594C,I611E,F664L,A740S pSW365378T511V,R533I,D622ApSW735427N580S,L613A,A740S,R743A pSW375379A597S,I611E,Y668FpSW745428S510I,T511V,L613I,E739R pSW385381Y542E,F595W,L606CpSW755429V583K,E612I,L613D,Y668F pSW395382L606S,R657D,E739RpSW765430S573E,R584V,A594C,D622S *The pink cells denote the sequences of polymerases showing activity. The blue cells signify the sequences of polymerases that lack evidence of activ ity under these assay conditions. All are derivatives of the cotaq gene and inserted into the pASK-IBA43plus v ector. Mutations were designed by Dr. Eric Gaucher (Foundation for Applied Molecular Evolutio n) and were synthesized and assembled by DNA 2.0.

PAGE 96

96 Table 3-3. Bacterial stra ins used in this study. NameStrainGenotypeNameStrainGenotypeSW1 E. coli TG-1 FtraD36 lacIq (lacZ) M15 proA+B+ /supE (hsdM-mcrB)5 (rk mk McrB-) thi (lac-proAB) SW68 E. coli TG-1 SW1/pSW66 (pASK-IBA43+ with co taq mutant 5419, Apr) SW2 E. coli TG-1SW1/pASK-IBA43+SW69 E. coli TG-1 SW1/pSW67 (pASK-IBA43+ with co taq mutant 5420, Apr) SW3 E. coli TG-1 SW1/pSW1 (pASK-IBA43+ with wt taq insert, Apr) SW70 E. coli TG-1 SW1/pSW68 (pASK-IBA43+ with co taq mutant 5421, Apr) SW4 E. coli TG-1 SW1/pSW2 (pASK-IBA43+ with co taq insert, Apr) SW71 E. coli TG-1 SW1/pSW69 (pASK-IBA43+ with co taq mutant 5422, Apr) SW5 E. coli TG-1 SW1/pSW3 (pASK-IBA43+ with co taq mutant 5339, Apr) SW72 E. coli TG-1 SW1/pSW70 (pASK-IBA43+ with co taq mutant 5423, Apr) SW6 E. coli TG-1 SW1/pSW4 (pASK-IBA43+ with co taq mutant 5340, Apr) SW73 E. coli TG-1 SW1/pSW71 (pASK-IBA43+ with co taq mutant 5425, Apr) SW7 E. coli TG-1 SW1/pSW5 (pASK-IBA43+ with co taq mutant 5342, Apr) SW74 E. coli TG-1 SW1/pSW72 (pASK-IBA43+ with co taq mutant 5426, Apr) SW8 E. coli TG-1 SW1/pSW6 (pASK-IBA43+ with co taq mutant 5343, Apr) SW75 E. coli TG-1 SW1/pSW73 (pASK-IBA43+ with co taq mutant 5427, Apr) SW9 E. coli TG-1 SW1/pSW7 (pASK-IBA43+ with co taq mutant 5344, Apr) SW76 E. coli TG-1 SW1/pSW74 (pASK-IBA43+ with co taq mutant 5428, Apr) SW10 E. coli TG-1 SW1/pSW8 (pASK-IBA43+ with co taq mutant 5345, Apr) SW77 E. coli TG-1 SW1/pSW75 (pASK-IBA43+ with co taq mutant 5429, Apr) SW11 E. coli TG-1 SW1/pSW9 (pASK-IBA43+ with co taq mutant 5346, Apr) SW78 E. coli TG-1 SW1/pSW76 (pASK-IBA43+ with co taq mutant 5430, Apr) SW12 E. coli TG-1 SW1/pSW10 (pASK-IBA43+ with co taq mutant 5347, Apr) SW79 E. coli DH5 SW1/pCSRMut1 (pASK-IBA43+ with co taq CSR mut 1, Apr) SW13 E. coli TG-1 SW1/pSW11 (pASK-IBA43+ with co taq mutant 5348, Apr) SW80 E. coli DH5 SW1/pCSRMut2 (pASK-IBA43+ with co taq CSR mut 2, Apr) SW14 E. coli TG-1 SW1/pSW12 (pASK-IBA43+ with co taq mutant 5349, Apr) SW81 E. coli DH5 SW1/pCSRMut3 (pASK-IBA43+ with co taq CSR mut 3, Apr) SW15 E. coli TG-1 SW1/pSW13 (pASK-IBA43+ with co taq mutant 5350, Apr) SW82 E. coli DH5 SW1/pCSRMut4 (pASK-IBA43+ with co taq CSR mut 4, Apr) SW16 E. coli TG-1 SW1/pSW14 (pASK-IBA43+ with co taq mutant 5351, Apr) SW83 E. coli DH5 SW1/pCSRMut5 (pASK-IBA43+ with co taq CSR mut 5, Apr) SW17 E. coli TG-1 SW1/pSW15 (pASK-IBA43+ with co taq mutant 5352, Apr) SW84 E. coli DH5 SW1/pCSRMut6 (pASK-IBA43+ with co taq CSR mut 6, Apr) SW18 E. coli TG-1 SW1/pSW16 (pASK-IBA43+ with co taq mutant 5353, Apr) SW85 E. coli DH5 SW1/pCSRMut7 (pASK-IBA43+ with co taq CSR mut 7, Apr) SW19 E. coli TG-1 SW1/pSW17 (pASK-IBA43+ with co taq mutant 5356, Apr) SW86 E. coli DH5 SW1/pCSRMut8 (pASK-IBA43+ with co taq CSR mut 8, Apr) SW20 E. coli TG-1 SW1/pSW18 (pASK-IBA43+ with co taq mutant 5357, Apr) SW87 E. coli DH5 SW1/pCSRMut9 (pASK-IBA43+ with co taq CSR mut 9, Apr) SW21 E. coli TG-1 SW1/pSW19 (pASK-IBA43+ with co taq mutant 5358, Apr) SW88 E. coli DH5 SW1/pCSRMut10 (pASK-IBA43+ with co taq CSR mut 10, Apr) SW22 E. coli TG-1 SW1/pSW20 (pASK-IBA43+ with co taq mutant 5359, Apr) SW89 E. coli DH5 SW1/pCSRMut11 (pASK-IBA43+ with co taq CSR mut 11, Apr) SW23 E. coli TG-1 SW1/pSW21 (pASK-IBA43+ with co taq mutant 5360, Apr) SW90 E. coli DH5 SW1/pCSRMut12 (pASK-IBA43+ with co taq CSR mut 12, Apr) SW24 E. coli TG-1 SW1/pSW22 (pASK-IBA43+ with co taq mutant 5361, Apr) SW91 E. coli DH5 SW1/pCSRMut13 (pASK-IBA43+ with co taq CSR mut 13, Apr) SW25 E. coli TG-1 SW1/pSW23 (pASK-IBA43+ with co taq mutant 5363, Apr) SW92 E. coli DH5 SW1/pCSRMut14 (pASK-IBA43+ with co taq CSR mut 14, Apr) SW26 E. coli TG-1 SW1/pSW24 (pASK-IBA43+ with co taq mutant 5364, Apr) SW93 E. coli DH5 SW1/pCSRMut15 (pASK-IBA43+ with co taq CSR mut 15, Apr) SW27 E. coli TG-1 SW1/pSW25 (pASK-IBA43+ with co taq mutant 5365, Apr) SW94 E. coli DH5 SW1/pCSRMut16 (pASK-IBA43+ with co taq CSR mut 16, Apr) SW28 E. coli TG-1 SW1/pSW26 (pASK-IBA43+ with co taq mutant 5366, Apr) SW95 E. coli DH5 SW1/pCSRMut17 (pASK-IBA43+ with co taq CSR mut 17, Apr) SW29 E. coli TG-1 SW1/pSW27 (pASK-IBA43+ with co taq mutant 5367, Apr) SW96 E. coli DH5 SW1/pCSRMut18 (pASK-IBA43+ with co taq CSR mut 18, Apr) SW30 E. coli TG-1 SW1/pSW28 (pASK-IBA43+ with co taq mutant 5368, Apr) SW97 E. coli DH5 SW1/pCSRMut19 (pASK-IBA43+ with co taq CSR mut 19, Apr) SW31 E. coli TG-1 SW1/pSW29 (pASK-IBA43+ with co taq mutant 5369, Apr) SW98 E. coli DH5 SW1/pCSRMut20 (pASK-IBA43+ with co taq CSR mut 20, Apr) SW32 E. coli TG-1 SW1/pSW30 (pASK-IBA43+ with co taq mutant 5370, Apr) SW99 E. coli DH5 SW1/pCSRMut21 (pASK-IBA43+ with co taq CSR mut 21, Apr) SW33 E. coli TG-1 SW1/pSW31 (pASK-IBA43+ with co taq mutant 5371, Apr) SW100 E. coli DH5 SW1/pCSRMut22 (pASK-IBA43+ with co taq CSR mut 22, Apr) SW34 E. coli TG-1 SW1/pSW32 (pASK-IBA43+ with co taq mutant 5372, Apr) SW101 E. coli DH5 SW1/pCSRMut23 (pASK-IBA43+ with co taq CSR mut 23, Apr) SW35 E. coli TG-1 SW1/pSW33 (pASK-IBA43+ with co taq mutant 5375, Apr) SW102 E. coli DH5 SW1/pCSRMut24 (pASK-IBA43+ with co taq CSR mut 24, Apr) SW36 E. coli TG-1 SW1/pSW34 (pASK-IBA43+ with co taq mutant 5376, Apr) SW103 E. coli DH5 SW1/pCSRMut25 (pASK-IBA43+ with co taq CSR mut 25, Apr) SW37 E. coli TG-1 SW1/pSW35 (pASK-IBA43+ with co taq mutant 5377, Apr) SW104 E. coli DH5 SW1/pCSRMut26 (pASK-IBA43+ with co taq CSR mut 26, Apr) SW38 E. coli TG-1 SW1/pSW36 (pASK-IBA43+ with co taq mutant 5378, Apr) SW105 E. coli DH5 SW1/pCSRMut27 (pASK-IBA43+ with co taq CSR mut 27, Apr) SW39 E. coli TG-1 SW1/pSW37 (pASK-IBA43+ with co taq mutant 5379, Apr) SW106 E. coli DH5 SW1/pCSRMut28 (pASK-IBA43+ with co taq CSR mut 28, Apr) SW40 E. coli TG-1 SW1/pSW38 (pASK-IBA43+ with co taq mutant 5381, Apr) SW107 E. coli DH5 SW1/pCSRMut29 (pASK-IBA43+ with co taq CSR mut 29, Apr) SW41 E. coli TG-1 SW1/pSW39 (pASK-IBA43+ with co taq mutant 5382, Apr) SW108 E. coli DH5 SW1/pCSRMut30 (pASK-IBA43+ with co taq CSR mut 30, Apr) SW42 E. coli TG-1 SW1/pSW40 (pASK-IBA43+ with co taq mutant 5383, Apr) SW109 E. coli DH5 SW1/pCSRMut31 (pASK-IBA43+ with co taq CSR mut 31, Apr) SW43 E. coli TG-1 SW1/pSW41 (pASK-IBA43+ with co taq mutant 5384, Apr) SW110 E. coli DH5 SW1/pCSRMut32 (pASK-IBA43+ with co taq CSR mut 32, Apr) SW44 E. coli TG-1 SW1/pSW42 (pASK-IBA43+ with co taq mutant 5385, Apr) SW111 E. coli DH5 SW1/pCSRMut33 (pASK-IBA43+ with co taq CSR mut 33, Apr) SW45 E. coli TG-1 SW1/pSW43 (pASK-IBA43+ with co taq mutant 5387, Apr) SW112 E. coli DH5 SW1/pCSRMut34 (pASK-IBA43+ with co taq CSR mut 34, Apr) SW46 E. coli TG-1 SW1/pSW44 (pASK-IBA43+ with co taq mutant 5388, Apr) SW113 E. coli DH5 SW1/pCSRMut35 (pASK-IBA43+ with co taq CSR mut 35, Apr) SW47 E. coli TG-1 SW1/pSW45 (pASK-IBA43+ with co taq mutant 5389, Apr) SW114 E. coli DH5 SW1/pCSRMut36 (pASK-IBA43+ with co taq CSR mut 36, Apr) SW48 E. coli TG-1 SW1/pSW46 (pASK-IBA43+ with co taq mutant 5390, Apr) SW115 E. coli DH5 SW1/pCSRMut37 (pASK-IBA43+ with co taq CSR mut 37, Apr) SW49 E. coli TG-1 SW1/pSW47 (pASK-IBA43+ with co taq mutant 5391, Apr) SW116 E. coli DH5 SW1/pCSRMut38 (pASK-IBA43+ with co taq CSR mut 38, Apr) SW50 E. coli TG-1 SW1/pSW48 (pASK-IBA43+ with co taq mutant 5393, Apr) SW117 E. coli DH5 SW1/pCSRMut39 (pASK-IBA43+ with co taq CSR mut 39, Apr) SW51 E. coli TG-1 SW1/pSW49 (pASK-IBA43+ with co taq mutant 5395, Apr) SW118 E. coli DH5 SW1/pCSRMut40 (pASK-IBA43+ with co taq CSR mut 40, Apr) SW52 E. coli TG-1 SW1/pSW50 (pASK-IBA43+ with co taq mutant 5396, Apr) SW119 E. coli DH5 SW1/pCSRMut41 (pASK-IBA43+ with co taq CSR mut 41, Apr) SW53 E. coli TG-1 SW1/pSW51 (pASK-IBA43+ with co taq mutant 5397, Apr) SW120 E. coli DH5 SW1/pCSRMut42 (pASK-IBA43+ with co taq CSR mut 42, Apr) SW54 E. coli TG-1 SW1/pSW52 (pASK-IBA43+ with co taq mutant 5398, Apr) SW121 E. coli DH5 SW1/pCSRMut43 (pASK-IBA43+ with co taq CSR mut 43, Apr) SW55 E. coli TG-1 SW1/pSW53 (pASK-IBA43+ with co taq mutant 5399, Apr) SW122 E. coli DH5 SW1/pCSRMut44 (pASK-IBA43+ with co taq CSR mut 44, Apr) SW56 E. coli TG-1 SW1/pSW54 (pASK-IBA43+ with co taq mutant 5400, Apr) SW123 E. coli DH5 SW1/pCSRMut45 (pASK-IBA43+ with co taq CSR mut 45, Apr) SW57 E. coli TG-1 SW1/pSW55 (pASK-IBA43+ with co taq mutant 5401, Apr) SW124 E. coli DH5 SW1/pCSRMut46 (pASK-IBA43+ with co taq CSR mut 46, Apr) SW58 E. coli TG-1 SW1/pSW56 (pASK-IBA43+ with co taq mutant 5402, Apr) SW125 E. coli DH5 SW1/pCSRMut47 (pASK-IBA43+ with co taq CSR mut 47, Apr) SW59 E. coli TG-1 SW1/pSW57 (pASK-IBA43+ with co taq mutant 5405, Apr) SW126 E. coli DH5 SW1/pCSRMut48 (pASK-IBA43+ with co taq CSR mut 48, Apr) SW60 E. coli TG-1 SW1/pSW58 (pASK-IBA43+ with co taq mutant 5408, Apr) SW127 E. coli DH5 SW1/pCSRMut49 (pASK-IBA43+ with co taq CSR mut 49, Apr) SW61 E. coli TG-1 SW1/pSW59 (pASK-IBA43+ with co taq mutant 5409, Apr) SW128 E. coli DH5 SW1/pCSRMut50 (pASK-IBA43+ with co taq CSR mut 50, Apr) SW62 E. coli TG-1 SW1/pSW60 (pASK-IBA43+ with co taq mutant 5410, Apr) SW129 E. coli DH5 SW1/pCSRwt1 (pASK-IBA43+ with co taq wt mut 1, Apr) SW63 E. coli TG-1 SW1/pSW61 (pASK-IBA43+ with co taq mutant 5411, Apr) SW130 E. coli DH5 SW1/pCSRwt2 (pASK-IBA43+ with co taq wt mut 2, Apr) SW64 E. coli TG-1 SW1/pSW62 (pASK-IBA43+ with co taq mutant 5413, Apr) SW131 E. coli DH5 SW1/pCSRwt3 (pASK-IBA43+ with co taq wt mut 3, Apr) SW65 E. coli TG-1 SW1/pSW63 (pASK-IBA43+ with co taq mutant 5414, Apr) SW132 E. coli DH5 SW1/pCSRwt4 (pASK-IBA43+ with co taq wt mut 4, Apr) SW66 E. coli TG-1 SW1/pSW64 (pASK-IBA43+ with co taq mutant 5417, Apr) SW133 E. coli DH5 SW1/pCSRwt5 (pASK-IBA43+ with co taq wt mut 5, Apr) SW67 E. coli TG-1 SW1/pSW65 (pASK-IBA43+ with co taq mutant 5418, Apr) SW134 E. coli DH5 F 80d lac Z M15 ( lac ZYAarg F) U169 rec A1 end A1 hsd R17(rk, mk+) gal pho A sup E44 thi1 gyr A96 rel A1

PAGE 97

97 Figure 3-6. Growth curves, cell c ounts, and expression of various E. coli TG-1 cell lines. The SW3 (denoted SW3-I) and SW4 (denoted SW4-I) cell lines were induced after 3.75 hrs with a final c oncentration of 0.2 ng/ L anhydrotetracycline. A) Growth curves of various cell lines. Samples were grown in LB media, cultures SW2 SW4 were supplemented with ampicillin (100 g/mL final concentration), at 250 rpm and 37 C for 8.75 hrs. B) Colony counts (cfu/mL) of each of the cell line s in part A at the various time points. Cells were grown on LB or LB-Amp agar ove rnight at 37 C. C) Coomassie Blue stained SDS-PAGE ( 7.5%) gel showing protein expression of induced cells at various time points. U stands for uninduced cells, I-1 through I-4 indicate time-points at hours one through four after induction (t = 4.75 through t = 7.75 hrs), and NEB Taq depicts the migration of the 94 kDa Taq polymerase purchased from New England Bi oLabs. Since their genetic code has been optimized for use in E. coli cells, the SW4 strain, containing the cotaq gene, appear to grow to a higher OD550nm than the SW3 strain containing the His(6)wt taq gene. 0.000 1.000 2.000 3.000 4.000 5.000 6.000 7.000 012345678Time (hr)OD at 550 nm SW1 SW2 SW3-U SW3-I SW4-U SW4-IA) B) C) SW1SW2SW3-USW3-ISW4-USW4-I 01.36E+072.01E+076.10E+067.95E+066.00E+067.22E+06 15.40E+071.31E+071.22E+078.73E+069.53E+061.04E+07 21.80E+083.04E+078.14E+075.21E+072.65E+073.09E+07 36.48E+088.63E+072.87E+082.79E+083.88E+081.90E+08 3.751.47E+093.22E+086.83E+084.93E+084.09E+084.36E+08 4.752.73E+097.95E+081.08E+092.36E+088.26E+085.66E+08 5.752.94E+091.01E+091.08E+095.07E+071.10E+096.29E+08 6.756.06E+091.39E+091.88E+091.22E+071.53E+095.01E+08 7.753.08E+091.11E+091.77E+097.39E+062.23E+097.63E+08 8.753.14E+091.14E+091.55E+093.90E+062.15E+095.69E+08 Time (hr) Colony Counts (cfu/mL)

PAGE 98

98 Figure 3-7. Purificati on and activity of His(6)wt Taq polymerase. A) The purification of His(6)wt Taq polymerase from SW3 cells after five hours of induction. U uninduced cells, I-5 cells after 5 hrs of induction, L load from the Ni2+ column, W-1 and W2 wash fractions from the column, E-1 through E-4 elution fractions from the column. Elution fractions 2 4 were combined and subjected to dialysis. B) Products of PCRs comparing id entical concentrations of Taq polymerase (New England BioLabs) and His(6)wt Taq polymerase. The amount of product generated with each polymerase was almost identical considering the de nsity of the product band using Taq polymerase was 1980 CNT/mm2 and the density of the product band using His(6)wt Taq polymerase was 1925 CNT/mm2. B) A)

PAGE 99

99 Figure 3-8. Representative gels showing the am ount of full-length PCR products generated with different dNTP/d UNTP ratios and the indicated polym erases. Concentrations of dNTPs/d UNTPs listed are the starting concentr ations (see Materials and Methods for listing of final concentrations). All PCRs used 1 x 106 cfu of cells expressing polymerase as the sole source of polymeras e and template plasmid for the reaction. Polymerases were forced to replicate their own encoding gene (2603 bp). A) Incorporation of various dNTP/d UNTP ratios by coTaq polymerase. FLP is not generated beyond the ratio of 3 mM TTP/7 mM d UTP. B) Incorporation of various dNTP/d UNTP ratios by a representative of the RD Library (SW21). FLP is not generated beyond the ratio of 3 mM TTP/7 mM d UTP. A) B)

PAGE 100

100Table 3-4. Incorporation of d UTP at 94.0 C by RD Library. All dNTPs 9 mM dT/ 1 mM d U 8 mM dT/ 2 mM d U 7 mM dT/ 3 mM d U 6 mM dT/ 4 mM d U 5 mM dT/ 5 mM d U 4 mM dT/ 6 mM d U 3 mM dT/ 7 mM d U 2 mM dT/ 8 mM d U 1 mM dT/ 9 mM d U All d UNTPs SW4Codon-Optimized (co) wt Taq2244256200537119956491535822125537958963718875264360000 SW8D575F,L606C,A740S5857775405894073273076032027871187885249119033000 SW10N480R,F595V,E742H54728116477917088022128626568497901493210000 SW11E517I,V583K,A597S9193408367227471386209954120591624536291919455000 SW12D575F,V583K,M670A247942667915530200640000000 SW14A594C,F664Y,A774H66993312996714253399426591523136200000 SW17S510I,A605K,L606S325360000000000 SW21Q486H,D575T,N580S13448771310961117464999965064049733351111256948284000 SW25A594T,L606C,R657D5092412015401197216989446402000000 SW27E517G,K537I,L613A1371805337232796022605000000 SW29A597S,A740R,E742V766112500823350791263933233934774463873122923000 SW30N580Q,A605E,L613I284020000000000 SW31N580S,F595V,A605G431840000000000 SW34Q486H,E517G,A605K93870362565049336539590640112363888301270000 SW36D575T,N580Q,R584V407260346463592044580000000 SW41L606S,R657D,E739R469252304702968629736000000 SW47T511V,A594C,L606S,A740R66112265382744535249300212299100000 SW72T511V,E517G,L606C,F664Y499069309148155466970995598317934125530000 SW76S510I,T511V,L613I,E739R216700172538119075121527522902695500000 Raw Densities (CNT/mm2) Substitutions Cell Line

PAGE 101

101 Table 3-5. Mutations present afte r selection for active polymerases. Cell Line Mutations Present Cell Line Mutations Present SW79-SW104 P336L,M371T,N580S,L613A SW80 E517G,A597S,A605G,D622A SW105SW81 E468G SW106 E504G,R533I,R584V,F664L,F697S SW82 L11P SW107 G195D,L230P,P552Q,D575T,F664L SW83-SW108 L606P,I611E,E739R,R743A SW84 A594C,F664Y SW109SW85 Q579A SW110SW86-SW111 K257E,S510I,D575F,L606C SW87 F44S SW112SW88-SW113SW89 T31M SW114 M314I SW90-SW115 A201E SW91 V61L SW116 N480R,T511A SW92-SW117SW93 V646I SW118 Q486H,F595V,D622A,F664Y SW94 D575T,F664H,F721L,E742V,R743A,E817G SW119 S510I SW95 I60T SW120SW96-SW121 G327S,H330Y,N480R,R533I SW97-SW122SW98 R743A SW123 S510I,A605E,E612I SW99 I543T,A594C,E631G,F664Y,W703R SW124SW100 Q486H,F595V,D622A SW125SW101-SW126SW102-SW127 R533I,K537I,Q579A,E739R SW103 Q486H,D575T,N580S SW128* The dashes (-) represent polymerases with no amino acid mutations relative to the coTaq sequence.

PAGE 102

102 Table 3-6. Breakdown of types of mutations present after selection. Silent Nonsilent Total SW79X------000 SW80--E517G,A597S, A605G,D622A E517G,A597S, A605G,D622A -088 SW81E468G -----011 SW82L11P -----011 SW83X------000 SW84---A594C,F664Y --033 SW85---Q579A --134 SW86X------000 SW87F44S -----011 SW88X------101 SW89T31M -----011 SW90X------000 SW91V61L -----011 SW92X------000 SW93V646I -----011 SW94--D575T,F664H, F721L,E742V, R743A,E817G ---01616 SW95I60T -----112 SW96X------000 SW97X------000 SW98---R743A --033 SW99--I543T,A594C, E631G,F664Y, W703R I543T,A594C, E631G,F664Y, W703R --066 SW100---Q486H,F595V, D622A --044 SW101X------000 SW102X------000 SW103-Q486H,D575T, N580S ----077 SW104--P336L,M371T, N580S,L613A P336L,M371T, N580S,L613A --077 SW105X------000 SW106--E504G,R533I, R584V,F664L, F697S ---189 SW107--G195D,L230P, P552Q,D575T, F664L G195D,L230P, P552Q,D575T, F664L G195D,L230P, P552Q,D575T, F664L -189 SW108-L606P,I611E, E739R,R743A ----11011 SW109X------000 SW110X------000 SW111--K257E,S510I, D575F,L606C K257E,S510I, D575F,L606C K257E,S510I, D575F,L606C -1910SW112X------000 SW113X------000 SW114M314I -----011 SW115A201E -----011 SW116---N480R,T511A --044 SW117X------000 SW118-Q486H,F595V, D622A,F664Y ----055 SW119---S510I --033 SW120X------000 SW121---G327A,H330Y, N480R,R533I --077 SW122-------011 SW123-S510I,A605E, E612I ----077 SW124X------000 SW125X------000 SW126X------000 SW127-----R533I,K537I, Q579A,E739R 01010 SW128X------000 Cell Line CodonOptimized Taq No RD Mutations (Random Taq Mutations) RD Library Variants # of DNA Mutations RD Variants + Add'l Mutations RD Variants with Reversions RD Recombinants with 1 Crossover RD Recombinants with 2 Crossovers An X indicates polymerases with no amino acid mutations relative to the coTaq sequence.

PAGE 103

103 CHAPTER 4 DISTRIBUTION OF THERMOSTABILITY IN POLYMERASE MUTATION SPACE Introduction Recent years have seen a dramatic increase in the number of experiments being performed to optimize protein function utiliz ing directed evolution. With th e rise of directed evolution, there is a proportional escalation in the number and type of approaches used to create the libraries for these selections. Many different theories exist on the best met hods to create the best library, one that contains a large number of diverse, yet ac tive clones (Hibbert and Dalby, 2005, Arnold and Georgiou, 2003b). These theories contradict ea ch other on fundamental levels; for example, some say it is best to use random mutagenesi s throughout the entire gene (Drummond et al., 2005), and others think that it is better to perform random mutagenesis only within the region containing the active site of the protein (P ark et al., 2005, Dalby, 2003) Conversely, some researchers believe that site-saturation mutagenesis at carefully selected sites generates the best results (Parikh and Matsumura, 2005), while a few consider that mutagenesis at specific sites with specific amino acids will a llow for the creation of an optimal library (Crameri et al., 1998, Crameri et al., 1996, Castle et al., 2004). Our laboratory is interested in pursuing the directed e volution of polymerases to incorporate non-standard nucleotides (NSBs), specifically those exhi biting a C-glycosidic linkage (Fig. 2-3), such as 2-deoxypseudouridine (d U) and 2-deoxypseudothymidine (d T). To determine what type of mutagenic library would best suit our needs, we compared two libraries for their ability to perform at high temperatures, a prerequisite for selection in emulsions under the Ghadessy et al. conditions, as well as being desire d for a synthetic biology(Ghadessy et al., 2004, Ghadessy et al., 2001).

PAGE 104

104 The first was the rationally designed (RD) polymerase library, designed by Dr. Eric Gaucher using the REAP approach as discussed in the previous chapter, wh ere carefully selected residues were changed into other specific amino acids, and a random library (L4) with mutations spread across the whole polymerase sequence for their ability to function at various temperatures. The second was a randomly genera ted library (L4) with mutations spread across the entire polymerase sequence. The L4 lib rary was created using error-prone PCR with Taq polymerase and manganese chloride serving as the mutators (Arnold and Georgiou, 2003b). The starting gene was derived from the cotaq polymerase gene, which is the His(6)wt taq polymerase gene whose sequence had been optimized for codon usage in E. coli cells (Gustafsson et al., 2004). The 74 Taq polymerase mutants in each library we re first tested for their ability to incorporate dNTPs at various temperatures in a PCR reaction to determine the optimal temperature at which individual polymerases pe rformed, judging by the generation of full-length PCR products. In this case, it appeared that random mutagenesis was better able to yield thermostable variants than rational design met hod, but our RD library was specifically modified for identifying polymerases with altered catalytic activities, and therefore targeted sites where changes would be more likely to decrease thermostability. Then, variants from the RD library were tested for their ability to incorporate C-glycosides using mixtures of d UTP and TTP in different ratios, both at 94.0 C and at their optimal temperature. We identified only one muta nt with enhanced abilities over the coTaq polymerase to incorporate d UTP. Finally, the ability of d UTP to epimerize at high temperatures was of concern. Epimerization would lower the concentration of the -anomer, which is the desired substrate for

PAGE 105

105 the polymerase (Fig. 4-1). Therefore, para llel experiments were performed with d TTP, which is known not to epimerize (Wellington and Benner, 2006, Cohn, 1960, Chambers et al., 1963). These results suggested that d U is epimerizing to generate the -epimer, suggesting that d U is not as suitable as a C-glycosid e substrate in PCR experiments generally, as well as in directed evolution studies to develop thermostable polymerases having new catalytic activities. Materials and Methods DNA Sequencing and Analysis DNA sequencing was carried out by the University of Florida Interdis ciplinary Center for Biotechnology Research Sequencing DNA Core Facility using an ABI 3130 xl Genetic Analyzer (Applied Biosystems, Foster City, California) using primers P-6 through P-9 (Table 3-1). BLAST 2 Sequences software was used for se quence similarity sear ching (Tatusova and Madden, 1999); Dertis Reverse and/or compleme nt DNA sequences website was used to find the reverse complement of various DNA strands (Derti, 2003); and ExPASys translate tool was used to translate DNA sequences into their amino acid counterparts (Swiss Institute of Bioinformatics, 1999). Bacterial Growth Conditions and Strains The bacterial strains used in this study ar e listed in Table 3-3 (SW1, SW4 SW78, and SW134) and those in Table 4-4. The rich media used in these studies was Luria-Bertani (LB) medium (Difco Laboratories, Detroit, Michigan) (Miller, 1972). Ampici llin was provided in liquid or solid medium at a final concentration of 100 g/mL. Plasmids were transformed into the E. coli TG-1 cell line according to manufacturer s protocol (Zymo Research, Orange, California). Cell growth was determined by m easuring optical density at 550 nm using a SmartSpec Plus Spectrophotometer (Bio-Rad, Herc ules, California). Anhydrotetracycline (2

PAGE 106

106 mg/mL stock in N,N -dimethylformamide) was used at a final concentration of 0.2 ng/ L to induce expression. Synthesis of Triphosphates and Oligonucleotides Dr. Shuichi Hoshika, from the Foundation for Applied Molecular Evolution (FfAME, Gainesville, Florida), synthesized the pseudothymid ine precursor as described in Appendix A. Dr. Daniel Hutter (also of FfAME) synthesized 2-deoxyps eudothymidine-5-triphosphate (d TTP) as described in Appendix A. 2-deoxypseudouridine5-triphosphate (d UTP) was purchased from TriLink BioTechnol ogies (San Diego, California) Standard deoxynucleotide triphosphates (dNTPs) of 2-deoxyadenosine5-triphosphate (dATP), 2-deoxycytidine-5triphosphate (dCTP), 2-deoxyganos ine-5-triphosphate (dGTP), and thymidine-5-triphosphate (TTP) were purchased from Promega Co rporation (Madison, Wisconsin). d TNTP solutions were comprised of dATP, dCTP, dGTP, and d TTP, while d UNTPs were comprised of dATP, dCTP, d GTP, and d UTP. Random Mutagenic Library (L4 Library) Creation DNA 2.0 (Menlo Park, California) s ynthesized a form of the His(6)wt taq polymerase gene (cotaq ) that was optimized for the codon-usage of E. coli which was then used to construct the pSW2 plasmid (Fig. 3-3). Mutage nic PCR was performed on the cotaq gene to generate a library containing three to four amino acid changes per polymerase in a fashion similar to that described by Arnold and Geor giou (Arnold and Georgiou, 2003b). The PCRs contained the following: 1 X Mutagenic Taq Buffer (10 mM Tris-HCl, pH 8.3, 50 mM KCl, 15 mM MgCl2), 0.1 ng/ L pSW2, 200 M dNTPs, 300 nM P-4, 300 nM P-5, 5 U Taq polymerase (New England BioLabs, Beverly, Massachusetts), and MnCl2 (115 M). PCR reaction conditions were as follows: 5 min, 94 C; (30 s, 94.0 C; 20 s, 55.0 C; 3 min, 72.0 C)x15 cycles; 7 min, 72.0 C.

PAGE 107

107 Products were purified with the QIAquick PCR Purification Kit (Q iagen, Valencia, CA), eluted with Qiagen Buffer EB (50 L), and quantitated at an absorb ance of 260 nm using a SmartSpec Plus Spectrophotometer (Bio-Rad). The mutagenic PCR products were used in an adaptation of the Miyazaki and Takenouchi megaprimer PCR protocol (Miyazaki and Takenouchi, 2002). Samples (10 ng/ L final concentration) were added to a PCR mixture (1X Native Pfu Buffer, 100 ng pSW2, 500 M dNTPs, 6% DMSO). Mixture was heated to 96 C for 30 s prior to the addition of 0.05 U/ L Native Pfu Polymerase (Stratagene, La Jolla, California). Samples were then subjected to PCR [2 min, 96 C; (30 s, 96.0 C; 10 min, 68.0 C)x25 cycles; 30 min, 72.0 C]. The host strands of DNA (the pSW2 plasmi d in the PCR) were digested with 2 U Dpn I (New England BioLabs) at 37 C for 2.5 hrs. Reactions cooled to room temperature, purified using a QIAquick PCR Purificat ion Kit (Qiagen), and eluted with Qiagen Buffer EB (30 L). Purified products were transformed into the E. coli DH5 cell line according to manufacturers protocol (Invitrogen, Carlsbad, Ca lifornia). Seventy-nine isolated colonies were selected after the transformation (cell lines SW135 through SW 211). Each colony was grown in a separate overnight 5 mL LB-Amp culture (250 rpm, 37 C) in 14 mL 2059 Falcon Tubes (BD Biosciences, San Jose, California). Their plas mids were isolated using the QIAprep Spin Miniprep Kit (Qiagen). Plasmid constructs were verified both by restri ction digest analysis, using the enzymes Bam HI and Nco I according to the manufacturers protocol (New England BioLabs), and mutations were determined by sequencing. The 74 L4 Library plasmids containing mutations were transformed into the E. coli TG-1 cell line (ce ll lines SW212 through SW285) according to manufacturers protocol (Zymo Research, Orange, California).

PAGE 108

108 Incorporation of dNTPs by RD and L4 Libraries at Various Temperatures A single isolated colony from each of th e SW5 through SW78 (RD Library) and SW212 through SW285 (L4 Library) cell line s were used to inoculate a 148 individual cultures (5 mL LB-Amp), and were grown for 14.25 hrs at 250 rpm and 37 C in 14 mL 2059 Falcon Tubes (BD Biosciences). Approximately 2 x 108 colony-forming units (cfu), roughly equal to 500 L of a culture with an OD550nm of 4.0, were used to inoculate indi vidual 100 mL cultures of LB-Amp in 500 mL baffled flasks. These cultures were grown at 37 C and 250 rpm for 3.75 hrs to an approximate OD550nm of 1.8, and were then induced with an hydrotetracycline. The cells were allowed to grow for 1 hr longer to an approximate OD550nm of 3.0. Approximately 1 x 106 cfu (~2 L cells) were used as the sole source of polymerase and template in separate PCR reactions containing fi nal concentrations of these constituents: 1X Modified ThermoPol Buffer (2 mM Tr is-HCl, pH 9, 10 mM KCl, 1 mM (NH4)2SO4, 2.5 mM MgCl2, 0.2% Tween 20), 500 M dNTPs, 1.4 M P-4, 1.4 M P-5, 1.1 ng/ L RNaseA, and 6% DMSO. The PCRs (50 L) were run under the following conditions: 5 min, X C; (1 min, X C; 1 min, 55.0 C; 3 min, 72.0 C)x15 cycles; 7 min, 72.0 C, where X was a denaturing temperature of 75.0 C, 75.5 C, 76.6 C, 78.1 C, 80.4 C, 83.1 C, 86.3 C, 89.0 C, 91.1 C, 92.6 C, 93.7 C, or 94.0 C. Products were analyzed by agarose gel electrophoresis and quantitated using the GeneT ools Software, version 3.07 (SynGene, Cambridge, England). Incorporation of d UNTPs by RD Library at Optimal Temperatures A single isolated colony from each of the cell lines (SW4 through SW78) that were active at one of the temperatures tested, were used to inoculate a 33 indivi dual cultures (5 mL LBAmp). These were grown for 14.25 hrs at 250 rpm and 37 C in 14 mL 2059 Falcon Tubes (BD Biosciences). Approximately 2 x 108 cfu, roughly equal to 500 L of a culture with an OD550nm

PAGE 109

109 of 4.0, were used to inoculate individual 100 mL cultures of LB-Amp in 500 mL baffled flasks. These cultures were grown at 37 C and 250 rpm for 3.75 hrs to an approximate OD550nm of 1.8, and were then induced with anhydrotetracycline. The cells were allowed to grow for 1 hr longer to an approximate OD550nm of 3.0. Approximately 1 x 106 cfu (~2 L cells) were used as the source of polymerase and template in separate PCR reactions containing fi nal concentrations of these constituents: 1X Modified ThermoPol Buffer, 1.4 M P-4, 1.4 M P-5, 1.1 ng/ L RNaseA, and 6% DMSO. One of the following sets of nucleotide triphospha tes were added to th e reactions (final concentrations): 500 M dNTPs; 500 M dATP/dGTP/dCTP; 500 M dATP/dGTP/dCTP + 450 M TTP + 50 M d UTP; 10 M dATP/dGTP/dCTP + 400 M TTP + 100 M d UTP; 10 M dATP/dGTP/dCTP + 350 M TTP + 150 M d UTP; 10 M dATP/dGTP/dCTP + 300 M TTP + 200 M d UTP; 10 M dATP/dGTP/dCTP + 250 M TTP + 250 M d UTP; 10 M dATP/dGTP/dCTP + 200 M TTP + 300 M d UTP; 10 M dATP/dGTP/dCTP + 150 M TTP + 350 M d UTP; 10 M dATP/dGTP/dCTP + 100 M TTP + 400 M d UTP; 10 M dATP/dGTP/dCTP + 50 M TTP + 450 M d UTP; 500 M d UTPs. The PCRs (50 L) were run under the following conditions: 5 min, X C; (1 min, X C; 1 min, 55.0 C; 3 min, 72.0 C)x15 cycles; 7 min, 72.0 C, where X was each polymerases optimal denaturing temperature of 86.3 C, 89.0 C, 91.1 C, 92.6 C, 93.7 C, or 94.0 C. Products were analyzed by agarose gel electrophoresis and quantitated using th e GeneTools Software, version 3.07 (SynGene, Cambridge, England).

PAGE 110

110 Incorporation of d UTP and d TTP by coTaq Polymerase at Various Melting Temperatures The SW4 cell line was used to inoculate an LB-Amp culture (5 mL in a 14 mL 2059 Falcon Tube: BD Biosciences). The culture was grown for 14.25 hrs (250 rpm shaking at 37 C). Approximately 2 x 108 cfu of the resulting cell suspension (ca. 500 L of a culture with an OD550nm of 4.0) was used to inoculate a secondary culture of LB-Amp (100 mL in a 500 mL baffled flask). The secondary culture was grown (37 C, 250 rpm) for 3.75 hrs to an approximate OD550nm of 1.8. Expression of the poly merase was then induced with anhydrotetracycline. The cells were allowed to grow for 1 hr longer to an approximate OD550nm of 3.0. The cells themselves (approximately 1 x 106 cfu or ~2 L cells) were used as the sole source of polymerase and template in separate PCRs. These reactions contained a final concentrations of the following: 1X Modified ThermoPol Buffer, 1.4 M P-4, 1.4 M P-5, 1.1 ng/ L RNaseA, and 6% DMSO. The final concentr ation of nucleotide triphosphates added to the reactions can be found in Table 4-6. The PCRs (50 L) were run under the following conditions: 5 min, X C; (1 min, X C; 1 min, 55.0 C; 3 min, 72.0 C)x15 cycles; 7 min, 72.0 C, where X was a denaturing temperature of 86.3 C, 89.0 C, 91.1 C, 92.6 C, 93.7 C, or 94.0 C. Products were analyzed by agarose ge l electrophoresis and qua ntitated using the GeneTools Software, version 3.07 (SynGene, Cambridge, England). Results Random Mutagenic Library (L4 Library) Creation The L4 mutagenic library was created using the cotaq gene as the template sequence, and MnCl2 and Taq polymerase as the mutagens (Arnold and Georgiou, 2003b). Conditions were manipulated to create a library with approxima tely three amino acid changes per gene. After

PAGE 111

111 purification of the mutagenic PCR products, they were used in a variation of the Miyazaki and Takenouchi megaprimer PCR protocol (Miy azaki and Takenouchi, 2002), creating the fulllength plasmid (pASK-IBA43plus w ith insert). Purified produc ts were transformed into the E. coli DH5 cell line; 79 clones were isolated, sequenced, and compared to the coTaq amino acid sequence (Table 4-2). Of those 79 clones, only five retained the coTaq polymerase sequence; the remaining 74 contained at least one mutation. The plasmids containing the 74 mutant L4 genes were then transformed into the E. coli TG-1 expression cell line. Incorporation of dNTPs by RD and L4 Libraries at Various Temperatures To determine the optimal temperature for each of the cell lines in the RD and L4 Libraries, each of these mutant Taq polymerases were tested for their ability to incorporate dNTPs in PCR reactions at various temp eratures ranging from 75.0 C to 94.0 C. Reactions contained induced cells (1 x 106 cfu) as the sole source of polymerase and template plasmid, so active polymerases were forced to replicate th eir own encoding gene (2603 bp). Figure 4-2[A-C] shows the difference be tween the PCR products from the coTaq polymerase screen and representatives of the RD Library (SW17) and th e L4 Library (SW251). In these, and all of the other reactions screen ing various temperatures no full-length product (FLP) was observed at a temperature lower than 86.3 C. Based on the product band densities, the optimal temperature for each polymerase in these two libraries was determined (Table 4-3 and Table 4-4). Of the 74 L4 mutants, 39 were active at one of the temperatures tested, as compared to only 33 of the 74 RD mutants that were active. Figure 4-3[A-B] shows the distribution of the active polymerases in each library at the various temperatures. The average temperature for the RD mutants was 87.5 C and for the L4 mutants it was 89.0 C. It is interesting to note that

PAGE 112

112 within the RD Library, if the mutant was active with a 94.0 C temperature, it was active at the other five temperatures as well. This was not the case, however, w ith the L4 mutants; two of the mutants that were active at 94.0 C were not active at th e lower end of the spectrum. In addition, more mutants were stable at higher temperatur es in the L4 variants as opposed to the RD variants. Incorporation of d UNTPs by RD Library at Optimal Temperatures To find a polymerase that can incorporate and extend beyond d Us with higher efficiency than the co -Taq polymerase, each of the active mutant Taq polymerases in the RD Library were tested for their ability to incor porate varying concentrations of d UTP across from template dA in PCR reactions at their optimal temperature. Those polymerases that were not able to incorporate dNTPs at any temperature were not a ssayed in this experiment Reactions contained induced cells (1 x 106 cfu) as the sole source of polymeras e and template plasmid, again forcing active polymerases to replicate their own en coding gene (2603 bp). Control reactions, containing cells (SW4) expressing the coTaq polymerase, were performe d with the temperatures of 86.3 C and 89.0 C for comparative purposes. The raw densities of FLP bands, as measured by GeneTools Software (ver. 3.07, SynGene) of the RD mutants at their optim al temperatures were compared to those generated by the coTaq at that same temperature (Table 4-5). None of these polymerases showed an optimal temperature of 94.0 C; they all had optimal temperatures of 86.3 C or 89.0 C. In addition, all but one of the RD mutants failed to incorporate d UTP as efficiently as coTaq polymerase at their optimal temperature. The remaining mutant (SW29), howev er, showed an ability to generate FLP in all dNTP/d UNTP ratios up to 72% more efficiently, on average, than coTaq polymerase at 86.3 C (Fig. 4-4[A-C]).

PAGE 113

113 Although the RD mutants were unabl e to perform as well as coTaq polymerase at their optimal temperatures, as compared to the coTaq FLP at that same temperat ure, they were able to generate, on average, 40% more FL P at their optimal temperature th an when they were tested at 94.0 C (Table 4-5 versus Table 3-4). Figure 4-5[A-C] shows repr esentative (SW8) PCR products from the polymerase screen at 94.0 C (Fig. 4-5A) and at the SW8 polymerases optimal temperature of 86.3 C (Fig. 4-5B). In the first set of reactions the polymerase produced FLP with concentrations of 350 M d UTP; however, in the second set, the polymerase was able to generate FLP with concentrations of 400 M d UTP. This is graphically represen ted in Figure 4-5C. Incorporation of d UTP and d TTP by coTaq Polymerase at Various Melting Temperatures We performed a comparative analysis between the ability of coTaq polymerase to incorporate d UTP or d TTP in various concentrations at different temperatures. Table 4-6 shows the raw densities, as determined by the Ge neTools Software (ver. 3.07, SynGene), of the FLP bands generated by these experiments. Once the d TTP final concentration reached 300 M, all of the FLPs generated averaged a 23% higher density than those produced when using d UTPs at identical concentrations. Ho wever, at the temperatures of 91.1 C and 92.6 C, all concentrations of d TTP supported the synthesis of more FLP than d UTP. In addition, at every temperature tested, FLP was present when the d TTP concentration was in a 9:1 ratio with TTP, whereas the FLP was only obser ved at the highest ratio of d UTP to TTP of 8:2 when the temperature was 86.3 C or 89.0 C. Representative gels of these expe riments, at a temperature of 86.3 C, can be seen in Figure 4-6[A-C]. In the first gel (Fig. 4-6A), we see that FLP was generated up to an 8:2 ratio of

PAGE 114

114 d UTP to TTP; but was produced at a 9:1 ratio of d TTP to TTP (Fig. 4-6B). Easily visualized in the chart (Fig. 4-6C), the amount of FLP formed was higher when d UTP was present, until the final concentration reached 300 M; then the level of FLP was elevated when in the presence of d TTP. Graphical representations of the densities of the remaining five temperatures can be found in Figure 4-7[A-E]. Discussion In previous studies (Chapter 3), we showed that only 24% of the RD Library mutants generated PCR products in the presence of dNTP s when the highest cy cle temperature was 94.0 C (Table 3-4). This was, perhaps, consistent with expectation. A highe r percentage of active mutants, if randomly produced, might be expected to yield very few variants (Guo et al., 2004), but this might be balanced through the selection of sites less likely to cause unfolding, according to the REAP hypothesis (Gaucher, 2006). Unexp ectedly, we found that at least one of the 35 sites identified by the REAP approach had been pr eviously shown to modify the thermostability of Taq polymerase (Ghadessy et al., 2001). It is possible that other mutations affected thermostability as well. Therefore, polymerases in the RD Library were tested for their ability to incorporate dNTPs at temp eratures ranging from 75.0 C to 94.0 C. Of the 74 clones tested, none were able to generate FLP below a temperature of 86.3 C. This is most likely not due to any property of th e polymerase, but rather to the inability of the duplex DNA strands to melt at these lower temper atures, considering the temperature of the taq gene is ca. 88 C. Thirty-three of the clones (45%), in cluding the 18 previously shown to have activity (Table 3-4), were now able to incorporate dNTPs at lower temperatures down to 86.3 C (Table 4-3). This leads us to believe that so me of these replacements, located in and around the active site, indeed lowered the thermostability of the coTaq polymerase.

PAGE 115

115 It is also interesting to point out th at the SW36 and SW74 cell lines, containing polymerases with the F664H or F664L mutations re spectively, refutes Suzuki s theory that this phenylalanine residue can only be mutated to a tyrosi ne and retain activity (Suzuki et al., 1996). It may also be that the mutation to tyrosine allows the polymerase to retain the thermostability at 94.0 C; Suzuki et al. however, did not test the thermostability of the mutants in their experiments, and the activity seen with the F 664H and F664L mutations in this research was only at lower temperatures. We wanted to determine if a randomly create d mutagenic library would be as likely to create clones that display a d ecrease in the thermostability of the protein. A random, mutagenic library (L4) was created using MnCl2 and Taq polymerase (New England BioLabs) as the mutators (Table 4-2). We iden tified 74 clones, with unique sequen ces, that were also tested for their ability to incorporate dNTP s at various temperatures. Thirty-nine (53%) of the clones were able to generate FLP at one of the temperatures tested, and twenty-eight (38%) of these were able to function at 94.0 C (Table 4-4). Comparison of the thermostab ility of polymerases from the two libraries found more thermostability in the L4 random lib rary than in the RD library (F ig. 4-3[A-B]). Thirty-nine of the L4 mutants were active, with an average optimal temperature of 89.0 C, while only thirtythree of the RD clones retain ed activity, with an averag e optimal temperature of 87.5 C. It remains open whether this difference reflects the slightly greater number of replacements in the RD library members (3.5 residues/ protein) over the L4 library members (3 residues/protein). The ability to function at lower temperatures can be advantageous to the incorporation of non-standard bases. The litera ture reports examples where NSBs are incorporated more efficiently at lower temperatures (Rappaport, 2004, Horlacher et al., 1995 ). We, therefore,

PAGE 116

116 decided to test the incorporation of d UTP at each of the RD mutants optimal temperatures, comparing the results to those seen at 94.0 C (Table 3-4). Table 4-5 shows the ability of the active RD mu tants to incorporate various concentrations of d UTP in a PCR reaction with an optimal temperature of either 86.3 C or 89.0 C. Using this assay system, we identified a mutant (SW29) that was able to ge nerate, on average, 72% more product than coTaq polymerase with a temperature of 86.3 C (Fig. 4-4[A-C]). This mutant, containing the A597S, A740R, and E742V residue changes, produced FLP up to a final concentration of 400 M d UTP. The remaining 32 active polymerases that were tested were unable to incorporate the d UTP as well as coTaq polymerase; they were, however, able to generate, on average, 40% more FL P at their optimal temperature th an when they were tested at 94.0 C (Table 4-5 versus Table 3-4). This lends st rength to the theory that NSBs are easier to incorporate at lower temperatures. Another reason d UTP is more readily incorporated at lower temperatures could be due to its ability to epimerize (Fig. 4-1) (Wellin gton and Benner, 2006, Cohn, 1960, Chambers et al., 1963). Epimerization can occur at two stages. First, the d UTP might epimerize, converting d UTP (the substrate) to -d UTP (not a substrate). At worst, the -d UTP might be an inhibitor, but this possibility is not considered. The consequence of this conversion is to lower the concentration of -epimer, as well as its amount. If th e concentration of the triphosphate is less than its Michaelis constant this would slow the rate of primer extension. Here, the concentrations are 500 M, not dramatically higher than Taq polymerases reported Kms for dNTPs (16 M) (Kong et al., 1993), but gi ven the long extension time (3 min), it is doubtful that this is the origin of any effect seen. Alternativ ely, if the PCR is run to the point of exhaustion of the triphosphates, then a lower yield of PCR product is expected simply because the conversion

PAGE 117

117 of -d UTP to -d UTP leads to earlier exhaustion. Sin ce the amplicon is about 56% GC, this is considered not to be likel y, as the triphosphates are present in equal concentrations, implying that dGTP and/or dCTP would be the first to be exhausted. The alternative possibility is that the -d UTP is epimerizing at high temperatures after it is incorporated into the amplicon. Unlike with epimerization as the triphosphate, epimerized amplicon cannot simply be ignored. Rather, it creates serious problems with read-through by the polymerase; the amplicon may be lost to further PCR even with a single -d UTP. In all experiments, the total concentration (T+d UTP or T+d TTP) were kept constant (500 M). The total primer concentration (1.4 M each, 7 x 10-11 molecules per 50 L assay) is approximately five orders of magnitude greater th an the number of copies of the plasmids (about 300 copies per cell, and about 106 cells per assay). With 2603 nucleotides per amplicon, the triphosphates are nominally consumed after 15.5 P CR cycles; the primer is nominally consumed after 17 PCR cycles. This PCR was carried out for 15 rounds, but it is doubtful that nominal perfection (true doubling each round) was observe d here, or anywhere in PCR literature. The results with d UTP and d TTP in carious concentrations and at different temperatures are shown in Table 4-6. While errors in the first column, where the pseudo component concentration is zero, are large, trends across the series ar e consistent. It appears that d TTP supports the formation of PCR produc t at higher concentrations than d UTP. There is no obvious way to explain this if the limitation on product formation involves the exhaustion of either the triphosphate or the primer. Rather, it is consistent with a slow rate of epimerization of d UTP once it is incorporated into the amplicon. With d TTP, eventually at high concentrations, PCR product formation subsides. Th is is attributed to th e accumulative effects of too many unnatural nucleotides present in the amplicon (approaching 25%).

PAGE 118

118 Numerous approaches for library design exist to support directed evolut ion; this study has compared two of them: A) a rationally designed library based on comparative sequence analysis of the active sites of Family A polymerases, and B) a randomly mutagenized library with no preference to the location of mutations. Our st udies have shown that we are more likely to generate active, thermostable mutants with a ra ndomly mutated library than with our rationally designed library. This supports the conclusions drawn by Arnol d and colleagues (Drummond et al., 2005), who determined that libraries with high error-rates distributed throughout the entirety of the gene, result in a higher than exp ected number of active and unique variants. Our RD Library was designed to identify pol ymerases with an increased ability to incorporate NSBs, not for thermostability, so th e you get what you select for theory may be applicable in this situation. We were able to identify one mutant with an increased ability over coTaq polymerase to incorporate d UTP; this, however, is only one example of a C-glycoside. Since it is possible that d UTP is epimerizing at the temperatures tested, perhaps the testing of other C-glycosides, such as d TTP, can assist in the identifica tion of more mutants with the capability of incorporating these, or other NS Bs. Additionally, useful information could be gained by testing the ability all of our RD and L4 mutants for their ab ility to incorporate d TTP, or other NSBs, not only about inco rporation of NSBs, but also in regards to favorable library design.

PAGE 119

119 O H O R H H H H RO NNH O O H H OH H O R H H H H RO NNH O O H Figure 4-1. Epimerization of 2-deoxypseudour idine. 2-deoxypseudouridine can epimerize under acidic, basic, or even neutral conditions over time, in either the nucleoside or the oligonucleotide forms. Polymerases will not incorporate the epimerized form of this nucleotide, therefore use of the non-epimerizing 2-deoxypseudothymidine is recommended.

PAGE 120

120 Table 4-1. Additional bacteria l strains used in this study. NameStrainGenotypeNameStrainGenotypeSW135 E. coli DH5 SW1/pL4Mut1 (pASK-IBA43+ with co taq L4 mut 1, Apr) SW210 E. coli DH5 SW1/pL4Mut78 (pASK-IBA43+ with co taq L4 mut 78, Apr) SW136 E. coli DH5 SW1/pL4Mut2 (pASK-IBA43+ with co taq L4 mut 2, Apr) SW211 E. coli DH5 SW1/pL4Mut79 (pASK-IBA43+ with co taq L4 mut 79, Apr) SW137 E. coli DH5 SW1/pL4Mut3 (pASK-IBA43+ with co taq L4 mut 3, Apr) SW212 E. coli TG-1 SW1/pL4Mut1 (pASK-IBA43+ with co taq L4 mut 1, Apr) SW138 E. coli DH5 SW1/pL4Mut4 (pASK-IBA43+ with co taq L4 mut 4, Apr) SW213 E. coli TG-1 SW1/pL4Mut2 (pASK-IBA43+ with co taq L4 mut 2, Apr) SW139 E. coli DH5 SW1/pL4Mut5 (pASK-IBA43+ with co taq L4 mut 5, Apr) SW214 E. coli TG-1 SW1/pL4Mut3 (pASK-IBA43+ with co taq L4 mut 3, Apr) SW140 E. coli DH5 SW1/pL4Mut6 (pASK-IBA43+ with co taq L4 mut 6, Apr) SW215 E. coli TG-1 SW1/pL4Mut4 (pASK-IBA43+ with co taq L4 mut 4, Apr) SW141 E. coli DH5 SW1/pL4Mut7 (pASK-IBA43+ with co taq L4 mut 7, Apr) SW216 E. coli TG-1 SW1/pL4Mut5 (pASK-IBA43+ with co taq L4 mut 5, Apr) SW142 E. coli DH5 SW1/pL4Mut8 (pASK-IBA43+ with co taq L4 mut 8, Apr) SW217 E. coli TG-1 SW1/pL4Mut6 (pASK-IBA43+ with co taq L4 mut 6, Apr) SW143 E. coli DH5 SW1/pL4Mut9 (pASK-IBA43+ with co taq L4 mut 9, Apr) SW218 E. coli TG-1 SW1/pL4Mut7 (pASK-IBA43+ with co taq L4 mut 7, Apr) SW144 E. coli DH5 SW1/pL4Mut10 (pASK-IBA43+ with co taq L4 mut 10, Apr) SW219 E. coli TG-1 SW1/pL4Mut8 (pASK-IBA43+ with co taq L4 mut 8, Apr) SW145 E. coli DH5 SW1/pL4Mut11 (pASK-IBA43+ with co taq L4 mut 11, Apr) SW220 E. coli TG-1 SW1/pL4Mut9 (pASK-IBA43+ with co taq L4 mut 9, Apr) SW146 E. coli DH5 SW1/pL4Mut12 (pASK-IBA43+ with co taq L4 mut 12, Apr) SW221 E. coli TG-1 SW1/pL4Mut10 (pASK-IBA43+ with co taq L4 mut 10, Apr) SW147 E. coli DH5 SW1/pL4Mut13 (pASK-IBA43+ with co taq L4 mut 13, Apr) SW222 E. coli TG-1 SW1/pL4Mut11 (pASK-IBA43+ with co taq L4 mut 11, Apr) SW148 E. coli DH5 SW1/pL4Mut14 (pASK-IBA43+ with co taq L4 mut 14, Apr) SW223 E. coli TG-1 SW1/pL4Mut12 (pASK-IBA43+ with co taq L4 mut 12, Apr) SW149 E. coli DH5 SW1/pL4Mut15 (pASK-IBA43+ with co taq L4 mut 15, Apr) SW224 E. coli TG-1 SW1/pL4Mut13 (pASK-IBA43+ with co taq L4 mut 13, Apr) SW150 E. coli DH5 SW1/pL4Mut16 (pASK-IBA43+ with co taq L4 mut 16, Apr) SW225 E. coli TG-1 SW1/pL4Mut14 (pASK-IBA43+ with co taq L4 mut 14, Apr) SW151 E. coli DH5 SW1/pL4Mut17 (pASK-IBA43+ with co taq L4 mut 17, Apr) SW226 E. coli TG-1 SW1/pL4Mut15 (pASK-IBA43+ with co taq L4 mut 15, Apr) SW152 E. coli DH5 SW1/pL4Mut18 (pASK-IBA43+ with co taq L4 mut 18, Apr) SW227 E. coli TG-1 SW1/pL4Mut16 (pASK-IBA43+ with co taq L4 mut 16, Apr) SW153 E. coli DH5 SW1/pL4Mut19 (pASK-IBA43+ with co taq L4 mut 19, Apr) SW228 E. coli TG-1 SW1/pL4Mut17 (pASK-IBA43+ with co taq L4 mut 17, Apr) SW154 E. coli DH5 SW1/pL4Mut20 (pASK-IBA43+ with co taq L4 mut 20, Apr) SW229 E. coli TG-1 SW1/pL4Mut18 (pASK-IBA43+ with co taq L4 mut 18, Apr) SW155 E. coli DH5 SW1/pL4Mut21 (pASK-IBA43+ with co taq L4 mut 21, Apr) SW230 E. coli TG-1 SW1/pL4Mut20 (pASK-IBA43+ with co taq L4 mut 20, Apr) SW156 E. coli DH5 SW1/pL4Mut22 (pASK-IBA43+ with co taq L4 mut 22, Apr) SW231 E. coli TG-1 SW1/pL4Mut21 (pASK-IBA43+ with co taq L4 mut 21, Apr) SW157 E. coli DH5 SW1/pL4Mut23 (pASK-IBA43+ with co taq L4 mut 23, Apr) SW232 E. coli TG-1 SW1/pL4Mut22 (pASK-IBA43+ with co taq L4 mut 22, Apr) SW158 E. coli DH5 SW1/pL4Mut24 (pASK-IBA43+ with co taq L4 mut 24, Apr) SW233 E. coli TG-1 SW1/pL4Mut23 (pASK-IBA43+ with co taq L4 mut 23, Apr) SW159 E. coli DH5 SW1/pL4Mut25 (pASK-IBA43+ with co taq L4 mut 25, Apr) SW234 E. coli TG-1 SW1/pL4Mut26 (pASK-IBA43+ with co taq L4 mut 26, Apr) SW160 E. coli DH5 SW1/pL4Mut26 (pASK-IBA43+ with co taq L4 mut 26, Apr) SW235 E. coli TG-1 SW1/pL4Mut27 (pASK-IBA43+ with co taq L4 mut 27, Apr) SW161 E. coli DH5 SW1/pL4Mut27 (pASK-IBA43+ with co taq L4 mut 27, Apr) SW236 E. coli TG-1 SW1/pL4Mut28 (pASK-IBA43+ with co taq L4 mut 28, Apr) SW162 E. coli DH5 SW1/pL4Mut28 (pASK-IBA43+ with co taq L4 mut 28, Apr) SW237 E. coli TG-1 SW1/pL4Mut29 (pASK-IBA43+ with co taq L4 mut 29, Apr) SW163 E. coli DH5 SW1/pL4Mut29 (pASK-IBA43+ with co taq L4 mut 29, Apr) SW238 E. coli TG-1 SW1/pL4Mut30 (pASK-IBA43+ with co taq L4 mut 30, Apr) SW164 E. coli DH5 SW1/pL4Mut30 (pASK-IBA43+ with co taq L4 mut 30, Apr) SW239 E. coli TG-1 SW1/pL4Mut31 (pASK-IBA43+ with co taq L4 mut 31, Apr) SW165 E. coli DH5 SW1/pL4Mut31 (pASK-IBA43+ with co taq L4 mut 31, Apr) SW240 E. coli TG-1 SW1/pL4Mut32 (pASK-IBA43+ with co taq L4 mut 32, Apr) SW166 E. coli DH5 SW1/pL4Mut32 (pASK-IBA43+ with co taq L4 mut 32, Apr) SW241 E. coli TG-1 SW1/pL4Mut33 (pASK-IBA43+ with co taq L4 mut 33, Apr) SW167 E. coli DH5 SW1/pL4Mut33 (pASK-IBA43+ with co taq L4 mut 33, Apr) SW242 E. coli TG-1 SW1/pL4Mut34 (pASK-IBA43+ with co taq L4 mut 34, Apr) SW168 E. coli DH5 SW1/pL4Mut34 (pASK-IBA43+ with co taq L4 mut 34, Apr) SW243 E. coli TG-1 SW1/pL4Mut35 (pASK-IBA43+ with co taq L4 mut 35, Apr) SW169 E. coli DH5 SW1/pL4Mut35 (pASK-IBA43+ with co taq L4 mut 35, Apr) SW244 E. coli TG-1 SW1/pL4Mut36 (pASK-IBA43+ with co taq L4 mut 36, Apr) SW170 E. coli DH5 SW1/pL4Mut36 (pASK-IBA43+ with co taq L4 mut 36, Apr) SW245 E. coli TG-1 SW1/pL4Mut37 (pASK-IBA43+ with co taq L4 mut 37, Apr) SW171 E. coli DH5 SW1/pL4Mut37 (pASK-IBA43+ with co taq L4 mut 37, Apr) SW246 E. coli TG-1 SW1/pL4Mut38 (pASK-IBA43+ with co taq L4 mut 38, Apr) SW172 E. coli DH5 SW1/pL4Mut38 (pASK-IBA43+ with co taq L4 mut 38, Apr) SW247 E. coli TG-1 SW1/pL4Mut39 (pASK-IBA43+ with co taq L4 mut 39, Apr) SW173 E. coli DH5 SW1/pL4Mut39 (pASK-IBA43+ with co taq L4 mut 39, Apr) SW248 E. coli TG-1 SW1/pL4Mut40 (pASK-IBA43+ with co taq L4 mut 40, Apr) SW174 E. coli DH5 SW1/pL4Mut40 (pASK-IBA43+ with co taq L4 mut 40, Apr) SW249 E. coli TG-1 SW1/pL4Mut41 (pASK-IBA43+ with co taq L4 mut 41, Apr) SW175 E. coli DH5 SW1/pL4Mut41 (pASK-IBA43+ with co taq L4 mut 41, Apr) SW250 E. coli TG-1 SW1/pL4Mut42 (pASK-IBA43+ with co taq L4 mut 42, Apr) SW176 E. coli DH5 SW1/pL4Mut42 (pASK-IBA43+ with co taq L4 mut 42, Apr) SW251 E. coli TG-1 SW1/pL4Mut43 (pASK-IBA43+ with co taq L4 mut 43, Apr) SW177 E. coli DH5 SW1/pL4Mut43 (pASK-IBA43+ with co taq L4 mut 43, Apr) SW252 E. coli TG-1 SW1/pL4Mut44 (pASK-IBA43+ with co taq L4 mut 44, Apr) SW178 E. coli DH5 SW1/pL4Mut44 (pASK-IBA43+ with co taq L4 mut 44, Apr) SW253 E. coli TG-1 SW1/pL4Mut45 (pASK-IBA43+ with co taq L4 mut 45, Apr) SW179 E. coli DH5 SW1/pL4Mut45 (pASK-IBA43+ with co taq L4 mut 45, Apr) SW254 E. coli TG-1 SW1/pL4Mut46 (pASK-IBA43+ with co taq L4 mut 46, Apr) SW180 E. coli DH5 SW1/pL4Mut46 (pASK-IBA43+ with co taq L4 mut 46, Apr) SW255 E. coli TG-1 SW1/pL4Mut48 (pASK-IBA43+ with co taq L4 mut 48, Apr) SW181 E. coli DH5 SW1/pL4Mut47 (pASK-IBA43+ with co taq L4 mut 47, Apr) SW256 E. coli TG-1 SW1/pL4Mut50 (pASK-IBA43+ with co taq L4 mut 50, Apr) SW182 E. coli DH5 SW1/pL4Mut48 (pASK-IBA43+ with co taq L4 mut 48, Apr) SW257 E. coli TG-1 SW1/pL4Mut51 (pASK-IBA43+ with co taq L4 mut 51, Apr) SW183 E. coli DH5 SW1/pL4Mut49 (pASK-IBA43+ with co taq L4 mut 49, Apr) SW258 E. coli TG-1 SW1/pL4Mut52 (pASK-IBA43+ with co taq L4 mut 52, Apr) SW184 E. coli DH5 SW1/pL4Mut50 (pASK-IBA43+ with co taq L4 mut 50, Apr) SW259 E. coli TG-1 SW1/pL4Mut53 (pASK-IBA43+ with co taq L4 mut 53, Apr) SW185 E. coli DH5 SW1/pL4Mut51 (pASK-IBA43+ with co taq L4 mut 51, Apr) SW260 E. coli TG-1 SW1/pL4Mut54 (pASK-IBA43+ with co taq L4 mut 54, Apr) SW186 E. coli DH5 SW1/pL4Mut52 (pASK-IBA43+ with co taq L4 mut 52, Apr) SW261 E. coli TG-1 SW1/pL4Mut55 (pASK-IBA43+ with co taq L4 mut 55, Apr) SW187 E. coli DH5 SW1/pL4Mut53 (pASK-IBA43+ with co taq L4 mut 53, Apr) SW262 E. coli TG-1 SW1/pL4Mut56 (pASK-IBA43+ with co taq L4 mut 56, Apr) SW188 E. coli DH5 SW1/pL4Mut54 (pASK-IBA43+ with co taq L4 mut 54, Apr) SW263 E. coli TG-1 SW1/pL4Mut57 (pASK-IBA43+ with co taq L4 mut 57, Apr) SW189 E. coli DH5 SW1/pL4Mut55 (pASK-IBA43+ with co taq L4 mut 55, Apr) SW264 E. coli TG-1 SW1/pL4Mut58 (pASK-IBA43+ with co taq L4 mut 58, Apr) SW190 E. coli DH5 SW1/pL4Mut56 (pASK-IBA43+ with co taq L4 mut 56, Apr) SW265 E. coli TG-1 SW1/pL4Mut59 (pASK-IBA43+ with co taq L4 mut 59, Apr) SW191 E. coli DH5 SW1/pL4Mut57 (pASK-IBA43+ with co taq L4 mut 57, Apr) SW266 E. coli TG-1 SW1/pL4Mut60 (pASK-IBA43+ with co taq L4 mut 60, Apr) SW192 E. coli DH5 SW1/pL4Mut58 (pASK-IBA43+ with co taq L4 mut 58, Apr) SW267 E. coli TG-1 SW1/pL4Mut61 (pASK-IBA43+ with co taq L4 mut 61, Apr) SW193 E. coli DH5 SW1/pL4Mut59 (pASK-IBA43+ with co taq L4 mut 59, Apr) SW268 E. coli TG-1 SW1/pL4Mut62 (pASK-IBA43+ with co taq L4 mut 62, Apr) SW194 E. coli DH5 SW1/pL4Mut60 (pASK-IBA43+ with co taq L4 mut 60, Apr) SW269 E. coli TG-1 SW1/pL4Mut63 (pASK-IBA43+ with co taq L4 mut 63, Apr) SW195 E. coli DH5 SW1/pL4Mut61 (pASK-IBA43+ with co taq L4 mut 61, Apr) SW270 E. coli TG-1 SW1/pL4Mut64 (pASK-IBA43+ with co taq L4 mut 64, Apr) SW196 E. coli DH5 SW1/pL4Mut62 (pASK-IBA43+ with co taq L4 mut 62, Apr) SW271 E. coli TG-1 SW1/pL4Mut65 (pASK-IBA43+ with co taq L4 mut 65, Apr) SW197 E. coli DH5 SW1/pL4Mut63 (pASK-IBA43+ with co taq L4 mut 63, Apr) SW272 E. coli TG-1 SW1/pL4Mut66 (pASK-IBA43+ with co taq L4 mut 66, Apr) SW198 E. coli DH5 SW1/pL4Mut64 (pASK-IBA43+ with co taq L4 mut 64, Apr) SW273 E. coli TG-1 SW1/pL4Mut67 (pASK-IBA43+ with co taq L4 mut 67, Apr) SW199 E. coli DH5 SW1/pL4Mut65 (pASK-IBA43+ with co taq L4 mut 65, Apr) SW274 E. coli TG-1 SW1/pL4Mut68 (pASK-IBA43+ with co taq L4 mut 68, Apr) SW200 E. coli DH5 SW1/pL4Mut66 (pASK-IBA43+ with co taq L4 mut 66, Apr) SW275 E. coli TG-1 SW1/pL4Mut69 (pASK-IBA43+ with co taq L4 mut 69, Apr) SW201 E. coli DH5 SW1/pL4Mut67 (pASK-IBA43+ with co taq L4 mut 67, Apr) SW276 E. coli TG-1 SW1/pL4Mut70 (pASK-IBA43+ with co taq L4 mut 70, Apr) SW202 E. coli DH5 SW1/pL4Mut68 (pASK-IBA43+ with co taq L4 mut 68, Apr) SW277 E. coli TG-1 SW1/pL4Mut71 (pASK-IBA43+ with co taq L4 mut 71, Apr) SW203 E. coli DH5 SW1/pL4Mut69 (pASK-IBA43+ with co taq L4 mut 69, Apr) SW278 E. coli TG-1 SW1/pL4Mut72 (pASK-IBA43+ with co taq L4 mut 72, Apr) SW202 E. coli DH5 SW1/pL4Mut70 (pASK-IBA43+ with co taq L4 mut 70, Apr) SW279 E. coli TG-1 SW1/pL4Mut73 (pASK-IBA43+ with co taq L4 mut 73, Apr) SW203 E. coli DH5 SW1/pL4Mut71 (pASK-IBA43+ with co taq L4 mut 71, Apr) SW280 E. coli TG-1 SW1/pL4Mut74 (pASK-IBA43+ with co taq L4 mut 74, Apr) SW204 E. coli DH5 SW1/pL4Mut72 (pASK-IBA43+ with co taq L4 mut 72, Apr) SW281 E. coli TG-1 SW1/pL4Mut75 (pASK-IBA43+ with co taq L4 mut 75, Apr) SW205 E. coli DH5 SW1/pL4Mut73 (pASK-IBA43+ with co taq L4 mut 73, Apr) SW282 E. coli TG-1 SW1/pL4Mut76 (pASK-IBA43+ with co taq L4 mut 76, Apr) SW206 E. coli DH5 SW1/pL4Mut74 (pASK-IBA43+ with co taq L4 mut 74, Apr) SW283 E. coli TG-1 SW1/pL4Mut77 (pASK-IBA43+ with co taq L4 mut 77, Apr) SW207 E. coli DH5 SW1/pL4Mut75 (pASK-IBA43+ with co taq L4 mut 75, Apr) SW284 E. coli TG-1 SW1/pL4Mut78 (pASK-IBA43+ with co taq L4 mut 78, Apr) SW208 E. coli DH5 SW1/pL4Mut76 (pASK-IBA43+ with co taq L4 mut 76, Apr) SW285 E. coli TG-1 SW1/pL4Mut79 (pASK-IBA43+ with co taq L4 mut 79, Apr) SW209 E. coli DH5 SW1/pL4Mut77 (pASK-IBA43+ with co taq L4 mut 77, Apr)

PAGE 121

121 Table 4-2. L4 Mutant Library. Plasmid Name Mutations Present in L4 Taq Library Plasmid Name Mutations Present in L4 Taq Library pL4Mut1 L4Q,G16S,R91H,E292G,D575N,S620P pL4Mut41 E794G,M804V pL4Mut2 V110A pL4Mut42 L530P,K539N,L654P pL4Mut3 G197C,F269S,K790R pL4Mut43 F44I,E167G pL4Mut4 L409P,V615I,K828R pL4Mut44 Y392F,N412D,N562D,E649G pL4Mut5 L27Q,L30Q,R263S,L273R,L409P pL4Mut45 P809T,E227K pL4Mut6 F89S,I160T,P261L, GAP pL4Mut46 GAP pL4Mut7 V38G,K222E,F255I,E407A,E691A pL4Mut47 NONE pL4Mut8 P552L,L765P pL4Mut48 GAP pL4Mut9 N482I pL4Mut49 NONE pL4Mut10 A83G,I135N,L285Q,Y336H, GAP pL4Mut50 K216I,A455D,V651E pL4Mut11 G393S,T444P,M670T,E710G pL4Mut51 L108P,G197S,L377P,R390C,T503A,GAP pL4Mut12 L789P pL4Mut52 K125M pL4Mut13 E6D,K351R,A797D,G821D pL4Mut53 E167G,S309P,T719A,L814Q pL4Mut14 L122Q,D341G,A411V pL4Mut54 W315C,T506P pL4Mut15 I596M,M643V pL4Mut55 GAP pL4Mut16 A115P,L458R,H558P,H617R,L654P,K801E pL4Mut56 Y158C,S309T,A404T,S540G,M758T pL4Mut17 Y113H,G276V,L409P pL4Mut57 V38A,K337E,V796D pL4Mut18 R693C,E731G,V812A pL4Mut58 D101G pL4Mut19 NONE pL4Mut59 H330P, GAP pL4Mut20 F44L,T183A,K194E,L291P pL4Mut60 R91H,F561S pL4Mut21 K337E pL4Mut61 D493G pL4Mut22 L362P,M441I,E517V pL4Mut62 S121T,E599V,Y808HpL4Mut23 G81D,K203E,V446A,D634G pL4Mut63 A80V,R220C,G367C,D378N,R389L,S574G,G752D,M776L pL4Mut24 NONE pL4Mut64 L4R,I150N,K203R,K337E,Q563Stop,P647L,A774V pL4Mut25 NONE pL4Mut65 W425R,E771G,V796D pL4Mut26 GAP pL4Mut66 L13P,A565V pL4Mut27 L279P,S287T pL4Mut67 L93P,E156G,A213T,P299S,N580S,E771G,V796D pL4Mut28 L12P,V133M,N217D,L266P,E300G,L546P,W703R,L825P pL4Mut68 E109K,G209C,W240R,L373Q pL4Mut29 A231V, GAP pL4Mut69 K46E,V118A,L218P,I529T,Q579H pL4Mut30 K203I,A268G,D544N,R633L,T753A,V763A pL4Mut70 G184C,L491P,R556H pL4Mut31 L285P, GAP pL4Mut71 S309Y,D369G,F479S,I581V,A605T pL4Mut32 R91L,E534G pL4Mut72 E420D,E678G,K828E pL4Mut33 L221P,E264V,K528R,GAP pL4Mut73 K216I,T503A,T511M,Q589R,K759E pL4Mut34 L777P,Stop830 pL4Mut74 V780A pL4Mut35 N624Y pL4Mut75 K759R pL4Mut36 K337E pL4Mut76 E109K,G209C,W240R,L373Q pL4Mut37 A126P, GAP pL4Mut77 F721L pL4Mut38 D185V,L491P,M643V pL4Mut78 Y42H,R220C,W425Stop,R590W pL4Mut39 Y75C,K216E,S377T,A565V,M758T,E770D pL4Mut79 Y169C,T247A,D248G,E638G pL4Mut40 GAP,GAP *All are derivatives of the cotaq gene, and all are inserted into the pASK-IBA43plus vector. NONE means no mutations were found relative to the coTaq sequence, GAP denotes the presence of a frameshift mutation within the protein; and Stop indicates the presence of a Stop codon in the sequence.

PAGE 122

122 Figure 4-2. Representative images of ethidium-b romide stained agarose gels resolving products arising from PCR amplification using standard dNTPs and three different polymerases. Cells expressing the in dicated polymerase provided both the polymerase and the template plasmid for the reaction. Polymerases were therefore forced to replicate their own encoding ge ne (2603 bp) using primers P-4 and P-5. Optimal temperatures for each polymerase were determined by identifying the FLP band having the highest density. A) The coTaq polymerase having an optimal temperature of 89.0 C. B) The polymerase expressed in SW17 cells having an optimal temperature of 89.0 C. C) The polymerase expressed in SW251 cells having an optimal temperature of 94.0 C. A) B) C)

PAGE 123

123 Table 4-3. Generation of full le ngth PCR products from dNTPs by individual polymerases from the rationally designed (RD) Librar y at the indicated temperatures. Optimal Temp 86.389.091.192.693.794.0SW4Codon-Optimized (co) wt Taq89.0268194229250662705135257010124747212364200 SW5S573E,Y668F,A740S86.3316793560300000 SW6Q486H,K537I,M670G-000000 SW7A605G,L613A,E739P86.3260207325100251400114000 SW8D575F,L606C,A740S86.3351275533451223215369315530130026912914907 SW9T511V,R584V,I611E-000000 SW10N480R,F595V,E742H86.3178875974183210372981071869605564354588 SW11E517I,V583K,A597S89.0675098995847868690798423724526781743 SW12D575F,V583K,M670A86.3332982705726676270172780330258 SW13E517I,D607W,D622S-000000 SW14A594C,F664Y,A774H86.3168797715292991178437873870516390284889 SW15F595W,L606P,D622S86.33107200000 SW16S573E,D575F,F595V89.087879997897443058511256200 SW17S510I,A605K,L606S89.0206480722506151780581144747626625939332 SW18S573E,D622L,E742H-000000 SW19N480R,T511V,Y542E-000000 SW20A594C,F664H,M670G-000000 SW21Q486H,D575T,N580S86.3251691423776422415212200199520986861867511 SW22S510I,A605E,E612I-000000 SW23A594C,E612I,M670A-000000 SW24S510I,Q579A,I611Q-000000 SW25A594T,L606C,R657D89.0234369424146862041814189763314898401095528 SW26T541A,A605G,L606S86.31957603184683710809389678200 SW27E517G,K537I,L613A86.3172882514688641031245801755537510465426 SW28K537I,Q579A,E742V-000000 SW29A597S,A740R,E742V266993926726522649972230814422939932126919 SW30N580Q,A605E,L613I89.01055603173173514224634610047284828402 SW31N580S,F595V,A605G86.315212291380438119473546163010528243184 SW32N580S,D622L,A774H-000000 SW33R533I,R584V,F664L-000000 SW34Q486H,E517G,A605K89.0258296326735152412059229583523197942054398 SW35S573H,F664Y,R743A-000000 SW36D575T,N580Q,R584V89.011630319172316710714857210195277632 SW37T541A,F664L,R743A-000000 SW38T511V,R533I,D622A-000000 SW39A597S,I611E,Y668F89.0415640955460000 SW40Y542E,F595W,L606C-000000 SW41L606S,R657D,E739R89.072735927889782565988238419618471721518298 SW42S573H,D575T,L613I-000000 SW43T541A,L606P,L613D-000000 SW44Y542E,V583K,A605E-000000 SW45E517I,F595W,A605E,I611E-000000 SW46T541A,D575F,L613A,D622A89.0514801861910000 SW47T511V,A594C,L606S,A740R89.0164626023878012248001216287118239571588298 SW48Q486H,R533I,L606C,L613A-000000 SW49Q486H,F595V,D622A,F664Y89.018892220666128711000 SW50E517I,S573H,A605G,E612I-000000 SW51Y542E,R584V,A605K,E612I-000000 SW52D575T,A605E,L606C,D622A86.37086000000 SW53A594T,L613A,F664Y,E742H86.321252962020864148773622525400 SW54D575F,N580Q,W601G,D622S86.319848531070720000 SW55K537I,L606P,A740S,E742H86.329671022375974566103000 SW56A597S,W601G,L606S,F664H86.36042231618500000 SW57S510I,E517G,D607W,I611E-000000 SW58S510I,V583K,R584V,L606P-000000 SW59N480R,R533I,A597S,M670G-000000 SW60E612I,D622L,F664L,E739P-000000 SW61I611Q,M670G,E739P,E742H-000000 SW62F595W,F664H,Y668F,E739P-000000 SW63A597S,A605G,D622A,F664L-000000 SW64L606P,I611E,E739R,R743A-000000 SW65D607W,I611Q,R657D,E742V-000000 SW66T541A,I611Q,L613I,D622L-000000 SW67K537I,S573H,N580S,D622S-000000 SW68N480R,S573E,D607W,A740R-000000 SW69D575T,L613D,E739R,A774H86.33541500000 SW70Q579A,R657D,F664Y,A740R-000000 SW71R533I,K537I,A605K,L613I-000000 SW72T511V,E517G,L606C,F664Y89607861899097748202686452668056716970 SW73D575T,F664H,E742V,R743A-000000 SW74A594C,I611E,F664L,A740S86.314672399162620000 SW75N580S,L613A,A740S,R743A-000000 SW76S510I,T511V,L613I,E739R8910118151256969984215820803736008655700 SW77V583K,E612I,L613D,Y668F-000000 SW78S573E,R584V,A594C,D622S-000000Cell Line Substitutions RawDensities(CNT/mm2) *The pink rows indicate the sequences of polymer ases that generated full-length PCR product at a temperature at 94.0 C and below. The green rows indicate th e sequences of polymerases that generated full-length PCR product at a temperature between 86.3 C and 93.7 C, but not at 94.0 C, suggesting thermal instability. The blue rows indicate the seque nces of polymerases that lack evidence of activity at any temperature.

PAGE 124

124 Table 4-4. Generation of full le ngth PCR products from dNTPs by individual polymerases from the randomly generated (L4) Librar y at the indicated temperatures. Optimal Temp 86.389.091.192.693.794.0SW489.0268194229250662705135257010124747212364200 SW212pL4Q,G16S,R91H,E292G,D575N,S620P-000000 SW213V110A94.000629162696297776068789298 SW214G197C,F269S,K790R-000000 SW215pL409P,V615I,K828R-000000 SW216L27Q,L30Q,R263S,L273R,pL409P-000000 SW217F89S,I160T,P261L, GAP-000000 SW218V38G,K222E,F255I,E407A,E691A-000000 SW219P552L,L765P86.37549500000 SW220N482I91.1294562308802352467341925313543303106 SW221A83G,I135N,L285Q,Y336H, GAP-000000 SW222G393S,T444P,M670T,E710G-000000 SW223L789P89.019670623508036241000 SW224E6D,K351R,A797D,G821D-000000 SW225L122Q,D341G,A411V94.0276216318539401690406474409339439202 SW226I596M,M643V89.0197651122668752145291202764219638161765708 SW227A115P,pL458R,H558P,H617R,L654P,K801E-000000 SW228Y113H,G276V,pL409P-000000 SW229R693C,E731G,V812A89.022959072574544192912428735100 SW230F44L,T183A,K194E,L291P91.10152790181860171210136258109580 SW231K337E89.015565028302562594346246507024534822217280 SW232L362P,M441I,E517V-000000 SW233G81D,K203E,V446A,D634G-000000 SW234GAP-000000 SW235L279P,S287T92.6191930220367302023186214182719915951881015 SW236L12P,V133M,N217D,L266P,E300G,L546P,W703R,L825P-000000 SW237A231V, GAP-000000 SW238K203I,A268G,D544N,R633L,T753A,V763A86.3125304455440000 SW239L285P, GAP-000000 SW240R91L,E534G91.158916522023622313418208963620264481996181 SW241L221P,E264V,K528R,GAP-000000 SW242L777P,Stop830,Stop83189.0751487110538483247650416118801497316 SW243N624Y89.0337177805380736375693938672573595106 SW244K337E89.030555324944042346465218061820739382098704 SW245A126P, GAP-000000 SW246D185V,pL491P,M643V-000000 SW247Y75C,K216E,S377T,A565V,M758T,E770D89.01586157228832819739941367562456778125527 SW248GAP,GAP-000000 SW249E794G,M804V86.3956992887580701751485211214733116760 SW250L530P,K539N,L654P86.3544325413000000 SW251F44I,E167G94.045033510722661113110117097211655281283853 SW252Y392F,N412D,N562D,E649G89.057813546638440068335711216585156511 SW253P809T,E227K89.08289921840442140841200550618722901813403 SW254GAP-000000 SW255GAP-000000 SW256K216I,A455D,V651E86.3385109294115251941130359478790 SW257L108P,G197S,L377P,R390C,T503A,GAP-000000 SW258K125M89.0495313796179723534627670527588704237 SW259E167G,S309P,T719A,L814Q-000000 SW260W315C,T506P86.32319736328423784000 SW261GAP-000000 SW262Y158C,S309T,A404T,S540G,M758T86.31952541187997717164861218814828628460152 SW263V38A,K337E,V796D86.312471459353354286515095800 SW264D101G89.0257503026354502621795256949025963922453437 SW265H330P, GAP-000000 SW266R91H,F561S86.3213187018214857736515127300 SW267D493G86.3286101728455382527108231985724127472320855 SW268S121T,E599V,Y808H86.31142750111471110423421019279945295915580 SW269A80V,R220C,G367C,D378N,R389L,S574G,G752D,M776L-000000 SW270pL4R,I150N,K203R,K337E,Q563Stop,P647L,A774V86.3410343261521904000 SW271W425R,E771G,V796D-000000 SW272L13P,A565V-000000 SW273L93P,E156G,A213T,P299S,N580S,E771G,V796D86.333845221210000 SW274E109K,G209C,W240R,L373Q91.16621178915921018264961926605490705172 SW275K46E,V118A,L218P,I529T,Q579H-000000 SW276G184C,pL491P,R556H-000000 SW277S309Y,D369G,F479S,I581V,A605T-000000 SW278E420D,E678G,K828E86.31392467122746198092254000612969440950 SW279K216I,T503A,T511M,Q589R,K759E-000000 SW280V780A89.0146360415253261383685130521012396261132704 SW281K759R89.0171381417365441434711149336013785101262243 SW282E109K,G209C,W240R,L373Q92.6163899229883278944333824180119193040 SW283F721L93.7104251415106171497583150548215585791448625 SW284Y42H,R220C,W425Stop,R590W-000000 SW285Y169C,T247A,D248G,E638G91.1552304643967674392593949395651340181Raw Densities(CNT/mm2) Cell Line Substitutions *The pink rows indicate the sequences of polymer ases that generated full-length PCR product at a temperature at 94.0 C and below. The green rows indicate th e sequences of polymerases that generated full-length PCR product at a temperature between 86.3 C and 93.7 C, but not at 94.0 C, suggesting thermal instability. The blue rows indicate the seque nces of polymerases that lack evidence of activity at any temperature.

PAGE 125

125 Figure 4-3. Number of active RD and L4 mutant s at various temperatur es. A) The number of polymerases from the RD Library (a total of 74) that show a FLP band after a PCR run at the indicated temperature. B) The number of polymerases from the L4 Library (a total of 74) th at show a FLP band after a PCR run at the indicated temperature. 33 30 24 21 18 17 0 5 10 15 20 25 30 35Number of Active RD Mutants 86.3 89.0 91.1 92.6 93.7 94.0 Optimal Temperatures (oC) 37 37 35 32 29 28 0 5 10 15 20 25 30 35 40Number of Active L4 Mutants 86.3 89.0 91.1 92.6 93.7 94.0 Optimal Temperatures (oC)A) B)

PAGE 126

126 Table 4-5. Incorporation of d UTP by RD Library at optimal temperatures. Optimal Temp (oC) All dNTPs 9 mM dT/ 1 mM d U 8 mM dT/ 2 mM d U 7 mM dT/ 3 mM d U 6 mM dT/ 4 mM d U 5 mM dT/ 5 mM d U 4 mM dT/ 6 mM d U 3 mM dT/ 7 mM d U 2 mM dT/ 8 mM d U 1 mM dT/ 9 mM d U All d UNTPsSW4Codon-Optimized (co) wt Taq86.32312987224579021186821882497198660714670549047812780053747400 SW4Codon-Optimized (co) wt Taq89.02897387275582726183082278423195625612299737194591657563293700 SW5S573E,Y668F,A740S86.347790255425013124617572513672025594245420000 SW7A605G,L613A,E739P86.31602269164626414012681060751775918415561175702567003200300 SW8D575F,L606C,A740S86.3165171811603441175086125610514901729358595952031570664928700 SW10N480R,F595V,E742H86.32447053188390901937412211411113316459245212238214720100 SW11E517I,V583K,A597S89.0152258190882289567580425210199947326453907501197772040200 SW12D575F,V583K,M670A86.3244500000000000 SW14A594C,F664Y,A774H86.320425191602378142509812586171222901625304221786591123336700 SW15F595W,L606P,D622S86.300000000000 SW16S573E,D575F,F595V89.04991013011112295931484614725355941 00000 SW17S510I,A605K,L606S89.02533093208144322813081758469134736945164719613145945000 SW21Q486H,D575T,N580S86.31962394162200216466171759955180613910223595461301619263478400 SW25A594T,L606C,R657D89.0844336562078403590220913125825545140000 SW26T541A,A605G,L606S86.32092635220753821145001849280165317911636365713551662054149100 SW27E517G,K537I,L613A86.310965915530515473874342275430882353269467349521000 SW29A597S,A740R,E742V86.327937532548317271680923546232534067197610513381444711157912200 SW30N580Q,A605E,L613I89.07227114541554309933462762199601368387143435934000 SW31N580S,F595V,A605G86.31423097148359513104191238333162965812267287135711933365845700 SW34Q486H,E517G,A605K89.02258399219216316951771831735173280411284605227431496573929500 SW36D575T,N580Q,R584V89.01553454950046602523935406929149147700000 SW39A597S,I611E,Y668F86.375644657530434816321083913937747287281600000 SW41L606S,R657D,E739R89.0423788423788423788423788423788423788423788423788000 SW46T541A,D575F,L613A,D622A89.0541353801726102432803675129931 00000 SW47T511V,A594C,L606S,A740R89.05434104473103963383829123215331396615505120913000 SW49Q486H,F595V,D622A,F664Y89.04535643213053092882059219984638405381840000 SW52D575T,A605E,L606C,D622A86.33382123322862953452557352052851153094462118348000 SW53A594T,L613A,F664Y,E742H86.3178242012894521101912571901372153113447368700000 SW54D575F,N580Q,W601G,D622S86.315830121815276194069816510941410232846848395770804723643900 SW55K537I,L606P,A740S,E742H86.317545981788506163284412788791102959424994166452452632165200 SW56A597S,W601G,L606S,F664H86.31519620 000000000 SW69D575T,L613D,E739R,A774H86.3373410000000000 SW72T511V,E517G,L606C,F664Y89.024472118650016105113593096028478522417712878000 SW74A594C,I611E,F664L,A740S86.3130697610750337387814738352737451047174454029520000 SW76S510I,T511V,L613I,E739R89.0437213262515229710991671689081019173743222388000Cell Line Substitutions Raw Densities (CNT/mm2) *The pink rows indicate the polymerases that showed activity with a temperature of 94.0 C and lower; these data can be compared to those in Table 3-4. The green rows indicate the polymerases that showed activity with a temperature between 86.3 C and 93.7 C; these had no activity at 94.0 C, suggesting thermal instability.

PAGE 127

127 Figure 4-4. Generation of fu ll length PCR product at 86.3 C using d UTP by the coTaq polymerase and the RD polymerase in the SW29 cell line. Concentrations of dNTPs/d UNTPs listed are the starting concentr ations (see Materials and Methods for listing of final concentrations). A) Incorporation of various dNTP/d UNTP ratios by coTaq polymerase. FLP is not generated beyond the ratio of 2 mM TTP/8 mM d UTP. B) Incorporation of various dNTP/d UNTP ratios by SW29 cells. FLP is not generated beyond th e ratio of 2 mM TTP/8 mM d UTP. C) A graphical comparison of the band densities in each of these gels. The red columns correlate to the bands in gel A; the blue columns represen t those in gel B. Densities can also be found in Table 4-5. A ) B ) C ) All dNTPs 9 mM/1 mM 8 mM/2 mM 7 mM/3 mM 6 mM/4 mM 5 mM/5 mM 4 mM/6 mM 3 mM/7 mM 2 mM/8 mM 1 mM/9 mM All d UNTPs SW4 SW29 0 500000 1000000 1500000 2000000 2500000 3000000Raw Density (CNT/mm2)Ratios of dT/d U

PAGE 128

128 Figure 4-5. Generation of fu ll length PCR product at 94.0 C and 86.3 C using d UTP by the RD polymerase in the SW8 cell line. Concentrations of dNTPs/d UNTPs listed are the starting concentrations (see Materials and Methods for listing of final concentrations). A) Inco rporation of various dNTP/d UNTP ratios by SW8 cells at 94.0 C. FLP is not generated beyo nd the ratio of 3 mM TTP/7 mM d UTP. B) Incorporation of various dNTP/d UNTP ratios by SW8 cells at their optimal temperature of 86.3 C. FLP is not genera ted beyond the ratio of 2 mM TTP/8 mM d UTP. C) A graphical comparison of the ba nd densities in each of these gels. The red columns correlate to the bands in gel A; the blue columns represent those in gel B. Densities can also be found in Table 3-4 and Table 4-3. All dNTPs 9 mM/1 mM 8 mM/2 mM 7 mM/3 mM 6 mM/4 mM 5 mM/5 mM 4 mM/6 mM 3 mM/7 mM 2 mM/8 mM 1 mM/9 mM All d UNTPs 94.0 86.3 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000Raw Density (CNT/mm2)Ratio of dT/d UoCA) B) C)

PAGE 129

129Table 4-6. Incorporation of d UTP and d TTP by coTaq Polymerase at various temperatures. 500 mM dNTPs 450 mM dT/ 50 mM d U 400 mM dT/ 100 mM d U 350 mM dT/ 150 mM d U 300 mM dT/ 200 mM d U 250 mM dT/ 250 mM d U 200 mM dT/ 300 mM d U 150 mM dT/ 350 mM d U 100 mM dT/ 400 mM d U 50 mM dT/ 450 mM d U 500 mM d UNTPs SW486.32312987224579021186821882497198660714670549047812780053747400 SW489.02897387275582726183082278423195625612299737194591657563293700 SW491.16377503872434719694924142498951243695559027124000 SW492.6455238323427258032222532153441623793700824186000 SW493.7643793350500304969131233172586709103604825309000 SW494.02244256200537119956491535822125537958963718875264360000 Cell Line Melting Temp (oC) 500 mM dNTPs 450 mM dT/ 50 mM d T 400 mM dT/ 100 mM d T 350 mM dT/ 150 mM d T 300 mM dT/ 200 mM d T 250 mM dT/ 250 mM d T 200 mM dT/ 300 mM d T 150 mM dT/ 350 mM d T 100 mM dT/ 400 mM d T 50 mM dT/ 450 mM d T 500 mM d TNTPs SW486.3131505617050271783891183784616645751361746114312811600867936121592480 SW489.01383077149902116827361506228171960816567901374745915308543815729100 SW491.1805345860275820663804330826318729993593906425430278006588990 SW492.6714674107472711322787828011109007713425545528463738200326378580 SW493.7549273525350490532450728486603409646290744211733121997361270 SW494.0364939431868517729363169446363332797302432239203112759347630 Raw Densities (CNT/mm2) Cell Line Melting Temp (oC)

PAGE 130

130 Figure 4-6. Generation of fu ll length PCR product at 86.3 C by coTaq polymerase using various TTP:d UTP and TTP:d TTP ratios. Concentrations of dNTPs/d UNTPs/d TNTPs listed are the starting con centrations (see Materials and Methods for listing of final concentrati ons). A) Incorporation of various dNTP/d UNTP ratios by coTaq polymerase at 86.3 C. FLP is not generated beyond the ratio of 2 mM TTP/8 mM d UTP. B) Incorporation of various dNTP/d TNTP ratios by coTaq polymerase at 86.3 C. FLP is not generated beyond the ratio of 1 mM TTP/9 mM d TTP. C) A graphical comparison of the band densities in each of these gels. The red columns correlate to the bands in gel A; the blue columns represent those in gel B. Densities can also be found in Table 4-6. All dNTPs 9 mM/1 mM 8 mM/2 mM 7 mM/3 mM 6 mM/4 mM 5 mM/5 mM 4 mM/6 mM 3 mM/7 mM 2 mM/8 mM 1 mM/9 mM All d UNTPs/d TNTPs d U d T 0 500000 1000000 1500000 2000000 2500000Raw Density (CNT/mm2)Ratio of dT to d U/d TA) C) B)

PAGE 131

131 Figure 4-7. Graphical comp arisons of the band densities listed in Table 4-6. All of the reactions were identical except for the temperature of the PCR and ratios of dT:d U or dT:d T. The red columns corre late to the densities of full-length PCR product bands present on gels containing the d U studies. The blue columns correlate to the densities of the full-length PCR product bands present on gels containing the d T studies. Data for the 86.3 C PCR study is shown in Figur e 4-6C. A) A graphical comparison of the band densities at 89.0 C. B) A graphical comparison of the band densities at 91.1 C. C) A graphical comparis on of the band densities at 92.6 C. D) A graphical comparison of the band densities at 93.7 C. E) A graphical comparison of the band densities at 94.0 C. All dNTPs 9 mM/1 mM 8 mM/2 mM 7 mM/3 mM 6 mM/4 mM 5 mM/5 mM 4 mM/6 mM 3 mM/7 mM 2 mM/8 mM 1 mM/9 mM All d UNTPs/d TNTPs d U d T 0 200000 400000 600000 800000 1000000 1200000Density (CNT/mm2)Ratio of T to d U or d T 92.6 oC All dNTPs 9 mM/1 mM 8 mM/2 mM 7 mM/3 mM 6 mM/4 mM 5 mM/5 mM 4 mM/6 mM 3 mM/7 mM 2 mM/8 mM 1 mM/9 mM All d UNTPs/d TNTPs d U d T 0 500000 1000000 1500000 2000000 2500000 3000000Raw Density (CNT/mm2)Ratio of dT to d U/d T 89.0 oC All dNTPs 9 mM/1 mM 8 mM/2 mM 7 mM/3 mM 6 mM/4 mM 5 mM/5 mM 4 mM/6 mM 3 mM/7 mM 2 mM/8 mM 1 mM/9 mM All d UNTPs/d TNTPs d U d T 0 100000 200000 300000 400000 500000 600000 700000 800000 900000Raw Density (CNT/mm2)Ratio of dT to d U/d T 91.1 oC All dNTPs 9 mM/1 mM 8 mM/2 mM 7 mM/3 mM 6 mM/4 mM 5 mM/5 mM 4 mM/6 mM 3 mM/7 mM 2 mM/8 mM 1 mM/9 mM All d UNTPs/d TNTPs d U d T 0 100000 200000 300000 400000 500000 600000 700000Density (CNT/mm2)Ratio of T to d U or d T 93.7 oC All dNTPs 9 mM/1 mM 8 mM/2 mM 7 mM/3 mM 6 mM/4 mM 5 mM/5 mM 4 mM/6 mM 3 mM/7 mM 2 mM/8 mM 1 mM/9 mM All d UNTPs/d TNTPs d U d T 0 500000 1000000 1500000 2000000 2500000Density (CNT/mm2)Ratio of T to d U or d T 94.0 oCA) B) C) D) E)

PAGE 132

132 CHAPTER 5 CONCLUSIONS The notion of creating an arti ficially expanded genetic in formation system (AEGIS) by adding extra letters to the DNA al phabet has sparked interest in determining what features of these non-standard nucleotides (NSBs) could be a hindrance for incorporation by polymerases. Studies have been performed on the incorporation of a variety of different NSBs, such as those lacking minor-groove electrons (Hendrickson et al., 2004), those with a C-glycosidic linkage (Lutz et al., 1999), and those that do not allo w for the creation of hydrogen bonds between the nucleobases (Delaney et al., 2003) However, prior to this study, only a limited amount of research has studied the incorporation of multiple sequential NSBs. This dissertation focused on the stability of the duplex DNA containing multiple C-glycosides, the ability of polymerases to incorporate multiple, sequential nucleotides contai ning a C-glycosidic linkage, and the directed evolution of polymerases to incorporate thes e nucleotides more efficiently and faithfully. DNA Helical Structure in the Presence of C-Glycosides Using circular dichroism (CD), previous st udies of the helical structure of duplex DNA have shown that poly(U) poly(A) helices favor the Ahelical conformation while poly(dT) poly(dA) helices display B-DNA struct ure (Ivanov et al., 1973, Saenger, 1984, Chandrasekaran and Radha, 1992). It was necessar y to determine if the presence of multiple Cglycosides in double-stranded DNA (dsDNA) would alter the conf ormation of the helix to a point where there is a phase transition from B-DNA to A-DNA, possibl y making it difficult for polymerases evolved to handle B-DNA helices unable to replic ate the DNA containing multiple C-glycosidic nucleotides. CD measurements were used to test dsDNA containing from one to twelve consecutive d U dA or dT dA base pairs. Results indicated that at 25 C, the addition of more U did not

PAGE 133

133 generate a trend in the CD spectra that might i ndicate a change from a Bhelical conformation to an A-helical conformation. Relativ ely little difference was observed between the CD spectra of duplexes containing increasing numbers of 2-deoxypseudouridine (d U) nucleotides and those containing dT. These data suggest that gro ss conformational change should not present a problem for a polymerase to incorporate and replicate DNA containing these C-glycosides. Polymerase Screen for the Incorporation of C-glycosides Previous studies on the incorporation of Cglycosides required only that polymerases incorporate up to three consecu tive 2-deoxypseudothymdidine (d T) residues into a growing DNA strand (Lutz et al., 1999). To create an artificially expanded alphabet that freely incorporates C-glycosides, in cluding the AEGIS alphabet that has three species with a Cglycosidic linkage (Fig. 1-4), polymerases would be required to incorporat e more than three of these NSBs consecutively, efficiently, and faithfully. In the first part of this study, primer-extensi on assays were used to screen a number of Family A and Family B polymerases for their ab ility to incorporate and extend beyond four of the two representative C-glyc osides, 2-deoxypseudouridine (d U) and 2deoxypseudothymidine (d T). Studies described here showed that although the Klenow (exo-), Bst Large Fragment, and Therminator polymerases performed exceptionally well in their ability to incorporate and fully extend beyond four consecutive d T and d U nucleotides. Klenow (exo-) and Bst are not thermostable, and thus canno t support PCR. Further, according to its manufacturer, Therminator is not recommende d for any applications except DNA sequencing and primer-extension reactions. This means that none of these three polymerases were likely candidates for future studies. Taq polymerase, however, which was also able to incorporate and

PAGE 134

134 extend beyond the four NSBs, albe it with less efficiency, is ab le to support high temperature PCR. Taq was therefore selected as a candidate for further study. It is also interesting to note that base d on full-length product (FLP) band densities, it appears that the incorporation of d TTP by polymerases was more efficient than the incorporation of d UTP. Taq Polymerase Primer-Extension Assays If Taq was used for the starting point to obtain polymerases that accep t C-glycosides, it must replicate its own encoding polymerase gene, forcing it to incorporate four consecutive d T or d U across from template dA, as this is th e longest run of consecutive dAs in the taq polymerase gene. Since we have already shown that Taq can incorporate and extend beyond four consecutive C-glycosides, as seen in Chap ter 2 of this dissertati on, we next needed to demonstrate its ability to incorporate and extend beyond up to twelve consecutive d T-dA or d U-dA base pairs. Results showed that the production of FLP was terminated if it requi red the incorporation of more than five consecutive C-glycosides by Taq polymerase. The FLP band densities from this data showed that the d TTP was incorporated more efficiently than the d UTP. If this polymerase is to be used as a potential candida te for synthetic biology containing C-glycosides, it must first be modified by dire cted evolution experiments to allow it to incorporate more consecutive C-glycosides. Growth and Purification of Taq Polymerase A tightly regulated plasmid containing an N-terminal hexahistidine tagged wt taq gene (His(6)wt Taq ) was constructed and transformed into the E. coli TG-1 expression strain (Skerra, 1994). Growth and expression conditions were th en optimized prior to using the cells in

PAGE 135

135 selection experiments. Previous studies showed that the expression of polymerases in vivo is toxic to the cells (Moreno et al., 2005, Andraos et al., 2004); th is was also observed in these studies. Once the most favorable set of expressi on conditions was ascertained (a 1 hr expression following a late log phase induction), the His(6)wt Taq polymerase was purified via nickel chromatography and its activity was tested. Almo st identical amounts of FLP were found to be generated in a PCR reaction when identical concentrations (ng/ L) of the purified His(6)wt Taq polymerase and Taq polymerase purchased from New England BioLabs were used; this signifies that the purified protein isolated was indeed an active polymerase. It was noted that a low level of His(6)wt Taq polymerase was being produced after only 1 hr of induction, most likely best explained by polymerase toxicity. To rectify this situation, the gene encoding His(6)wt Taq polymerase was optimized for codon-usage in E. coli (cotaq gene) by our collaborators, DNA 2.0 Inc (Gustafsson et al., 2004). The codon-optimization does not affect the toxicity of the protein, but it does allow the E. coli cells to produce a greater amount of protein in the same amount of time. Once optim ized, after one hour of induction, at least three times as much polymerase was produced, as evidenced by the density of the bands on a Coomassie blue stained SDS-PA GE (7.5%) gel (Fig. 3-6C). The cotaq gene was cloned into the tightly re gulated plasmid with a histidine tag, transformed into E. coli cells, and its growth and expression conditions were compared to those of the His(6)wt Taq polymerase. The data revealed that under identical e xpression conditions, more polymerase was produced by cells expressing the coTaq polymerase than those expressing the His(6)wt Taq polymerase. To maximize the formati on of product in the directed evolution reactions, the cells containing coTaq polymerase were used in all further experiments. This is the first example of a polymerase that has been optimized for the codon usage of the expression

PAGE 136

136 cell strain. The success in the overproduc tion of large quantities of active coTaq polymerase in E. coli relative to the overproduction of His(6)wt Taq in cells, could be useful for other applications such as structural studies and commercial production, whic h require large amounts of protein. Creation of coTaq Polymerase Mutant Libraries Literature presents many different theories re garding the best methods to create a library most useful for directed evolut ion experiments. Such a libra ry contains a large number of diverse, yet active clones (Hibbert and Da lby, 2005, Arnold and Georgiou, 2003b, Drummond et al., 2005, Park et al., 2005, Dalby, 2003, Parikh and Matsumura, 2005, Crameri et al., 1998, Crameri et al., 1996, Castle et al., 2004). For this dissertation, two of these methods were selected for comparative analysis. The first wa s a rationally designed (RD) library, generated by Dr. Eric Gaucher (FfAME), th rough the selection of specific replacement amino acids based on a combination of evolutionary analysis and prev ious functional studies. In addition, a random library (termed L4) was generated with mutations randomly spread across the whole polymerase sequence. Creation of the Rationally Design ed Mutagenic Library (RD Library) The reconstructing evolutionary adaptive paths (REAP) approach, was used to create the RD Library, allowing for modification at residues where Type II functional divergence occurred within a family of polymerases. In this approach, sites were identified that, in the historical evolution of the polymerase, had a split conser ved but different pattern of evolutionary variation, and had previously been suggested to lead to a change in the function or behavior of the polymerase. Using this technique in comb ination with sequences discussed in a recent review on the evolution of novel polymerase activities (Henry and Romesberg, 2005), a total of 57 amino acid changes at 35 sites in the Taq polymerase sequence were chosen. The 57

PAGE 137

137 replacement amino acid residues were selected fro m those found at those sites within the Family A viral polymerase sequences, as li terature has revealed that vira l polymerases are more able to incorporate NSBs than other po lymerases (Sismour et al., 2004, Leal et al., 2006, Horlacher et al., 1995). The FfAME collaborators at DNA 2.0 th en created and synthesized the RD library containing 74 differe nt mutant sequences; the 57 amino acid changes we dictated were used in various combinations to yield three or four amin o acid mutations per sequence. This approach to creating mutagenic libraries restri cts the diversity based on evolu tionary data, but in doing so, was predicted to create a larg e number of active clones. Creation of the Random Mut agenic Library (L4 Library) The L4 random mutagenic library was created from the cotaq gene using error-prone PCR with the mutagens MnCl2 and Taq polymerase and primers flanking either end of the gene (Arnold and Georgiou, 2003b). This allowed mutations to be locat ed anywhere in the sequence of the gene. Rather than ri sk losing large quantities of the mutagenic PCR product during digestions, ligations, and purifica tion, a variation of the megaprim er PCR protocol was used to create the full length plasmids with inserts (Miyazak i and Takenouchi, 2002). This procedure was found to be extremely useful when creating lib raries, as it generates crossover mutations and reversions, introducing more diversity. The 74 unique clones generate d by these techniques contained approximately three am ino acid changes per sequence resu lting from an average of 4.3 base mutations per gene. These combined pr ocedures are recommende d for creating future mutagenic libraries because of their simplicity, th eir cost, and the ease with which they can be modified to increase or decrease the number of mutations per gene. Preliminary Studies of the Incorporation of d UTP by the RD Library Initially, members of the RD library were indivi dually tested for their ability to incorporate increasing concentrations of d UTP into PCR products. It was discovered that only 18 of the 74

PAGE 138

138 mutants tested were able to form FLP, even in the presence of only standard dNTPs. At this point, the design of the RD library was questioned, and it was noticed that at least one of the sites we had mutated had been previ ously shown to be involved in the thermostability of the Taq polymerase (Ghadessy et al., 2001). In addition, we designed our 57 replacement am ino acid residues based on the sequences of Family A viral polymerases, wh ich are only thermostable up to 37 C. Therefore, it is very likely that the mutations introduced in this appro ach caused a decrease in the thermostability of the RD polymerase variants The next step was to test the ability of the polymerase variants to function at a variet y of temperatures. Incorporation of dNTPs by RD and L4 Libraries at Various Temperatures The mutants from each library were individually tested for their ability to form FLP at various temperatures in PCRs containing only standard dNTPs. We found that by lowering the temperature from 94.0 C to 86.3 C, the number of mutant generating PCR products increased. In the RD Library, the number of active mutant polymerases increased from 18 to 33; in the L4 Library, 39 mutants were active when the temp erature was lowered, compared with only 27 active at a temperature of 94.0 C. These results suggest that it more likely to generate active, thermostable mutants with a randomly mutated libra ry than with a rationally designed library. This supports the conclusions drawn by Arnold and colleagues (Drummond et al., 2005), who determined that libraries with mutations distribu ted throughout the en tirety of the gene are more likely to result in active and unique variants than if the mutations were limited to the active site. Based on the generalization that approximately one-third of all random amino acid changes will result in the inactivation of a protein (G uo et al., 2004), and the design we employed to create the RD Library, it was perhaps reasonable to expect that more active mutants would be present in the RD Library than the randomly created L4 Librar y, when the same number of

PAGE 139

139 clones were tested. Since this was not the case, the design of the RD Library must be examined. It was also reasonable to concl ude that by focusing mutations in and around the active site, the risk of knocking out activity was increased, even though the s ites chosen were known to be variable, and the residues chosen were known to function in the evolutionary history of the polymerase. This library, however, was designed w ith the incorporation of NSBs in mind, so the possibility was investigated that the RD variants will have an incr eased ability to incorporate the C-glycosides when compared to coTaq polymerase. Incorporation of d UTP by the RD Library at Optimal Temperatures After the identification of the optimal te mperature for each of the 33 active RD polymerases, we challenged the polymerases to incorporate increasing concentrations of d UTP in PCR reactions at their optimal temperature. It was discovered that one RD mutant polymerase (pSW27: A597S, A740R, E742V) was able to incorporate d UTP more efficiently than the coTaq polymerase. The A597S mutation has previously been shown to assist in the incorporation of rNTPs (Xia et al., 2002), while the E742 mutation contributed to the incorporation of various NSBs (Ghadessy et al., 2004). Since d UTP is closely related to rUTP and is an NSB, it is arguable that these changes contri buted greatly to the activity of this mutant. The remaining 32 polymerases tested were unable to incorporate the d UTP as well as coTaq polymerase; they were, however, able to generate more FLP at th eir optimal temperature than when they were tested previously at 94.0 C. It is possible, that the increased ability to incorporate C-glycosides at lower temperatures can be attributed to the fact that NSBs are so metimes incorporated more efficiently at lower temperatures (Rappaport, 2004, Horlacher et al., 1995) Alternatively, it was considered that the d UTP is epimerizing at the higher temperatures, thereby making it difficult for the polymerase

PAGE 140

140 to incorporate the base into a growing DNA strand (Wellington and Benner, 2006, Cohn, 1960, Chambers et al., 1963). Therefore, a test was de signed to establish which of these theories was actually occurring. Incorporation of d UTP and d TTP by coTaq Polymerase at Various Temperatures The presence of the methyl group on d TTP inhibits the epimerization of the C-glycoside (Wellington and Benner, 2006), so it was possible to perform a comp arative analysis between coTaq polymerases ability to cope with d UTP and d TTP in various concentrations and at different temperatures. The coTaq polymerase was found able to incorporate final concentrations of d TTP greater than those with d UTP at all temperatures tested. This leads to the conclusion that the epimerization of the nuc leotide is hindering th e incorporation of d UTP, consequently this should not be used as a model C-glycoside in future studies. Selection of Thermostable RD Mutants Using Water-In-Oil Emulsions Selections require that some members of the library perform differently than the original protein of interest (Arnold and Georgiou, 2003a, Lutz and Patrick, 2004). One of the goals of this research is to evolve polymerases to incorporate various C-glycoside triphosphates efficiently and faithfully, thus it makes sense to perform an initial selection to identify mutants able into incorporate C-glycosides After noting that the d UTP was most likely epimerizing under our reaction conditions, and consid ering our availabl e quantities of d TTP were limited, we decided to select for the eighteen mutant polymerases in the RD Library that exhibited activity with dNTPs at a temperature of 94.0 C from the pool of the 74 RD mutants. In doing so, we were able to demonstrate our laboratorys ability to perform in vitro selections. A variation of the compartmen talized self-replica tion (CSR) method was used to create water-in-oil emulsions containing all 74 mutants, as a way to li nk genotype to phenotype (Fig. 1-

PAGE 141

141 13) (Miller et al., 2006, Tawfik and Griffiths, 1998, Ghadessy et al., 2001, Ghadessy et al., 2004). Products from the selection were recl oned into the expre ssion vector using the megaprimer PCR method previously discussed (Miyazaki and Takenouchi, 2002). Unfortunately, this protocol cause d numerous crossovers, reversi ons, and additions, so we were not able to determine the true sequences of th e all polymerases we isolated using the CSR. However, the megaprimer PCR reveals itself as an effective method for library rediversification between rounds of selection. We were able to identify one mutant from th e 50 clones we sequenced that coded for one of the eighteen variants previously shown to have activity under these reaction conditions. This demonstrates an ability to perform successful in vitro selections in our laboratory. If this selection was to be repeated, and products were cloned using the standard digestion, ligation, and purification techniques, it would most likely yield some to all of the eighteen sequences of active polymerases. Future Experimentation The results presented in this dissertation open the door for many future experiments. Further structural studies of the DNA contai ning multiple sequential C-glycosides can expound upon the knowledge gathered here. Given that we now know 2-deoxyp seudouridine is not a good representative of a C-glycoside, we can create duplex DNA containing 2deoxypseudothymidine and perform similar circular dichroism studies to di stinguish what helical form the DNA assumes. In addition, thermal dupl ex denaturation studies can be performed with these duplex struct ures containing d T to ascertain the st ability of the duplex DNA formed when multiple, sequential C-glycosides are present (Geyer et al., 2003).

PAGE 142

142 Additional study of rationally de signed library creation is no w possible. Since we now know that the use of viral re sidues to replace those of Taq polymerase at some sites may cause a decrease in protein thermostability, in the de sign of future librari es, we can avoid making mutations at these sites, and possibly increase th e percentage of active mu tants in the library. Furthermore, in future libraries, we can take into consideration the mu tation of sites throughout the polymerase sequence that display Type II func tional divergence, not just those in and around the active site. A distinct decrease in the amount of full-length PCR product was observed when the PCR based assays were performed using increasing co ncentrations of the C-glycoside and decreasing concentrations of dT. A contro l reaction using only decreasing levels of dT and no thymidine analogue should be performed to determine how much of the full-length PCR product generated in future reactions actually cont ains the C-glycosides versus ho w much is produced only using dT. Another reason we may be seeing this de crease in FLP formation with increasing Cglycoside concentration could be due to the ethidium bromide dye being used to aid in the visualization of the DNA. It is, perhaps, plausi ble that the ethidium br omide cannot intercalate as efficiently when multiple C-glycosides are present. Therefore, a comparative analysis between the amounts of FLP formed when using ethidium bromide versus another fluorescent dye, such as the SYBR Safe DNA Gel Stain, could be performed. It would also be interesting to test the 74 L4 mutants for their ability to incorporate Cglycosides more efficiently than the coTaq polymerase. This information will also help determine if the mutation of residues not in the ac tive site is beneficial to the incorporation of NSBs. Moreover, if the 74 RD mutants were rete sted for their ability to incorporate increasing

PAGE 143

143 levels of d TTP as opposed to d UTP, we may find more than one polymerase that incorporates the NSBs more efficiently than coTaq Once an acceptable library is created, in vitro evolution experiments can be performed to identify polymerases able to incorporate high levels of d TTP, rather than testing each mutant individually. These selections will begin with a moderate ratio of d T to dT, and increase with each round of selection. Between rounds, we now know that our libraries can be rediversified using the megaprimer PCR protocol, thereby redu cing the risk of large quantities of product being lost to the purification steps required fo r traditional recloning steps. After demonstrating our ability to perform a selection for polymeras es that can incorporate C-glycosides, we can begin to apply these techniques to develop pol ymerases that can incorporate more NSBs. Directed evolution is already being used in industry to improving the quality of and developing new industrial enzymes and therapeu tic treatments (Chirumamilla et al., 2001, Douthwaite and Jermutus, 2006). If the conjunct ion of the rationally de signed library with the modified CSR technique proves to be successful in the isolation of large numbers of active clones, there could be a commerci al impact for this system. Ri ght now it takes three to four rounds of selection to isolate cl ones with a desired trait, and e ach round takes at least one week; with our system, it is feasible that only one to two rounds of selection would be needed, thereby cutting the time in half. In addition, the use of synthetic gene libraries reduces the amount of time spent creating libraries de novo With an improved ability to produce more clones with desired activity from smaller st arting libraries, imagine how many products could be quickly isolated using these techniques.

PAGE 144

144 APPENDIX A SYNTHESIS OF PSEUDOTHYMIDINE AND PSEUDOTHYMIDINE-CONTAINING OLIGONUCLEOTIDES Synthesis of the 2-deoxypseudothymidine (d T) precursor was performed by Dr. Shuichi Hoshika according to the procedures previously set forth, with some modifications (actual scheme shown in Figure A-1) (Bhattacharya et al., 1995, Lutz et al., 1999, Zhang and Daves, 1992). The synthesis of 2-deoxypseudothymidine-5-triphosphate (d TTP) was performed by Dr. Daniel Hutter according to the standard Ludwig-Eckstein procedure for triphosphate synthesis (Ludwig and Eckstein, 1989). It was purified by HPLC on a Waters Delta 600 with Waters 2487 Dual wavelength absorbance detector controlled by Waters Millennium software. Initial purification was on i on-exchange column [GE Health care HiPrep 16/10 DEAE FF column, eluent A = 10 mM NH4CO3, eluent B = 1 M NH4CO3, gradient from 0 to 80% B in 40 min, flow rate = 3 mL/min, Rt = 22 min] followed by reverse pha se HPLC [Waters NovaPak HR C18 column, 19x300 mm, eluent A = 25 mM triet hylammonium acetate (TEAA) pH 7, eluent B = 10% CH3CN in 25 mM TEAA pH 7, gradient from 0 to 80% B in 32 min, flow rate = 5 mL/min, Rt = 16 min]. After l yophilization, it was twi ce re-dissolved in water and lyophilized again to remove excess TEAA. Analytical HP LC was performed to verify the purification [Waters Alliance 2695 with Waters 2996 PDA de tector, controlled by Waters Millennium software; Dionex DNAPac PA-100 colum n, 4x250 mm, eluent A = 10 mM NH4CO3, eluent B = 500 mM NH4CO3, gradient from 0 to 40% B in 20 min, flow rate = 0.5 mL/min: Rt = 17 min]. NMR (Varian Mercury 300 MHz spectrometer): 1H-NMR (D2O, 300 MHz): (ppm, rel to HDO = 4.65) = 1.97 (ddd, J = 5.9, 9.9, 13.3 Hz, 1H); 2.13 (ddd, J = 2.6, 5.9, 13.3 Hz, 1H); 3.27 (s, 3H); 3.94-4.00 (m, 3H); 4.39-4.41 (m, 1H); 4.97 (dd, J = 5.9, 9.6 Hz, 1H); 7.65 (d, J = 0.8 Hz,

PAGE 145

145 1H). 31P-NMR (D2O, 121 MHz): (ppm, rel to external standard H3PO4 = 0) = -10.7 (d, J = 20 Hz, 1P); -11.2 (d, J = 20 Hz, 1P); -23.3 ( t J = 20 Hz, 1P).

PAGE 146

146 S yn t hesis o f pseudo t hymidineO HO TBDPSO 8NH HN O I 9 a O HO TBDPSO NH HN O O HO O NH HN O O HO HO NH HN O b c 101112 Scheme 2. a) Pd(OAc)2, Ph3As, Bu3N, MeCN b) TBAF, AcOH, THF c) NaBH(OAc)3, AcOH, MeCN Scheme 3. a) Ac2O, DMAP, DMF, 30C b) MeI, N O -bis(trimethylsilyl)acetamide, CH2Cl2 ,reflux or MeI, ( i -Pr)2NEt, DMF c) K2CO3, MeOH d) DMTrCl, pyridine e) ClPN( i -Pr)2OCH2CH2CN, ( i -Pr)2NEt, CH2Cl2 f) aq. AcOH, THFO O O O O HO HO NH HN O 12O a b c O AcO AcO NH HN O 13O O AcO AcO NH N O 14O O HO HO NH N O 15O O HO AcO NH N O 17O O DMTrO HO NH N O 16O e O DMTrO O NH N O 18O fP O N CN d 2 steps 65% 2 steps 52% 96% 58% 82% 91% 73% Figure A-1. Synthesis of pseudothymidine precurs or. This scheme was designed by Dr. Shuichi Hoshika following protocol set forth previ ously (Bhattacharya et al., 1995, Lutz et al., 1999, Zhang and Daves, 1992).

PAGE 147

147 APPENDIX B PHYLOGENETIC TREES OF FAMILY A POLYMERASES The following are insets of the phylogenetic tr ee seen in Figure 3-1 and a seed alignment of twelve of the 719 Family A pol ymerases identified in this tr ee. These trees were generated using Pfam (Bateman, 2006, Finn et al., 2006), an d analyzed for sites that underwent Type II functional divergence. In this approach, Dr. Eric Gaucher identified sites that had a split conserved but different pattern of historical evolutionary variation, and had been previously suggested to lead to a change in the function or behavior of the polymerase (Henry and Romesberg, 2005). Using Pfam, 57 amino acid cha nges across 35 sites were identified within the 719 members of Family A polymerases that we re available to us (B ateman, 2006, Finn et al., 2006). Figure B-1. A seed alignment of the Family A pol ymerases. This tree was generated using Pfam (Bateman, 2006, Finn et al., 2006), and displays twelve representatives of the major genera found in the 719 Family A polymerase sequences.

PAGE 148

148 Figure B-2. Inset of the phylogene tic tree of Family A polymerases (from Fig. 3-1) showing the location of Taq polymerase. This tree was ge nerated using Pfam (Bateman, 2006, Finn et al., 2006).

PAGE 149

149 Figure B-3. Inset of the phylogene tic tree of Family A polymerases (from Fig. 3-1) showing the location of some viral polymerases. This tree was generated us ing Pfam (Bateman, 2006, Finn et al., 2006).

PAGE 150

150 APPENDIX C GENETIC CODE AND AMINO ACID ABBREVIATIONS Table C-1. The Genetic Code. UUUPheUCUSerUAUTyrUGUCysU UUCPheUCCSerUACTyrUGCCysC UUALeuUCASerUAAStopUGAStopA UUGLeuUCGSerUAGStopUGGTrpG CUULeuCCUProCAUHisCGUArgU CUCLeuCCCProCACHisCGCArgC CUALeuCCAProCAAGlnCGAArgA CUGLeuCCGProCAGGlnCGGArgG AUUIleACUThrAAUAsnAGUSerU AUCIleACCThrAACAsnAGCSerC AUAIleACAThrAAALysAGAArgA AUGMetACGThrAAGLysAGGArgG GUUValGCUAlaGAUAspGGUGlyU GUCValGCCAlaGACAspGGCGlyC GUAValGCAAlaGAAGluGGAGlyA GUGValGCGAlaGAGGluGGGGlyG GSecond LetterUCAFirst LetterThird LetterU C A G Table C-2. Amino acid abbreviations. Name3-Letter Code1-Letter Code AlanineAlaA Ar g inineAr g R AsparagineAsnN Aspartic acidAspD CysteineCysC GlutamineGlnQ Glutamic acidGluE GlycineGlyG HistidineHisH IsoleucineIleI LeucineLeuL MethionineMetM PhenylalaninePheF ProlineProP SerineSerS ThreonineThrT TryptophanTrpW TyrosineTyrY ValineValV

PAGE 151

151 LIST OF REFERENCES Allemann, R. K., Presnell, S. R. and Benner, S. A. (1991) Protein Engineering, 4, 831-835. Andraos, N., Tabor, S. and Richardson, C. C. (2004) Journal of Biological Chemistry, 279, 50609-50618. Argoudelis, A. D. and Mizsak, S. A. (1976) Journal of Antibiotics, 29, 818-823. Arnez, J. G. and Steitz, T. A. (1994) Biochemistry, 33, 7560-7567. Arnold, F. H. and Georgiou, G. (Eds.) (2003a) Directed Enzyme Evolution: Screening and Selection Methods, Humana Press, Totowa, N.J. Arnold, F. H. and Georgiou, G. (Eds.) (2003b) Directed Evolution Libr ary Creation: Methods and Protocols, Humana Press, Totowa, N.J. Bain, J. D., Switzer, C., Chamberlin A. R. and Benner, S. A. (1992) Nature, 356, 537-539. Bateman, A. (2006), Vol. 2006, Pfam, Sanger Institute, http://www.sanger.ac.uk/Software/Pfam/. Beard, W. A., Shock, D. D., Vande Be rg, B. J. and Wilson, S. H. (2002) Journal of Biological Chemistry, 277, 47393-47398. Beese, L. S., Derbyshire, V. and Steitz, T. A. (1993a) Science, 260, 352-355. Beese, L. S., Friedman, J. M. and Steitz, T. A. (1993b) Biochemistry, 32, 14095-14101. Benner, S. A. (2004) Accounts of Chemical Research, 37, 784-797. Bhattacharya, B. K., Devivar, R. V. and Revankar, G. R. (1995) Nucleosides & Nucleotides, 14, 1269-1287. Brakmann, S. (2005) Cellular and Molecular Life Sciences, 62, 2634-2646. Brock, T. D. and Freeze, H. (1969) Journal of Bacteriology, 98, 289-297. Castle, L. A., Siehl, D. L., Gorton, R., Patten, P. A., Chen, Y. H., Bertain, S., Cho, H. J., Duck, N., Wong, J., Liu, D. L. a nd Lassner, M. W. (2004) Science, 304, 1151-1154. Chambers, R. W., Kurkov, V. and Shapiro, R. (1963) Biochemistry, 2, 1192-1203. Chandrasekaran, R. and Radha, A. (1992) Journal of Biomolecular Structure & Dynamics, 10, 153-168. Charette, M. and Gray, M. W. (2000) International Union of Biochemistry and Molecular Biology Life, 49, 341-351. Chien, A., Edgar, D. B. and Trela, J. M. (1976) Journal of Bacteriology, 127, 1550-1557.

PAGE 152

152 Chirumamilla, R. R., Muralidhar, R., Marchant, R. and Nigam, P. (2001) Molecular and Cellular Biochemistry, 224, 159-168. Cline, J., Braman, J. C. and Hogrefe, H. H. (1996) Nucleic Acids Research, 24, 3546-3551. Cohn, W. E. (1960) Journal of Biological Chemistry, 235, 1488-1498. Collins, M. L., Irvine, B., Tyner, D., Fine, E ., Zayati, C., Chang, C. A., Horn, T., Ahle, D., Detmer, J., Shen, L. P., Kolberg, J., Bushnell, S., Urdea, M. S. and Ho, D. D. (1997) Nucleic Acids Research, 25, 2979-2984. Crameri, A., Raillard, S. A., Bermudez, E. and Stemmer, W. P. C. (1998) Nature, 391, 288-291. Crameri, A., Whitehorn, E. A., Tate, E. and Stemmer, W. P. C. (1996) Nature Biotechnology, 14, 315-319. Crick, F. (1970) Nature, 227, 561-563. Dalby, P. A. (2003) Current Opinion in Structural Biology, 13, 500-505. Davis, D. R. (1995) Nucleic Acids Research, 23, 5020-5026. Delaney, J. C., Henderson, P. T., He lquist, S. A., Morales, J. C., Essigmann, J. M. and Kool, E. T. (2003) Proceedings of the National Academy of Sciences of the United States of America, 100, 4469-4473. DeLano, W. L. (2002) PyMOL, DeLano Scientific, http://www.pymol.org. Derti, A. (2003), Vol. 2006, Reverse and/or complement DNA sequences, Harvard Medical School, http://arep.med.harvard.edu/labgc/a dnan/projects/Utilities/revcomp.html. Douthwaite, J. and Jermutus, L. (2006) Current Opinion in Drug Discovery & Development, 9, 269-275. Drummond, D. A., Iverson, B. L., Ge orgiou, G. and Arnold, F. H. (2005) Journal of Molecular Biology, 350, 806-816. Egli, M. (2004) Current Opinion in Chemical Biology, 8, 580-591. Emilsson, G. M. and Breaker, R. R. (2002) Cellular and Molecular Life Sciences, 59, 596-607. Eom, S. H., Wang, J. M. and Steitz, T. A. (1996) Nature, 382, 278-281. Fa, M., Radeghieri, A., Henry, A. A. and Romesberg, F. E. (2004) Journal of the American Chemical Society, 126, 1748-1754. Fairbanks, G., Steck, T. L. and Wallach, D. F. H. (1971) Biochemistry, 10, 2606-2617.

PAGE 153

153 Finn, R. D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S. R., Sonnhammer, E. L. L. and Bateman, A. (2006) Nucleic Acids Research, 34, D247-D251. Forterre, P. (2006) Virus Research, 117, 5-16. Garrett, R. H. and Grisham, C. M. (1999) Biochemistry, Harcourt Brace College Publishers, Fort Worth, TX. Gaucher, E. A. (2006) In National Institute of Health STTR Phase 1 Grant Number 1 R41 GM074433-01, Foundation for Applied Molecula r Evolution, Gainesville, FL. Geyer, C. R., Battersby, T. R. and Benner, S. A. (2003) Structure, 11, 1485-1498. Ghadessy, F. J., Ong, J. L. and Holliger, P. (2001) Proceedings of the National Academy of Sciences of the United States of America, 98, 4552-4557. Ghadessy, F. J., Ramsay, N., Boudsocq, F., Loakes, D., Brown, A., Iwai, S., Vaisman, A., Woodgate, R. and Holliger, P. (2004) Nature Biotechnology, 22, 755-759. Ghosh, A. and Bansal, M. (2003) Acta Crystallographica Secti on D-Biological Crystallography, 59, 620-626. Goldman, M. and Marcy, D. (2001) In HIV-1 Reverse Transcriptase Tutorial pp. 1-3. Griffiths, A. D. and Tawfik, D. S. (2006) Trends in Biotechnology, 24, 395-402. Grosjean, H., Constantinesco, F., Fo iret, D. and Benachenhou, N. (1995) Nucleic Acids Research, 23, 4312-4319. Gu, X. (1999) Molecular Biology and Evolution, 16, 1664-1674. Gu, X. (2002), Vol. 2006, DIVERGE 2.0, Iowa State University, http://xgu.zool.iastate.edu/software.html. Guo, H. H., Choe, J. and Loeb, L. A. (2004) Proceedings of the National Academy of Sciences of the United States of America, 101, 9205-9210. Gustafsson, C., Govindarajan, S. and Minshull, J. (2004) Trends in Biotechnology, 22, 346-353. Hendrickson, C. L., Devine, K. G. and Benner, S. A. (2004) Nucleic Acids Research, 32, 22412250. Henry, A. A., Olsen, A. G., Matsuda, S., Yu, C. Z., Geierstanger, B. H. and Romesberg, F. E. (2004) Journal of the Americ an Chemical Society, 126, 6923-6931. Henry, A. A. and Romesberg, F. E. (2005) Current Opinion in Biotechnology, 16, 370-377. Hibbert, E. G. and Dalby, P. A. (2005) Microbial Cell Factories, 4

PAGE 154

154 Hirao, I., Kimoto, M., Mitsui, T., Fujiwara, T., Kawai, R., Sato, A., Harada, Y. and Yokoyama, S. (2006) Nature Methods, 3, 729-735. Hirao, I., Ohtsuki, T., Fujiwara, T., Mitsui, T ., Yokogawa, T., Okuni, T., Nakayama, H., Takio, K., Yabuki, T., Kigawa, T., Kodama, K., Nishikawa, K. and Yokoyama, S. (2002) Nature Biotechnology, 20, 177-182. Hogrefe, H. H., Cline, J., Lovejoy, A. E. and Nielson, K. B. (2001) In Hyperthermophilic Enzymes, Pt C Vol. 334, pp. 91-116. Horlacher, J., Hottiger, M., Podust, V. N., Hubscher, U. and Benner, S. A. (1995) Proceedings of the National Academy of Sciences of the United States of America, 92, 6329-33. Huisse, F. (2004) Journal of Clinical Virology, 30, S26-S28. Ivanov, V. I., Minchenk, L. E., Schyolki, A. K. and Poletaye, A. I. (1973) Biopolymers, 12, 89110. Johnson, S. C., Marshall, D. J., Harms, G., Miller, C. M., Sherrill, C. B., Beaty, E. L., Lederer, S. A., Roesch, E. B., Madsen, G., Hoffman, G. L ., Laessig, R. H., Kopish, G. J., Baker, M. W., Benner, S. A., Farrell, P. M. and Prudent, J. R. (2004) Clinical Chemistry, 50, 20192027. Joyce, C. M. and Benkovic, S. J. (2004) Biochemistry, 43, 14317-24. Kelman, Z., Hurwitz, J. and O'Donnell, M. (1998) Structure, 6, 121-125. Kim, Y., Eom, S. H., Wang, J. M., Lee, D. S., Suh, S. W. and Steitz, T. A. (1995) Nature, 376, 612-616. Kong, H. M., Kucera, R. B. and Jack, W. E. (1993) Journal of Biological Chemistry, 268, 19651975. Kornberg, A., Lehman, I. R., Bessma n, M. J. and Simms, E. S. (1956) Biochimica Et Biophysica Acta, 21, 197-198. Kunkel, T. A. and Bebenek, R. (2000) Annual Review of Biochemistry, 69, 497-529. Laemmli, U. K. (1970) Nature, 227, 680-685. Lane, B. G., Ofengand, J. and Gray, M. W. (1995) Biochimie, 77, 7-15. Leal, N. A., Sukeda, M. and Benner, S. A. (2006) Nucleic Acids Research, 34, 4702-4710. Lewin, B. (1997) Genes VI, Oxford University Press, New York. Li, J. S., Fan, Y. H., Zhang, Y., Marky, L. A. and Gold, B. (2003) Journal of the American Chemical Society, 125, 2084-2093.

PAGE 155

155 Li, J. S., Shikiya, R., Marky, L. A. and Gold, B. (2004) Biochemistry, 43, 1440-1448. Li, Y., Kong, Y., Korolev, S. and Waksman, G. (1998a) Protein Science, 7, 1116-1123. Li, Y., Korolev, S. and Waksman, G. (1998b) European Molecular Biology Organization Journal, 17, 7514-7525. Limbach, P. A., Crain, P. F. and McCloskey, J. A. (1994) Nucleic Acids Research, 22, 21832196. Lin-Goerke, J. L., Robbins, D. J. and Burczak, J. D. (1997) Biotechniques, 23, 409-12. Ludwig, J. and Eckstein, F. (1989) Journal of Organic Chemistry, 54, 631-635. Lutz, M. J., Horlacher, J. and Benner, S. A. (1998) Bioorganic & Medicinal Chemistry Letters, 8, 1149-1152. Lutz, S., Burgstaller, P. and Benner, S. A. (1999) Nucleic Acids Research, 27, 2792-8. Lutz, S. and Patrick, W. M. (2004) Current Opinion in Biotechnology, 15, 291-297. Michelet, W. and Genet, J. P. (2005) Current Organic Chemistry, 9, 405-418. Miller, J. H. (1972) Experiments in molecular genetics, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. Miller, O. J., Bernath, K., Agresti, J. J., Amitai, G., Kelly, B. T., Mastrobattista, E., Taly, V., Magdassi, S., Tawfik, D. S. and Griffiths, A. D. (2006) Nature Methods, 3, 561-570. Miyazaki, K. and Takenouchi, M. (2002) Biotechniques, 33, 1033-1038. Morales, J. C. and Kool, E. T. (2000) Journal of the Americ an Chemical Society, 122, 10011007. Moreno, R., Haro, A., Castellanos, A. and Berenguer, J. (2005) Applied and Environmental Microbiology, 71, 591-593. Muller, U. F. (2006) Cellular and Molecular Life Sciences, 63, 1278-1293. Najmudin, S., Cote, M. L., Sun, D. M., Yohannan, S ., Montano, S. P., Gu, J. and Georgiadis, M. M. (2000) Journal of Molecular Biology, 296, 613-632. Neumann, J. M., Bernassau, J. M., Gueron, M. and Trandinh, S. (1980) European Journal of Biochemistry, 108, 457-463. Ollis, D. L., Brick, P., Hamlin, R., X uong, N. G. and Steitz, T. A. (1985) Nature, 313, 762-766. Ong, J. L., Loakes, D., Jaroslawski, S., Too, K. and Holliger, P. (2006) Journal of Molecular Biology, 361, 537-550.

PAGE 156

156 Parikh, M. R. and Matsumura, I. (2005) Journal of Molecular Biology, 352, 621-628. Park, S., Morley, K. L., Horsman, G. P., Holmquis t, M., Hult, K. and Kazlauskas, R. J. (2005) Chemistry & Biology, 12, 45-54. Patel, P. H. and Loeb, L. A. (2001) Nature Structural Biology, 8, 656-659. Paul, N. and Joyce, G. F. (2004) Current Opinion in Chemical Biology, 8, 634-639. Pavlov, A. R., Pavlova, N. V., Kozyavki n, S. A. and Slesarev, A. I. (2004) Trends in Biotechnology, 22, 253-260. Perler, F. B., Kumar, S. and Kong, H. M. (1996) In Advances in Protein Chemistry Vol. 48, pp. 377-435. Piccirilli, J. A., Krauch, T., Morone y, S. E. and Benner, S. A. (1990) Nature, 343, 33-37. Piccirilli, J. A., Moroney, S. E. and Benner, S. A. (1991) Biochemistry, 30, 10350-6. Presnell, S. R. and Benner, S. A. (1988) Nucleic Acids Research, 16, 1693-702. Rappaport, H. P. (2004) Biochemical Journal, 381, 709-717. Raychaudhuri, S., Conrad, J., Hall, B. G. and Ofengand, J. (1998) RNA A Publication of the RNA Society, 4, 1407-1417. Rich, A. and Zhang, S. G. (2003) Nature Reviews Genetics, 4, 566-572. Rothwell, P. J. and Waksman, G. (2005) In Fibrous Proteins: Muscle and Molecular Motors Vol. 71, pp. 401-440. Roychowdhury, A., Illangkoon, H., Hendrickson C. L. and Benner, S. A. (2004) Organic Letters, 6, 489-492. Saenger, W. (1984) Principles of Nucleic Acid Structure, Springer-Verlag, New York. Saiki, R. K., Gelfand, D. H., Stoffel, S., Scharf, S. J., Higuchi, R., Horn, G. T., Mullis, K. B. and Erlich, H. A. (1988) Science, 239, 487-491. Sambrook, J., Fritsch, E. and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laborato ry, Cold Spring Harbor, NY. Sismour, A. M. and Benner, S. A. (2005) Nucleic Acids Research, 33, 5640-5646. Sismour, A. M., Lutz, S., Park, J. H., Lutz, M. J., Boyer, P. L., Hughes, S. H. and Benner, S. A. (2004) Nucleic Acids Research, 32, 728-735. Skerra, A. (1994) Gene, 151, 131-135.

PAGE 157

157 Steitz, T. A. (1999) Journal of Biological Chemistry, 274, 17395-17398. Suzuki, M., Baskin, D., Hood, L. and Loeb, L. A. (1996) Proceedings of the National Academy of Sciences of the Unit ed States of America, 93, 9670-9675. Swiss Institute of Bioinf ormatics. (1999) Vol. 2006, Translate, ExPASy, http://www.expasy.ch/tools/dna.html. Switzer, C., Moroney, S. E. and Benner, S. A. (1989) Journal of the American Chemical Society, 111, 8322-8323. Switzer, C. Y., Moroney, S. E. and Benner, S. A. (1993) Biochemistry, 32, 10489-96. Tatusova, T. A. and Madden, T. L. (1999) Fems Microbiology Letters, 177, 187-188. Tawfik, D. S. and Griffiths, A. D. (1998) Nature Biotechnology, 16, 652-656. Tindall, K. R. and Kunkel, T. A. (1988) Biochemistry, 27, 6008-6013. Vartanian, J. P., Henry, M. and WainHobson, S. (1996) Nucleic Acids Research, 24, 2627-2631. Watson, J. D. and Crick, F. H. C. (1953a) Nature, 171, 964-967. Watson, J. D. and Crick, F. H. C. (1953b) Nature, 171, 737-738. Wellington, K. W. and Benner, S. A. (2006) Nucleosides, Nucleotides, and Nucleic Acids, 25, 1309-1333. Williams, R., Peisajovich, S. G., Miller, O. J., Ma gdassi, S., Tawfik, D. S. and Griffiths, A. D. (2006) Nature Methods, 3, 545-550. Xia, G., Chen, L. J., Sera, T., Fa, M., Sc hultz, P. G. and Romesberg, F. E. (2002) Proceedings of the National Academy of Sciences of the United States of America, 99, 6597-6602. Yan, X. H. and Xu, Z. R. (2006) Drug Discovery Today, 11, 911-916. Zhang, H. C. and Daves, G. D. (1992) Journal of Organic Chemistry, 57, 4690-4696. Zhao, H. M., Giver, L., Shao, Z. X., Af fholter, J. A. and Arnold, F. H. (1998) Nature Biotechnology, 16, 258-261. Zhou, B. L., Pata, J. D. and Steitz, T. A. (2001) Molecular Cell, 8, 427-437. Zhou, J., Yang, M. M., Akdag, A. and Schneller, S. W. (2006) Tetrahedron, 62, 7009-7013.

PAGE 158

158 BIOGRAPHICAL SKETCH Stephanie Ann Havemann was born in Akron, Ohio and raised in Beaufort, South Carolina. She attended Beaufort Academy for primary school, where she began competing in cheerleading, softball, and golf. She particip ated in these sports throughout her high school career at Beaufort High School, where she graduated in the top 10 of her senior class. She also served on the Science Academic Challenge Team fo r 3 years, and led her team to one silver and two gold medals. She attended Mercer University in Macon, Ge orgia for her undergradu ate career, obtaining a Bachelor of Science in Biology and another Bachelor of Scien ce in Environmental Science in 2000. While there, she conducted a year of unde rgraduate research under Dr. Alan Smith characterizing the lipid transport proteins and pro-phenol oxidase of insects. Another semester of undergraduate research was performed, under the supervision of Dr. David Crowely, in attempts to identify an excision repair gene of the archaea, Haloferax volcanii that was homologous to that of the E. coli uvr A gene. She was also the fi rst non-engineering major at Mercer ever to participate in an engineering senior design project. Her three-person team designed and performed the initial construction of the Water Res ource Monitor for the City of Macon, allowing the city to monitor the depth, temperature, and pH of the Ocmulgee River. Her graduate career began in 2000, in the la boratory of Dr. Made line Rasche at the University of Floridas Department of Microbi ology & Cell Science. There, she devised and implemented an assay to detect the levels of methanopterin produced in various methanogenic and methylotrophic cells. She joined Dr. Steven Benners laboratories in 2002 in the University of Floridas Department of Chemistry where she studied the incorporation of non-standard bases into DNA. Her research focused on the directed evolution of polymerases to incorporate non-

PAGE 159

159 standard bases, exhibiting a C-glycosidic linkage, with efficiency and fidelity. She plans to continue her academic study as a post-doctoral research fellow.