<%BANNER%>

The Application of Semiempirical Methods in Drug Design

Permanent Link: http://ufdc.ufl.edu/UFE0021354/00001

Material Information

Title: The Application of Semiempirical Methods in Drug Design
Physical Description: 1 online resource (263 p.)
Language: english
Creator: Peters, Martin B
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: drug, quantum, semiempirical
Chemistry -- Dissertations, Academic -- UF
Genre: Chemistry thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The application of quantum mechanical methods in de novo drug design is currently quite limited in both scope and utility. This thesis outlines where these methods are placed in this process and where they can be improved on. Chapters one and two of this dissertation describe the drug development process and current methods used to calculate the free energy of receptor-ligand binding. Some of the computational tools used in drug design are discussed such as scoring functions, molecular mechanics, quantum mechanics, semiempirical pair-wise energy decomposition, comparative binding energy analysis, the SE-COMBINE approach and popular 3D-QSARs approaches. The remaining chapters of this work describes the development and application of a package of computational chemistry C++ libraries called the Modeling ToolKit++ (MTK++). This toolkit was used to develop a new technique to superimpose drug-like molecules onto one another using a quantum mechanical score function. Obtaining the correct alignment of two molecules to reproduce the pose within a protein active site is a challenging problem. This new method was validated on almost 90 protein-ligand complexes for which x-ray crystallographic data was available. MTK++ was also used to develop a generalized tetrahedral Zinc force field for metalloprotein molecular dynamics simulations. It is desirable to model metalloprotein systems using MM models because one can carry out simulations to address important structure/function and dynamics questions that are not currently attainable using QM and QM/MM based methods. Until now force fields for metalloproteins were built by hand through a convoluted process. The creation of a computer program to do this removes the human error factor. This program was used to build force fields for 10 Zinc tetrahedral active sites. This required the parameterization of bond and angle force constants and the calculation of partial charges. MTK++ was designed to automatically perceive metal centers and assign parameters necessary to carry out MM or MD calculations.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Martin B Peters.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Merz, Kenneth Malcolm.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021354:00001

Permanent Link: http://ufdc.ufl.edu/UFE0021354/00001

Material Information

Title: The Application of Semiempirical Methods in Drug Design
Physical Description: 1 online resource (263 p.)
Language: english
Creator: Peters, Martin B
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: drug, quantum, semiempirical
Chemistry -- Dissertations, Academic -- UF
Genre: Chemistry thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The application of quantum mechanical methods in de novo drug design is currently quite limited in both scope and utility. This thesis outlines where these methods are placed in this process and where they can be improved on. Chapters one and two of this dissertation describe the drug development process and current methods used to calculate the free energy of receptor-ligand binding. Some of the computational tools used in drug design are discussed such as scoring functions, molecular mechanics, quantum mechanics, semiempirical pair-wise energy decomposition, comparative binding energy analysis, the SE-COMBINE approach and popular 3D-QSARs approaches. The remaining chapters of this work describes the development and application of a package of computational chemistry C++ libraries called the Modeling ToolKit++ (MTK++). This toolkit was used to develop a new technique to superimpose drug-like molecules onto one another using a quantum mechanical score function. Obtaining the correct alignment of two molecules to reproduce the pose within a protein active site is a challenging problem. This new method was validated on almost 90 protein-ligand complexes for which x-ray crystallographic data was available. MTK++ was also used to develop a generalized tetrahedral Zinc force field for metalloprotein molecular dynamics simulations. It is desirable to model metalloprotein systems using MM models because one can carry out simulations to address important structure/function and dynamics questions that are not currently attainable using QM and QM/MM based methods. Until now force fields for metalloproteins were built by hand through a convoluted process. The creation of a computer program to do this removes the human error factor. This program was used to build force fields for 10 Zinc tetrahedral active sites. This required the parameterization of bond and angle force constants and the calculation of partial charges. MTK++ was designed to automatically perceive metal centers and assign parameters necessary to carry out MM or MD calculations.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Martin B Peters.
Thesis: Thesis (Ph.D.)--University of Florida, 2007.
Local: Adviser: Merz, Kenneth Malcolm.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021354:00001


This item has the following downloads:


Full Text




























) 2007 Martin B. Peters


































For Jane









ACKNOWLEDGMENTS

Words cannot describe my Jane. She is everything I can could ask for. She has stood

by me even when I left Ireland to pursue my dream of getting my PhD. Thank you honey

for your love, support and the sacrifices you have made for us. I thank my mother for

alv--i- giving me tremendous support and for her words of wisdom and encouragement. I

would also like to thank my two brothers, Patrick and Francis, and my two sisters, Marian

and Deirdre, for all their encouragement and support.

Kennie thank you for giving me the opportunity to work with you; I have truly

enjo,, l the experience. I would like to express my gratitude to all Merz group members

especially Kaushik, Andrew, Ken, Kevin, and Duane for their support and friendship.

Also I would like to acknowledge the effort of Mike Weaver who helped by editing this

dissertation.









TABLE OF CONTENTS
page

ACKNOW LEDGMENTS ................................. 4

LIST OF TABLES ....................... ............. 8

LIST OF FIGURES ....................... ........... 11

LIST OF ABBREVIATIONS ............................... 15

ABSTRACT . . . . . . . . .. . 19

CHAPTER

1 INTRODUCTION ...................... .......... 21

2 THEORY AND METHODS ................... ....... 25

2.1 Receptor-Ligand Binding Free Energy ......... ........ .... 28
2.2 Computational Drug Design ................... ..... 30
2.3 Molecular Mechanics ................... ....... 32
2.4 Quantum Mechanics .............................. 33
2.5 Ligand Based Drug Design ........................... 34
2.5.1 3D-QSAR with QM descriptors ............. ... .. .. 35
2.5.2 Field-based Methods .................. ....... .. 36
2.5.3 Spectroscopic 3-D QSAR .................. ... .. 37
2.5.4 Quantum QSAR and Molecular Quantum Similarity . ... 39
2.6 Receptor Based Drug Design .................. ..... .. 40
2.7 Semiempirical Divide-And-Conquer Approach . . ..... 42
2.8 Pairwise Energy Decomposition (PWD) .............. .. .. 44
2.9 Quantum Mechanical C('! ge Models . . ...... 46
2.10 Comparative Binding Energy Analysis (COMBINE) . . .... 47
2.11 SemiEmpirical Comparative Binding Energy Analysis (SE-COMBINE) 48
2.12 Graph Theory .................. ............... .. 49
2.13 Statistical Methods ............... ........... ..54
2.14 Metalloproteins ............... .............. ..59

3 MODELING TOOL KIT++ .................. ........ .. .. 67

3.1 Introduction ............... ................ ..67
3.2 Overview . ............... ............ .. 68
3.2.1 Development ............... ........... .. 68
3.2.2 Library Hierarchy ............... ......... ..69
3.2.3 Molecule Library ............... ......... ..70
3.2.4 Graph Library ............... ........... ..77
3.2.5 MM Library ............... ........... .. 78
3.2.6 GA Library ............... ........... .. 78













3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10


3.2.7 Statistics Library . ..............
3.2.8 Molecular Fragment Library . ........
3.2.9 Parsers Library. . . .
Hybridization, Bond Order and Formal C('!i ige Perception
Ring Perception . . . . . .
Addition of Hydrogen Atoms to Molecules . ....
Conformational Sampling . .............
Substructure Searching/ Functionalize . .....
Clique Detection/ Maximum Common Pharmacophore .
Superim position . . . . . .
C conclusions . . . . . . .


4 SEMIFLEXIBLE QUANTUM MECHANICAL ALIGNMENT
OF DRUG-LIKE MOLECULES .................


4.1 Introduction ...... ..........
4.2 Implementation ... . ......
4.2.1 Ligand Conformational Searching
4.2.2 Structural Alignment and Clique I
4.2.3 Semiempirical Similarity Score .
4.3 Results and Discussion . ....
4.3.1 Data Set . . . .
4.3.2 Carboxypeptidase A . ..
4.3.3 Glycogen Phosphorylase . .
4.3.4 Immunoglobin . .....
4.3.5 Streptavidin . .......
4.3.6 Dihydrofolate Reductase . .
4.3.7 Trypsin . . . .
4.3.8 Estrogen Receptor . . .
4.3.9 Peroxisome Proliferator-Activated
4.3.10 Human Carbonic Anhydrase II .
4.3.11 Thrombin . ........
4.3.12 Elastase . . . .
4.3.13 Thermolysin . ......
4.4 Conclusions . ...........


electionn












Receptor7
.


5 METAL CLUSTER MOLECULAR MECHANICS PARAMETERIZATION

5.1 Introduction ...................... . . .....
5.2 Implementation ....... . . ......
5.2.1 Equilibrium Bond Lengths and Angles . .
5.2.2 Force Constants....... . . .....
5.2.3 Point C('!i ges ........... . .....
5.3 Zinc AMBER Force Field . . . .
5.3.1 Protein Data Bank Survey of Zinc Containing Proteins . .
5.3.2 Tetrahedral Zn Environment Force Field Parameterization .


80
80
82
83
87
92
94
98
101
102
104


106
110
110
111
112
113
113
117
118
119
121
123
125
128
131
132
136
136
140
144

146

146
148
150
150
151
152
154
157









5.4 Conclusions .. ............

6 CONCLUSIONS . ..........


. . .. . 89


APPENDIX


A ALGORITHMS ................

A.1 Subgraph Isomorphism Algorithm .
A.2 Maximum Common Pharmacophore .

B AMBER GRADIENTS . ......

B.1 Vector Math and Derivatives . .
B.2 AMBER First Derivatives . ..
B.2.1 Bond . . . .
B.2.2 Angle . . . .
B.2.3 Dihedral . ........
B.2.4 Electrostatic . .....
B.2.5 van der Waals . .....

C FRAGMENT LIBRARY .. .........


C.1 Terminal Fragments . ..
C.2 Two Point Linker Fragments .
C.3 Three Point Linker Fragments .
C.4 Four Point Linker Fragments .
C.5 Five Point Linker Fragments .
C.6 Three Membered Ring Fragments
C.7 Four Membered Ring Fragments
C.8 Five Membered Ring Fragments
C.9 Six Membered Ring Fragments
C.10 Greater than Six Membered Ring
C.11 Fused Ring Fragments . .

REFERENCES ....... ........

BIOGRAPHICAL SKETCH .........


. . . . . 19 1


194

194
195
195
196
197
201
202


203
208
212
214
216
217
218
219
224
229
230

237


Fragments










LIST OF TABLES


Tabl

2-1

3-1

3-2

3-3

3-4

3-5

3-6

3-7

3-8

3-9

3-10

4-1

4-2

4-3

4-4

4-5

4-6

4-7

4-8

4-9

4-10

4-11

4-12

4-13

4-14


Correspondence between Graph Theory and C('I, iII 1

Disulfide Bond Prediction Parameters. . .....

Meng Atomic Covalent Radii . .........

Labute Algorithm Upper Bound Bond Conditions .

Labute Algorithm Atom Hybridization Assignment. .

Labute Algorithm Lower Bound Single Bond Lengths..

Labute Algorithm Bond Weights . .......

Hydrogen Bond Lengths . ............

Hydrogen Bond Angles . ............

Hydrogen Bond Dihedrals . ...........

Dihedral Angles Available based on Bond Type .

Compound Alignment Literature . .......

Protein-Ligand Data Set . ..........

Statistics of CuTieP Performance . .......

Carboxypeptidase A Ligand Alignments . ....

Glycogen Phosphorylase Ligand Alignments . .

Immunoglobin Ligand Alignments . ......

Streptavidin Ligand Alignments . .......

Dihydrofolate Reductase Ligand Alignments . .

Trypsin Ligand Alignments . ..........

Estrogen Receptor Ligand Alignments. . .....

PPAR7 Ligand Alignments . ..........

40 Human Carbonic Anhydrase II Inhibitors . .

Human Carbonic Anhydrase II Results . ....

Thrombin Ligand Alignments . .........


e


page

terminology . ... 53

.. 73

.. . 84

. 85

. 86

. . 86

.. . 87

.. . 94

.. . 94

.. . 95

... . 95

.. . 07

.. . 15

.. . 17

.. . 18

.. . 20

.. . 23

.. . 25

.. . 27

.. . 130

.. . 32

.. . 32

.. . 34

.. . 38

.. . 39










4-15 Elastase Ligand Alignments .................. . .

4-16 T!i. i,,i. .1,-ii Ligand Alignments ............. . . .

5-1 Metal Ions in the Protein Data Bank............. . . .

5-2 Published Metalloprotein Force Fields Using the Bonded Plus Electrostatics
M o d el . . . . . . . . . .

5-3 Metal-Donor Bond Target Lengths............... . . .

5-4 Ideal Angles Used to Calculate Root Mean Square Deviations for Tetrahedral,
Square Planar, Trigonal Bipyramidal, Square Pyramid and Octahedral


Geom etries . ...................

5-5 Tetrahedral Zinc Primary Ligating Residues . .

5-6 Zn-CCCC Cluster Bond Lengths and Force Constants.


Zn-CCCC

Zn-CCCH

Zn-CCCH

Zn-CCCH

Zn-CCHH

Zn-CCHH

Zn-CHHH

Zn-CHHH

Zn-HHHH


Cluster Angles and Force Constants .. ..

Cluster Bond Lengths and Force Constants.

Cluster Angles and Force Constants .. ..

Cluster Angles and Force Constants .. ..

Cluster Bond Lengths and Force Constants.

Cluster Angles and Force Constants .. ..

Cluster Bond Lengths and Force Constants.

Cluster Angles and Force Constants .. ..

Cluster Bond Lengths and Force Constants.


5-16 Zn-HHHH Cluster Angles and Force Constants .. ..

5-17 Cysteine C('!i ges using Ch('!\IdA for the Zn-CCCC, -C


. .. . 55

. .. . 57

. . . 159

. ... . 60

. . . 160

... . 161

... . 161

. . . 162

. ... . 163

. . . 163

. ... . 64

. . . 164

. ... . 65

CCH, -CCHH, and -CHHH
1.. . .. 67


5-18 Cysteine C!i irges using C('!i\~udB for the Zn-CCCC, -CCCH, -CCHH, and -CHHH
( -1 -.. ........ .... .. ......... .... 167

5-19 Histidine C!i irges using C('!I~,\ dA for the Zn-CCCC, -CCCH, -CCHH, and -CHHH
('C -1, i-. . . . . . . ........ .. 168

5-20 Histidine C!i ,rges using C('!i\IudB for the Zn-CCCC, -CCCH, -CCHH, and -CHHH
('C -1 i-. . . . ... . .. . . . . 170

5-21 Zn-HHHO Cluster Bond Lengths and Force Constants. . ..... 170


. 140

. 142

. 146


. 148

. 153


5-7

5-8

5-9

5-10

5-11

5-12

5-13

5-14

5-15










5-22 Zn-HHHO Cluster Angles and Force Constants. ................. 171

5-23 Zn-HHOO Cluster Bond Lengths and Force Constants. ............. .172

5-24 Zn-HHOO Cluster Angles and Force Constants. ................. 174

5-25 Zn-HOOO Cluster Bond Lengths and Force Constants. ............. .182

5-26 Zn-HOOO Cluster Angles and Force Constants. ................. 183

5-27 Histidine and Water's Partial C('!i ges using C('!~;\ 3 dB for the Zn-HHHO, -HHOO,
and -HOOO Clusters ................ .......... 184

5-28 Zn-HHHD and Zn-HHDD Cluster Bond Lengths and Force Constants. .... 185

5-29 Zn-HHHD Cluster Angles and Force Constants. ................. 185


5-30 Zn-HHDD Cluster Angles and Force

5-31 Histidine and Aspartate Residue C(!
and -HHDD Clusters . ...

C-l Terminal Fragments . ....

C-2 Two Point Linker Fragments ..

C-3 Three Point Linker Fragments. .

C-4 Four Point Linker Fragments ..

C-5 Five Point Linker Fragments ..

C-6 Three Membered Ring Fragments..

C-7 Four Membered Ring Fragments.

C-8 Five Membered Ring Fragments.

C-9 Six Membered Ring Fragments..

C-10 Greater than Six Membered Ring Fr

C-11 Fused Ring Fragments . ...


Constants. ...... .......... 187

irges using C('!i;,\ dB for the Zn-HHHD
. . . .. . 8 8

. . . .. . 203

. . ... . 208

. . ... . 2 12

. . ... . 2 14

. . ... . 2 16

. ... . . 2 17

. ... . 2 18

. ... . 2 19

. . ... . 224

agents. .................. 229

. . . .. . 230










LIST OF FIGURES


Figu

2-1

2-2

2-3

2-4

2-5

2-6

2-7

2-8

2-9


2-10

2-11

2-12

2-13


2-14 Graph Theory II . .....................

2-15 Principal Component Analysis (PCA) Schematic Diagram of the
Vectors Involved . ....................

2-16 Partial Least Squares (PLS) Schematic Diagram of the Matrices
Involved . . . . . . . .

2-17 Most Common Amino Acid Residues which Bond to Metal Ions.

2-18 Zinc Metalloproteins . ...................

2-19 Copper Metalloproteins . .................

2-20 Homo-Nuclear Metalloproteins .. ..............

2-21 Hetero-Nuclear Metalloproteins .. ..............

3-1 Computational Drug Design .. ................

3-2 Library Hierarchy as Implemented in MTK++ .. ........


Matrices and


and Vectors


re

Drug Development Process . ....................

The Iterative Drug Design Process . ................

Thermodynamic Cycle of Receptor-Ligand Binding . .......

Computational Component of Drug Design . ...........

Hierarchy of QM methods used in SBDD . ............

NMR QSAR . ............

The Classic "Pac-ni io, Representation of Receptor-Ligand Binding. .

PWD Density Matrix Representation... . .....

Schematic Diagram of the Human Carbonic Anhydrase II inhibitor
Fragm entation ....................... . .....

SE-COMBINE Descriptor Table .............. . .....

Schematic Diagram of a Trypsin Inhibitor Fragmentation. . ....

SE-COMBINE Intermolecular Interaction Map (IMM). . .....

Graph Theory I . .........................


page

25

26

29

31

35

38

41

41


46

49

50

51

52










3-3 Core Class hierarchy of the Molecule Library as implemented in MTK++.

3-4 Class Hierarchy of the Parameters Component of the Molecule Class as
Implemented in MTK++ . ........................

3-5 Class Hierarchy of the Standard Library Component of the Molecule Class as
Implemented in MTK++ . ........................

3-6 Disulfide Bond in Proteins . ........................

3-7 The Structural Types of the Histidine Residue .. ..............

3-8 Class Hierarchy of the Molecule Component of the Molecule Class as Implemented
in M TK++ ........ ..............................


Class Hierarchy of the Graph Library as Implemented in MTK++. ..

Class Hierarchy of the MM library as Implemented in MTK++ ......

Class Hierarchy of the GA Library as Implemented in MTK++ ......

Class Hierarchy of the Statistics Library as Implemented in MTK++..

Class Hierarchy of the Parsers Library as Implemented in MTK++. ..

Hybridization, Bond Order, and Formal C('! ige Perception Using the Lat


Algorithm.


Ring Perception . ..........

Ring Perception Contd . ......

Aromatic, Non-aromatic, and Anti-aromatic I


3-18 Hydrogen Bond.


Rotatable Bond Types . .......

Systematic Conformational Searching. .

Conformer Generation . .......

Ullman Subgraph Isomorphism Illustration.

Clique Detection Illustration . ....

Molecular Superposition . ......

Carboxypeptidase A Ligands . ....

1CBX Conformer Analysis. . .....

Carboxypeptidase A Alignment Results. .


)ute


3-9

3-10

3-11

3-12

3-13

3-14


3-15

3-16

3-17


. .. . 8 8

. .. . 9 0

. .. . 9 1

Rings. ..... . ... 93

. .. . 9 4

. .. . 9 6

. ... . 96

. .. . 9 7

.. . . 99

. . . . . 10 3

. . . . . 10 4

. . . . . 1 19

. . . . . 120

. . . . . 121


3-19

3-20

3-21

3-22

3-23

3-24

4-1

4-2

4-3










4-4

4-5

4-6

4-7

4-8

4-9

4-10

4-11

4-12

4-13

4-14

4-15

4-16

4-17

4-18

4-19

4-20


Glycogen Phosphorylase Ligands..

Glycogen Phosphorylase Alignment

Immunoglobin Ligands . .

Immunoglobin Alignment Results..

Streptavidin Ligands . ...

Streptavidin Alignment Results..

Dihydrofolatreductase Ligands. .

Trypsin Inhibitors. . .....

Trypsin Alignment Results .. ..

Estrogen Receptor Ligands .. ..

Peroxisome Proliferator-Activated F

HCA II Ligands . .....

Thrombin Inhibitors . ....

Elastase Ligands . .....

Elastase Alignment Results .. ..

T! i ii.. -ii Inhibitors . .

Ti,. iii,, ,- i, Alignment Results. ..


5-1 Approaches to Incorporate Metal Atoms into

5-2 MCPB Flow Diagram .. .........


t


Molecular


Mechanics Force Fields.


5-3 Metal Ligand Geometries Perceived Using Harding's Rules. . ..... 154

5-4 Zinc Coordination Geometry Distribution from the PDB. ........... ..156

5-5 The Most Common Tetrahedral Zinc Coordinating ligands Combination
Distribution .................. .................. .. 158

5-6 Zn-S Bond Length Distributions in CCCC, CCCH, CCHH, and CHHH Tetrahedral
Environments .................. ................ .. 172

5-7 Box Plots of Zn-S/N Bond Lengths in CCCC, CCCH, CCHH, CHHH, and HHHH
environm ents. .................. ... ............. 173

5-8 Tetrahedral Zn-O(Asp/Glu) and Zn-N(His) Bond Length Distributions. .... 175


1I


. . ... . 2 2

Results. .................. 123

. . . .. . 2 4

. . ... . 2 5

. . . .. . 2 6

. . ... . 2 7

. . ... . 2 8

. . . .. . 2 9

. . . ... . 3 0

. . . ... . 3 1

eceptor 7 Agonists. .. . . 133

. . . .. . 3 7

. . . .. . 3 9

. . . .. . 14 1

. . . ... . 4 2

. . . .. . 14 3










5-9

5-10

5-11

5-12

5-13

5-14

5-15


5-16

5-17


ZAFF Flow Diagram. .................... ......

Zn-CCCC Cluster Models (PDB ID: 1A5T). ...........

Zn-CCCH Cluster Models (PDB ID: 1A73 and 2GIV). . .

Zn-CCHH Cluster Models (PDB ID: 1A1F). ...........

Zn-CHHH Cluster Models (PDB ID: 1CK7). ...........

Zn-HHHH Cluster Models (PDB ID: 1PBO). ...........

Correlation between Zn-S and Zn-N Bond Lengths and Calculated
through the Series CCCC, CCCH, CCHH, CHHH, and HHHH. .

Zn-HHHO Cluster Models (PDB ID: 1CA2). ...........

Zn-HHOO Cluster Models (PDB ID: 1VLI). ...........


5-18 Zn-HOOO Cluster Models (PDB ID: 1L3F).


5-19 Zn-HHHD and Zn-HHDD Cluster Models (PDB ID: 2USN and 1UOA). .... 186


. . 176

. . 176

. . 177

. . 178

. . 178

. . 179

Force Constants
. . . 180

. . 181

. . 181









LIST OF ABBREVIATIONS


Abbreviation

PDB

DD

NDA

IND

FDA

ADME

SBDD

LBDD

MM

QM

HF

DFT

SE

MNDO

AMI

PM3

PDDG/PM3

SCC-DFTB

RBDD

QSAR

MLR

PCR

PLSR

CNNs

HOMO


page


Protein Data Bank ...... ...................................

D rug D esign ....... ........................................

New Drug Application ...... ................................

Investigational New Drug .... ..............................

Food and Drug Administration ..............................

Absorption, Distribution, Metabolism, and Excretion .............

Structure-Based Drug Design ..... ...........................

Ligand-Based Drug Design ..... .............................

M olecular M echanics ...... ..................................

Quantum M echanics ...... ..................................

H artree Fock ..... ........................................

Density Functional Theory ..... .............................

Sem iEm pirical ..... .......................................

Modified Neglect of Differential Overlap ......................

Austin M odel 1 .... .......................................

Param etric M odel 3 ..... ..................................

Pairwise Distance Directed Gaussian modification of PM3 .........

Self-Consistent-C'!i I'ge Density-Functional Tight-Binding .........

Receptor-Based Drug Design ..... ............................

Quantitative Structure Activity Relationship ......................

M multiple Linear Regression .... .............................

Principal Component Regression .............................

Partial Least Squares Regression .............................

Computer Neural Networks ..... .............................

Highest Occupied Molecular Orbital .........................










LUMO

CODESSA

CoMFA

CoMSIA

PLS

PIE

QSM

CSI

QQSAR

QSSA

Q Ti A
BCPs

AIM

DnC

NDDO

SASA

PWD

CNDO

C'\ I

C'\i

RESP

MK

COMBINE

SE-COMBINE

IMM

LOO

PRESS


Lowest Unoccupied Molecular Orbital .. ......................

COmprehensive DEscriptors for Structural and Statistical Analysis

Comparative Molecular Field Analysis .. .....................

Comparative Molecular Similarity Indices Analysis ................

Partial Least Squares ...... .................................

Probe Interaction Energy .... ..............................

Quantum Similarity Measure ..... ...........................

Carb6 Similarity Index .... ................................

Quantum QSAR .......................................

Quantum Similarity Superposition Algorithm .....................

Quantum Topological Molecular Similarity ....................

Bond Critical Points ...... ..................................

Atom s-In-M olecules ...... ...................................

Divide-and-Conquer ...... ..................................

Neglect of Differential Diatomic Overlap ......................

Solvent Accessible Surface Area ..... .........................

Pairwise Energy Decomposition ..... .........................

Complete Neglect of Differential Overlap ......................

C(' i irge M odel 1 .... ......................................

C(' i irge M odel 2 .... ......................................

Restrained ElectroStatic Potential ...........................

M erz-Singh-Kollm an ...... ..................................

Comparative Binding Energy Analysis ............................

SemiEmpirical-Comparative Binding Energy Analysis .............

InterMolecular interaction Map ..............................

Leave-O ne-O ut .... .......................................

predicted residual sum of squares .... ........................










SDEC Standard Deviation of Error of Calculations ..................... 55

SDEP Standard Deviation of Error Prediction ........................ 55

RMSD Root Mean Squared Deviation ................. ............. 56

PCA Principal Component Analysis ................................. 56

PC Principal Component ................................. ...... 56

CYS Cysteine ........................................... 59

MET Methionine ......................................... 59

ASP Aspartic Acid .......... .............................. 59

GLU Glutamic Acid .......... .............................. 59

HIS Histidine ........................................... 59

HCA II Human Carbonic Anhydrase II ................................ 60

MTK++ Modeling Tool Kit++ ................................ ....... 67

API Application Programming Interface ......................... 67

GA Genetic Algorithm ........................................ 68

BLAS Basic Linear Algebra Subprograms ......................... 68

LAPACK Linear Algebra PACKage ..................................... 68

GAFF Generalized AMBER Force Field .............. .............. 80

MEP Molecular Electrostatic Potential ............... .......... 106

vdW van der Waals .......... .............................. 106

GFs Gaussian Functions ....................................... 106

GA Genetic Algorithm .......... ............................. 106

RFO Rational Function Optimization ............... .............. 106

RIPS Random Incremental Pulse Search ................................ 106

BFGS Broyden-Fletcher-Goldfarb-Shanno ........... ............. 106

SD Steepest Descent .......... .............................. 106

NR Newton-Raphson .......... ............................. 106

MCP Maximum Common Pharmacophore .............................. 106










ASA Atomic Shell Approximation .................................... 106

SA Surface Area ......................................... 106

MO Molecular Orbital ............................................. 106

DHFR Dihydrofolate Reductase ...................................... 113

PPAR7 Peroxisome Proliferator-Activated Receptor 7 .................... 113

ER Estrogen Receptor ........................................ 113

ESP ElectroStatic Potential ................................ ....... 146

UFF Universal Force Field ........................................ 146

CCSD Crystallographic Structural Database ......................... 146

MCPB Metal Center Parameter Builder ............... ............. 148









Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

THE APPLICATION OF SEMIEMPIRICAL METHODS IN DRUG DESIGN

By

Martin B. Peters

August 2007

C'!C ,i: Kenneth M. Merz Jr.
Major: C'! i,1,1-1 ry

The application of quantum mechanical methods in de novo drug design is currently

quite limited in both scope and utility. This thesis outlines where these methods are

placed in this process and where they can be improved on.

C'! lpters one and two of this dissertation describe the drug development process

and current methods used to calculate the free energy of receptor-ligand binding. Some

of the computational tools used in drug design are discussed such as scoring functions,

molecular mechanics, quantum mechanics, semiempirical pair-wise energy decomposition,

comparative binding energy analysis, the SE-COMBINE approach and popular 3D-QSARs

approaches.

The remaining chapters of this work describes the development and application of

a package of computational chemistry C++ libraries called the Modeling ToolKit++

(\ !TK++). This toolkit was used to develop a new technique to superimpose drug-like

molecules onto one another using a quantum mechanical score function. Obtaining the

correct alignment of two molecules to reproduce the pose within a protein active site

is a challenging problem. This new method was validated on almost 90 protein-ligand

complexes for which x-ray (< i --I i11. .graphic data was available.

MTK++ was also used to develop a generalized tetrahedral Zinc force field for

metalloprotein molecular dynamics simulations. It is desirable to model metalloprotein

systems using MM models because one can carry out simulations to address important









structure/function and dynamics questions that are not currently attainable using QM

and QM/\l\l based methods. Until now force fields for metalloproteins were built by hand

through a convoluted process. The creation of a computer program to do this removes the

human error factor. This program was used to build force fields for 10 Zinc tetrahedral

active sites. This required the parameterization of bond and angle force constants and

the calculation of partial charges. MTK++ was designed to automatically perceive metal

centers and assign parameters necessary to carry out MM or MD calculations.









CHAPTER 1
INTRODUCTION

Drug discovery has evolved from being serendipitous to a rather rational process of

design. High-throughput screening, combinatorial chemistry, the human genome project,

and computational methods have been developed to this end. Nonetheless, the cost of

creating a drug has increased exponentially over the last 50 years [1] without the number

of new drugs getting to the market increasing accordingly. The most plausible reason is a

lack of fundamental understanding of molecular recognition, binding and ultimately drug

delivery processes.

Computational medicinal chemistry spans a broad spectrum of disciplines including

theoretical, computational, and structural chemistries. Theoretical chemistry involves the

development of new and improved theories whereas computational chemistry entails the

application of established theoretical tools to chemical problems. Structural chemistry

techniques such as X-ray i i --i 11.graphy and NMR spectroscopy have p1l i, d a significant

role in facilitating our understanding of molecular recognition and interaction. Although

computational medicinal chemistry cannot design new drugs on its own, it has been shown

that it can p1l -i a role in predicting binding free energies and geometries of receptor-ligand

complexes. Examples include the development of the HIV protease inhibitor, saquinavir,

by "in silico" design as a transition-state analogue [2] and the rational design of an

Angiotensin-Converting Enzyme (ACE) inhibitor called captopril [3].

The computational techniques emploi-, to aid the drug design process include virtual

screening, docking, and scoring with the results or "hits" utilized by medicinal chemists

[4]. Computational methods vary in cost; screening can be carried out on large databases

of compounds, while scoring and docking are generally carried out on a smaller number

of structures. Screening attempts to predict physicochemical properties of molecules

such as aqueous solubility and by doing so reduces the number of molecules with poor

drug-like properties being synthesized. Docking is a technique of placing a drug candidate









into the active site of a receptor. The docked pose of a ligand in the active site of a

receptor can be scored using knowledge-, empirical-, or physics-based methods with the

latter being more expensive [5]. Also computational methods lend themselves to virtual

combinatorial chemistry which can be used to optimize the complementarities between a

receptor and a ligand. Although this can also be done experimentally, the main reason of

the computational approach is the reduction of cost and time.

The computational prediction of binding free energies is still not an exact science.

However, utilizing current computer hardware and theoretical technologies, problems that

were hitherto reputedly unfeasible are now tractable. There are two areas where increased

computational power can be used for the accurate prediction of binding free energies:

increased sampling of conformational space, and interaction energy calculations using

complete Hamiltonians [6]. The use of both will be investigated in this thesis.

This dissertation describes the application of quantum mechanical methods in

bio- and medicinal chemistry. The following chapters describe the development of

computational chemistry modeling software, the flexible alignment of drug-like molecules,

and the generation of a Zinc force field for metalloprotein simulations and drug design

applications.

In C'! lpter 2 the industrial drug design process is outlined as an overview of why

computational tools are used in drug design. The thermodynamic basis and current

methods used to calculate the free energy of receptor-ligand binding are described. The

equations of binding are derived in order to reflect the current understanding of binding,

both experimentally and computationally. Some of the computational tools used in

Structure Based Drug Design (SBDD) are discussed, including scoring functions [4],

Molecular Mechanics (\1\ 1), Quantum Mechanics (QM), SemiEmpirical (SE) Pair-Wise

energy Decomposition (PWD) [7], the Comparative Binding Energy Analysis (COMBINE)

[8], and the SE-COMBINE [9] approaches. Popular 3D-QSARs approaches are also

outlined including CoMFA (Comparative Molecular Field Analysis) [10] and CoMSIA









(Comparative Molecular Similarity Indices Analysis) [11, 12] and two multivariate

statistical tools; Principal Component Regression (PCR) [13] and Partial Least Squares

(PLS) [14].

C!i lpter 3 outlines the design and development of the Modeling ToolKit++

(\iTK++) package of C++ libraries for the use of QM methods in drug design. The

algorithms such as atomic hybridization and formal charge determination, bond order

and ring perception, substructure searching and clique detection are described in detail

with numerous illustrations. The impetus for this work was to create a computational

chemistry platform where QM methods could be conveniently incorporated in drug

discovery applications. This work was fundamental to this thesis and all modeling in later

chapters used this package.

The fourth C(i lpter describes a method to flexibly align drug-like molecules onto

one another using a semiempirical scoring function. The alignment of two bodies is a

mathematical problem; however, the challenge is to reproduce the pose seen in x-ray

( ir I ,11. .graphic studies. Traditionally, molecular superposition has been carried out using

empirical scoring functions due to their speed. The goal of this research was to investigate

the applicability of semiempirical methods in molecular alignment and its ability to do

so was validated against over 80 protein-ligand complexes from the Protein Data Bank

(PDB) [15].

The fifth C(! Ipter outlines the development of a molecular mechanics force field

(FF) for tetrahedral Zinc metalloproteins suitable for the AMBER suite of programs [16].

Several issues regarding the modeling of metalloproteins were addressed. The first goal

was to develop software to conveniently handle metalloprotein structures. The program

MCPB ('\. I I! Center Parameter Builder) was created to build and validate metalloprotein

FFs for use in molecular simulations to study structure, function, and dynamics. Secondly,

the automated perception of metal centers in proteins was undertaken and gave rise to

the program called pdbSearcher. This software was used to survey the PDB for all Zinc









containing metalloproteins. The most abundant primary shell combinations bound to Zn

atoms were extracted and the FFs generated with the resulting parameters analyzed in

detail.

Finally, ('! Ilpter 6 provides a brief summary of the work presented in this dissertation.

I hope this dissertation demonstrates the utility of current quantum mechanical

approaches in the areas of drug design and metalloprotein modeling. The use of quantum

mechanical methods in drug design can be viewed as the final frontier due to the fact

that these methods describe molecular interactions from first principles [6]. Nevertheless,

the use of quantum mechanics over classical approaches brings extra expense and so it is

necessary to show that these methods can be superior to simpler models.










CHAPTER 2
THEORY AND METHODS

Designing a drug (a molecule which affects biological processes without causing

injury) requires numerous steps from its inception to its introduction into the market.

This process takes approximately 10-15 years as shown in Fig. 2-1 and can cost in the

order of a half billion dollars. This is due to both the vastness of chemical space [17-21]

and the cost of research and testing [1].


Formulation Research
Process Development

Compound Safety IND Phases NDA FDA
Discovery Testing Preparation I,II,III Preparation Review

Pharmacokinetics
Toxicology

Basic Preclinical IND Clinical
NDA Submission
Research Development Submission Development


On g 3-4 1 6-8 2-3
Ongoing
years year Years years


Millions 1000 10
Compounds Compounds Compounds Compound




Figure 2-1. Drug Development Process. Adapted from http://www.netsci. org/. An
IND (Investigational New Drug) is prepared and submitted to the FDA (Food
and Drug Administration) at the end of the preclinical phase of drug
development. With good results from the clinical phase an NDA (New Drug
Application) is submitted to the FDA for approval to release the drug to the
general public.


The pre-clinical phase of Drug Design (DD) is carried out using an iterative process

(first three columns of Fig. 2-1). It starts with some knowledge of a target, i.e known

natural substrate or a i -I1 i/NMR structure of the target or receptor. The target is

chosen based on some known chemical feature of a biological disease. The design cycle

takes many steps such as computational design, ligand design, synthesis, biochemical









evaluation, and
2-2 [22]. During each cycle of this process different computational tools are used with

varying costs and accuracies which will be discussed in more detail later in this chapter.

At the end of the pre-clinical phase an IND (Investigational New Drug) is prepared to

allow a c~i-,-ll,, to test the drug in humans. After the IND is approved by the FDA

(Food and Drug Administration) the clinical stage (4th column of Fig. 2-1) begins.

Phase I of the clinical trials tests the toxicity, pharmacokinetics or ADME (Absorption,

Distribution, Metabolism, and Excretion) properties, and dosage on approximately 50

healthy volunteers. Phase II evaluates the drugs effectiveness and side effects on volunteer

patients (a 500) and it is at this stage where most adverse effect of the drugs use are

observed. The final phase, Phase III, of clinical trials determines the effects of long term

use on a large pool of volunteer patients. After phase III a company prepares an NDA

(New Drug Application) and submits it to the FDA for approval to release the drug to

the general public. The NDA contains results of all clinical studies and once approved by

the FDA the drug can be marketed. After release the company carries out post-marketing

surveillance of the drugs effectiveness in a so-called phase IV.


Target
Information
I Crystallographic
k Analysis D u
Drug
Lead
[ Computational
Biochemical
I I Testing
[ Ligand Design Sy


S2 T Synthesis


Figure 2-2. The Iterative Drug Design Process. Adapted from Babine and Bender [22].









Drug targets or receptors include enzymes, ion channels, nuclear hormone receptors,

and DNA, which interact with endogenous physiological substances such as hormones

and neurotransmitters. There are currently over 1200 drugs approved by the FDA for the

therapeutic use in the United States, 25'. of which target enzymes [23]. The majority

of enzyme-targeted drugs are enzyme-substrate based and most act via non-covalent

interactions. Drugs that mimic the effects of endogenous regulatory compounds are

called agonists, while compounds that do not have 10li' activity are termed partial

agonists [24]. Drugs that bind to receptors but have no activity and prevent endogenous

compounds from binding are termed antagonists or inhibitors. There are two main

types of enzymatic inhibition, reversible and irreversible. Reversible inhibition occurs

through competitive, noncompetitive and uncompetitive mechanisms. Diuretics used

to control blood pressure and many anti-depressive agents, for example antagonists

of dopamine receptors, are reversible competitive inhibitors. These drugs compete for

the same binding site as the natural substrate, but the enzyme cannot process the

inhibitor, thus preventing catalytic activity. Non-competitive or allosteric inhibitors bind

to different regions of the enzyme and do not compete for the binding site. However, the

process of binding the inhibitor can change the shape of the active site thus preventing

catalytic activity. Uncompetitive inhibition takes place when the inhibitor only binds

the enzyme-substrate complex, consequently preventing catalysis. Irreversible inhibition

occurs when the inhibitor covalently attaches to the enzyme active site such as inhibitors

of Carbonic Anhydrase [25]. Structural determination of receptors or complexes is often

carried out by x-ray < i -1 I,1. ._-raphy. It should be noted that the atomic positions from

Si --I 11. g.raphy have an associated error and generally can be in the order of 1/6 of the

resolution (A 0.4A uncertainty from a 2.4A resolution structure) [22].

A fundamental understanding of the interactions between receptors and ligands is

necessary to the design of new drugs. These forces include ionic or electrostatic effects,

ion-dipole and dipole-dipole interactions, charge transfer, van der Waals, and hydrophobic









interactions. Molecules with high biological activity usually possess a shape that is

complementary (hydrophobic, electrostatic, and polar contacts are paired upon binding) to

that of the receptors active site as first proposed by Fischer ("lock-and-key" hypothesis).

2.1 Receptor-Ligand Binding Free Energy

In the simplest case, receptor-ligand binding corresponds to a single ligand molecule

forming a 1 : 1 complex with a receptor that contains only a single binding site as shown

in Eq. 2-1. R represents the receptor, L the ligand and R L is the complex, where kl and

k_l are the association and dissociation rate constants, respectively.


R+L R L (21)
k1

At equilibrium, association of a receptor and ligand occurs at the same rate as dissociation

and the equilibrium constants, K, and Kd, can be defined as:

[RL] 1
Ka -[ (2-2)
[R] [L] [Kd]

It is a common practice to use Kd for practical reasons as it has units of concentration. Kd

is the concentration of free ligand at which half of the receptor binding sites at equilibrium

are occupied. Small values of Kd correspond to a high affinity between the receptor and

ligand.

To gain a fundamental understanding of receptor-ligand binding one must begin with

a thermodynamic description. The Gibbs free energy is most often used in biochemistry as

binding experiments are carried out under conditions of constant temperature, pressure,

and number of particles. AG, Eq. 2-3 is the free energy change for the reaction, AH and

AS are the enthalpy and entropy changes respectively, and T is the temperature.


AG = AH TAS (2-3)


The change in free energy can be expressed in terms of the equilibrium Kd as follows:


AG = AG RTlnKd (2-4)










where AG is the standard state (1 M, 1 bar) free energy change and R is the gas

constant. When complex association and dissociation reach equilibrium, AG = 0, the

expression takes the form:

AG = RTlnKd (2-5)

Since free energy is a state function it can be calculated and compared with experimental

values. The free energy of binding, AGbind, is calculated by determining the free energy of

reactants, (AGR + AGL), and products, AGRL, separately. The superscript "" is dropped

from the remaining equations for simplicity; however, it is implied.


AGbid AGL (AG + AGL) (2-6)

zGg-
R + L RL

solve solve solve
R + L "v RL

Figure 2-3. Thermodynamic Cycle of Receptor-Ligand Binding


Using the thermodynamic cycle in Fig. 2-3 and Eq. 2-6, the free energy of binding in

solution, AG'd, can be fully decomposed in Eq. 2-7 [26]. AGUi is the free energy of

complexation in the gas phase. This term is dominated by the enthalpic contributions

from steric and electrostatic interactions. AAGso0,, is the solvation free energy of

complexation, which incorporates the desolvation of the ligand, AG1AL, receptor, AGoV,
RL
and complex, AG.Vl

AGso' AGia + AAGSo0v (2-7)


where AG~d and AAGsolG are defined by equations 2-8 and 2-9.
bind

AG gs d AH d TASbqd (2-8)

AAGolv AGft AG R AG L (2-9)
801AoVv 80Jo A~solv









For tight binding ligands the interactions in the complex are significantly stronger

than those of the receptor and ligand alone in solution. Also the favorable enthalpic

interactions must compensate the entropic loss of conformational degrees of freedom for

both the receptor and ligand plus the three rotational and three translational degrees

of freedom. It should be noted that small variations in a complex's stability (AG) in

kcal/mol corresponds to large differences in affinity (Kd). For example, a difference of

5kcal/mol coincides with three orders of magnitude variation in observed affinity.

2.2 Computational Drug Design

The computational components of the drug design process take place during the

initial stages of each iterative cycle as shown in Fig. 2-2 and the main reasons for their use

is to reduce costs and provide atomic level insight into receptor-ligand interaction. This

are can be broken down in to two area: Structure-Based Drug Design and Ligand-Based

Drug Design The former requires structural knowledge of the receptor while the latter

does not and both will be discussed in detail below. The early iterations of the drug

design process involved the searching or screening of databases of molecules such as

ZINC [27, 28] and other combinatorial libraries [29] for compounds which may be active

against the target [30-32], thus separating drugs from non-drugs [33-35]. Screening can

involve similarity/dissimilarity searching [36] against a known active/inactive molecule.

Compounds can be compared to each other in 1D [37], 2D [38] or 3D [39, 40] with

the later technique being the most expensive. Simple counting techniques [41] such as

Lipinski's "rule-of-five" [42] are also used to filter out non-drug molecules. Screens are

used to predict ADME properties [43] such aqueous solubility [44], hepatotoxicity [45],

P450 inhibition [46], and absorption [47]. Screens are also carried out to predict the

synthetic accessibility of compounds thus allowing for later functional group optimization

[48]. Subsequent "hits" from a screen serve as lead compounds for medicinal chemists.

De novo drug design [49-51] is another tool used to identify novel lead compounds. This









technique ;i. v- molecules in the active site of a receptor or pseudoreceptor from
alignemnt of known active molecules [52].


million
Compounds


1000s
Compounds


100s
Compounds


Target

I
Database
Screening

I
Docking
Scoring

I
Lead
Optimization


milli-seconds


seconds/
hours


seconds/
hours


Figure 2-4. Computational Component of Drug Design. Timings are per compound.


Lead, or "drug-like", compounds are expected to have good pharmacokinetics
and be accessible to synthetic modification. The transition from a lead compound to a
drug candidate involves optimizing structural and chemical complementarities with the
receptor. Docking and scoring are tools to measure the complementary between lead and
receptor [4]. Docking is the process by which a ligand structure is placed in the active site
of the receptor while scoring predicts the binding free energy of complex formation. Lead
optimization is often used to optimize the pharmacokinetics through functional group
substitution. A schematic of the computational aspect of drug design is shown in Fig. 2-4.
This is drawn as a funnel to highlight that the number of compounds decreases from the
top to bottom; however, most often the expense of computational tools used increases.









Various approaches have emerged to calculate or predict the binding free energy.

These have met with varying degrees of success. They include physics-, empirical-, and

knowledge-based scoring functions [5, 53-57], and various QSAR approaches [10, 11].

The results of empirical and knowledge-based scoring functions are highly dependent

on parameterization and the calculation of binding free energies of compounds unlike

those in the training set can yield spurious results. Physics based scoring functions try

to model each component of Eq. 2-7 from first principles. Physics-based techniques and

QSAR approaches are introduced in the following sections and their advantages and

disadvantages in determining the free energy of binding are discussed.

2.3 Molecular Mechanics

Molecular Mechanics (M\!1) force fields such as AMBER [16, 58-60], CHARMM

[61], MMFF [62-69], OPLS [70], and MM3 [71] can be used to calculate the enthalpic
component of the binding free energy between the receptor and ligand.

The AMBER energy function, Eq. 2-10, contains bond, angle, dihedral, and

non-bonded terms. The bond and angle terms are represented by harmonic expressions.

The van der Waals term is a 6-12 potential, and the electrostatic is expressed as a

Coulombic interaction with atom centered point charges.

Etotal K(r req)2 + Ko0(O- eq)2 V + cos( 7)+
bonds angles dihedrals

-12 r 6 + F (2-10)
i
where Kr(kcal/molA2), and Ko(kcal/(mol Radian2)) are the force constants for bond

length and angle, respectively, while req and Oeq are the equilibrium bond distances and

angles.

A truncated Fourier series represents the dihedral term, where V, is the barrier

height, n is the periodicity, Q is the calculated dihedral angle and 7 is the phase difference.









The fourth term describes the steric interaction as a Lennard-Jones potential, where

rij is the distance between atoms i and j. Ai = ijr*12 and Bij = 2Eijyr are parameters

that define the shape of the potential where r* = r* + r* in A, r( is the van der Waals

radius for atom i, and Eij = E E i is the van der Waals well depth in kcal/mol

and q are the atom-centered point charges. A vigorous derivation of the gradients of the

AMBER function are described in Appendix B.

2.4 Quantum Mechanics

Higher order molecular interactions such as polarization and charge transfer are

neglected in molecular mechanics force fields due to their point charge based approaches.

Quantum mechanical techniques intrinsically include such interactions. The

high computational cost of ab initio methods such as Hartree-Fock (HF) and Density

Functional Theory (DFT) restrict their use to small systems such as organic molecules,

protein active sites, and metal clusters.

Thanks to the work by Pople, Dewar and Stewart amongst others, the Roothaan-Hall

equations have been approximated and parameterized to give us a series of so-called

SemiEmpirical (SE) methods. The most commonly used SE methods are derived from

the MNDO (Modified Neglect of Differential Overlap) [72] method including AM1 (Austin

Model 1) [73], PM3 (Parametric Model 3) [74, 75], MNDO/d (\l NDO with d orbitals) [76]

and PDDG/PM3 (Pairwise Distance Directed Gaussian modification of PM3) [77].

Recently, DFT methods have been approximated creating the SCC-DFTB (Self-

Consistent-C'!i ,ige Density-Functional Tight-Binding) method [78, 79]. The SCC-DFTB

approach has been compared to the traditional SE methods, AM1 and PM3, with

comparable errors in predicting heats of formation for a set of 622 neutral molecules;

however, errors were higher than those from the PDDG/PM3 method [80].

SE methods can be used to calculate the total electrostatic energy of a molecular

system, which is the sum of the electronic energy, EeI, and core-core repulsion, Ecore-core


Etot = Eel + Ecore-core (2-11)









where Eei and Ecore-core are described in equations 2-12 and 2-13. In these equations H is

the one-electron matrix, F is the Fock matrix, and P represents the density matrix. Z is

the nuclear charge on the atom, RAB is the atomic separation between A and B, and N is

the total number of atoms.


Eei (H,, + F,,,) P,, (2-12)


corecore N ZAZB
A A RAB
A=1 B>A
The use of QM in SBDD can be divided into two broad categories, receptor-based

and ligand-based methods (Fig. 2-5). Receptor-Based Drug Design (RBDD) methods

include scoring-, QM/i\ i and comparative binding energy (COMBINE)-type methods.

RBDD requires either an X-ray i -1I I or NMR structure of ligands in complex with

the relevant receptor. Ligand-based drug design techniques include various Quantitative

Structure-Activity Relationship (QSAR) methods, which rely only on knowledge of the

ligand structure. In general, QSAR can be conducted using two-dimensional (2D) or

three-dimensional (3D) structures; however, the user must utilize 3D structures when

using QM because of the need to have an all-atom description of the nuclei and associated

electrons [6].

2.5 Ligand Based Drug Design

One of the oldest tools used in rational drug design is QSAR (Quantitative Structure

Activity Relationship) [81]. QSAR models are derived for a set of compounds with

dependent variables (activity values e.g. Ki, IC50), and a set of calculated molecular

properties or independent variables called descriptors. Each compound in the data set is

assumed to be in its active conformation. Models are generated using statistical techniques

such as Multiple Linear Regression (i l.l1), Principal Component Regression (PCR) [13],

Partial Least Squares Regression (PLSR) [82], and Computer Neural Networks (CNNs)


















Field-based Scoring COMBINE
eg. QM-QSAR eg. -. re eg. SE-COMBINE

3D-QSAR QM/MM
eg. AMPAC+CODESSA eg. DivCon/AMBER


Figure 2-5. Hierarchy of QM methods used in SBDD.


[83] to name a few. Ligand-based methods can be further divided into two categories,

3D-QSAR and field-based methods. Both will be touched on below.

2.5.1 3D-QSAR with QM descriptors

The descriptors used in 3D-QSAR are usually divided into three categories: 1)

Electronic, such as HOMO and LUMO energies, 2) Topological, for example connectivity

indices, and 3) Geometric such as moment of inertia. The models in all cases are often

created using multivariate statistical tools due to the large number and high degree of

collinearity of descriptors. An excellent review by Karelson, Lobanov, and Katritzky

provides details of QM based descriptors used in QSAR programs such as COmprehensive

DEscriptors for Structural and Statistical Analysis (CODESSA) [84]. These include those

that can be observed experimentally, such as dipole moments, and those that cannot, such

as partial atomic charges. Clark and co-workers have recently used AMI-based descriptors

to distinguish between drugs and non-drugs and to understand the relationship between

descriptors and their physical properties [85].

Most descriptors are calculated at the semiempirical level of theory using programs

such as AMPAC or MOPAC. However, with computer speed increasing steadily the use of









ab initio and DFT methods are becoming increasingly common. These methods allow the
descriptors to be calculated from first principles. Yang and co-workers examined various
DFT-based descriptors to generate models for a series of protoporphyrinogen oxidase
inhibitors. It was shown that the DFT-based model out performed the PM3 based model

[86].
2.5.2 Field-based Methods

CoMFA (Comparative Molecular Field Analysis) [10] and CoMSIA (Comparative

Molecular Similarity Indices Analysis) [11, 12] are field-based or grid-based methods

where all the compounds in the data set are aligned on top of one another and steric and
electrostatic descriptors are calculated at each grid point using a probe atom. As a result

there are many more descriptors than molecules, therefore a Partial Least Squares (PLS)
data analysis is used to generate linear equations. A study by Weaver and co-workers

compares different field-based methods for QSAR including CoMFA, and CoMSIA finding

that field-based methods provide a robust tool to aid medicinal chemists [87]. Absent
from the traditional MFA approaches are quantum mechanically derived descriptors of

electronic structure. QMQSAR is a relatively new technique where semiempirical QM

methods are used to develop quantum molecular field-based QSAR models [88]. Placing
the aligned training set ligands into a finely spaced grid produces quantum molecular

fields, where each ligand is characterized by a set of Probe Interaction Energy (PIE)
values. A PIE is defined as the "electrostatic potential energy obtained by placing a

positively charged carbon's 2s orbital at a given grid point gi and summing the attractive
and repulsive potentials experienced by that electron as it interacts with the field of the
ligand L":

PIE = (s1siV(L)) (2-14)

\ Xs 1)Xsj [i I l /o li, r | ra d da ,
J" // a=l pr a a Jp al 21









The nuclear charge za is simply the number of valence electrons on atom a and the

notation pe a indicates the set of valence atomic orbitals centered on atom a. Density

matrix elements P,, are given by the following sum over the occupied MOs:

Nocc
P,1 = 2 cYkCpik (2-15)
k=1

When applied to data sets containing corticosteroids, endothelin antagonists, and

serotonin antagonists, linear regression models were produced with similar predictability

compared to various CoMFA models.

2.5.3 Spectroscopic 3-D QSAR

The Spectroscopic QSAR methods [89, 90] include EVA (Vibrational frequencies),

[91-98] EEVA (\!0 energies),[99-102] and CoSA (NMR chemical shifts)[103]. It is a

requirement of 3-D QSAR that all compounds which are being studied contain the same

number of descriptors. However, none of the above techniques obey this requirement. The

number of vibrational frequencies and MOs are dependent on the number of atoms, N, in

a molecule (3N-6, or 3N-5 if linear). While the number of NMR chemical shifts depends

on the number of atomic isotopes with NMR active nuclei, N. A solution of this problem

is to force the information onto a bound scale using a Gaussian smoothing technique,

where the upper and lower limits of this scale are consistent for all compounds in the data

set. A Gaussian kernel, f(x), with a standard deviation of a is placed over each calculated

point, EVA, EEVA or NMR chemical shift as shown in Eq. 2-16. Summing the amplitudes

of the overlaid Gaussian functions at intervals x along the defined range results in the

descriptors for each molecule, f(x), as shown in Eq. 2-17. This process is illustrated in

figures 2-6(a) through 2-6(c).


f(x) = 1- exp- (-)2/22 (2-16)

shifts
f(x) = aexp- "(-x) (2-17)















NMR Spectrum with Gaussian Kernels
interval = 1 sigma = 10


I I I
0 50 100
ppm


(a) Calculated NMR C13 C'1,. 1.11. .1


I I
150 200


I
0


50 100 150 200
ppm


Shifts (b) NMR C13 C',. i.i. .1 Shifts with Gaus-
sian Kernels


NMR Spectrum with Gaussian Kernels + BNMRS
interval = 1 sigma = 10


0 50 100 150 200
0 50 100 150 200


(c) NMR C13 C'1. ..... .1 Shifts with Gaussian
Kernels plus the spectrum projected onto a
bound scale (BNMRS)


Figure 2-6. NMR QSAR. Calculated NMR spectra for a steroid molecule with Gaussian
kernels place at each shift followed by a bound scale projected from the
spectrum.



These descriptors contain a wealth of structural information when we consider the

physical basis of the methods. IR spectroscopy provides information concerning the

presence of molecular functional groups and NMR chemical shifts are highly dependent on


NMR Spectrum









substituent effects in a congeneric series of compounds. MO energies give the electronic

structure of the molecule such as the HOMO-LUMO energies that p1 iv an important role

in the binding process.

The choice of theory used to calculate these descriptors depends on the number of

compounds in the dataset and the accuracy that is required; all can be calculated using

SE or ab initio methods. The QSAR results also depend on the choice of a and x in the

above equations.

These methods have provided predictive models for a number of data sets and have

an advantage over the field-based methods because they are alignment-free, in other words

there is no need to superimpose the structures in the dataset. Asikainen and co-workers

provided a comparison of these methods in a recent paper where they studied estrogenic

activity in a series of compounds [89].

2.5.4 Quantum QSAR and Molecular Quantum Similarity

The Carb6 group has been involved in the development of the field of quantum QSAR

and molecular quantum similarity since the 1980s [104]. The Quantum Similarity Measure

(QSM) between any two molecules, A and B, can be calculated using the following:


ZAB = (PA 2 PB) = J (r)Q(rr2)PB(r2drdr2 (2-18)


where f is some positive definite operator (e.g. kinetic energy or Coulomb) and p is the

electron density. The QSMs can be transformed into indices ranging between 0 and 1

using:
4AB
rAB ZAB (2-19)
V/ZAAZBB
yielding the so called Carb6 Similarity Index (CSI). Calculating an array of QSMs or CSIs

between all molecular pairs in some data set provides descriptors for Quantum QSAR

(QQSAR) [105].

A drawback of the CoMFA-based methods is the need to superimpose the molecules

in the training set. This is no easy task due to the many degrees of freedom (both rigid









and internal motions). However, the alignment of the molecular structures in a common

3D framework provides a convenient method of determining which regions of the molecules

impact activity and which regions can be developed to create new compounds with more

favorable properties. Recently, QSMs have been used with a Lamarckian genetic algorithm

called the Quantum Similarity Superposition Algorithm (QSSA) to superimpose the

classic CoMFA data set [106]. The QSSA is performed in such a way as to maximize the

molecular similarity and does not rely on atom typing as other empirical based methods

do.

Popelier and co-workers have coupled the Atoms-In-Molecules (AIM) theory of Bader

with quantum molecular similarity to produce Quantum Topological Molecular Similarity

(QT\ [S) [107]. It uses the so-called Bond Critical Points (BCPs) of predefined bonds

in a series of molecules as descriptors followed by multivariate statistical analysis. The

series of compounds must have a common core for this method to remain computationally

tractable. QT:\ S has been used to generate models to estimate the values for a set of

aliphatic carboxylic acids, anilines, and phenols [108].

2.6 Receptor Based Drug Design

The classic "Pac-\! In representation of Receptor-Ligand binding is shown in Fig.

2-7 where the receptor is depicted on the left subdivided into residues and on the right is

a small molecule split into fragments. Proteins can be split using standard amino acids

definitions, while ligand structures can be decomposed using functional group definitions.

The binding free energies between receptors and ligands can be calculated using classical

and quantum mechanical methods. In most cases when QM is used in SBDD a single

snap shot of this complex is taken and the interaction energy is determined. Taking

ensemble averages is expensive and time consuming. The scheme in Fig. 2-8 is a matrix

or graphical comparison between classical and quantum mechanical methods in SBDD.

This scheme is divided into three parts, first on the left is a large box which represents

a receptor made up of smaller boxes or residues, I, such as amino acids or bases. The





























Figure 2-7. The Classic "Pac-in i i Representation of Receptor-Ligand Binding. The
receptor is depicted on the left subdivided into residues and on the right is a
small molecule split into fragments.


I-K







...J-L




Figure 2-8. PWD Density Matrix Representation


dark blue box represents how all the other residues in the receptor polarize that residue.

Polarization is where the charges centered on each atom are allowed to relax in the field

of all other charges. The lighter blue box symbolizes the charge transfer that can occur

between residues in a receptor.

The smaller box in the middle of the figure symbolizes a ligand and the smaller boxes

it contains are molecular fragments, J. The pink and yellow boxes can be described in a

similar fashion to the boxes of the receptor.









The largest box on the right is the complex structure. Both the residues, I, upon

binding the fragments, J, are allowed to relax in the presence of the each other. The I

residues are transformed from dark blue to mustard while the J fragments are changed

from pink to brown. This is polarization; however, now it is caused by complex formation.

Most classical potentials cannot model these effects; however, recently there have been

some attempts to incorporate polarization into classical methods such as ff02 and amoeba

[109]. Conversely, QM methods include these interactions implicitly. The I-K (blue to

grey) and J-L (yellow to magenta) interactions originate from the effect binding has on the

intramolecular interaction of the receptor or ligand. These interactions can include charge

transfer and polarization. The I-J interactions (red box) are the most important. Both

methods can calculate these and they are only present in the complex structure. Classical

potentials describe the Coulombic and van der Waals interactions or electrostatic and

dispersive effects between the moieties. The QM methods go a step further to include the

other higher order effects such as polarization and charge transfer. This is where the QM

methods begin to describe the physics of the system more completely; however, this does

not necessarily -i-i-. -1 that they are more predictive!

2.7 Semiempirical Divide-And-Conquer Approach

Very few full quantum mechanical studies of whole proteins have been published [7,

9, 110] but with the increasing speed of computers and linear scaling Divide-and-Conquer

(D&C) techniques the ability to include the whole protein is now possible [111-113].

The D&C method takes advantage of the local character of chemical interactions that

cause the magnitude of density matrix elements to decrease exponentially with distance.

Through the use of cutoffs for the Fock and density matrices and D&C techniques, the

"nearsightedness" of chemical interactions can be exploited without loss of accuracy [114].

The D&C method divides the molecular system into overlapping subsystems where each

localized Roothaan-Hall equation can be solved separately:


FaCa = CEa (2 20)









where F", C", and E" are the subsystem Fock, coefficient, and orbital energy matrices.

The overlap matrix, S, in SE methods is set equal to the identity matrix due to the

NDDO (Neglect of Differential Diatomic Overlap) approximation:


(pAVB CD) (PAA c) AB6CD (2-21)

where 6AB is the Kronecker delta function:

A 1 if A = B,
5AB = (2 22)
0 otherwise.

The diagonalization of the global Fock matrix is the most expensive part of a standard

SE calculation compared to the two-center two-electron integral evaluation which is the

bottleneck of ab initio methods. However, subdividing the global Fock matrix in the D&C

method replaces global diagonalization with -, 1 -i'- I i, diagonalizations which scales

linearly with the number of subsystems, 8,ub (N")3. The subsystem density matrices

are used to assemble the global density matrix and the total energy is calculated using

Equation 2-11. The subsetting scheme in D&C methods is the key to its efficiency.

Usually, each -, 1 '-i--,. i comprises a core region surrounded by one or more buffer regions.

In protein systems, it has been shown that treating each amino acid as a core with a 4.5A/

2.0A-buffering scheme fits the compromise of computational efficiency and accuracy. The

D&C method is not however, the only linear scaling SE method, other methods include

density matrix minimization [115], and the localized molecular orbital method [116].

Recently, Raha and Merz reported a SE D&C based scoring function, QMSCORE,

[117, 118] which is capable of predicting the binding free energy of protein-ligand

complexes. QMSCORE is derived using current technologies to best describe the master

equation 2-7:


AGbind = A Hb dA + ALJ6 + ASsol8 + ASnf + AAGsoz8 (2-23)









The enthalpic interactions in the gas phase, AHbgd, between the protein-ligand were

determined using semiempirical Hamiltonians such as AM1 and PM3. The attractive part

of the Lennard-Jones potential, ALJ6, was used to represent the dispersive interactions

neglected by SE methods. The solvent entropy, ASsoi,, and conformational entropy,

ASc,,f were accounted for by solvent accessible surface area (SASA) and number

of rotational bonds. The solvation free energy due to complexation, AAGsoi,, was

calculated using a Poisson-Boltzmann continuum approach. QMSCORE was applied

to 165 protein-ligand complexes including HIV protease, Serine protease, FKBP, and

DHFR. Although there was a substantial increase in computational cost, it showed better

performance than other scoring functions such as Autodock, DrugScore and LigScore.

2.8 Pairwise Energy Decomposition (PWD)

QM methods are frequently used to determine the electronic energy of molecular

systems. Electronic energies are quantities that characterize the whole system and do not

provide any information regarding the key interactions taking place. Unlike a MM force

field, QM does not easily lend itself to descriptions of energetic in a pairwise fashion.

However, work first done by Fischer and Kollmar using a modified CNDO (Complete

Neglect of Differential Overlap) method partitioned the energy into mono, EA, and

bicentric terms, EAB [119].

N N N
ETOT = EA + EAB (2-24)
A=l A=1B>A

EA + (EAB + EAB + EB ) (2-25)
A B
Later, Dewar and Lo decomposed the MINDO/2 Hamiltonian to study the Cope

rearrangement [120], and then Olivella and Villarrasa separated the MNDO method

to investigate the basicity of azole-based compounds [121].

Recently, Raha et al. partitioned the AM1 and PM3 Hamiltonians to study the

interactions between human Carbonic Anhydrase II, and a series of fluorine-substituted









ligands [7]. Similar to the decomposition by Fischer and Kollmar, the total energy can

be calculated by summing the mono and bicentric terms as shown in Equation 2-25.

The bicentric term is comprised of a repulsive term E4B, an exchange term, EAB, and a

core-core repulsion term E5$e (Eq. 2-26).

Core ZAZB -26
SZAB= (2 26)
A B
The presence of the EA term in Equation 2-25 results in this formalism not being fully

pairwise. This term has a large negative energy contribution to the total energy since it

contains the one-center terms as shown in Eq. 2-27.


EA p A PA 2HAA + A PA A A(A AA) A A A) (227)
EA 2P P 2 j 2
pI v \A -

EGB shown in Eq. 2-28 contains all the electron repulsion, and so it is a positive

contributor to the energy which comes from the diagonal block of the Fock matrix.


E = PAAfpBB, TLA A iA A) (2-28)


EAB defined in Eq. 2-29 contains the exchange between atoms and is a small negative

contributor to the total energy, which stems from the off-diagonal elements of the Fock,

one-electron, and density matrices. As originally described, it contains most of the binding

energy.

AB BA
EAB PAB 2H B 2 A (P AA BVB)) (2-29)


In a biological environment it is often more convenient to partition the energy

function in terms of residues or fragments such as amino acids, bases, or functional groups

rather than atoms. In the HCA II study by Raha the ligands were divided as shown in

Fig. 2-9. The above scheme can be modified to reflect these requirements where the total

energy can be broken down in intra- and inter-residue terms in Eq. 2-30 where El78 and

Ee" are outlined in Eq. 2-31 with A, B E I denoting that atoms A and B are members of









residue I.


Figure 2-9. Schematic Diagram of the human Carbonic Anhydrase II inhibitor
Fragmentation. The structure in blue is the sulfonamide moiety. The amide
group is colored green while the flouro-substitued phenyl group is shown in
red.



Eror = E + EJ e (2-30)
I J

E s EA EA + B
(2-31)
ref = EA B (EAB + EAB + EBore) A EI, Be J


2.9 Quantum Mechanical Charge Models

The point charges used in classical methods are derived from quantum mechanics

using various approaches [122]. The Mulliken charges are the simplest (Eq. 2-32) where P

and S are the density and overlap matrices.


qj Z (PS) (2-32)
vEi
The Mulliken charges are rarely used due to systematic errors and failure to reproduce

experiment properties. Cramer and Truhlar [123-125] have parameterized methods

called C'!i irge Model 1 (CM l) and C!i irge Model 2 (C'l\2) which are based on Mulliken

charges but include corrections for errors due to sensitivity to one-electron basis sets and

levels of theory used. The C'\ I functional form is shown in Eq. 2-33 where fCM1 is a









parameterized function and BAC is the bond order between atoms A and C.

CM1 Mulliken BCM Ac) (23
q 'M qi + I cm (BA.) (2-33)
A#C

The atomic point charges used in the AMBER FF are derived using the Merz-Singh-

Kollman (\!l) [126] and Restrained ElectroStatic Potential (RESP) [127-129] schemes.

The MK charges are generated from QM to reproduce the electrostatic potential at points

around the molecule. The fitting procedure begins by giving each atom a van der Waals

radius and then forming a grid encompassing the molecule. C'li irges are then generated

using grid points not within the van der Waals volume. The MK procedure can often

lead to .,i-viii,, i cal or large charges being placed on atoms which leads to problems

simulating biomolecular interactions. The RESP scheme solves these problems.

2.10 Comparative Binding Energy Analysis (COMBINE)

COMBINE was first developed by Wade and Ortiz in 1995 where they modeled

the binding free energies of a series of ligands to a receptor using a MM potential and

PLS [8, 130]. The COMBINE method has been successfully applied to protein-ligand

[131-134], RNA-ligand [135], protein-DNA [136], and protein-peptide [137] complexes

where predictive QSAR models have been generated. COMBINE was also used in flexible

virtual screening application of Factor Xa inhibitors by Murcia [138].

The enthalpic interaction between the receptor and the ligand is approximated using

a MM function and solvation interaction with Poisson-Boltzmann or Generalized-Born

methods. The Hamiltonian, AU, in MM methods for the binding of a ligand to a

receptor can be described by Eq. 2-35 where u DW and u EL are the van der Waals

and electrostatic interactions between atoms i of the receptor and atom j of the ligand.

The Au ,AT are the changes in bond lengths, angles and dihedrals and uB are the new

intramolecular non-bonded interactions upon binding. The premise of the COMBINE

approach is that AU, can be approximated by a weighted linear combination of the most

important energetic interactions, ui, between the ligand and the receptor as shown in Eq.










2-35. It has been shown using experimental approaches such as thermodynamic double

mutant cycles and pH titration that only a small number of sites of the receptor pi. i, a

role in binding. On this basis, the approximation central to COMBINE can be considered

reasonable.

n1 'T n1 nFr
AU ZZU>DW LE fr' + (2-34)
i=1 j 1 i 1 j 1
nl
YE/A B,L ,A A,L A T,L\ Y ^ AUNB,L
A_,Uti AL + Au) + A /B +
i=1 i
Y(AtBR AiAiR ,ATR\ Y AUNB.R
( +Au + AJR' + AUNBR
j-1 j

AU' "A (2-35)
i=
The challenge in the COMBINE approach is to obtain the n important interactions

and the coefficients, i,' This is accomplished using variable selection techniques and

multivariate statistics or a genetic algorithm [139]. Variable selection methods such as

D-optimal selection and fractional factorial design have been used to reduce AU to the

so-called effective potential function, U'. The AG of binding can then be approximated

using Eq. 2-36 where the coefficients and the regression constant, C are evaluated using

multivariate statistical methods. The constant, C, contains information common to all

compounds in the series of ligands plus the terms that are neglected in the equation such

as entropy.

AG t Ai" + C (2-36)
i= 1

2.11 SemiEmpirical Comparative Binding Energy Analysis (SE-COMBINE)

The SemiEmpirical Comparative Binding Energy Analysis (SE-COMBINE) [9, 140]

approach was developed as a direct extension of the COMBINE and PWD approaches

previously described. The interaction energy in SE methods was decomposed resulting in

Eq. 2-37 with the descriptor table shown in Fig. 2-10.









The SE-COMBINE method was used to elucidate the most important interactions

between trypsin and a series of inhibitors. Protein-ligand interaction energies are

decomposed to find the most or least stabilizing interactions as well as provide a means

to identify regions of significant variation (thereby targeting areas that could benefit from

more optimization). The multivariate statistical tools, PCA and PLS, were used to mine

the interactions between the receptor residues and the ligand fragments to generate QSAR

models. The fragmentation scheme used in this study is shown in Fig. 2-11 where each

ligand contains the 3-amidino-phenylalanine moiety. The authors introduced so-called

IMMs (InterMolecular interaction Maps), an example of which is given in 2-12, which

enable the researcher to graphically view where a candidate drug could be modified or

optimized.

EINT = EJ (EA EB EAB + EAB + Ee) + A I,BcJ

EI EK
J EL
I (EA (AEA + EB

Ej (EA (AEA + LB


Mol Act IJ-E IJ-E IJ-E: J IK-E IK-E JL-E JL-E I-E I-E I-EI J-E J-E J-E"

2



Figure 2-10. SE-COMBINE Descriptor Table.


2.12 Graph Theory

SI i. problems in cheminformatics such as finding the shortest path from one

atom to another, ring and substructure searching are solved using graph theory and

recursive algorithms [40]. Graph theory is a well establish area and is commonly used in

computer networking [141]. A graph G consists of a set of n vertices, V, and a set of m

























Figure 2-11. Schematic Diagram of a Trypsin Inhibitor Fragmentation. The structure in
blue is the 3-amidino-pi. i,1 i I 1.,, i: rn.i i. iv (APM). The TOS group is
colored green while the PIP group is shown in red.


edges, E, where an edge is an unordered pair of vertices. V = {vI, v2, v3, v4, ...,~vU ,

E {ei2, e23, e34, 637, ..., eI}, and G = {V, E}. The order and size of a graph is the

number of vertices, n, and the number of edges, m, respectively. The degree of a vertex

v of G is the number of edges incident upon v. Connected graphs contain a route from

every vertex to every other. Multigraphs (multiple bond containing molecules) are graphs

which contain repeated edges between vertices while a simple graph does not contain any.

A directed graph, or digraph, is a graph with directions assigned to each edge. Complete

graphs are denoted by Kn and are graphs where an edge connects every pair of vertices. A

labeled graph is one where the vertices and/ or edges are given labels. A weighted graph is

a type of labeled graph where the labels are real numbers.

A walk in G is a sequence of vertices w = [vi, v2,. 1, ], k > 1, such that [vj, vj+l] E E

for j = ,..., k 1. The walk is closed if k > 1 and vk V= l, and open if they are different.

A walk is called a path if there are no repeated vertices. A closed walk with no repeated

vertices other than its first and last one is called a cycle. The length I of a walk is the

number of edges it contains (open walks: 1 = s 1, closed walks: 1 = s, where s is the

number of vertices visited). The terms path and chain describe an open walk and a walk























72
68
70
88
62
71 00
69
66
87
63
64
60
56
52
59
29
31
23
44
41
46 -02
27
47
79
85
16
30
28
77
53
45
43
36
22
67
51 -0 4
35
82
54
39

o 17
25

8

76
38
50
12 -06
20
15
57
78
55
49
32
24
75
81
61
33
74
18
48
19 -08
21
3
73

83
34
12

100
42
58
14
37 -1 0
84
86
11







Descriptor




Figure 2-12. Model Lig3C Intermolecular Interaction Map (IMM) of the important EAB

descriptors. The key residues of trypsin that interact with the triple fragment

ligand (APM, TOS, and PIP see Fig. 2-11) label the x-axis. The compounds

on the y-axis are ordered with respect to activity. The activity decreases from

top to bottom. The legend indicates the magnitude of the unsealed descriptor

in eV.









e12 V2 -4 6 ,
I3 Vs C'I-
V7 N
/ \re78 / \
Vg-V8 c-c
(a) Sample Graph (b) Molecular Graph

Figure 2-13. Graphs to Link the Terminology used in Graph Theory and C('l 1i11-I ry.


in which all vertices (and edges) are distinct, respectively. Cycles and paths of size n are

denoted by C, and Pn, respectively. A block is a group of vertices such that all edges

between them are involved in one or more cycles. An open ..1 i-, 1i: vertex is a vertex that

is not located between two blocks while a closed .11 i- 11i vertex is located between two

blocks.

The graph in Fig. 2-13(a) contains a cycle, R = {Rv, RE where Rv {v7, vs, .,}

and RE = {e8, 89, e97}, of type C3. R is a subgraph of G where the vertices and edges of

R are subsets of G or in other words R is isomorphic to a subgraph of G. Reversely, G is a

supergraph of R. The determination if the graph G1 is isomorphic to a subgraph of G2 is

a known as the subgraph isomorphism problem which is NP-complete (Non-deterministic

Polynomial time). The term clique is used for a set of vertices where an edge exists

between each pair. A clique is a subgraph of G and itself is a complete graph. A k-clique

is a clique of order k. Clique detection or maximum common subgraph isomorphism

is a method to find the largest subgraph of Gi isomorphic to a subgraph of G2. The

subgraph isomorphism and maximum subgraph isomorphism problems are known in

cheminformatics as substructure searching and pharmacophore mapping. A molecule can

be represented as a graph where the atoms are the vertices and bonds are the edges as in

Fig. 2-13(b). This is a labeled or colored graph, in other words each vertex is labeled with

the element type and each edge is colored with the bond order. The similarity between

the two structures (graph and molecule) can be seen in Figures 2-13(a) and 2-13(b). A

dictionary of terms is compiled in Table 2-1.









Table 2-1. Correspondence between Graph Theory and


Graph Theory 1! ii ry
Connected Graph Molecule
Graph Order Number of Atoms
Graph size Number of bonds
Vertex Degree Number of bonded atoms
Leaf Vertex Terminal atom
Closed path/ Cycle Ring
Cycle Type Ring Size
Ch II Ch IlII
Block Cluster of Rings
Subgraph Isomorphism Problem Substructure Searching
Maximum Common Subgraph Isomorphism pharmacophore mapping


A tree, T = (Tv, T), is a connected ., 11. i graph. Trees contain leaves which are

vertices of degree 1 and non-leaf vertices. A root is a vertex where all edges point away

from it. A forest is a set of di- I i il trees while a "k-ary tree is a rooted tree in which

every vertex has k children". Trees are often used in conformational searching and other

combinatorial problems.

A matching or edge independent set, M, of G is a subset of the edges, such that no

two edges in E shares a vertex. There are three types of matching called maximum,

maximal, and perfect. A maximum matching is a matching of highest cardinality.

Maximal matching is a matching where no other edges can be added, while a perfect

matching contains all vertices of the graph. A matching is maximum if and only if it has

no augmenting path. An augmenting path is an alternating path which starts and ends

with free or unmatched vertices. An alternating path describes a matching where the

edges are alternately in M and not in M. For molecular graphs the maximum weighted

matching algorithm is a technique of assigning double and triple bonds and corresponds to

maximizing the number of double bonds in a pi system [142].

There are various v-i of traversing or searching a graph. One such technique is the

depth-first search. This is implemented as a recursive routine and tracks which vertex and

edge are encountered thus only visiting each once.


C'!h, i:, dil Terminology.









V1 /V3,
V2 V4

V5V5..S
V6 V10
I I
V7" V9

(a) Molecular Graph (b) Graph

V1j V3 V1 / V3
V2 V4 V2 V4
V5 .. V5
V6 V10 V6 V10
I I
V7 Vg V7 Vg V9

(c) Maximum Matching (d) Maximal Matching



0-1
(e) Kekule Structures of
Benzene

Figure 2-14. Graphs to Link the Terminology used in Graph Theory and Cl' iIII-I ry.


2.13 Statistical Methods

All RBDD and LBDD methods use statistical methods to correlate with or predict

binding free energies. Common statistical measures or tools include mean (Eq. 2-38),

centering (Eq. 2-39), sample variance (Eq. 2-40), standard deviation ( /u2), Z-score (Eq.

2-41), and covariance (Eq. 2-42) where Ei is the observable quantity for element i and N

is the total number of observables.
N
(E)= E (2-38)
i= 1

= E, E) (2-39)

S= E,- (E)2 (2-40)
N 1 (
Ei (E)
Zor E (E) (2 41)
O-









Cov N= (E (E))(En (E')) (2 42)

MLR is an extension of the ordinary least squares method where more than one

independent variable is used to derive the QSAR model, which takes the following form:


Y =BX + E (2-43)


where B is the matrix of regression coefficients and E are the residuals. The quality of the

fit can be accessed using the Pearsons correlation coefficient, r, as shown in Eq. 2-44.

V I (i 1 -W)(Yi -(2 -44)
ri (2I44)


The square of the Pearsons correlation coefficient, r2 or R2, is often reported and

describes the goodness of fit. Cross-validation or jack-knifing is a technique that checks

the quality of the fit. It removes some of the dependent variables from the data set and

derives a model with the remainder. It then predicts the values of the data that have been

left out. The PRESS value (Eq. 2-45) is the residual sum of squares of the data left out

and is used in the calculation of q2 or Q2 (Eq. 2-46), or cross-validated R2, which is the

measure of predictability. Many different forms of cross-validation can be used but the

most common is the 'Leave-One-Out' or LOO scheme, where each dependent variable is

left out and predicted in turn.
N
PRESS = (ypred, yi)2 (2-45)
i=1
PRESS
Q2 = PRESS (246)
z-1 (Ypred,i -())
Together with the correlation coefficient, R2, and the cross-validated correlation

coefficient, Q2, the standard deviation of error of calculations, SDEC, and the standard

deviation of error prediction, SDEP are used to assess the quality of the model. SDEP

can also be defined as the root mean squared error of the dependent variables in a LOO

scheme or external data set as shown in Eq. 2-47. Similarly, SDEC is calculated for those









variables used to build the model or training set.

PRESS
N (2-47)

Unsigned or absolute error (Eq. 2-48) and signed error (Eq. 2-49), mean squared error

(Eq. 2-50) and Root Mean Squared Deviation (RMSD, Eq. 2-51) are also often calculated

to measure the quality of the models.

ae -Ei abs (Ei E) (2 48)
N
ae = (2-48)


N
se = (2-49)

MSE = E ) (2-50)
N

fv (Ei EF)2
RMSD = )2(2-51)

PCA is a method to reduce the dimensionality of the descriptor space by generating

linear combinations of the original descriptors called Principal Components (PCs) that

best describe their variance. Usually the number of PCs is smaller than the number

of descriptors and in doing so reveals the "underlying factors or combinations of the

original variables that principally determine the structure of the data distribution"

[143]. Generally in PCA the X matrix is iii,,- I. l" (Eq. 2-41) or in other words each

descriptor is processed to have a mean of zero and a standard deviation of 1, ensuring

certain variables do not dominate because of their magnitude. The theory of PCA

stems from the eigenanalysis of the correlation matrix, C (Eq. 2-52) where X is the

descriptor matrix. The descriptor matrix has dimensions of n x k, where n is the number

of observations and k is the number of measured variables. C is a square and symmetric

matrix and so facilitates the generation of principal components or eigenvectors, P, which

are orthonormal to each other. A schematic diagram of the matrices and vectors involved

in PCA are shown in Figure 2-15

C = XTX (2-52)









CP = AiP


The magnitude of each eigenvalue, Ai, represents the variances the PC explains of the X

matrix.

PC, = Pf1X + P,2X2 + + ikXk (2-54)

PCA can also be derived in terms of the original descriptor matrix, X, as shown in Eq.

2-55 where Pi are the loading vectors or eigenvectors and ti are the score vectors and E is

an error term after the descriptor matrix was reduced to q principal components.


X tip + t2P ++ tqp + E (2-55)


ti = Xp, (2-56)

The score vectors, ti, determined from the X matrix and pi, are the new variables of

the reduced data set and describe how the different dependent variables relate to one

another while the loadings, pi, reveal which descriptors are responsible. It is common

to analyze the results of PCA using scatter plots of the scores and loadings. The

similarity/dissimilarity between dependent variables is investigated using score plots.

Usually, clustering of the data occurs with very similar data points grouping together and

dissimilar ones being further apart. The loadings plot is used in conjunction with the score

plot to determine the reasons why such clustering exists and decipher which of the original

X variables are causing it.

Partial Least Squares (PLS) is a technique similar to PCR; however, it is derived in a

way that the X scores, ti, explain the variation in X and correlate with Y simultaneously.

PLSR transforms the X matrix into orthogonal components, so-called latent variables, and

then performs a regression step that predicts Y. NIPALS, SIMPLS, and the kernel method

are algorithms for calculating a PLS model; the basic algorithm was developed by Wold

et al. and is outlined below and the matrices and vectors are shown in Fig. 2-16 [14, 144].

The algorithm begins by setting the vector u to one of the columns of Y, which allows the


(2-53)













I
c
I
I
SI


descriptors
-K



x


N


P'


Figure 2-15. Principal Component Analysis (PCA) Schematic Diagram of the Matrices
and Vectors Involved.


calculation of the X-weights, w:


The weights are normalized (wTw = 1):





From the normalized weights, the X-sco





From the calculated X-score the Y-weig


w=
W W

res, t, are calculated:

Xw
t =
WTW

hts, c, and Y-scores, u are determined:

T Y
tTt

YTc
u =
cTc


This method is an iterative process, which is tested on the change in t. The convergence

criterion is normally set in the range of c 10-6 10-s. If convergence has not been

reached the algorithm returns to the calculation of the X-weights. Note that if there is


XTu
S-UTU
rtT1t


(2-57)


(2-58)





(2-59)





(2-60)


(2-61)









only a single Y variable then the algorithm converges in a single step.


told te < e (2-62)
1 .new I

Once convergence is reached the X- and Y- loadings, p and q, are calculated:

XTt
p = T (2-63)

Y
q (2-64)

To proceed to the next latent variable the X matrix (Y is optional) must be "deflated",

in other words, the current latent variable's information must be removed as shown in Eq.

2-65 and 2-66. The total number of latent variables to consider is generally determined

using cross-validation.

X =X -tp (2-65)

Y =Y to (2-66)

Similar to MLR (Y = BX + E), the full PLSR solution can be written where B is the

regression coefficient matrix as Eq. 2-67.


B = W(pTW)- CT (2-67)


The kernel method is an alternative to the above algorithm. Similar to PCA, this

uses eigenvalue-eigenvector equations to come to the same result. For example, the

X-weights are determined by taking the first eigenvector of the variance-covariance matrix

XTYYTX. Similar to the analysis of a PCA model, the interpretation of a PLSR model

involves the score and loading plots, however, in PLSR the score and loading plots engage

the X and Y scores and weights.

2.14 Metalloproteins

Metalloproteins are a key subset of proteins in the body which bind a transition

metal. The metal ion acts as a Lewis acid (electron pair acceptor) towards amino acids









descriptors
JK T U


SX Y


N

F U
P' C'-




W'


Figure 2-16. Partial Least Squares (PLS) Schematic Diagram of the Matrices and Vectors
Involved.


or other molecules which are Lewis bases (possess one or more lone pairs) and are called

ligands. The bonding of ligands to a metal ion is described as dative when the ligand

donates one or more lone pairs to the metal. Metals can be coordinated to any number

of ligands with four, five and six being the most common in '1-i-- ,- iin- Thus the most

likely geometries include tetrahedral, square planar, trigonal pyramidal, and octahedral.

The amino acids that most commonly bind to a metal ion in metalloproteins are shown

in Fig. 2-17. The side chain of each amino acid is labeled with greek letters and these are

used when referring to which atom of an amino acid bonds to the metal, e.g. Zn-CYS@SG

would translate that the gamma Sulfur of a cysteine residue is bound to a Zinc ion.

Iron, Copper, and Zinc are the most abundant transition metals in the human body.

Metal ions in biological systems have both structural and functional roles. They are

termed structural when no chemical reaction takes place at the metal site but aide in the

stabilization of the protein structure whereas functional metalloproteins carry out chemical

reactions.











'SH




(a) Cysteine (CYS)
(a) Cysteine (CYS)


(c Aspartic Acid (ASP
(c) Aspartic Acid (ASP)


Figure 2-17. Most Common Amino


YQ

N f'(


JIV

(b) Methionine (MET)


(d) Glutamic Acid (GLU)


(e) Histidine (HIS)

Acid Residues which Bond to Metal Ions.


Zinc proteins are both structural and catalytic. Zinc acts as a superacid and promotes

the hl, di..1] -i or cleavage of chemical bonds. For example, Human Carbonic Anhydrase II

(HCA II), (Fig. 2-18(a)), catalyses the conversion of CO2 into bicarbonate or vice versa.

HCA II contains a tetrahedral zinc at its active site as shown in Fig. 2-18(b). The Zinc

atom is bound to three histidine residues and a water molecule (pH< 7) or a hydroxyl

ion. Farnesyl Transferase (FTase) is a zinc metalloenzyme that removes the diphosphate

group from the farnesyl diphosphate substrate and connects the resulting farnesyl ]rr,. i I. i









to the cysteine. The full protein structure and active site of 1QBQ are shown in figures

2-18(c) and 2-18(d). Other Zn metalloproteins include carboxypeptidase which cleaves the

terminal carboxy group from peptides and alcohol dehydrogenase which converts alcohol

to .... 1 .1 1. hvde.

Metalloproteins that contain Copper are also both structural and functional. Copper

can change oxidation state and is often involved in electron-transfer reactions. Human

Antioxidant protein (HAH1) contains a tetrahedral Cu(I) bound by four cysteine residues

as shown in Fig. 2-19(b). HAH1 is involved in the transporting of Copper in the body and

is labeled a chaperone. Amicyanin is a tetrahedral Cu(II) containing protein which binds

two histidines, a methionine, and a cysteine residue as shown in Fig. 2-19(d). This protein

is called a blue copper protein due to its spectroscopic properties arising from cysteine to

Cu(II) charge-transfer [145-150].

Metalloproteins can also contain multiple metals in close proximity. For example,

Aminopeptidase is a di-zinc protein from Aeromonas. p,. ,I/.. 1.: (AAP) as shown in

Fig. 2-20(a) which catalytically cleaves the N-terminus of polypeptides. The active site

of Aminopeptidase is shown in Fig. 2-20(b) where the zinc ions are bound to histidine,

aspartic acids and are bridged with a water molecule. Urease from Bacillus pasteurii,

(Fig. 2-20(c)), is a di-nickel enzyme that catalyzes the hy.h, l.I -i- of urea to ammonia and

carbon dioxide. Its active site is shown in Fig. 2-20(d), where two nonequivalent Ni(II)

atoms (3.5A separation) are bound to two histidines each and a bridging callbI. nvlated

lysine. An aspartate residue, two waters and a bridging water/hydroxyl ion complete

the coordination sphere. The geometry of both Ni centers can be described as square

pyramidal and octahedral [151-154]. Both Aminopeptidase and Urease are homo-nuclear

proteins but hetero-nuclear metalloproteins also exist. Copper-Zinc Superoxide Dismutase,

Cu,Zn-SOD, is one such protein as shown in Fig. 2-20(a) and its active site highlighted in

Fig. 2-20(b).






















(a) Human Carbonic Anhydrase, HCA II, (PDB
ID: 1CA2)


(c)
1QBQ)


Farnesyl Transferase, FTase, (PDB ID:


(b) 1CA2 Active Site


(d) 1QBQ Active Site


Figure 2-18. Zinc Metalloproteins. The Zinc ion is shown in purple while the Oxygen atom
of the water molecule in 1CA2 bound to Zinc is shown in red.


;li



































(a) Human Antioxidant Protein, HAH1,
(PDB ID: 1FEE)


(b) 1FEE Active Site


(c) Copper Amicyanin (PBD ID: 1AAC) (d) 1AAC Active Site

Figure 2-19. Copper Metalloproteins. The Copper ion is shown in grey.



































(a) Aminopeptidase (PDB ID: 1AMP)


(c) Urease. (PDB ID: 2UBP)


Figure 2-20.


(d) 2UBP Active Site


Homo-Nuclear Metalloproteins. The Zinc and Nickel ions are shown in purple
and grey respectively, while Oxygen atoms of water molecules are shown in
red.


(b) 1AMP Active Site











































Figure 2-21. Hetero-Nuclear Metalloproteins. The Active Site of Copper-Zinc Superoxide
Dismutase, Cu,Zn-SOD, (PDB ID: 1CBJ) is shown. The Zinc and Copper
ions are shown in purple and grey respectively.









CHAPTER 3
MODELING TOOL KIT++

3.1 Introduction

In an ideal world where one could use the most accurate theories with infinite

computer power and time it is possible to design a new drug. However, in reality there

is alv--i a compromise between speed and accuracy. Figure 3-1 attempts to illustrate

current research efforts where reality is marked as an "X" and progress is being made,

for example in levels of theory used in scoring functions for receptor-ligand interaction

calculations and conformational sampling techniques [6]. These advances have mirrored

the recent increases in computing power over the last number of years.




Time/Money

kDrug









Theory

Figure 3-1. Computational Drug Design. In theory a drug can be design in silicon however
in reality it has not materialized. Current research efforts have focused on
pushing the boundaries of theories used in the SBDD areas with mixed results.


With the desire to use QM techniques in SBDD firmly established there comes a need

to develop software where these methods can be used conveniently in the DD process.

This has led to the development of a package of C++ libraries called Modeling ToolKit++

(\ TK++) to interface with common QM programs to test the applicability and validate

QM methods in SBDD. MTK++ was designed from the ground up to be used in areas









of in silico SBDD such as molecular alignment and receptor-ligand scoring. The use of

SE methods in molecular alignment and scoring is further analyzed in Ch. 4. Also this

toolkit was designed with metalloproteins in mind where no such software was known to

be available and will be discussed in detail in Ch. 5. All too often molecular modeling

software which are described as "open source" are obfuscated before release and so it

becomes almost impossible to read or extend. To combat this MTK++ was developed

as an in-house suite of libraries with a consistent Application Programming Interface

(API) which will allow new and novel methods to be developed. This chapter describes

the design and development of MTK++. The algorithms used are described in detail with

numerous illustrated examples.

3.2 Overview

MTK++ is an object oriented C++ package of molecular modeling libraries

including Molecular Mechanics (\ l\ ), Genetic Algorithm (GA) file processing and

conversion (Parsers), statistical and molecular tools used in LBDD and SBDD and other

computational chemistry fields. The Basic Linear Algebra Subprograms (BLAS) Linear

Algebra PACKage (LAPACK) Boost, and xerces-c [155] libraries were used in the

development. At the time of preparation of this thesis MTK++ contained over 30,000

lines of code. Thus, a complete description of the code cannot be given; however all

libraries and their 1 ii i" classes and algorithms are described.

3.2.1 Development

MTK++ is implemented as a C++ package of libraries. C++ was used instead of

other programming languages such as FORTRAN, and C because it is an object-oriented

programming language that enables abstraction, encapsulation and inheritance. C++

contains the Standard Template Library which has convenience classes such as vectors,

maps, lists, etc. and external libraries such as BLAS and xerces-c for matrix-vector math

and xml handling respectively. Also C++ code can be compiled on nearly any operating

system and allows modular programming which makes making changes easy. Furthermore









C++ is backwards compatible with C and the resulting C++ code is very efficient due to

its duality as a high-level and low-level language. The development and debugging was

done on a 1.33 GHz PowerPC G4 Apple computer running Mac OS X 10.4.8 with 512 MB

SDRAM. The gcc (4.0.1) compiler was used. The code is cross-compiler and cross-platform

compatible and was tested on both Mac OS X and Linux operating systems.

3.2.2 Library Hierarchy



: 1 Molecule I

GA Graph



Parsers |lItil



Statistics I Minimizers





Figure 3-2. Library Hierarchy as Implemented in MTK++. The Library where the tail of
the arrow starts uses the library where the head of the arrow ends, e.g. The
Parsers library uses the Molecule, Utils, Statistics, and GA Libraries.


Figure 3-2 shows the hierarchy of the MTK++ package. At the center of the package

of libraries is a group of utility routines which are used in all other packages. These

include constants definition, diagonalization functions, an indexing class for easy sorting

of objects, and a class called vector3d for atomic coordinate storage and transformation.1

The Parsers library takes care of reading and writing of files and it requires the Molecule



1 The vector3d class was originally developed by Andrew Wollacott [156, 157]. Extensive functionality
was added.









and GA Libraries. Also the Molecule library uses the Graph library for ring perception

and other recursive functions. The design of the individual libraries is discussed further in

the sections below.

3.2.3 Molecule Library

The core of the MTK++ package is the Molecule library and its most important

classes are shown in Fig. 3-3. This library can handle multiple molecules at a time and

these are stored in the collection class. The collection class also takes care of all the

elements (this information only needs to be stored once, not for every molecule), and

parameter and fragment information for MM calculations. The molecules themselves are

of type in! I,. !, and this class stores submolecule or residue information. This division

is analogous to that of amino acids in protein or nucleotides in DNA or in fact fragments

in small organic molecule. The submolecule class stores a list of atoms and the atom class

stores pointers to objects such as its element and coordinates which are a vector of three

double precision numbers (vector3d).

The parameters class stores information for MM calculations as structures, such

as atom types, bond, angle, torsion, improper (force constants, equilibrium values)

and non-bonded (charges, Lennard-Jones values) parameters as shown in Fig. 3-4.

The stdLibrary class is the main object which deals with the storage and function of a

molecular fragment library as shown in Fig. 3-5. stdLibrary stores a list of stdGroups

and a stdGroup is a storage container for stdFrag's. For example a stdGroup could store

the 20 amino acids, each a stdFrag, of proteins or a list of functional groups in drug

design. The stdFrag class contains information about it atoms, stdAtom, bonds, stdBonds,

features, stdFeature, etc.

The functionality available to molecules such as proteins, DNA, and small organic

originate from the molecule class in the Molecule library as shown in Fig. 3-8. molecule

stores a lists of bonds, (a vector of Bond objects in C++), angles, torsions, and impropers.

The connectivity information is determined in the connections class. This class can

















metalCenter stdLibrary parameters




collection




disulfide molecule elements




submolecule




atom




L ------I
vector3d I element



Figure 3-3. Core Class Hierarchy of the Molecule Library as Implemented in MTK++.
Solid line boxes denotes a class, while a dashed box signifies a structure. A
class where the tail of the arrow starts uses or contains a class or structure
where the head of the arrow ends. e.g. The elements class contains or uses the
element structure.











1 atomType I parameters bondParaml
L--LJ .





[r- --- i ------- ------ -------- --- ----
I angParam I I torParam I I impParam I LJ612Param I eqAtoms I
L -- _- J ------ L -------J i---------- L------J


Figure 3-4.


Class Hierarchy of the Parameters Component of the Molecule Class as
Implemented in MTK++. Solid line boxes denotes a class, while a dashed box
signifies a structure. A class where the tail of the arrow starts uses or contains
a class or structure where the head of the arrow ends. e.g. The parameters
class contains or uses the atomType structure.


stdLibrary


I
stdGroup




I stdAtomi stdFrag I stdBond I





S----- r ------------ I---u r-------i r-----
istdLoopi i stdImproper i I stdAlias i 'stdFeature' istdRingi
L -_- J L -------- L------ --__ __--_- L-----


Figure 3-5.


Class Hierarchy of the Standard Library Component of the Molecule Class as
Implemented in MTK++. Solid line boxes denotes a class, while a dashed box
signifies a structure. A class where the tail of the arrow starts uses or contains
a class or structure where the head of the arrow ends. e.g. The stdFrag class
contains or uses the stdAtom structure.









perceive bonds using distance and other geometric information, and also determine bonds

through user defined databases of molecular structures (this is discussed further below in

section 3.2.8 and appendix C). For example the connectivity of an alanine residue in a

protein doesn't need to be perceived since it is known a priori if the names of the atoms

are known. Disulfide bonds between Cysteine residues, as shown in Fig. 3-6, of proteins

are automatically perceived using the parameters in table 3-1 [156]. If the Cysteine's SG

atoms are within dCutoff of each other and S SEnergy from Eq. 3-1 is less than eCutoff

they are considered bonded. The protonation states of Histidine residues bound to a metal

ion are also perceived using a bond distance cutoff of 2.3 Angstr6m. If the HISANE2

(epsilon nitrogen of Histidine) atom is within this cutoff the residue is set to HID type. If

the HISAND1 is within this cutoff the residue is set to HIE type. If both HISANE2 and

HISAND1 are bonded to a metal atom within 2.3 A then the residue is set to HIN type

such as the bridging histidine residue in Copper-Zinc Superoxide Dismutase [158]. Bond

order, hybridization and formal charge of atoms for small molecule are determined in the

hybridize class which is discussed in more detail below in section 3.3.

,CB1 G .ISG2, CYS
CYS SG C Y

Figure 3-6. Disulfide Bond in Proteins.


Table 3-1. Disulfide Bond Prediction Parameters.
Parameter Value
dCutoff 2.5
ssBondReq 2.038
ssBondKeq 166.0
CBSGSGReq 103.7
CBSGSGKeq 68.0
eCutoff 30.0










HN^N


-CB
(a) HIN


N 'N


-CB
(c) HIE


N NH


-CB
(b) HID


HN NH


-CB
(d) HIP


Figure 3-7. The Structural Types of the Histidine Residue.


S- SEnergy

where :

E Bond
SG-SG
JAngle
CB-SG-SG


E Bond i VAngle jAngle
ESG- SG2 -+ E-C-SG1-SG2 + --C'B2-SG2-SG1



ssBondKeq (distancesGsG ssBondReq)2

CBSGSGKeq (anglecB-sG-SG CBSGSGReq)2


Ring moieties are perceived within the rings class and each ring found is stored

in a ring structure. The perception of rings is discussed further in section 3.4. The

functionalize class determines which functional groups are present in a molecule using

the database of fragments as defined in appendix C. The implementation details of the

functionalize class is outlined in section 3.7.

The fingerprint class contains rudimentary functionality for molecular fingerprinting.

A fingerprint is defined as information that describes a molecule in 1-D. The fingerprint

in MTK++ is represented as a vector of integers with the following form: ,ii-ii info,

bond type, # of rings ring info". The number of atoms from Hydrogen through Iodine

are stored in the first 52 positions, another 52 positions store the number of each of the


(3 1)









following bond types B-H, C-H, N-H, O-H, S-H, B-C, B C, B-O, B-N, B-O, B-F, B-S,

B-C1, B-Br, B-I, C-C, CC, CC, N-N, N N, C-N, C N, C'.'N, N-O, N 0, N-P, N-Se,

N Se, 0-0, C-O, C O, O-Si, O-S, O S, O-Se, O Se, C-F, S-S, C-S, C S, S-N, C-C1,

P-P, P-C, P-O, P-0, P-S, P-Se, Se-Se, C-Se, C Se, N-Se, where "-", "=", '." denote a

single, double and triple bond respectively. Finally the 105th position stores the number of

rings in the molecule or fragment. The size, planarity, aromaticity, heterocyclic boolean,

and the number of nitrogens, (.::;i--,. -; and sulfur atoms of each ring is also stored after

the 105th position. Thus the length of the vector depends on the number of rings present

in the molecule or fragment. Fingerprinting in MTK++ is primarily used in conjunction

with the functionalize class. Fingerprints are used to screen out fragments that could not

be apart of a molecule based on elements, bond types, and rings present, thus speeding up

the functionalization of molecules.

A pharmacophore is commonly defined as the three dimensional geometric arrangement

of molecular features that are necessary for biological activity. Pharmacophores between

two molecules are detected using a feature (H-bond Donor/Acceptor, Pi Center,

Positive/\. ;,Il ive Center, Hydrophobicity) matching algorithm in the pharmacophore

class. The features common to both molecules are stored in a clique structure. A full

description of the algorithm implemented in MTK++ is outlined in section 3.8.

The protonate class carries out the addition of Hydrogen atoms to macromolecules

(proProtonate), ligands (ligProtonate), and water (watProtonate) molecules. proProtonate

uses information from user defined libraries to add Hydrogens while ligProtonate is used

when no such library is available. Water molecules often surround structures derived from

X-ray < I l11 -I...raphic data but no Hydrogen atom positions are provided. Hydrogens are

added separately to water molecules after they are added to macromolecules and ligands.

The algorithmic details of the three protonate classes are described in section 3.5.

Conformational searching of drug-like molecules is carried out in the conformer

class using graph theory methods. Each conformer of a molecule is stored in a conformer









structure. The internal workings of this class are described in section 3.6. A integral part

of conformational searching is determine the amount of conformational space sampled.

To determine this requires being able to superimpose a conformer onto some reference

structure and calculate the root-mean-squred deviation. The superimposition of two

molecules is carried out in the superimpose class and is discussed below in section 3.9.

The selection class is used to parse strings that represent subsets of molecular

data and is essential in providing an API for users of MTK++. The data structure in

the Molecule library has an inherent hierarchal nature. Atom information is stored in

submolecule; bonds, angles, torsions, impropers, and submolecules are stored in molecule

and finally all molecules are stored in collection. The atom class is at the bottom of the

hierarchy, while collection is at the top. Thus to retrieve for example all atoms which

a specific name in all molecules of the collection would require a certain syntax. The

syntax used in the selection class resembles that of a UNIX operating system such as

"/collection/molecule/submolecule/atom" For example, providing the following string:

"/COL/\ OL/ALA-10/.CA." would select the atom ".CA.", alpha carbon, in alanine

with residue number 10 (ALA-10) and that's part of the molecule named MOL in the

COL collection. The "/" on the left hand side of the string assumes that the selection

is starting from the top of the structural hierarchy. The following selection string does

not begin with a slash: "ALA-10/.CA." and represents parsing the hierarchy from the

bottom up; all alpha carbons of molecules in the collection with alanine at position 10 will

be selected. This syntax can handle molecule/ submolecule/ atom names, numbers, or a

combination (name-number), such as ALA-10.

The atomTyper class assigns molecular mechanics atom types to the atoms of

a molecule using user defined fragment libraries such as those in appendix C. The

hydrophobic regions of molecules is determined using an atom additive approach as

outlined by Wang and Zhou in 1998 [159].










1 r----- ------1
Bond I Angle I Torsion I Improper
r-- -- -- r---- ----1--


Shybridize M le opont o select C

,i tonne tions


s|superimpose the hed of te molecule I pharmacophoren

atomTyper
protonate conformers
S functionalize class.

.. tdLi rah L g t I conformer a cique



Figure 3-8. Class Hierarchy of the Molecule Component of the Molecule Class as
Implemented in MTK++. Solid line boxes denotes a class, while a dashed box
signifies a
structure. A class where the tail of the arrow starts uses or contains a class or
structure where the head of the arrow ends. e.g. The molecule class contains
or uses the hybridize class.


3.2.4 Graph Library

The graph library contains classes as shown in Fig. 3-9 to handle molecular graphs as

described in C!I Ilpter 2.12. This library is used to find rings and to determine whether

graphs are isomorphic. Also it is used to traverse the torsional tree for systematic

conformational searching. Tree and graph traversal is carried out using the depth-first

search algorithm. The graph class stores both vertices and edges with the edge struct

storing pointers to two vertex objects. Both vertices and edges store a boolean to describe

whether each has been visited during a traversal and a numerical variable to describe

its color or label. The vertex class also stores a list of its neighbors and which lI-., r it is

placed on.










vertex



I edgeI
L ___-J

/



Figure 3-9. Class Hierarchy of the Graph Library as Implemented in MTK++. Solid line
boxes denotes a class, while a dashed box signifies a structure. A class where
the tail of the arrow starts uses or contains a class or structure where the head
of the arrow ends. e.g. The graph class contains or uses the edge structure.

3.2.5 MM Library

The MM library contains classes and functions to carry out Molecular Mechanics

minimizations as shown in Fig. 3-10. Currently, the AMBER function is used as described

in C'! lpter 2.3. The amber class contains the driver functions for the lower level classes

ambBond, ambAngle, ambTorsion, and ambNonBonded that contain the AMBER

energy/gradient functions. The mmPotential class is the controller for all MM functions

which could be developed. It performs all the memory allocation/deallocation. The

MTK++ was designed as to easily allow the extension of its features. For example,

cross terms such as bond-stretch and non-harmonic terms such as the Morse potential

for bonded atoms are now possible within the MM library. This will become essential

when MTK++ is used to study for example Blue Copper proteins where they contain a

Copper-Sulfur bond that cannot be represented using a harmonic potential.

3.2.6 GA Library

The GA library contains classes and functions to carry out a parameter optimization

using a genetic algorithm as shown in Fig. 3-11. This was initially designed to carry

out conformational searching of organic molecules. However, its design is application

independent. A genetic algorithm is a heuristic method whereby reaching the global











ambBond ambAngle ambTorsion ambNonBonded




Somber




I mmPotential



Figure 3-10. Class Hierarchy of the MM library as Implemented in MTK++. Solid line
boxes denotes a class and a class where the tail of the arrow starts uses or
contains a class where the head of the arrow ends. e.g. The amber class
contains or uses the ambBond class. The orange arrow is used to represent a
public inheritance relationship between classes, i.e. amber is of type
mmPotential.


minimum of parameter space is not guaranteed to be found. Other heuristic methods

include the Monte Carlo technique.

The GA library was designed in such a way as to mimic human civilization or

evolution. gaWorld is the main class in the library. gaWorld can contain multiple regions

or gaReg's. Each region contains a population (gaPop) of individuals (galnd). Each

individual contains some chromosomal (gaChr) make up that in turn is described by genes

(gaGene). The individual can contain multiple chromosome which in turn can contain

multiple genes. For each application of this GA the fitness of each individual must be

evaluated. This energy function is provided by the user to the library. The iii v I, i! of

the fittest" model is used for individuals to propagate or survive between generations.

Individuals can survive from one iterative step to the next through a semi-random

selection process biased by its fitness. Reproduction between individuals is carried out

using operators such as cross-over, mutation, and averaging. The number of iterations

carried out by the GA is user defined or through some convergence criteria.









Regions are treated as being independent, however the library was implemented in

a way as to allow the i-! i l-hoppin to be developed. This would allow the GA to

run in parallel and during the course of an optimization an individual would with some

probability travel from one region to another. This would allow the genetic information to

be more diverse and prevent early convergence.

3.2.7 Statistics Library

The Statistics library contains classes to carry out statistical analysis as shown in

Fig. 3-12. This library is built from the boost library where matrix and vector math is

perform very efficiently. The sheet class contains and handles matrix objects. The matrix

class is derived from the boost matrix and extends its features by allowing for matrix

labeling. The baseStats class performs the basic statistical functions as described in Ch.

2.13. The ols class carries out Multiple Linear Regression to calculate Pearson's correlation

coefficient. The pea class carries out Principal Component Analysis (PCA) and the pls

class performs Partial Least Squares (PLS) modeling using the kernel algorithm with

leave-N-out cross validation being implemented. The PLS algorithm produces a number of

matrices during execution and so the sheet and matrix classes were essential.

3.2.8 Molecular Fragment Library

A fragment library was developed for applications including functional group

recognition, molecular alignment and fragmentation of drug-like molecules for SE-COMBINE

approach. The library currently contains over 300 fragments. Fragment names, internal

codes and 2-D structural pictures can be found in Appendix C. This is a highly

extendable library with all the tools required to add fragments available within MTK++.

The fragments are built using the methodology developed to construct residues for

the AMBER suite of programs. This approach uses atom names and types from the

Generalized AMBER FF (GAFF) [160] and HF/6-31G* Merz-Kollmann/RESP charges as

described in Ch. 2.9. The use of RESP easily allowed for symmetric atomic charges such

as the oxygen atoms in a carboxylate group and for fragments to contain integer charge.















gaWorld gaOutput
Ra~~rld ()~lr~lA


gaSelection


gaGaussian


gaReg gaOperators gaCrossOver


gaAverage


gaMutate


galndl g IRaChr IgaGene


Figure 3-11. Class Hierarchy of the GA Library as Implemented in MTK++. Solid line
boxes denotes a class and a class where the tail of the arrow starts uses or
contains a class where the head of the arrow ends. e.g. The gaWorld class
contains or uses the gaReg structure.


gaPop











baseStats





ols pea pis





table |boost





sheet



Figure 3-12. Class Hierarchy of the Statistics Library as Implemented in MTK++. Solid
line boxes denotes a class. A class where the tail of the arrow starts uses or
contains a class or structure where the head of the arrow ends. e.g. The pls
class contains or uses the baseStats class. The blue arrow is used to represent
a public inheritance relationship between classes, i.e. pls is of type baseStats.

3.2.9 Parsers Library

The Parsers library contains classes to read and write molecular file types. XYZ,

MOL, MOL2, PDB, SD, and ZMAT file formats are supported. All classes inherit

baseParser as shown in Fig. 3-13. baseParser controls the error handling of all classes

in a uniform way. The xml file parsers also inherit xmlConvertors and domErrorHandler

from the xerces-c library which deal with errors in the xml files. Input and output files

of programs such as DivCon and Gaussian (both cartesian and internal coordinates)

are handled. The element parser reads the elements.xml file stored in the MTK++

distribution and populates the elements object which the collection class stores. For each

atom in the periodic table the following information is stored: atomic number, mass,









group, period, valence, full shell size, covalent radius, van der Waals radius, Pauling's
electronegativity value, and which semiempirical Hamiltonians are available to a given

atom. The stdLib parser handles the library xml files described in the previous section and

populates the stdLibrary and stdGroup classes. The param parser handles the parameter

files associated with the fragment library such as parm94 and GAFF from AMBER. The

GA parser handles the files associated with the GA library of MTK++. The amber parser

can export and import AMBER topology and coordinate files.



DivCon igaussian |sd zmat


db mol2

l baseParser

amber | stats


element param stdLib




xmlConvertors/domErrorHandler



Figure 3-13. Class Hierarchy of the Parsers Library as Implemented in MTK++. Solid
line boxes denotes a class and a class where the tail of the arrow starts uses
or contains a class where the head of the arrow ends. e.g. The element class
contains or uses the xmlConvertors class. The blue arrow is used to represent
a public
inheritance relationship between classes, i.e. pdb is of type baseParser.


3.3 Hybridization, Bond Order and Formal Charge Perception

It is often the case in SBDD that the design process starts with an x-ray ( i -I I1

structure of a target molecule in complex with some bound substrate. This poses the

challenge of determining atomic hybridizations, formal charges and bond orders of the









small molecule due to the fact that there are no Hydrogen atoms present. Numerous

algorithms have been published [161-165] but the algorithm by Labute in 2005 [142] to

perceive atom hybridizations, bond orders and formal charges of drug-like molecules was

implemented in MTK++ as it was shown to be superior to the others. Other methods to

perceive atom types and bond information include antechamber by Wang et al. [166].

The Labute algorithm takes ten steps to determine the atom hybridizations, formal

charges, and bond orders. A ligand that binds to PPAR7 (PDB: 1FM9) as shown in Fig.

3-14(a) is used to illustrate the algorithm where xl, ..., x, denote the 3D coordinates of n

atoms with atomic number Z, ..., Z,, and the number of bonded atoms for each atom is

Qi and rj = IXi Xj is the distance between two atoms.

Bonds are perceived by first producing a candidate list and then refining it using

geometry. Covalent radii, Ri, from Meng [161] shown in table 3-2 are used in Eq. 3-4 to

determine the candidate bond list. Then for each atom, i, a "dii., -iii di, is assigned

based on a principal component analysis of the Gram Matrix, D, defined in Eq. 3-5 where

i is the current atom index, k is the number of bonded atoms and q is the geometric center

as shown in Eq. 3-6.

Table 3-2. Meng Atomic Covalent Radii.
Atom Radii Atom Radii Atom Radii Atom Radii
H 0.23 P 1.05 Ni 1.5 Nb 1.48
He 1.5 S 1.02 Cu 1.52 Mo 1.47
Li 0.68 Cl 0.99 Zn 1.45 Tc 1.35
Be 0.35 Ar 1.51 Ga 1.22 Ru 1.4
B 0.83 K 1.33 Ge 1.17 Rh 1.45
C 0.68 Ca 0.99 As 1.21 Pd 1.5
N 0.68 Sc 1.44 Se 1.22 Ag 1.59
O 0.68 Ti 1.47 Br 1.21 Cd 1.69
F 0.64 V 1.33 Kr 1.5 In 1.63
Ne 1.5 Cr 1.35 Rb 1.47 Sn 1.46
Na 0.97 Mn 1.35 Sr 1.12 Sb 1.46
Mg 1.1 Fe 1.34 Y 1.78 Te 1.47
Al 1.35 Co 1.33 Zr 1.56 I 1.4
Si 1.2









0.1 < rij < R + Rj + 0.4


k
D = (q q)(qi q) (3-5)
i=0

q E g (3-6)
i=0
di is set to k if k < 2 otherwise, di is the number of positive eigenvalues of D with square

root greater than 0.2. di will be 0 for isolated atoms, 1 for terminal and linear atoms

with at least 2 bonds, 2 for planar atoms (e.g., sp2 or square planar), and 3 otherwise

(e.g. tetrahedral or sp3d). The di numerical values for 1FM9 are shown in Fig. 3-14(b).

Following the assignment of di an upper bound, Bi, for the number of bonds allowed

by an atom is determined using di and Zi as shown in Table 3-3. Only the shortest Bi

are retained. At this point all atom hybridizations and bond orders are set to zero or

undefined. The next phase assigns obvious hybridizations based on d, Z, and Q. Each

Table 3-3. Labute Algorithm Upper Bound Bond Conditions.
Bi Condition
0 di 0
1 Zi < 3(H, He)
2 d = 1, Zi > 2 (sp hybridized and linear)
3 di = 2, Zi < 11 (sp2 hybridized for 2nd row elements)
4 di = 2, Zi > 10 or di 3, Zi < 11 (square planar or sp3 hybridized)
7 otherwise


row of table 3-4 is carried out one at a time with each row only being applied to atoms

with unassigned hybridization resulting in Fig. 3-14(c). Only atoms with unassigned

hybridizations have d < 3, Z = (C,N,O,Si,P,S,Se), Q < 4 and at least one bonded neighbor

with an unassigned hybridization. At this stage all bond orders, bij in which atom i or j

has non-zero hybridization are set to 1.

A dihedral test is used to identify bonds of order 1. The smallest out-of-plane

dihedral is computed using: minj,k IPijki I Pijk I Pikl If this dihedral is

greater than 15 then bij is set to 1. Results of this step are shown in Fig. 3-14(d).


(3-4)









Algorithm Atom Hybridization Assignment.


hybridization Condition
sp3 Zi 1,2
sp3d Qi > 4, Z,= (Group 5) and Qi =5, Zi Group 4,5,6,7,8
sp3d2 Qi > 4, Z = (Group 6) and Qi = 6, Zi = Group 4,5,6,7,8
sp3d3 Qi > 4, Z, (Group 7) and Qi =7, Zi Group 4,5,6,7,8
sp3d2 Qi 4, Zi > 10, di 2
sp3d2 Zi (Transition Metal)
sp3d2 Qi > 4, Z, > 10 and not Si, P, S, Se
sp3 Qi > 4, Z, > 10 and Si, P, S, Se
sp3 (Q 4) and (Q, = 3, d, = 3)
sp3 Qi > 2, Zi Group 6,7,8
sp3 Zi not (C,N,O,Si,P,S,Se)
sp3 All atoms where none of its bonded atoms have zero hybridization


The following table 3-5 of lower bound single bond lengths and Ixi xj > Li 0.05,

where Li is the reference bond length, are used to identify single bonds. The bonds

identified using this step are shown in Fig. 3-14(e)

Table 3-5. Labute Algorithm Lower Bound Single Bond Lengths.
bond dist bond dist
C-C 1.54 C-N 1.47
C-O 1.43 C-Si 1.86
C-P 1.85 C-S 1.75
C-Se 1.97 N-N 1.45
N-O 1.43 N-Si 1.75
N-P 1.68 N-S 1.76
N-Se 1.85 0-0 1.47
0-Si 1.63 O-P 1.57
O-S 1.57 O-Se 1.97
Si-Si 2.36 Si-P 2.26
Si-S 2.15 Si-Se 2.42
P-P 2.26 P-S 2.07
P-Se 2.27 S-S 2.05
S-Se 2.19 Se-Se 2.34


After steps 5 and 6 the hybridizations of all uncharacterized atoms not involved in a

bond of unknown order are set to sp3 as shown in Fig. 3-14(f).

A molecular graph is formed including only atoms (vertices) that have undefined

hybridization and bonds (edges) that have unknown order. This graph is then divided into


Table 3-4. Labute









components or subgraphs. Each subgraph is analyzed independently and bond orders are

assigned as shown in Fig. 3-14(g). Edge weights are assigned with the following equation

, = ui + uj + 26(rij < Lij 0.11) + 6(rij < Lj 0.25) using the atom parameters, u,

defined in Table 3-6 (3rd and 4th row elements are mapped to the corresponding 2nd row

with 0.1 been subtracted, -20.0 for all other atoms). Results are shown in Fig. 3-14(h).

Table 3-6. Labute Algorithm Bond Weights.
atom Q=1 Q=2 Q=3
C-O 1.3 4.0 4.0
C-N -6.9 4.0 4.0
C 0.0 4.0 4.0
N-C-O -2.4 -0.8 -7.0
N-C-N -1.4 1.3 -3.0
N 1.2 1.2 0.0
O-C-O 4.2 -8.1 -20.0
O-C-O 4.2 -8.1 -20.0
O 0.2 -6.5 -20.0


A Maximum Weighted Matching Algorithm as described in C'!i lter 2.12 is employ,

to find the best arrangement of double/triple bonds in each subgraph resulting in the

pattern shown in Fig. 3-14(i). Ionization states and formal charges are perceived from

the connectivity and bond order. The formal charge of atom i, fi, is calculated as follows:

fi = c oi + bi, where: ci is the atom group in the periodic table, oi is the nominal octet

(2 for Hydrogen, 6 for Boron, 8 for Carbon and all other sp3 atoms in groups 5,6,7,8)

and bi is the sum of the atom bond orders. The final stage of the algorithm determines

the correct bonding and charge state for the following functional groups: nitro, alphiatic

amines, carboxylic acids, sulfonic acids, phosphonic acids, amidines, guanidines, and

sulfonamides.

3.4 Ring Perception

The algorithm used is in close agreement with that published by Fan, Pin -i, v, Doucet,

and Barbu in 1993 [167]. The functions contained in rings determines the smallest set

of smallest rings (SSSR) from a molecule graph. The SSSR of a molecule is represented














C
c c
N C
c c
N
C
C C


O
c c

C N
C
C
0 C
C
C
C
C
C C
C C


(a) START


0 0



0 0 0
0 0 0 C N 0


0 0


) Step 3


(b) Step 1


(d) Step 5


sp3


(e) Step 6 (f) Step 7


01-


Qo


0 O

-010


(g) Step 8a


0
102 0
1020



10


1 N

0 o


(h) Step 8b


O o


\OQ


(i) Step 8c (j) END


Figure 3-14. Hybridization, Bond Order, and Formal C('!i ige Perception Using the Labute
Algorithm.


6N


N









as S(ml, m2, ...) where mi are the ring sizes. Take for example the following molecule

shown in Fig. 3-15(a) with all open .1. i-, 1i. nodes highlighted in Fig. 3-15(b) are removed

resulting in the structure shown in Fig. 3-15(c). Then all closed ..1. i- 1i. nodes are removed

as highlighted in figures 3-15(d) and 3-15(e). The structure is then separated into blocks

as shown in Fig. 3-15(f). The question then arises how many rings are there in the

current block as shown in Fig. 3-15(g)? Allowing the first node to be the root node,

numerous ring systems can be found including R' = {vI, v2, v3, v15, v13, vU14}, R

{Iv1, V2, V3, V15, v16, 10, V11, v12, v13, C14}, ..., R'. The closed path found containing the
root node is recursively searched until it can not be reduced further, in other words

R' is found as shown in 3-16. Once an irreducible closed path is found all nodes with

two links are removed. Nodes 2, 1, and 14 are then removed. The algorithm then picks

another root node and the next ring is found until all rings are found. Once all rings

are found in a molecule, an aromaticity test is applied. The algorithm used is in close

agreement with that published by Roos-Kozel, and Jorgensen in 1981 [168]. Rings are

classified as aromatic (AR), antiaromatic (AA), or nonaromatic (NA). A ring is assigned

to be nonaromatic if it contains no intra-ring double bonds, contains a quaternary atom,

contains more than one saturated carbon, contains a monoradical, or contains a sulfoxide

or sulfone. A ring system is aromatic if and only if it contains 4n+2 (n 0,1,2,3,4,...) pi

electrons (Hiickel rule) and is planar (10 0 tolerance). The number of pi electrons of a ring

is determine using the following rules: cationic carbon and boron contribute 0, saturated

heteroatoms give 2, an anionic carbon has 2, and atoms on intra-ring pi bond contribute 1.

If a rings contains exocyclic pi bond(s) (Carbon double bonded to a heteroatom), then 1

pi electron is removed. Some rings correctly perceived by this algorithm are shown in Fig.

3-17. All are perceived to be aromatric except for cyclooctatetraene (COT). COT contains

alternating single and double bonds but it is non-planar and is correctly assigned to be

antiaromatric.





























(a) Step 1


(c) Step 3 (d) Step 4


(e) Step 5


(f) Step 6


14 12




5 7

(g) Step 7


Figure 3-15. Ring Perception.


(b) Step 2



















14 12
13








5 7







2 10

14 1216
4 8







10 6






5 7



6 14 12
5 7 \13
1 11








15 10












S5 7










F3 163
5 7










Figure 3-16. Ring Perception Step 8.









The ring centroid, plane and normal are also calculated for uses in pharmacophore

matching and molecular alignment which will be discussed later in this chapter. The

centroid is calculated using Eq. 3-7 where k is the size of the ring and q, are the atomic

coordinates. The ring plane and normal are computed by carrying out the principal

component analysis of the Gram matrix as previously described in section 3.3. Matrix D is

evaluated, Eq. 3-8, and then diagonalized with the first two eigenvectors defining the ring

plane and the third being the ring normal.

k
o (37)
i=0
k
D = (q c)(q c) (3-8)
i=0


3.5 Addition of Hydrogen Atoms to Molecules

The addition of Hydrogen atoms to proteins, DNA or water molecules is carried

out using a predefined library. Small molecules with known atom hybridization, ring

information and bond orders are dealt with using the following algorithm. First, Hydrogen

atoms are added to polar (N, 0, and S) atoms, followed by ring systems, then terminal

atoms. All other unprotonated atoms are dealt with at the end. The number of Hydrogen

atoms to add is defined by the current valence and the ideal full shell value of the atom to

which the Hydrogen will be added.

The bond lengths used are defined in Table 3-7. The only distinction of type of atom

to which a Hydrogen is added is the element type, in other words the bond distance for a

Carbon sp3 or sp2 to Hydrogen is the same. The angle to which a Hydrogen atom is added

is defined either by the hybridization or type of the connecting atom or the type of bond

between the connecting atom and 1-3 atom as shown in Table 3-8. The dihedral angle

to add a Hydrogen atom is the most complex component of this algorithm. Suppose you

want to add a proton, A, on to atom B which is bonded to atom C and is 1 3 bonded










O
(a) Benzene (AR)

H

(c) Cycloheptatriene (AR)



(e) Pyridine (AR)



(g) Pyrimidine (AR)

NS

0
2-thioxo-2,3-dihydropyrimidin-4-one


COc
(b) Anthracene (AR)

H

(d) Cyclopentadienyl Anion (AR)


O-
(f) Thiophene (AR)



(h) Purine (AR)


NH
S
(j) imidazo-pyridine-3-thione (AR)


(k) Cyclooctatetraene (AA)

Figure 3-17. Ring structure which are correctly assigned aromatic (AR), non-aromatic
(NA) and anti-aromatric (AA).

to atom D. First a list is compiled of all torsional angles XBCD already occupied,

where X is any heavy atom bonded to atom A. The dihedral then used is defined by

the hybridization of atom B and built using atoms BCD. If B is sp3 hybridized then a

Hydrogen atom is placed 1200 from other bonded atoms. Dihedral angles of 0 and 1800

are used when B is sp2 and 1800 for sp hybridized. Aromatic rings are a special case of

sp2 hybridized atoms where only a torsion of 1800 is allowed. The dihedral values of polar

Hydrogens are optimized to maximize intra-molecular Hydrogen bonding using Eq. 3-9

where OD-H-A is the angle between the donor, Hydrogen and acceptor atoms and rH-A is

the Hydrogen-acceptor distance. If OD-A-AA or OD-H-A is greater than 900, or rD-A is less


(i)
(AR)









than 3.5A then no Hydrogen bond is considered.


EHB COS2 (OD-H-A) e(-(r-A-2.0)2) (3-9)



D-H

'A-AA

Figure 3-18. Hydrogen Bond.


Table 3-7. Hydrogen Bond Lengths.
atom Bond Length (A)
C 1.09
N 1.008
0 0.95
S 1.008
Se 1.10
Default 1.05


Table 3-8. Hydrogen Bond Angles.
atom / Bond Angle (Degrees)
sp / triple 180.0
sp2 / double 120.0
sp3 / single 109.47
Aromatic Ring ((360 ((ringSize 2) 180)/ringSize)/2)
Default 109.47


3.6 Conformational Sampling

Conformational searching of drug-like molecules in MTK++ is carried out using a

systematic approach. GAFF [160] atom types are assigned to the atoms in a particular

molecule using ANTECHAMBER and C' \2 charges are generated using the DivCon

program. The atomic hybridizations and bond orders defined in the hybridize class are

used to mark which single or double bonds are rotatable. If either of the atoms in a bond

are described as terminal then the bond is removed from the list of rotatable bonds. If

both of the atoms are members of a ring then the bond is also removed, thus removing









Table 3-9. Hydrogen Bond Dihedrals.
atom Dihedral (Degrees)
sp 180.0
sp2 0.0, 180.0
sp3 120.0
Aromatic Ring 180.0


ring flexibility. The incorporation of ring flexibility is planned in later releases of the

MTK++ code. Then for each rotatable bond that remains a torsion is sought after. The

total number of molecular conformations, Nc nformers, is then defined by Eq. 3-10, where

i is a rotatable bond index, R is the range of the associated torsion (0 3600), and 6 is

the rotation increment (1200 for sp2 sp3). The increment currently used are tabulated in

Table 3-10. Once the number and location, as shown in Fig. 3-19, of each rotatable bond

is determined a graph is formed as described in Fig. 3-20 [169]. Each rotatable bond is

defined as a li. r with each unique torsional value information contained in a vertex upon

this 1 -I, r. Graph edges are then defined between each vertex of one li-,r to every vertex

one l1-- r below. Once formed the graph is traversed and the AMBER energy, EMM, is

calculated for each conformer. The lowest energy conformers are stored, based on some

user provided criteria, for later use.

n
1con former i (3 10)
i= 1



Table 3-10. Dihedral Angles Available based on Bond Type.
Bond Type Angles
sp3 sp3 60, 180, 300
sp2 sp2 0, 30, 150, 180, 210, 330
sp2 sp3 0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330


In Fig. 3-21(a) we have an organic molecule which binds to the Peroxisome

Proliferator-Activated Receptor 7. This is a functional group rich molecule containing

phenyl rings, a carboxylate, a heterocycle (2,4,5-oxazole), a ketone, an amine, and an

ether rn. .i. iv. This structure has 12 rotatable bonds as shown in Fig. 3-21(b). Using
















Figure 3-19. Rotatable Bond Types.


Torsion 1: sp3 sp3


Torsion 2: sp3 sp2


0O 30 150 180 210 330
Figure 3-20. Systematic Conformational Searching. Torsion 1 forms the first liv.r
containing three values or vertices. Followed by l1i.r containing 12 vertices
and finally the third li.-r with six vertices. This graph would result in 216
conformers been formed.


the torsion resolution definitions in table 3-10 would lead to 4,353,564,672 conformers!

Even on modern computer hardware this number is too large. Taking a closer look at

this structure, Fig. 3-21(c), the symmetry of the functional groups becomes apparent.

For example, the carboxylate group, shown in green, is C2 symmetric since the negative

charge is not solely placed on one of the oxygen atoms and the phenyl group shown in

yellow is also symmetric. Removing these symmetric torsions from the total number

results in 544,195,584 conformers. C'!. m.il knowledge of the torsional profile between

the phenyl and oxazole groups enclosed in the red oval -,.-.I- I- even fewer available

torsions due to conjugation. Thus reducing the total number of discrete conformers

to 181,398,528. MTK++ attempts to reduce this number even further by recognizing



















(a) PPAR-y Agonist (b) Number of Rotatable Bonds


0


HN /\ \\ 6
N 1 20 0 H 67

8



(c) Symmetric Regions (d) Reduced number of Rotatable Bonds

Figure 3-21. Conformer Generation.


pl 'i!ii. [, 1 fi Si, ii with known torsion profiles. The tyrosine-like group enclosed in

the blue oval is one such fragment and is stored in the "cores" library of the package.

Removing these rotatable bonds results in 419,904 conformers. In Fig. 3-21(d) highlights

the rotatable bonds: green represents unrestricted, blue are restricted by symmetry, while

red bonds are frozen.

The systematic approach works extremely well for molecules with approx. 12

rotatable bond or less. When the number of potential conformers exceeds two million

the searching algorithm reverts to using the GA library of MTK++. During a GA search

of conformational space the user is required to provide the maximum number of MM

calculations which are allowed. Other searching tools such as MD and MD-LES are

recommended for large peptidic molecules that bind certain proteins such HIV Protease

and Endothiapepsin.









3.7 Substructure Searching/ Functionalize

To functionalize a molecule involves searching it for chemical substructures.

Substructures searching is known as the subgraph isomorphism problem of graph theory

and belongs to the class of NP-complete computational problems. Due to the NP-complete

nature of substructure searching usually a screen is carried out to eliminate subgraphs that

cannot be contained in the molecule. The fingerprint class in the molecule library carries

out this screening process between fragments and a molecule.

The brute-force algorithm for subgraph isomorphism begins by generating the

.,1i] i'ency matrices A and B for the fragment and the molecule containing PA and PB

atoms respectively. Then an exhaustive search involves generating PB!/[PA!(PB PA)!]

combinations of PA and determining whether any combinations are matches to a portion

of the molecule. The algorithm used is in close agreement with that published by Ullmann

[170], and Willett, Wilson, and R. dd l li-, [171]. Ullmann first noticed that using a

depth-first backtracking search dramatically increases efficiency, while Willett used a

labeled graph and a non-binary connection table to increase algorithm speed.

The functionality of finding substructures in molecules was developed to carry out

functional group alignment of drug-like molecules and to optimize fragment positions in

drug-protein complexes during the lead optimization stage of the drug design process.

The algorithmic details of the functionalize code are as follows (this example was

adapted from Molecular Modelling, Principles and Applications 2nd Edition by Andrew R.

Leach [81]). Take for example a fragment and molecule shown in Fig. 3-22(a) and 3-22(b).

The corresponding .,l1i iency matrices are shown in Eq. 3-11 and 3-12. The Ullman

algorithm tries to find the match between the fragment and the molecules (Fig. 3-22(c)).

Mathematically this is represented as the matrix A, Eq. 3-13, which satisfies A(AM)T as

shown in Eq. 3-14.











2


3 1 6

(a) Fragment Structure (b) Molecule Structure

2
4


I I
Si 5

2 4'

1/ 6

(c) Alignment

Figure 3-22. Ullman Subgraph Isomorphism Illustration.


/0 1 0 0o
0100

1 0 1 0
F (3-11)
0 1 0 1
0101

0010

010000

101100

010000
M = (312)
010000

000100

000100

001000
010000
1 0 1 1 0 0

0 1 0 0 0 0





A = (3-13)
0 1 0 0 0 0





000100

000 00
00 1 0 0 0

0 1 0 0 0 0
A (3- 13)
0 00 1 0 0

0 0 0 0 0 1










0100

0 0 1 0 0 0\ 1 0 1 0 0 1 0 0


010000 0100 1010
0 0 0 1 0 0 0 1 0 1 0 1 0 1



001 0

This depth-first backtracking algorithm uses a General match matrix, M that

contains all the possible equivalences between atoms from A and B. The elements of this

matrix, mij(1 < i < Pa; 1 < j < Pb) are such that:


m 1 if the ith atom of A can be mapped to the jth atom of B, (3

0 otherwise.

The Ullmann heuristic states that "if a fragment atom ai has a neighbor x, and a

molecule atom bj can be mapped to ai then there must exist a neighbor of bj, y, that can

be mapped to x" and is mathematically written in Eq. 3-16.


mij = V x(1...PA) [(ai= 1) = 3 y(1...PB)(mybjy= 1)] (3-16)


If at any state during the search an atom i in A such that mj = 0 for all atoms in B

then a mismatch is identified as defined in Eq. 3-17 and the match is discarded.


mismatch = i(1...PA)[(my = 0 V j(1..PB)] (3-17)


The complete algorithm to perceive the functional groups in a molecule in described

using pseudo code in Algorithm 3.1. The algorithm begins by reading user created

fragment libraries and molecules which are to be studied. Fingerprints of each fragment

and molecule are then created. For each molecule under consideration rings, atom

hybridizations, bond orders and formal charges are assigned using the algorithm previously









described. If Hydrogens are not present on the molecule then they are added using an

algorithm described later in this chapter.

Algorithm 3.1: Functionalize Algorithm.
Data: Fragment Libraries and MDL Files
Result: Functional group assignment
begin
Read Fragment Libraries;
Read in molecules to functionalize;
Generate fingerprints for all fragments;
for i -- nMolecules do
Determine Rings;
Perceive Hybridizations, Bond Order, and Formal C('!i ge;
Add Hydrogens;
Generate fingerprint;
for j -- nFragments do
bool bMatchl = Compare molecule to simple fingerprint (Screening);
if bMatchl then
bool bMatch2 = Match fragment to molecule using the Ullmann and
Willett algorithm of subgraph isomorphism;
if bMatch2 then
| assign fragment code to molecule;
end
end
end
end
end


Then for each fragment store in memory its fingerprint is compared to that of

the molecule. If there is a fingerprint match then the Ullmann/Willett algorithm is

invoked. The subgraph isomorphism algorithm is outlined in Appendix A. If the subgraph

isomorphism algorithm results in a match then the fragment code is assigned to the

molecule.

3.8 Clique Detection/ Maximum Common Pharmacophore

As outlined in Ch. 2.12 a 3D molecular clique is defined as a group of pharmacophore

points and the geometric distances between all points in that group. Fig. 3-23 illustrates

the clique detection algorithm implemented in MTK++ [172]. Take for example two

estrogen receptor ligands (PDBID: 1ERR and 3ERT) and finding the pharmacophore









points such as Hydrogen bond acceptor/donor, positive/negative charge centers,

hydrophobes, rings, and ring Normals as shown in Fig. 3-23(b) certain molecular features

can be mapped to one another, as shown in Fig. 3-23(c). The mapping, Eq. 3-18, results

in a valid clique because the inter point distances, Eq. 3-19, are compatible within some

tolerance. However, adding the mapping M = [F1 +- F2], does not result in a valid

clique as d, 96 dc,. The clique detection algorithm thus requires a method of pruning

a potentially large set of mapping which is carried out by allowing each to be a seed

and growing cliques using heuristic criteria. Obviously certain seed mappings will lead

to equivalent cliques however a diverse set are often found. Cliques are then scored or

ranked to determine the best overall matching using Eq. 3-20 where D are the inter-point

distances and d' is a distance between two features of molecule 1 and df is the distance

between equivalent features in molecule 2. The function for the two mapping A1 +- A2

and B1 B2 reaches a value of 1.0 when dB = dAB. The parameter dmax, controls how

rapidly the match score drops off as the distances becomes less compatible.


P = [Ai +- A2, B1 B2, I1 --C2, D1 D2, El E21 (318)


D 1 2
[dAIB ~dAB, C ~ c dCI dCD d D, dDE d DE .. (3-t9)
D df~ f21 (3 20)
Score = exp d(3-20)


3.9 Superimposition

Molecular superimposition is carried out using a rigid body least squares procedure

from Kearsley [173] and Kabsch [174, 175]. The rotation matrix to minimize the sum

of the squared distances between atoms of two molecules, Eq. 3-21 is solved using

quaternions and eigen methods as described by Kearsley in 1989.


F x x 2 (321)
i


























(a) Estrogen Ligands (PDB: 1ERR and 3ERT.)


(b) C'I. m.i. .1 Features Highlighted


(c) C'hI. i... I Feature Mappings


Figure 3-23. Clique Detection Illustration.









A requirement of this procedure is that atom i in molecule A corresponds to atom i

in molecule B. For example if you wanted to measure the rmsd between two benzoic acid

conformers as shown in figures 3-24(a) and 3-24(b) would require a certain correspondence

to remove artificial differences attributed to automorphisms or self-symmetry. This is

carried out by generating all matching of non-Hydrogen atoms by type or element kind

and assigning the lowest rmsd as the true value [176].


03 01
2 5 2 9
O 4 6 0 8
1 3
9 /7 5 /7
8 6
(a) Conformer 1 (b) Conformer 2

Figure 3-24. Illustration of the requirement of atom correspondence for molecular
superposition.


3.10 Conclusions

This chapter has outlined the design and development of a C++ package called

MTK++. This package contains functions to handle molecular structures ranging from

proteins to small molecules, that may be utilized to calculate molecular mechanics

energies and gradients, to perceive atom hybridizations, and evaluate bond orders, formal

charges, rings and functional groups. Utilities to add Hydrogen atoms to structures have

been developed; this code was created to deal with metalloprotein systems where no

other software could satisfactorily do so. MTK++ also has the capability to perform

conformational searching of drug molecules using a systematic approach where the

molecular mechanics code was a prerequisite. There is an obvious limitation to this

approach, that is when the number of rotatable bonds increases the method becomes

intractable. Tricks to improve this such as tree pruning and creating a torsional type

library could be implemented. Also an algorithm to perform clique detection of molecular









features was implemented to superimpose two molecular species on to one another for use

in ligand and receptor based drug design. Additionally, the MTK++ package contains

other general purpose libraries for parameter optimization, graph utilities, and statistical

methods.

MTK++ was developed with algorithms from the literature but this is a firm

foundation to further develop new and novel tools for drug design and metalloprotein

modeling. The remaining chapters of this thesis would not have been possible without this

software. C'! Ilpter 4 utilizes MTK++ conformational searching methods to flexibly align

over 80 small molecules which bind various receptors onto one another. While in C'!I pter

5 MTK++ was used to efficiently model metalloproteins, in particular Zn proteins. Force

fields for tetrahedral Zn proteins were generated using MTK++ where previously such

work would have been time consuming and prone to error if attempted by hand. The

chapters further foster the growth of a bridge between the development and application of

code applicable to biological problems.









CHAPTER 4
SEMIFLEXIBLE QUANTUM MECHANICAL ALIGNMENT
OF DRUG-LIKE MOLECULES

4.1 Introduction

The placement of drug molecules into the active sites of receptors remains a

challenging problem in the drug design field [177]. Docking [4, 178] is the method of choice

when there is a 3D structure of the receptor while template forcing or superimposition of

a structure on to a known active molecule is used when no such structure is available

[179, 180]. There have been over 20 years spent developing the tools necessary to

align molecules on top one another. Most of these methods were conceived for use in

ligand-based drug design (LBDD) where no receptor structure is available, such as targets

that are membrane proteins. Table 4-1 summarizes all alignment approaches from 1986 to

the present where methods can be distinguished by treatment of conformational flexibility,

optimization or superposition algorithm used, and the similarity metric between the

two structures [181]. There are three types of flexibility encountered in these methods,

the first is rigid body alignment, the second is described as semiflexible, and finally

flexible alignment. The semiflexible alignment describes a technique of performing a

conformational search of a ligand independent of the alignment algorithm, while fully

flexible alignment tools will perform both these task at the same time.

The SE-COMBINE approach introduced in Ch. 2.11 decomposed the interaction

energies of a series of inhibitors that bind trypsin [9]. That implementation of SE-

COMBINE contained a number of deficiencies including the neglect of solvent and

dispersive effects, ligand conformational flexibility, and from a modeling standpoint it

required the manual placement of inhibitors into the active site and the structures were

fragmented by hand. This chapter describes the Conformationally unlimited Template

forced Interaction energy biased Pharmacophore (CuTieP) program which was developed

using MTK++ to flexibly align ligand structures into receptor active sites using a clique

detection algorithm to produce trial alignments which were ranked using a semiempirical









(SE) score function. This hypothesis goes against the norm of Docking and molecular

alignment. Here we propose using a LBDD method to generate poses for receptor based

drug design (RBDD) scoring and 3D receptor QSAR. However, since the SE-COMBINE

method currently can only be considered applicable for modeling a series of congeneric

compounds and a receptor this assumption can be considered valid.

Table 4-1. Compound Alignment Literature. This table was adapted from Melani et al.

[182] where the alignment methods from 2003 to the present were added.


Program

Name

Sheridan[183]



SEAL[184]

ASP[185]

DISCO[186]



MSC[187]



TORSEAL[188]

COMPASS[189]

GASP[190]



AAA[191]



TFIT[192]


Similarity

Criteria

distance geometry of

pharmacophore

electrostatic and steric

MEP (Hodgkin function)

pharmacophore

points

physicochemical

properties


Optimization/

Superposition





RFO

simplex

combinatorial

search

BFGS


surface description neural nets

intermolecular matching GA

Energy

distance map of combinatorial

pharmacophore points search

inter/intra molecular MC and

Energy line search

Continued on next page.


Mode

flexible



rigid

rigid

semiflexible



semiflexible


flexible

semiflexible

flexible



flexible



flexible










Table 4-1. (continued)


Program

Name

PLM[193]

Petitjean[194]

Grant [195]



FLEXS[196]



McMahon[197]



Coss6-Barbi[198]

MIMIC[199]



QUASIMODI[200]



Parretti[201]



Cocchi[202]



De Rosa[203]



Handschuh[204]



3DFS[159, 205]


Similarity

Criteria

surface overlap volu:

electronic properties

vdW Volume

(GFs)

interaction

fields (GFs)

electrostatic

potential (GFs)

pattern in 3D Space

steric and electrosta

fields (GFs)

electron density

with GFs

steric and electrosta

fields (GFs)

MEP, size and

and shape descripto:

Euclidean distance i

Hi-PCA space

geometric fit



pharmacophore

Continued


Optimization/

Superposition

SA

gradient method

analytic

derivatives

combinatorial

search

gradient

method

stepwise approach

SD or NR



simplex



MC



simplex


GA and

,-I-N- .- ton

GMA, Powell

on next page.


Mode

semiflexible

rigid

rigid



flexible



rigid



rigid

semiflexible



rigid



rigid



rigid



rigid



flexible



flexible










Table 4-1. (continued)


Optimization/

Superposition


RigFit[206]



SQ[207]

Klebe[208]



MIPSIM[209]



Co-:,..v. [210]

MutliSEAL[211]



TGSA[212, 213]

Labute[214]

SLATE[215]



FLASHFLOOD[216]

AUTOFIT



QSSA[106]

fFlash[217]

FIGO[182]


points

pharmacophore

points (GFs)

SQ type

pharmacophore

points (GFs)

MEP



local surface shape





Topo-Geometrical

atom properties

distance matrix for

H-bonding and aromatics

comma descriptors

pharmacophore

points

ASA

pharmacophore points

field interaction and

geometric overlap

Continued on nex


CQI -i-N. .--ton



simplex

Q 1 i-i-N. .--ton



gradient

approach

clique detection







modified RIPS

SA



cluster method

combinatorial

search

GA and simplex

clique-based

simplex


semiflexible



semiflexible

semiflexible



semiflexible



rigid

multiple

flexible

flexible

flexible

flexible



flexible

flexible



semiflexible

flexible

rigid


:t page.


Program

Name


Similarity

Criteria


Mode










Table 4-1. (continued)


Program

Name

FLUFF[218]

BRUTUS[219, 220]


Similarity

Criteria

vdW and electrostatic

charge distribution

and vdW, grid-based


Optimization/

Superposition


FLAME[221] MCP GA and BFGS flexible

GMA[222] MCP gradient-based flexible

torsion space

MEP: Molecular Electrostatic Potential, PCA: Principal Component Analysis,

vdW: van der Waals, GFs: Gaussian Functions were used, GA: Genetic Algorithm,

RFO: Rational Function Optimization, RIPS: Random Incremental Pulse Search,

BFGS: Broyden-Fletcher-Goldfarb-Shanno, SD: Steepest Descent,

NR: Newton-Raphson, MCP: Maximum Common Pharmacophore, MC: Monte Carlo,

ASA: Atomic Shell Approximation, SA: Surface Area


4.2 Implementation

The CuTieP approach can be divided into three key areas. The first is the generation

of a set of conformers for the query structure that is to be aligned onto a stationary

molecule called the target. Then each conformer is aligned onto the target structure and

finally the similarity between the two molecules is determined.

4.2.1 Ligand Conformational Searching

Conformational techniques to reproduce the bioactive conformation of small

molecules can be divided into two: deterministic or systematic and stochastic. The former

exhaustively enumerates all conformers by defining rotatable bonds and discrete torsional

angles as in the MIMUMBA program [223, 224]. The latter explores conformational


Mode

flexible

rigid









space using molecular dynamics, genetic algorithm [225, 226] or Monte Carlo techniques.

There are pro and cons for both categories; the systematic approach is certain to sample

all conformational space but the search space grows exponentially with the number of

rotatable bonds. And so for large flexible molecules the stochastic approaches are favored.

Various commercial packages for the conformational searching are available including

SPE, Catalyst, Macromodel, Omega, MOE, and Rubicon which were recently reviewed

by Agrafiotis et al. [227] where SPE and Catalyst were more effective in sampling

conformational space. The key point to note when performing conformational searching

is the requirement of finding the bioactive conformation, though most often this does

not correspond to the global energy minimun [223, 228-232]. And so for this study to

investigate the use of SE methods in molecular superposition a systematic approach was

chosen in order to ensure that the bioactive conformer was found within some tolerance.

4.2.2 Structural Alignment and Clique Detection

If two structures contain at least three pairs of reference points then they can

be aligned onto one another by minimizing the sum of the squared distances between

pairs as described in Ch. 3.9. This rigid body least squares procedure from Kearsley

[173] and Kabsch [174, 175] generates a rotation matrix using quaternions and eigen

methods. A clique detection algorithm described in Ch. 3.8 is employ, ,- to generate a

set of correspondences between the two molecules in question which has previously been

shown to be an efficient technique of producing alignments [186, 221, 222]. Each set of

reference points or clique produces trial alignments using the Kearsley algorithm. The

clique detection algorithm uses a score function shown in Eq. 4-1 where d' is the distance

between two pharmacophore features in molecule 1 and df is the distance between two

equivalent features in molecule 2. The parameter Admax controls how rapidly the match

score drops off as the distances becomes less compatible. The features used in this clique

detection algorithm include hydrogen bond acceptor/donor atoms, positive/negative









charge centers, hydrophobic groups, rings, and ring normals.


ClqScore exp ( (41)


4.2.3 Semiempirical Similarity Score

The trial alignments from the previous step are scored using a semiempirical function

implemented in the QMALIGN program from Dixon and Merz [233]. The QMALIGN

approach is dissimilar to all other quantum similarity and alignment programs where

instead of aligning based on the density, p(r), as described by Carb6 using Eq. 4-2, this

program aligns structures based on their wavefunctions I(r). The Carb6 approach [106]

matches the overall size and shape characteristics of molecules; however, there is no phase

information in p(r) thus any overlap contributes positively to ZAB even though orbitals

may be orthogonal. Alignment based on molecular wavefunctions or more precisely

frontier orbitals takes into account both phase and orbital information. In this application

of QMALIGN only the score function (Eq. 4-3) is used where k and k' are the mapped

Molecular Orbitals (\ Os) from each molecule and ', and Ck are the MOs and energies.

The parameter ACmax is similarly defined as Admax was above. The similarity between two

molecules, A and B, was then calculated as a Carb6 index, SEScore, as shown in Eq. 4-4.


ZAB J PA(r)PB(r)dr (4-2)


ZAB = exp k- (r) ) (r)dr (4-3)
k,k'
k' L v Axz AB) k k
SEScore ZAB (44)
V/ZAAZBB
The complete CuTieP algorithm is outlined in pseudo code in Al. 4.1 where the user

provides two molecules and a fragment library outlined in Appendix C. Pharmacophore

points are assigned to each molecule using the substructure searching tool within MTK++

and the fragment library provided. A torsional based conformational search of the flexible

molecule using the torsional resolutions outlined in Ch. 3.6 is carried out where the lowest









energy conformers are stored based on some energy cutoff, dConf. Then each of these

conformers is aligned onto the template structure using the clique detection algorithm

described above. The user defines the maximum number of cliques per conformer as

nMaxCLiques, and the total number of trial alignments from all conformer/clique

combinations as nTotalMCP. If a target structure is available then a MM interaction

energy is calculated between the trial alignments and the reference's receptor structure

with nMM stored, thus eliminating all unreasonable structures. This step is followed by

determining the SE similarity score of the flexible and template molecules using the AM1

[73] Hamiltonian with a total of nSE alignment being saved for later use.

4.3 Results and Discussion

The goal of this study was to investigate the performance of the CuTieP alignment

approach to reproduce < i I 11. -.graphic binding geometries of ligands in receptor active

sites observed in the Protein Data Bank (PDB) [15].

4.3.1 Data Set

To evaluate the CuTieP alignment approach 84 < i --I 1 structures of protein ligand

complexes were downloaded from the PDB as outlined in Table 4-2. This set contains 12

unique receptors including Carboxypeptidase A, Glycogen Phosphorylase, Immunoglobin,

Streptavidin, Dihydrofolate Reductase (DHFR), Thrombin, Trypsin, Estrogen Receptor

(ER), Peroxisome Proliferator-Activated Receptor 7 (PPAR7), Human Carbonic

Anhydrase II (HCA II), Elastase, and T!. i ii"..!li- i This data set resembles the one

used to validate the FLEXS [234] program; however, Concanavalin, Endothiapepsin,

HIV-Protease, Fructose Bisphosphatase, and Human Rhinovirus receptors were omitted

due to ligand size and flexibility. The ER, PPAR7, and HCA II ligands are used here but

were not in the FLEXS data set. The various ligands which bind certain receptors are

labeled using the lowercase form of the PDB-ID corresponding to the complex structure.

The data set was split into two where the ligands in the first portion were flexibly aligned

whereas the remaining ligands were rigidly aligned.














Algorithm 4.1: Flexible Alignment of Drug-like Molecules
Data: Fragment Libraries, Template and Flexible Files
Result: Flexible Molecule/s Aligned onto Template Molecule
begin
dConf 20 30kcal/mol;
nMaxCLiques <- 10;
nTotalMCP <- 10000;
nMM <-- 1000;
nSE <- 100;
Admax -- 0.5;
Read Fragment Libraries;
temp <- Read Template Molecule;
flex <- Read Flexible Molecule;
findFunctionalGroups(temp);
findFunctionalGroups (flex);
assignPharmacophorePoints (temp);
assignPharmacophorePoints (flex);
totalConformers <- confSampler(flex);
nConformers <- confAnalysis(flex, dConf);
for i -> nConformers do
nC'.:, <-- MCP(Template, Conformeri, nMaxCliques);
for j nCliquesi do
rotMat <- Superimpose (Pharmacophore);
alignedConformerij -- rotMat Conformeri;
end
end
store best nTotalMCP alignments;
for i -> nTotalMCP do
Calculate Ef$ i(receptor, alignedCon formeri;
end
store best nMM alignments;
for i -> nMM do
Calculate SE,7aign (temp, alignedConf ormer);
end
finalAlignments <- CuTieP();
end









Table 4-2. Protein-Ligand Data Set.

Receptor PDB-ID
Flexible Alignment
Carboxypeptidase A 1CBX, 2CTC, 3CPA
Glycogen Phosphorylase 1GPY, 3GPB, 4GPB, 5GPB
Immunoglobin 1DBB, 1DBJ, 1DBK, 1DBM, 2DBL
Streptavidin 1SRE, 1SRF, 1SRG, 1SRH, 1SRI, 1SRJ
Dihydrofolate Reductase 1DHF, 4DFR
Trypsin 1PPH, 1TNH, 1TNI, 1TNJ, 1TNK, 1TNL, 3PTB
Estrogen Receptor 1ERR, 3ERT
Peroxisome Proliferator- 1FM6, 1FM9
Activated Receptor 7
Human Carbonic Anhydrase II 1A42, 1BN1, 1BN3, 1BN4, 1BNM, 1BNN, 1BNQ, 1BNT,
1BNU, 1BNV, 1BNW, 1CIL, 1CIM, 1CIN, 1CNX, 1EOU,
1G1D, 1G52, 1G53, 1G54, 1I8Z, 1190, 1191, 1IF4,
1IF5, 1IF6, 1IF7, 1IF8, 1IF9, 1KWQ, 1KWR, 10KL,
10KN, 10Q5, 1TTM, 1XPZ, 1XQO, 1YDA, 1ZE8
Rigid Alignment
Thrombin 1DWC, 1DWD
Elastase 1ELA, 1ELB, 1ELC, 1ELD, 1ELE
Tl, i.ik, -.1, 1TLP, 1TMN, 2TMN, 3TMN, 4TLN, 4TMN, 5TMN


Each complex structure was broken into two, a receptor and ligand, with all co-factors

and water molecules removed. The ligand's relative orientation in space was retained.

The Labute algorithm [142] within MTK++ was used to perceive atom hybridizations,

bond orders, and formal charges of the ligands. Hydrogen atoms were added to both the

receptor and ligand structures using the protonate functions also build into MTK++.

GAFF [160] atom types and C' \! [123] charges were assigned to the ligands using the

antechamber [166] and DivCon [112] programs. The locations of the rotatable bonds and

the functional groups of each ligand were determined using the tools within MTK++

as described in Ch. 3. Each functional group within the MTK++ package contains

pharmacophoric information required for the clique detection algorithm above to generate

trial alignments. The atom and bond types, charges, and feature information is stored in

xml format for use with the CuTieP program.









To determine the performance of an alignment approach requires the definition of a

reference state. For each receptor example the structures therein were taken in turn and

all other complexes superimposed onto it by minimizing the RMSD between the peptide

backbone atoms. Thus for each receptor a set of ligands was aligned into its active site

which describes their reference state or ideal alignment.

Each pair of ligands in all receptor classes was superimposed onto one another, except

for HCA II where 1A42 was only used as a target. After each alignment the root mean

squared deviation (RMSD) between the query structure and the reference state was

calculated. Within the calculation of the RMSD an atom type correspondence between

atoms of the aligned query and ideal structures was determined. This procedure prevented

artificially high RMSD values due to automorphisms as described in Ch. 3.9. And so this

procedure results in an NxN matrix of alignments, where N is the number of complexes

for each receptor class. Nevertheless these matrices are not symmetric as the results

depend on which ligand is used as the target. The lowest RMSD value from the top ten

alignments is report in the tables and figures below. Also shown in the output tables are

the number of conformers sampled, stored, and RMSD from the bioactive conformation.

The ClqScore and SEScore values are also plotted using so-called levelplot's using the R

program [235] to graphically view the alignment results.

The overall performance of CuTieP to reproduce binding geometries of ligands in

complex with a receptor is shown in Table 4-3. It is considered a correct alignment when

the RMSD value between the pose and the reference structure is below 1.5A [196], while

under 2.5A RMSD represents the correct orientation and conformer of the query, and

finally RMSD above 2.5A are considered to be misaligned. A total of 219 alignments

were carried out and in 48.9' of the cases a satisfactory result was achieved. 64 of

the alignments were in the correct orientation; however, 35."' would be regarded as

misaligned. Each receptor is outlined in more detail below.









Table 4-3. Statistics of CuTieP Performance.
N
< 1.5A < 2.5A < 3.5A Total
Carboxypeptidase A 2 3 6 6
Glycogen Phosphorylase 7 8 11 12
Immunoglobin 7 13 15 20
Streptavidin 25 27 30 30
Dihydrofolate Reductase 0 1 1 2
Trypsin 12 20 28 42
Estrogen Receptor 1 1 2 2
PPAR7 0 0 2 2
Human Carbonic Anhydrase II 22 33 37 39
Thrombin 2 2 2 2
Elastase 8 8 11 20
Thermolysin 21 26 31 42
Total 107 142 174 219
48.9 64.8 79.5


4.3.2 Carboxypeptidase A

Three Carboxypeptidase A ligand complexes were used in this study including

L-benzyl-succinate (lcbx, Fig. 4-1(a)), L-phenyl-lactate (2ctc, Fig. 4-1(b)), and glycyl-L-

tryrosine (3cpa, Fig. 4-1(c)). The ligand structures of 2ctc and 3cpa when 2CTC and

3CPA are aligned onto 1CBX are shown in Fig. 4-1(d) and a key point to note is that

the phenyl moieties do not overlap. The carboxylates (closer to the phenyl group) of

all three structures are aligned onto one another which forms a strong intermolecular

interaction with an Arginine residue of the protein, while the tail group binds a Zinc atom.

The conformation analysis and pair alignments are outlined in Table 4-4. In all three

cases conformers were generated which resemble the bioactive conformation (0.5A RMSD

from the bioactive conformation) [223]. The conformational searching of lcbx is further

described in figures 4-2(a), 4-2(b), and 4-2(c). The plot of MM energy versus RMSD from

the bioactive conformation indicates that indeed the bound geometry is not the global

energy minimum using GAFF. The Euclidean Distance (ED) is the RMSD of torsional

space between conformers and the bioactive structure. The plot of ED versus MM energy

shows the conformers are clustered between 100 and 250 ED while the variation of the the









RMSD versus SD is shown in Fig. 4-2(c). The best RMSD of the top 10 poses between

the query structure and their associated reference alignment are also outlined in Table

4-4. When lcbx is used as the reference or query poor alignments are generated when

using the SE scoring function, though when either 2ctc or 3cpa are used RMSDs of below

1.5A are found. This is a disappointing result as the levelplot diagram of the ClqScores

for this receptor, shown in Fig. 4-3(a), generated trial alignment under 1.0A but the

SE score function did not score these the highest as shown in Fig. 4-3(b). The most

probable reason for the discrepancies can be attributed to the fact that the phenyl group

is the not the most important pharmacophore feature in the set. And so since the SE

scoring function is a frontier orbital approach the greatest similarity between pairs of these

molecules would be obtained where the phenyl groups overlap.

Table 4-4. Carboxypeptidase A Ligand Alignments.
Query
lcbx 2ctc 3cpa
Rotatable Bonds 5 3 6
Conformers Sampled 1944 108 '.22'11
Conformers Stored 126 70 6767
Conformers < 1A 11 30 807
Conformers < 2A 115 40 3201
lcbx 0.00 2.21 2.74
Target 2ctc 2.38 0.00 1.01
3cpa 2.05 1.24 0.00


4.3.3 Glycogen Phosphorylase

The ligands alpha-d-glucose-6-phosphate (lgpy, Fig. 4-4(a)), alpha-*d-glucose-1-

phosphate (3gpb, Fig. 4-4(b)), 2-fluoro-2-deoxy-alpha-*d-glucose-l-phosphate (4gpb, Fig.

4-4(c)), and alpha-*d-glucose-l-methylene-phosphate (5gpb, Fig. 4-4(d)) which bind to

Glycogen Phosphorylase were used in this study. The conformational analysis and SE

pair alignment scores are outlined in Table 4-5. Meaningful alignments when lgpy is used

as a target or query cannot be generated as this ligand binds in a different pocket as

shown in Fig. 4-4(e); however, these alignments were carried out to generate a sense of the










OH



C OH
OH
SHN
O OH 0
OH
0 HO 0 NH2
(a) lcbx (b) 2ctc (c) 3cpa





















(d) 2CTC and 3CPA aligned onto 1CBX

Figure 4-1. Carboxypeptidase A Ligands.


predicted error in this approach. All other pair alignments produced excellent agreement

using both the ClqScore and SEScore with the observed structural superimpositions from

the PDB as highlighted in figures 4-5(a) and 4-5(b).

4.3.4 Immunoglobin

The ligands progesterone (ldbb, Fig. 4-6(a)), aetiocholanolone (ldbj, Fig. 4-6(b)),

5-beta-androstane-3,17-dione (1dbk, Fig. 4-6(c)), progesterone-11-alpha-ol-hemisuccinate

(Idbm, Fig. 4-6(d)), and 5-alpha-pregnane-3-beta-ol-hemisuccinate (2dbl, Fig. 4-6(e))

which bind to Immunoglobin were used in this study. The ligands ldbj and ldbk are

cholic acid derivatives while ldbm, 2dbl, and ldbb are steroidal molecules. The ideal























00 05 10
RMSD (A)


(a) MM Energy vs RMSD


s o o ou
o

0 5 10 15 20 25
Euclidean Distance (Deg)

(b) MM Energy vs Elucidean Distance


0 5 10 15 20
Euclidean Distance (Deg)


(c) Elucidean Distance vs RMSD

Figure 4-2. 1CBX Conformer Analysis.



Table 4-5. Glycogen Phosphorylase Ligand Alignments.


Rotatable Bonds
Conformers Sampled
Conformers Stored
Conformers < 1A
Conformers < 2A


Target


Igpy
3gpb
4gpb
5gpb


Igpy
3
27
16
10
6
0.00
1.43
1.52
3.18


Query
3gpb 4gpb
3 3
27 27
16 19
12 14
4 5
3.66 3.19
0.00 0.38
0.42 0.00
0.82 0.52


5gpb
3
27
14
10
4
3.92
0.84
0.64
0.00












9 9
8 8
3CPA 7 3CPA
6 6
6 5 5
2CTC 2CTC
0 4 0 4
3 3
1CBX 2 1CBX 2
1 1
0 0

Reference Reference

(a) ClqScore (b) SEScore


Figure 4-3. Carboxypeptidase A Alignment Results. Reference or target structures are on
the x-axis with the query structures on the y-axis. The best RMSD value of
each pose alignment compared to the ideal alignment are given with an RMSD
scale in A.


alignments of ldbj, ldbk, ldbm, and 2dbl for the ldbb target structure are shown in Fig.

4-6(f). No ring flexibility was allowed in this study and so only the bioactive conformation

of ldbj and ldbk were used as indicated in Table 4-6. This receptor produced the largest

difference between the top poses predicted by ClqScore and SEScore as shown in Fig.

4-7(a) and 4-7(b) respectively. The alignment of ldbm and 2dbp onto ldbj and ldbk

produce large RMSD values using the SEScore even though the clique detection algorithm

produces an alignment which visually appears more reasonable. In fact the top poses are

aligned perpendicular to the plane of the target structures which could be put down to the

differences in the stereochemistry of the two subsets. The alignments of ligands within the

same subset produce poses that are in good agreement with the reference states.

4.3.5 Streptavidin

Streptavidin ligands including 2-((4'-hydroxyphenyl)-azo) benzoic acid (1sre,

Fig.4-8(a)), 2-((3'-tertbutyl-4'-hydroxyphenyl)azo) benzoic acid (Isrf, Fig.4-8(b)),

2-((3'-methyl-4'-hydroxyphenyl)azo) benzoic acid (lsrg, Fig. 4-8(c)), 2-((3', 5'-dimethoxy-4'

-hydroxyphenyl)azo) benzoic acid (lsrh, Fig.4-8(d)), 2-((3', 5'-dimethyl- 4'-hydroxyphenyl)azo)










OH

0" OH



HO OH
OH

(a) 1gpy

OH

0 K OH


(c) 4gpb


(d) 5gpb


(e) 3GPB, 4GPB and 5GPB aligned onto 1GPY


Figure 4-4. Glycogen Phosphorylase Ligands.


benzoic acid (Isri, Fig. 4-8(e)), and 2-((4'-hydroxylr, -ih lrl')-azo) benzoic acid (lsrj, Fig.

4-8(f)) were used in the validation of CuTieP. All ligands contain the azo-benzoic acid

core and vary at the iminophenol group as shown in Fig. 4-8(g). All ideal alignments were

predicted with good agreement using both the ClqScore and the SEScore except for the

query lsrf and target Isri as shown in Table 4-7 and figures 4-9(a) and 4-9(b). This high

RMSD can be associated to the fact that there are two valid alignments available in a


(b) 3gpb

OH

0 KOH












5GPB


1GPY


Reference


(b) SEScore


Figure 4-5.


Glycogen Phosphorylase Alignment Results. Reference or target structures are
on the x-axis with the query structures on the y-axis. The best RMSD value of
each pose alignment compared to the ideal alignment are given with an RMSD
scale in A.


Table 4-6. Immunoglobin Ligand Alignments


Rotatable Bonds
Conformers Sampled
Conformers Stored
Conformers < 1A
Conformers < 2A


Target


LBDD sense; the t-butyl group of lsrf may be placed on to either methyl rn. i. iv of Isri.


4.3.6 Dihydrofolate Reductase

The ligand molecules dihydrofolate (ldhf, Fig. 4-10(a)) and methotrexate (4dfr,

Fig. 4-10(b)) which bind to Dihydrofolate Reductase were used to validate the CuTieP

approach with the alignment of 4DFR onto 1DHF shown in Fig 4-10(c). Both ligand

structures are quite large, have many rotatable bonds and thus a large number of


Reference


(a) ClqScore


1dbb
1
12
9
9
0
0.00
3.24
3.20
0.41
0.43


1dbj
0
1
1
1
0
3.65
0.00
0.29
1.59
1.65


Query
Idbk
0
1
1
1
0
1.85
0.29
0.00
1.59
1.69


1dbm
6
93312
5000
712
4288
0.77
6.41
6.28
0.00
1.52


1dbb
1dbj
1dbk
1dbm
2dbl


2dbl
6
93312
5000
1361
3628
1.30
4.48
4.45
0.86
0.00






















(a) ldbb (b) ldbj


(d) ldbm (e) 2dbl


(f) 1DBJ, 1DBK, 1DBM and 2DBL aligned onto
1DBB

Figure 4-6. Immunoglobin Ligands


(c) 1dbk












2DBL

1DBM

1DBK

1DBJ

1DBB


Reference


(a) ClqScore


Figure 4-7.


2DBL

1DBM

1DBK

1DBJ

1DBB


Reference


(b) SEScore


Immunoglobin Alignment Results. Reference or target structures are on the
x-axis with the query structures on the y-axis. The best RMSD value of each
pose alignment compared to the ideal alignment are given with an RMSD scale
in A.


Table 4-7. Streptavidin Ligand Alignments


Query


Rotatable Bonds
Conformers Sampled
Conformers Stored
Conformers < 1A
Conformers < 2A


Target


1sre
1srf
1srg
1srh
1sri
1srj


1sre
3
108
55
36
19
0.00
0.66
0.27
0.46
0.90
0.46


1srf
4
1296
117
49
36
2.50
0.00
2.59
2.75
3.01
2.30


1srg
3
108
55
28
27
0.27
0.48
0.00
0.36
0.75
0.40


1srh
5
155552
2149
661
771
0.85
0.57
0.52
0.00
0.77
1.07


1sri
3
108
55
26
26
1.02
1.01
0.76
0.72
0.00
1.11


conformers were sampled as outlined in Table 4-8. Both the clique detection algorithm and

the semiempirical scoring function produced poor results for this receptor compared to the

results of the FLEXS program where an average RMSD of 1.53A was predicted.

4.3.7 Trypsin

The following ligands of Tyrypsin: 4-fluorobenzylamine (ltnh, Fig. 4-11(a)),

4-phenylbutylamine (Itni, Fig. 4-11(b)), 2-phenylethylamine (ltnj, Fig. 4-11(c)),


1srj
3
108
28
28
0
0.44
0.50
0.39
0.86
1.08
0.00
















OH


N



OH

0
(b) lsrf


OH


N



OH

0
(c) 1srg


I I


OH

0
(e) isri


(g) 1SRF, 1SRG, 1SRH, 1SRI and 1SRJ aligned
onto 1SRE

Figure 4-8. Streptavidin Ligands


(a) Isre


(d) lsrh


(f) lsrj












9 9
1SRJ 8 1SRJ 8
1SRI 1SRI
6 6
1SRH 5 1SRH 5
0 1SRG- 4 0 1SRG 4
3 3
1SRF 1SRF
2 2
1SRE 1SRE 1
0 0

Reference Reference

(a) ClqScore (b) SEScore

Figure 4-9. Streptavidin Alignment Results. Reference or target structures are on the
x-axis with the query structures on the y-axis. The best RMSD value of each
pose alignment compared to the ideal alignment are given with an RMSD scale
in A.

Table 4-8. Dihydrofolate Reductase Ligand Alignments. ClqScores are shown in
parenthesis.
Query
ldhf 4dfr
Rotatable Bonds 10 10
Conformers Sampled I,_" i- I ".I.
Conformers Stored 5000 5000
Conformers < 1A 7 5
Conformers < 2A 796 550
ldhf 0.00 3.28 (3.29)
g 4dfr 2.43 (2.10) 0.00


3-phenylpropylamine (ltnk, Fig. 4-11(d)), trans-2-phenylcyclopropylamine (ltnl, Fig.

4-11(e)), and benzamidine (3ptb, Fig. 4-11(f)), and m-amidino-nalpha-tosylated piperidide

(Ipph, Fig. 4-11(g)) were used in this study. The primary difference between ltnh, Itni,

Itnj, Itnk, and ltnl lies in the distance between the primary amine and the phenyl group,

while 3ptb is a substructure of the largest molecule in this set lpph. There are some

notable differences between the best poses predicted by both scoring functions. When ltnh

is used as the target structure the SEScore predicts more reasonable poses than that of

the clique detection algorithm, also the alignment of ltnh onto 3ptb is predicted in closer






















(a) ldhf (b) 4dfr


(c) 4DFR aligned onto 1DHF

Figure 4-10. Dihydrofolatreductase Ligands.


agreement with the reference state. The alignment of lpph onto all other targets results in

poor predicted alignments due to its size.

4.3.8 Estrogen Receptor

Two Estrogen Receptor (ER) complexes were used including 1ERR and 3ERT

where raloxifene (lerr, Fig. 4-13(a)) and 4-hydroxytamoxifen (3ert, Fig. 4-13(b)) bind

respectively. The alignment of the 3ERT complex onto the 1ERR structure is shown in

Fig. 4-13(c). The ER set was used in this study because it is a key target for breast cancer















F





NH3 NH3 N

(a) ltnh (b) Itni (c) Itnj (d) Itnk



0\ H






NH3 H2N NH2 H2N 0NH2
(e) Itnl (f) 3ptb (g) 1pph






















(h) 1TNH, 1TNI, 1TNJ, 1TNK, 1TNL, 3PTB
Aligned onto 1PPH

Figure 4-11. Trypsin Inhibitors.













Table 4-9. Trypsin Ligand Alignments


Rotatable Bonds
Conformers Sampled
Conformers Stored
Conformers < 1A
Conformers < 2A




Target


3PTB


Reference


(a) ClqScore


Figure 4-12. Trypsin Alignment Results. Reference or target structures are on the x-axis
with the query structures on the y-axis. The best RMSD value of each pose
alignment compared to the ideal alignment are given with an RMSD scale in
A.


lpph
8
559872
1000
2
62
0.00
1.16
6.75
6.82
7.14
7.36
3.81


1tnh
1
12
12
12
0
0.80
0.00
3.50
2.26
3.39
0.62
0.70


Query
Itni ltnj
4 2
162 18
90 17
34 14
56 3
5.43 3.93
2.79 1.54
0.00 2.41
2.84 0.00
2.90 2.64
3.00 1.40
2.98 2.13


lpph
ltnh
Itni
ltnj
ltnk
ltnl
3ptb


ltnk
3
54
36
26
10
9.05
1.73
2.79
1.97
0.00
1.17
1.32


1tnl
1
6
6
6
0
4.06
1.09
2.46
0.62
0.30
0.00
4.70


3ptb
1
3
4
4
0
0.23
0.33
3.38
0.94
1.56
4.20
0.00


3PTB


Reference


(b) SEScore






















(a) lerr


(c) 3ERT Aligned onto 1ERR

Figure 4-13. Estrogen Receptor Ligands.


drug discovery [31]. SEScore out performs ClqScore for this receptor as shown in Table

4-10. The alignment of 3ert onto lerr is predicted with an RMSD value of 1.12A using

SEScore but lerr onto 3ert can be considered misaligned.

4.3.9 Peroxisome Proliferator-Activated Receptor7

The rosiglitazone (lfm6, Fig. 4-14(a)) and GI262570 (lfm9, Fig. 4-14(b)) ligands of

Peroxisome Proliferator-Activated Receptor7 (PPAR7) were used in this study with the

ideal structures shown in Fig. 4-14(c). This receptor was including because it has been

directly linked to type 2 diabetes, cardiovascular diseases, and obesity [236-240] and so


(b) 3ert









Table 4-10. Estrogen Receptor Ligand Alignments. ClqScores are shown in parenthesis.
Query
lerr 3ert
Rotatable Bonds 6 8
Conformers Sampled 11664 419904
Conformers Stored 567 1015
Conformers < iA 16 132
Conformers < 2A 238 881
lerr 0.00 1.12 (2.15)
g 3ert 2.82 (3.47) 0.00


poses as a key target for drug research. The ClqScore predicts the correct pose for lfm6

aligned onto lfm9 but the opposite alignment gave poor results. SEScore was on average

better than the ClqScore but the two alignment could be considered as misaligned.

Table 4-11. PPAR7 Ligand Alignments.
Query
lfm6 lfm9
Rotatable Bonds 4 8
Conformers Sampled 324 419904
Conformers Stored 271 2000
Conformers < lA 11 3
Conformers < 2A 101 88
lfm6 0.00 2.69 (5.09)
Target fm9 2.59 (1.39) 0.00


4.3.10 Human Carbonic Anhydrase II

The 40 Human Carbonic Anhydrase II (HCA II) ligands used in this study are

shown in Table 4-12. HCA II is a Zinc metalloprotein as described in Ch. 2.14 and all

inhibitors bind the Zn ion through the sulfonamide group. The HCA family of proteins

have been extensively studied using X-ray ( i v-I 11.graphy because they pose as important

targets for drug discovery with drugs utilized as antiglaucoma, anticonvulsant, antirolithic,

antiepileptic, and anticaner agents [241]. This set was also chosen as it represents a

data set which could be used within the SE-COMBINE approach to predict binding free

energies and design new drugs. The ideal states for 39 ligands for the la42 inhibitor are

shown in Fig. 4-15. In total 22 ligands were aligned in a satisfactory manner, 11 were




















(b) lfm9


(c) 1FM6 Aligned onto 1FM9

Figure 4-14. Peroxisome Proliferator-Activated Receptor 7 Agonists.


aligned with the correct orientation, while only 6 ligands can be regarded as misaligned

whereas the ClqScore misaligned 12 structures as shown in Table 4-13.


(a) lfm6











Table 4-12. 40 Human Carbonic Anhydrase II Inhibitors.


HCA II Inhibitors.

Ref


Structure


Ref


0, 0
VI
NHI
HN





NH2


[25] li8z




[25] li90


0 0
--N [25]


li91


0 0











=S-
0
NHO







C)


0II s
0-siN

NH2
HN

0 0
1 /s- N,'a "o
NHI





NH2
0 H0

II
HN






OH


[25] lif4


[25] lif5


lif6


[25] lif7





[25] lif8


H -




NH2
F
0

II
F


O0


N


[243]


o0=




[243]


Continued on next page.


PDB ID


Structure


la42




lbnl






lbn3


[242]




[242]






[242]


lbn4


Ibnm



lbnn


lbnq





lbnt











Table 4-12. (continued)


PDB ID





lbnu


Ref


Structure


0 0


-t0<_


lif9


Structure

o 0

NH
OH

1o 0

o
HNo





NH2
0 0
-0 V

NH2
o s





NH2 N

NH f 2

NH_ 0
0











-L-O0

0 0 0V
11-0Y 0 L

VH1


0
N=o
0




0=0






\N

O=I -
NH2

NH2 HN




SH


NH2




NH2




0 0

NH2/ /O /


Continued on next page.


Ref


[25] lkwq




[25] lkwr





[245] lokl




[245] lokm





[245] lokn







[244] loq5





[247] lttm


lbnv




lbnw





Icil




1cim





Icin







1cnx





leou


[244]




[244]





[244]


[246]





[248]










Table 4-12. (continued)


PDB ID







Igld






1g52




lg53





lg54


Ref


Structure


Structure















a -



F F
11 ( 1
O=S
-I0
N, 2 1-b, F


0
o=S-0 7


\
NH/ /


N 0N






-N

N---






-tO


4.3.11 Thrombin

Two inhibitors of Thrombin including Idwe, (Fig. 4-16(a)) and ldwd (Fig. 4-16(b))

were used with the 1DWE aligned onto 1DWD shown in Fig. 4-16(c). The ClqScore

provide the same results as the SEScore with an average RMSD of 1.12A; however no

conformational searching was performed.

4.3.12 Elastase

Five tripeptide ligands of Elastase were used in this study including lela (Fig.

4-17(a)), lelb (Fig. 4-17(b)), lelc (Fig. 4-17(c)), leld (Fig. 4-17(d)), and lele (Fig.

4-17(e)) where lela, leld, and lele occupy different pockets of the serine protease. The


[249] lxpz






[249] IxqO




[249] lyda



F

[249] lze8

HCA II Inhibitors.


Ref







[248]






[248]




[244]





[250]


































(a) 1BN3, 1BNM, 1BNN, 1BNQ, 1BNT, 1BNU, (b) 1BN1, 1BN4, 1BNW, 1IF4, 1IF5, 1IF6,
1NBV, 1I8Z, 1190, 1191, 1CIL, 1CIM, and 1CIN 1KWQ, 1KWR, 1OKL, and 1YDA


(c) 1G1D, 2,1G52, 1G53 4,1 F54, 1IF, F8, 1IF9,
1CNX, 10KM, 10KN, and 1ZE8


(d) 1EOU, 10Q5, 1TTM, 1XPZ, and 1XQO


Figure 4-15. HCA II Ligands Aligned onto the 1A42 Structure.










Table 4-13. Human Carbonic Anhydrase II Results.


1a42
lbnl
lbn3
lbn4
lbnm
Ibnn
Ibnq
Ibnt
Ibnu
lbnv
Ibnw
Icil
Icim
Icin
Icnx
leou

1g52
1g53
1g54
li8z
1i90
1i91
lif4
lif5
lif6
lif7
lif8
lif9
Ikwq
Ikwr
lokl
lokm
lokn
loq5
Ittm
Ixpz
IxqO
lyda
lze8


Conformers
Stored < 1.0A < 2.0A ClqScore SE


Rotatable
Bonds
7
5
3
6
4
3
6
3
3
4
5
3
1
2
11
3
5
5
5
5
5
5
4
3
1
3
7
7
7
3
1
2
7
9
2
2
7
7
2
4


Sampled
8748
15552
864
18 1.,124
5184
1726
2916
1728
432
5184
15552
108
12
36
944784
27
20736
20736
20736
20736
15552
972
1296
432
12
432
1 1i,, 24
1t,1,,24
93312
432
12
72
23328
104976
144
36
18~1, 24
181,,, 24
72
1296


3545
133
5000
702
458
259
481
60
784
3775
25
7
7
5000
10
4311
4470
3542
4238
803
79
64
7
7
7
5000
5000
5000
8
7
4
5000
5000
73
37
5000
5000
37
480


3208
73
4818
393
270
179
150
52
440
2978
6
0
1
4188
3
3860
3766
3188
3290
438
44
47
0
0
0
3279
3243
3503
6
0
3
4482
4696
7
6
2657
2019
20
127


4.23
0.94
5.87
0.89
2.65
0.10
1.09
1.41
2.93
1.63
0.95
0.58
0.25
1.41
1.29
1.57
1.98
1.46
2.72
3.44
1.09
1.37
0.27
0.41
0.52
1.55
1.61
3.97
2.47
1.03
1.21
1.25
1.91
4.45
1.57
4.14
3.79
2.03
2.77


1.70
0.50
3.70
0.53
1.90
0.11
0.29
1.67
1.09
2.84
0.90
0.58
0.25
1.62
1.29
2.04
3.39
1.57
1.57
1.10
1.05
0.79
0.24
0.34
0.39
1.45
2.04
2.82
1.22
1.07
1.25
0.75
2.27
1.77
1.23
3.78
2.82
1.63
1.69



















H2N N

o H
H
NH
(a) Idwe


(b) ldwd


(c) 1DWE Aligned onto 1DWD

Figure 4-16. Thrombin Inhibitors.


Table 4-14. Thrombin Ligand Alignments


Target


1dwc
1dwd


Query
idwe ldwd
0.00 1.09
1.15 0.00









alignment of 1ELB, 1ELC, 1ELD, and 1ELE onto 1ELA is shown in Fig. 4-17(f). Only

the bioactive conformation of each ligand was used in this section of the validation

of the CuTieP approach. All pair alignments were evaluated; however, no reasonable

superposition of lelb and lelc onto lela, leld, or lele can be expected and the results in

Table 4-15 and figures 4-18(a) and 4-18(b) confirmed this. The program FLEXS was also

unable to successfully align these pairs where the authors explain that the volume overlap

is below I 1'- [234]. All other pairs were successfully aligned using both the ClqScore and

SEScore scores.

Table 4-15. Elastase Ligand Alignments.
Query
lela lelb lelc leld lele
lela 0.00 3.17 7.02 0.48 0.28
lelb 3.02 0.00 1.38 3.86 3.29
Target lelc 5.03 0.98 0.00 5.62 4.18
leld 0.50 3.94 5.87 0.00 0.48
lele 0.26 3.21 5.74 0.47 0.00


4.3.13 Thermolysin

Pairs of seven inhibitors of T!, i i ..vli-~ i from the PDB were aligned onto one another

in this study. These included ltlp (Fig. 4-19(a)), Itmn (Fig. 4-19(b)), 2tmn (Fig. 4-19(c)),

3tmn (Fig. 4-19(d)), 4tln (Fig. 4-19(e)), 4tmn (Fig. 4-19(f)), 5tmn (Fig. 4-19(g)) which

vary in size and charge considerably. Fig. 4-19(h) shows the ligands structures when

1TMN, 2TMN, 3TMN, 4TLN, 4TMN, 5TLN, and 5TMN are aligned onto 1TLP. The

smaller ligands 2tmn, 3tmn, and 4tln were allowed to change conformation with the rest

kept in their bioactive conformation as outlined in Table 4-16. When the smaller ligands

are used as the target structure the best poses generated with both the clique detection

algorithm and the semiempirical score function are poor but this can be expected to occur.
























(b) lelb (c) lelc


(d) leld (e) lele


(f) 1ELB, 1ELC, 1ELD and 1ELE aligned onto
1ELA


Figure 4-17. Elastase Ligands.


(a) lela















1ELE

1ELD

1ELC

1ELB

1ELA


Reference


(a) ClqScore


1ELE

1ELD

1ELC

1ELB


1ELA


Reference


(b) SEScore


Figure 4-18. Elastase Alignment Results. Reference or target structures are on the x-axis
with the query structures on the y-axis. The best RMSD value of each pose
alignment compared to the ideal alignment are given with an RMSD scale in
A.








Table 4-16. Ti, i iii., &-ii, Ligand Alignments.


Rotatable Bonds
Conformers Sampled
Conformers Stored
Conformers < lA
Conformers < 2A


Itlp
ltmn
2tmn
3tmn
4tln
4tmn
5tmn


Target


Itlp
12
1
1
1
0
0.00
0.82
0.93
1.20
5.02
1.37
1.07


Itmn
14
1
1
1
0
0.81
0.00
0.79
0.40
6.26
0.50
0.86


2tmn
5
972
68
12
55
1.35
4.85
0.00
2.67
2.89
1.42
1.60


Query
3tmn
7
1,,, .24
5000
25
2533
2.48
3.11
3.03
0.00
6.06
6.40
10.1


4tln
4
216
26
4
22
3.68
1.72
1.80
4.13
0.00
3.17
3.68


4tmn
16
1
1
1
0
1.98
0.79
0.53
0.76
9.43
0.00
0.44


5tmn
16
1
1
1
0
1.41
1.03
0.51
0.99
6.84
0.48
0.00





















(a) 1TLP (b) Itmn


(c) 2tmn (d) 3tmn (e) 4tln



H h

H yH \\O H
0 0- H oo


(f) 4tmn (g) 5tmn





















(h) 1TMN, 2TMN, 3TMN, 4TLN, 4TMN, 5TLN,
5TMN Aligned onto 1TLP

Figure 4-19. Thermolysin Inhibitors.













5TMN 8 5TMN 8
9 9

4TMN 7 4TMN 7
4TLN 6 4TLN 6
S5 5
3TMN -3TMN
2TMN 3 2TMN 3
1TMN 2 1TMN 2
1TLP 1 1TLP 1
0 0

Reference Reference

(a) ClqScore (b) SEScore


Figure 4-20. Thermolysin Alignment Results. Reference or target structures are on the
x-axis with the query structures on the y-axis. The best RMSD value of each
pose alignment compared to the ideal alignment are given with an RMSD
scale in A.


4.4 Conclusions

This study presents the first large scale validation of a semiflexible alignment

approach using a semiempirical scoring function. Over 80 complexes and 219 unique

alignments were considered where the observed ligand binding geometry was predicted

with 49'. accuracy. Though the percentage of successful alignment using this method is

not as high as FLEXS (1I1'- ), it provides an estimate of how physics-based techniques can

perform against their empirical counterparts.

Physics-based methods provide a more theoretically satisfying approach to molecular

alignment. The only parameters used in the approach are those which appear in the SE

Hamiltonian. No training set was used to fit a set of parameters and so it is predicted that

this method would not fail where other empirical approaches do due to transferability.

Speed is also an important property of molecular alignment algorithms. Empirical

methods can often flexibly align molecular structures in times ranging from seconds

to minutes. The current implementation of the CuTieP approach takes in the order of

minutes to a few hours depending on the dConf, nMaxCLiques, nTotalMCP, and









nMM parameters used. Though considering that this method was designed for use

with the SE-COMBINE approach where an SE interaction energy evaluation would

be much more expensive than the CuTieP alignment and so speed is not as important

than obtaining the correct pose. Numerous techniques may be employ, -l to increase the

efficiency of the CuTieP method which include tree pruning during the conformational

searching or using the power of large computer clusters or distributed computing since the

method is trivially parallelizable.









CHAPTER 5
METAL CLUSTER MOLECULAR MECHANICS PARAMETERIZATION

5.1 Introduction

There are currently 52550 structures in the Protein Data Bank (PDB) [15] and

searching for "metal" results in over 18,000 hits with the break down shown in table 5-1.

Metal ions pl .i a vital role in protein function, structure, and stability, with zinc, copper,

and iron pl liing the 'i-:-: -1 role as described in Ch. 2.14.

Table 5-1. Metal Ions in the Protein Data Bank (Accessed on April 5th 2007).
Metal Hits Metal Hits Metal Hits
Na 2149 V 12 Pd 1
Mg 3467 Cr 7 Ag 9
K 632 Mn 984 Cd 361
Ca 3601 Fe 2022 Ir 6
Co 340 Pt 62
Ni 310 Au 28
Cu 589 Hg 323
Zn 3427
Total= 18330


It is desirable to model metalloprotein systems using MM models because one

can carry out simulations to address important structure/function and dynamics

questions that are not currently attainable using QM and QM/\!il based methods

due to unavailability of parameters or system size.

There are a number of approaches to incorporating metal ions into FFs. The Bonded

Model defines bonds, angles, and torsion's between the metal ion and its ligand which

are added to the FF plus the van der Waals component of the non-bonded function.

Hancock [251] used this approach to study systems including Copper and Nickel. The

Bonded plus electrostatics Model defines bonds and angles between the metal ion

and its ligand as well as electrostatic potential (ESP) charges (Fig. 5-1(a)) [252]. This

method attempts to define the correct electrostatic representation of the metal active

site as assigning a plus two charge to a divalent metal ion would not describe reality

though formally correct. The partial atomic charges can be calculated using the RESP









approach [253] or the C'\ X models of Truhlar and Cramer [123]. The bond and angle

force constants are derived from experiment or calculated using ab initio or DFT methods

while the torsion term has so far been neglected. The Non Bonded Model does not

define any extra bonds and places integer charge on the metal ion [254]. Electrostatic

and Lennard-Jones terms describe the interactions. Modifications to this model to

include polarization and charge transfer effects have been developed (Fig. 5-1(b)) [255].

The Cationic Dummy Atom Model is related to the non bonded method where it

places dummy atoms cationss) to mimic valence electrons around the metal ion [256].

Electrostatic and Lennard-Jones terms between the dummy atoms and lighting residues

describe the metal-ion interactions (Fig. 5-1(c)) [257, 258].

Other methods include those of Vedani et al. which is a compromise between the

bonded and non-bonded methods and is implemented in the YETI program [259], the

SIBFA of Gresh and co-workers [260, 261] and the Universal Force Field (UFF) of

Goddard and Rappe and co-workers [262-264]. These methods do not use a pairwise

additive potential or are not readily available in typical biomolecular modeling packages.


/ R R R,
\M..,, M M
\ 'R4 R4 0 R4
R2 R R2 R2
R3 R3 R3
(a) Bonded Model (b) Non-Bonded Model (c) Cationic Dummy Atom Model

Figure 5-1. Three Approaches to Incorporate Metal Atoms into Molecular Mechanics
Force Fields. The bonded model defines bonds, angles, and dihedrals between
the metal and ligands, while the non-bonded model does not and uses
electrostatics and van der Waals to model the interactions. The cationic
dummy atom model is a derivative of the non-bonded model where cations are
placed near the metal center to mimic valence electrons around the metal.


Carrying out MM modeling or MD simulations of metal containing proteins is a

complicated procedure using the bonded plus electrostatics model. Incorporating metals

into protein force fields is a convoluted process due to the plethora of QM Hamiltonians,









basis sets and charge models to choose from. Also it has generally been carried out by

hand without extensive validation for specific metalloproteins. Some of the published force

fields for Zinc, Copper, Nickel, Iron, and Platinum containing systems using the bonded

plus electrostatics model are listed in Table 5-2. There have been numerous other FFs

containing various metals published including ruthenium(II)-polypyridyl [265], cobalt

corrinoids [266-269], Staphylococcal Nuclease [270], alcohol dehydrogenase [271-273], and

metalloporphyrins [274-278].

Automated procedures for the parameterization of MM functions for inorganic

coordination chemistry have been developed over the last number of years by Norrby

and co-workers [279, 280]. Their attempts have focused on generating parameters using

experimental, structural data from databases such as the Cambridge Crystallographic

Structural Database (CCSD) and quantum mechanical reference data using a version of

the MM3 force field [71].

Table 5-2. Published Metalloprotein Force Fields Using the Bonded Plus Electrostatics
Model.
Metal Protein References
Zinc Human Carbonic Anhydrase II [252, 281, 282]
Beta-lactamase [283-290]
Dinuclear Beta-lactamase [291, 292]
Farnesyl Transferase [293]
Copper Blue Copper Proteins [145-150]
Nickel Urea Amidohydrolase [151-154]
Iron Cytochrome P450 [294, 295]
Platinum DNA/Cisplatin [296]
Copper, Zinc Superoxide Dismutase [158]


5.2 Implementation

The goal of this research was to provide a platform to rapidly build, prototype, and

validate MM models of metalloproteins using the bonded plus electrostatics model for

the AMBER suite of programs [16]. The bonded plus electrostatics model was chosen

over the other approaches as the resulting parameters lend themselves to be readily

added to FFs such as those in AMBER [58] and CHARMM [61]. Also the functions used









in these programs are pairwise additive meaning there are no cross-terms and are thus

easier to parameterize and less computationally expensive. The latter is a key point when

considering fully solvated metalloproteins in MD simulations can have many hundreds

of thousands of atoms. A computer program, MCPB (\!, I Id Center Parameter Builder),

to generate FF parameters for metalloproteins was developed to this end. MCPB was

not build to supersede the approaches developed by Norrby described above but instead

to incorporate a realistic bonded and electrostatic model of the metallocenter into the

AMBER FF. The nature of these parameters was investigated in a systematic manner

with the objective of creating a generalized metal FF within the bonded plus electrostatics

framework. The MCPB program was built using the MTK++ Application Program

Interface (API) as described in chapter 3. A complete work flow of MCPB can be seen

in Fig. 5-2. The MCPB program carries out the following steps after a structure is

downloaded from the PDB. First the program checks whether the structure contains a

transition metal. If the structure does not contain a metal then the program terminates.

Otherwise MCPB attempts to determine the primary and secondary ligands of the metal

using rules described by Harding [297-302] which will be described in more detail later in

this chapter. Once a metal site is found, MCPB creates model structures of the metal's

first coordination sphere with which ab initio calculations can be performed on to generate

AMBER-like FF parameters. These models include one to generate charges, qi, and

another to determine bond, Kr, and angle, K0, force constants. The AMBER function

includes bond, angle, torsion, improper, van der Waals and electrostatic terms as described

in chapter 2.3; however, only bond, angle and electrostatic terms are parameterized under

the assumption made by Loops et al. that dihedral terms can be ignored. Lennard-Jones

parameters are also not parameterized here due to the fact that most metals are buried

and that van der Waals interactions are not as important as the electrostatics [280].

Lennard-Jones parameters for the most common metal ion in biology were taken from the

literature [303-310]. The methods of incorporating the bond, angle and charge parameters










SNo Get q, K, and Ko
S<-OK? -- I Models Setup OK QM Calculations


No Metal Found Ao
SINo

(E <- OK Test FF -OK


Figure 5-2. MCPB Flow Diagram where a biomolecular structure is downloaded from the
PDB and tested whether it contains a transition metal. If the structure
contains a metal ion the MCPB program is used to build and test molecular
mechanics force field parameters.

are outlined below. Once a FF is produced it is tested using minimization techniques to

observe its stability. Further tools such as comparing the frequencies from both ab initio

and the resulting FF could also be used [311].

5.2.1 Equilibrium Bond Lengths and Angles

Equilibrium values for bond, rq, angles, Oeq, can be determined through ab initio

calculations or taken directly from the ( i I structure in the PDB. There are pros and

cons for using values from both methods. Ab initio calculations are generally carried out
in the gas phase but solvent effects can be incorporated with PC'\ but with an added

cost. Crystal structures may contain spurious values and may not be representative of all

structures with this bond type. Both techniques of determining the equilibrium bond and

angle values will be investigated later in this chapter.

5.2.2 Force Constants

Force constants, K, and K0, are calculated by first creating a model (model 1) of

the metal site, adding Hydrogen atoms using the methods described in Ch. 3.5 and

then optimizing it in the gas phase. The residues bound to the metal are approximated,
for example, cysteine by a thiolate or histidine by a methyl-imidazole, to reduce the

computational cost of the minimization. However, all bonds and angles missing from the









FF were accounted for. Once a minimum is found the second derivatives are determined.

The Cartesian Hessian matrix is shown in Eq. 5-1, which is the second derivative of

energy with respect to coordinates. The eigen-analysis of k provides the force constants,

Ai and the normal modes, i as shown in Eq. 5-2. The interatomic force constant, KAB,

between atoms A and B is required to determine the force on atom A by displacing atom

B as shown in Eq. 5-3 which is required for a MM function.

a2E
[k] k (5-1)

Fi = -[k]i6r = -Aivi6r (5-2)

6FA = [kAB]6rB (5-3)


From the minimized structure of model 1 the metal-Ligand bond and angle

force constants are evaluated. The force constants are converted from Cartesian into

internal coordinates using the Gaussian program [312] providing the following keyword

(iop(7/33=1)). The MCPB program then reads the resulting internal force constant

matrix and assigns the values to the appropriate bonds and angles using a conversion

factor of 627.5095 between Hartree and kcal/mol and 2240.87 between Hartree/Bohr2

and kcal/molA2 for bonds.

5.2.3 Point Charges

The atom centered partial charges were derived using the Merz-Singh-Kollman (! 1\)

[126] and the Restrained ElectroStatic Potential (RESP) [127, 313, 314] schemes described

in Ch. 2.9 using a second model (model 2) of the metal center. This model included all

atoms of a bound residue which were capped with .... I7il (ACE) and N-methylamine

(NME) residues. If two lighting residues were less than five residues apart then they were

tethered with glycine residues and the chain capped with ACE and NME. Hydrogen atoms

were added using the methods described in Ch. 3.5. This model was not allowed to relax

to save computational expense and to keep the ( i I i11 J.4raphic geometry. The van der

Waals radii for the metals used in the MK scheme were taken from the literature. The









MK/RESP scheme was favored over other charge model schemes because its ability to

adjust the charge of the capped or linking residues to an integer value, thus allowing the

formal charge of the cluster to disperse over the metal and the bound ligands.

5.3 Zinc AMBER Force Field

Now with the ability to build and validate metal FFs established the task of

generating a generalized FF was initiated. Zinc was chosen as a considerable number of

proteins contain that metal as highlighted in Table 5-1, while also being computationally

well behaved. Metalloproteins containing zinc are both structural and functional proteins

as described in Ch. 2.14 and in general Zn is four coordinate, sometimes five or six

coordinate when multiple ASP/GLU residues or water molecules bind. It was then

necessary to determine all Zn environments which exist in proteins. This was carried

out using a program called pdbSearcher to analyze all structures currently in the PDB.

pdbSearcher was developed using the API provided by MTK++ as described in Ch. 3.

All X-ray i i -I I1 structures with a resolution below 3.0 A were extracted from a local

mirror of the PDB for further analysis. For each metal site the primary and secondary

shell ligands were determined using Harding's bond cut-off values as shown in Table

5-3 [297-302]. These values were determined from a series of papers describing metal

coordination in the CCSD. A donor atom is considered primary coordinated to a metal

if it is within the target distance as shown in Table 5-3 plus some tolerance (0.5 A was

used). Metal-donor distances lying between the target distance plus the tolerance and the

target distance plus a second tolerance (1.0 A was used) were defined secondary ligands.

For example, if a Zn atom is less than 2.53 A from a Histidine ND1 or NE2 atom then

it is considered a primary ligand. If it was less than 3.03 A away then that ligand is

labeled as secondary, otherwise it is unbound. Once the number of primary and secondary

shell ligands were determined, the geometry of the metal centers were evaluated. The

coordination states allowed include octahedral, Fig. 5-3(a), Trigonal Bipyramid, Fig.

5-3(b), Square Pyramid, Fig. 5-3(c), tetrahedral, Fig. 5-3(d), square planar, Fig. 5-3(e),









and tetrahedral plus a non-bonded contact, Fig. 5-3(f). From Fig. 5-3 we can see that the

coordination number alone is not enough to assign a metal geometry. Thus the root mean

square deviation (RMSD) of the geometry angles from those in a regular polyhedron were

calculated. Equation 5-4 was used to distinguish between square planar and tetrahedral

geometries with the ideal angles used in Table 5-4. Likewise, equations 5-5 and 5-6

were used for five and six coordinate metals respectively. The atom indices in Table 5-4

correspond to those atoms in Fig. 5-3. This indexing is useful to differentiate between

axial/equatorial and cis/trans ligands. The coordination state with the lowest rms was

assigned to the metal and its ligands.

Table 5-3. Metal-Donor Bond Target Lengths in A. The following donor atoms of residues
are implied: HOHAO, ASP@OD1/OD2, GLUaOE1/OE2, HIS@ND1/NE2,
CYSaSG, MET SG, SERAO, THRAO, TYRAO and the amino acid
backbone carbonyl oxygen atom CRL. If a metal-donor distance is within these
target distances plus some tolerance (0.5A) it is considered a primary
interaction.
Metal HOH ASP/GLU HIS CYS/ \iIKT SER/THR TYR CRL
Na 2.41 2.41 2.38
Mg 2.07 2.07 2.10 1.87 2.26
K 2.81 2.82 2.74
Ca 2.39 2.36 2.43 2.20 2.36
Mn 2.19 2.15 2.21 2.35 2.25 1.88 2.19
Fe 2.09 2.04 2.16 2.30 2.13 1.93 2.04
Co 2.09 2.05 2.14 2.25 2.10 1.90 2.08
Ni 2.09 2.05 2.14 2.25 2.10 1.90 2.08
Cu 2.13 1.99 2.02 2.15 2.00 1.90 2.04
Zn 2.09 1.99 2.03 2.31 2.14 1.95 2.07


S6 1/2
6tet/sqp (ai ideal (5-4)
i= 1

S10 1/2
6tbp/ttUp [ ai ideal )2 (5 5)


t ad 1/2
6oct = (a aideal)2 (5-6)
5 1















(a) Octahedral


4
2
(b) Trigonal Bipyramrnid


(d) Tetrahedral


5


314
(c) Square Pyramid



3(,, ,,,r 2
(e) Sq 4re Pl
(e) Square Planar


2' '
S 3
(f) Tetrahedral plus Non-Bonded Contact

Figure 5-3. Metal Ligand Geometries Perceived Using Harding's Rules.

5.3.1 Protein Data Bank Survey of Zinc Containing Proteins

The results of searching the PDB (accessed on the 5th of April 2007) for zinc

metalloproteins using the rules above are shown in Fig. 5-4. There are 524 cases of

trigonal bipyramidal (tbp) and 706 cases of square pyramid (trp) and 228 instances of

octahedral (oct). 615 metal centers are found as tetrahedral with a non-bonded contact

(tnb) and 1372 are ill-defined using the current definitions (unk). 2964 out of 6435 total

observations or 46.1 of zinc atoms in protein structures are found to be tetrahedral (tet),

and thus the results and discussion will focus on them. The most common Zn coordinating









Table 5-4. Ideal Angles Used to Calculate Root Mean Square Deviations for Tetrahedral,
Square Planar, Trigonal Bipyramidal, Square Pyramid and Octahedral
Geometries. The notation a12 describes the angle between atom 1, the metal
and atom 2. The atom indices correspond to Fig. 5-3. b, is the mean of the
four angles between the apical bond and the basal bonds in square pyramid
geometries.
Type Coordination Angle (Deg) Atoms
ML4 Tetrahedral 109.5
Square Planar 180.0 a12, a34
90.0 All others
ML5 Trigonal Bipyramidal 180.0 a12
120.0 a34, a45, a35
90.0 All others
1 y4
Square Pyramid bm i ai5 a15, a25, a35, a45
(360.0 2bm) a12, a34
2 sin-1 (21/2 [sin(180.0 bm)]) a13, a23, a14, a24
ML,. Octahedral 180.0 a12, a34, a56
90.0 All others


ligands in tetrahedral environments are shown in Fig. 5-5. Here the 1 letter amino acid

codes are used with X describing an unknown ligand such as a non-standard amino acid or

drug molecule.

The most common tetrahedral Zn environment is CCCC or four cysteines bound,

followed by CCCH, CCHH, DHHH, HHHO, HHHX, CCHX, and CCHO as shown in Fig.

5-5. This data led the research in a direction to investigate the relationship between Zinc

coordination environment and geometric parameters such as bond lengths and angles.

The bonds between Zinc and Sulfur, Nitrogen and Oxygen from 10 unique primary shell

environments are shown in Table 5-5.

The distribution of Zinc-Sulfur bonds in proteins that contain environments such as

CCCC, CCCH, CCHH, and CHHH are shown Fig. 5-6. A Box plot as shown in 5-7(a)

may also be used to represent this data as it shows the max and min values, the lower

and upper quartiles and the median. The Box plots of the variation of Zn-N bonds in the

series CCCC, CCCH, CCHH, and CHHH is shown in Fig. 5-7(b).














0
0

0
0
0

0
0
O-

0
0

E
; 0
z o

0
0
0









Coordination Type



Figure 5-4. Zinc Coordination Geometry Distribution from the PDB.



The peaks of the Zn-S bond length distributions in the CCCC and CCCH

environments lie between 2.3 and 2.4A, while for CCHH and CHHH systems it occurs

between 2.2 and 2.3A. There are only 14 instances of CHHH and the Zn-S and Zn-N

bonds have large standard deviations of 0.1364 and 0.1403 respectively. Also in the case of

the Zn-S bonds the median differs from the mean considerably (2.296 compared to 2.344),

thus -i-i-, .--; unreliability of this data. The Box plots also point to some outliers in

the data, for example there are Zn-S bonds in the PDB below 1.5 A which upon visualize

inspection seem crowded and poorly resolved.

The distribution of Zinc-Oxygen, and -Nitrogen bonds are shown in Fig. 5-8. Zinc

bonded to Glutamic and Aspartic acid .::i-, .i (GLUaOE1/OE2 or ASPeOD1/OD2) or

histidine nitrogen (HISeND1/NE2) all show similar behavior with bonds lengths around

2.1A being most common. In spite of this similarity the standard deviations of ASP and









Tetrahedral Zinc Primary Ligating Residues. Bond lengths are in A. 1 letter
amino acid codes are used; C:CYS, H:HIS, O:HOH, D:ASP. N is the number of
Bond instances. Min and Max are the minimum and maximum bond lengths
respectively. The 1st Quartile, 3rd Quartile, mean, median, and standard
deviation are statistical parameters to describe the bond length distribution.


N Bond Min


3284
1041
334
14
347
334
42
12
108
42
78
12
324
42
26
155
68
465
68
460
227
825
1768


Zn-S
Zn-S
Zn-S
Zn-S
Zn-N
Zn-N
Zn-N
Zn-N
Zn-O
Zn-O
Zn-O
Zn-O
Zn-N
Zn-N
Zn-N
Zn-O
Zn-O
Zn-N
Zn-N
Zn-O
Zn-O
Zn-ND
Zn-NE


1.424
1.448
1.908
2.180
1.833
1.716
1.778
1.935
1.359
1.866
1.611
1.781
1.872
1.850
1.836
1.688
1.805
1.604
1.959
1.688
1.462
1.716
1.604


Median Mean


1st
Quartile
2.294
2.284
2.234
2.270
2.056
2.023
1.964
2.006
2.000
2.092
2.006
2.004
2.044
2.060
2.041
1.914
2.053
2.000
2.101
1.958
1.996
2.030
2.031


2.338
2.332
2.295
2.296
2.124
2.078
2.034
2.040
2.252
2.268
2.143
2.158
2.098
2.143
2.089
2.000
2.148
2.064
2.184
2.044
2.102
2.096
2.093


2.338
2.332
2.301
2.344
2.132
2.088
2.056
2.049
2.185
2.233
2.115
2.135
2.116
2.161
2.102
2.007
2.166
2.077
2.192
2.077
2.134
2.107
2.108


3rd
Quartile
2.389
2.382
2.361
2.390
2.200
2.149
2.113
2.107
2.362
2.384
2.241
2.300
2.176
2.260
2.121
2.086
2.262
2.144
2.302
2.165
2.265
2.181
2.177


Max Standard
Deviation


2.805
3.047
2.795
2.608
2.525
2.465
2.486
2.129
2.518
2.543
2.495
2.428
2.757
2.453
2.459
2.457
2.938
2.499
2.460
2.988
2.823
2.525
2.757


0.1218
0.1089
0.1289
0.1364
0.1157
0.1188
0.1403
0.0627
0.2218
0.1816
0.1914
0.1917
0.1140
0.1455
0.1295
0.1425
0.1840
0.1275
0.1280
0.1899
0.2276
0.1256
0.1228


GLU bonds are greater than those of HIS. Also it is worth nothing that the 1i I i ,il ry of Zn

histidine bonding is through the epsilon Nitrogen.

5.3.2 Tetrahedral Zn Environment Force Field Parameterization

Metal-ligand bonds are softer than those of organic molecules; however, there are

obvious trends in the data presented above. These findings encouraged the formation of

a generalized FF for Zn called the Zinc AMBER Force Field or ZAFF. A concept key to

this work is "plug-and-pI ,i- where a researcher can download a metalloprotein structure


Table 5-5.


CCCC
CCCH
CCHH
CHHH
CCCH
CCHH
CHHH
HHHH
HHHO
HHOO
HOOO
0000
HHHO
HHOO
HOOO
HHHD
HHDD
HHHD
HHDD
ASP
GLU
HIS
HIS




















HSXX =
HOOO
HOO =
HHXX =
HHX =
HHOX 3
HHOO =
HHHX
HHHO i
HHH
EHOO =
EHH =
EEHH 0
-0 EEH =
c DOO =
SDHOO 0
2) DHO =
DHHX =
) DHHO =
c DHHH I
SDHH
b DDHH
o DDEX 0
.E CHHX 3
SCHHO =
CHHH '
CEHO =
CCHX
CCHO =
CCHH
CCH
CCEH
CCDH '
CCCX =
CCCO
CCCH
CCCC i

0 200 400 600 800
Number


Figure 5-5. The Most Common Tetrahedral Zinc Coordinating ligands Combination
Distribution. Three lettered environments also contain a secondary ligand not
shown.









from the PDB and run dynamics using predefined parameters as illustrated in Fig. 5-9. To

this end FFs for the 10 unique environments shown in Table 5-5 were built using MCPB.

A single structure from the PDB representative of each environment was chosen. FFs

were built using the B3LYP DFT method [315-317] with the 6-31G* basis set [318].

The resulting FFs were stored in cluster xml files using the definitions of stdLibrary and

stdGroup from Ch. 3.2.3 for later use.

The 1A5T structure from the PDB was used as a representative structure of the

Zn-CCCC cluster. Two models of this cluster were built using MCPB as shown in Fig.

5-10. The calculated bond and angle force constants are shown in Tables 5-6 and 5-7. The

average Zn-S bond length is 2.426A which is higher than the mean value from the survey

of the PDB but it is within one standard deviation. The corresponding mean bond force

constant is 100.677 kcal/(mol.A2). The mean S-Zn-S and C-S-Zn angles are 109.1380

and 101.7540 with an average force constants of 15.016 and 81.505 kcal/(mol rad2)

respectively.

Table 5-6. Zn-CCCC Cluster Bond Lengths, r, in A and Force Constants, Kr, in
kcal/(mol.A2) (PDB ID: 1A5T).
Bond r Kr
ZN-S1 2.42511 100.709
ZN-S2 2.42442 101.586
ZN-S3 2.42459 101.013
ZN-S4 2.43049 99.4008
Mean 2.4260 100.677


The structures 1A73 and 2GIV from the PDB were used as characteristic structures

of the Zn-CCCH cluster. The delta Nitrogen of Histidine is bonded to the Zinc atom in

1A73 while the epsilon Nitrogen is bound in 2GIV. Two models of each cluster were built

using MCPB as shown in Fig. 5-11. The bond lengths and force constants are tabulated

in Table 5-8, while the angles and angle force constants are shown in Tables 5-9 and 5-10.

The average Zn-S bond lengths are 2.355A and 2.352A which are in good agreement

with the values determined from the PDB survey. The Zn-S bond lengths are shorter in









Zn-CCCC Cluster Angles, 0, in Degrees and Force Constants, Ko, in
kcal/(mol rad2) (PDB ID: 1A5T).
S-Zn-S2 S-Zn-S3 S-Zn-S4 C-S-Zn


S1
S2
S3
S4
Mean

S1
S2
S3
S4
Mean


107.384 111.062
109.664


108.75
110.75
109.22


109.138
Ko
13.9532 16.0208 13.0393
15.2898 16.7256
15.0688

15.016


101.721
102.136
101.952
101.210
101.754

74.2777
84.6981
85.4549
81.5913
81.505


Zn-CCCH than they are in the Zn-CCCC cluster and this corresponds to the change in

force constant from 100.677 to 144.003 and 142.575 kcal/(mol.A2). The mean S-Zn-S,

S-Zn-N, and C-S-Zn angles for 1A73 and 2GIV clusters are 115.161/116.3810,

102.9380/101.2700 and 102.5340/101.793 with an average force constants of 13.695/10.377,

21.909/15.213, and 78.568/65.170 kcal/(mol rad2) respectively.


Zn-CCCH Cluster Bond Lengths, r, in A and Force
kcal/(mol.A2) (PDB ID: 1A73 and 2GIV).


Constants, Kr, in


Bond

Zn-S1
Zn-S2
Zn-S3
Zn-NB
Zn-S Mean

Zn-S1
Zn-S2
Zn-S3
Zn-NB
Zn-S Mean


1A73
2.38103
2.33594
2.34818
2.17615
2.355
2GIV
2.35365
2.37024
2.33396
2.14457
2.352


129.940
153.514
148.555
111.727
144.003

140.810
135.986
150.931
109.821
142.575


The 1A1F structure from the PDB was used as a representative structure of the

Zn-CCHH cluster. Two models of this cluster were built using MCPB as shown in Fig.


Table 5-7.


Table 5-8.











Table 5-9. Zn-CCCH Cluster Angles, 0, in Degrees and Force Constants, Ko, in
kcal/(mol rad2) (PDB ID: 1A73).


Angle
S-Zn-S2


S1
S2
S3


112.941


Mean 115.161
CR-NB-Zn
NB 118.832


S1
S2
S3


14.8019


Mean 13.695
CR-NB-Zn
NB 61.5645


Angle
S-Zn-S3


117.548
114.996


CC-NB-Zn
133.957


12.3254
13.9588


CC-NB-Zn
62.4920


Table 5-10. Zn-CCCH Cluster Angles, 0, in Degrees and Force Constants, Ko, in
kcal/(mol rad2) (PDB ID: 2GIV).


Angle
S-Zn-S2


S1
S2
S3


122.16


Mean 116.381
CR-NB-Zn
NB 124.745


S1
S2
S3


9.20218


Mean 10.377
CR-NB-Zn
NB 33.4104


Angle
S-Zn-S3
0
114.051
112.932


CV-NB-Zn
128.494


11.0341
10.8972


CV-NB-Zn
31.3493


Angle
S-Zn-NB

98.457
110.215
100.144
102.938


31.4032
15.4264
18.8974
21.909


Angle
C-S-Zn

101.452
103.370
102.782
102.534


66.5568
82.7660
86.3823
78.568


Angle
S-Zn-NB

99.8816
94.1401
109.790
101.270


15.3383
20.9385
9.36319
15.213


Angle
C-S-Zn

100.693
102.432
102.254
101.793


59.2390
72.5796
63.6916
65.170









5-12 with the calculated bond and angle force constants shown in Tables 5-11 and 5-12.

The average Zn-S and Zn-N bond length are 2.305A and 2.088A with corresponding

force constants of 181.478 and 147.126 kcal/(mol.A2) respectively. The average value

of the Zn-S bond length from the PDB is 2.301A while the mean Zn-N value is

2.088A which are in excellent agreement with the calculated values. Both the Zn-S

and Zn-N bonds are shorter than the previous clusters and this is reflected in stronger

force constants been determined. The mean S-Zn-N, C-S-Zn, and C-N-Zn, angles

for the 1A1F cluster are 103.232, 105.054, and 126.586 with average force constants of

12.488, 69.269, and 34.502 kcal/(mol rad2) respectively.

Table 5-11. Zn-CCHH Cluster Bond Lengths, r, in A and Force Constants, Kr, in
kcal/(mol.A2) (PDB ID: 1A1F).
Bond r Kr
Zn-S1 2.28799 191.997
Zn-S2 2.32226 170.959
Zn-NZ 2.09197 143.964
Zn-NY 2.08529 150.288
Zn-S Mean 2.305 181.478
Zn-N Mean 2.088 147.126


The final Zn center considered in this study which contains a cysteine residue was

Zn-CHHH. The 1CK7 structure from the PDB was used to model the Zn-CHHH cluster.

Two models of this cluster were built using MCPB as shown in Fig. 5-13 with the

calculated bond and angle force constants shown in Tables 5-13 and 5-14. The Zn-S bond

length was determined as 2.262A with a force constant of 186.196 kcal/(mol.A2). The

mean Zn-N bond length is 2.046A with a force constant of 180.437 kcal/(mol.A2). The

mean N-Zn-N and S-Zn-N angles are 105.8350 and 112.9500 with force constants of

2.795 and 12.488 kcal/(mol rad2) respectively.

There are a very small number of Zinc atoms bound to four histidine residues in the

PDB. But to complete this computational study the bond and angle force constants were

determined using 1PB0 as a starting geometry. The models created by MCPB are shown

in Fig. 5-14 with the resulting bond lengths and angles and ac. '1|' ,lr,:ing force constants









Table 5-12. Zn-CCHH Cluster Angles, 0, in Degrees and Force Constants, Ko, in
kcal/(mol rad2) (PDB ID: 1A1F).
Angle Angle Angle Angle
S/N-Zn-S2 S/N-Zn-NY S/N-Zn-NZ C-S-Zn
0


S1
S2
NY

NY
NZ

S1
S2
NY


135.025


CR-N-Zn
123.668
123.392

9.71416


101.200
102.721


108.303
100.705
106.293


104.986
105.122


CV-NB-Zn
129.422
129.863


10.1761
14.5105


10.1241
15.1437
8.41020


63.3345
75.2045


CR-N-Zn CV-NB-Zn
NY 36.1351 32.4650
NZ 36.2138 33.1944
Mean 0 Ko
S-Zn-N 103.232 12.488
C-S-Zn 105.054 69.269
C-N-Zn 126.586 34.502

Table 5-13. Zn-CHHH Cluster Bond Lengths, r, in A
kcal/(mol-.2) (PDB ID: 1CK7).
Bond r


and Force Constants, Kr, in


Zn-S1 2.26178 186.196
Zn-NX 2.05563 176.880
Zn-NY 2.04663 182.100
Zn-NZ 2.03803 182.333
Zn-N Mean 2.046 180.437


are shown in Tables 5-15 and 5-16. The average Zn-N bond distance is 2.010A with a

force constant of 217.616 kcal/(mol.A2). The angles of the Zn center are 109.481 with a

force constant of 6.088 kcal/(mol rad2).

It is evident there are clear trends in the calculated bond lengths and force constants

described above. The bond lengths of Zn-S through the series CCCC, CCCH, CCHH, and

CHHH correlate with the calculated force constants with an R2 value of 0.97 as seen in

Fig. 5-15(a). The Zn-N bond lengths and force constants correlate with an R2 of 0.95 as









Table 5-14. Zn-CHHH Cluster Angles, 0, in Degrees and Force Constants, Ko, in
kcal/(mol rad2) (PDB ID: 1CK7).
Angle Angle Angle Angle
S/N-Zn-NX S/N-Zn-NY S/N-Zn-NZ C-S-Zn
0
S1 108.625 110.015 120.210 104.518
NX 109.309 104.101
NY 104.096
CR-N-Zn CV-NB-Zn
NX 118.357 135.099
NY 133.352 120.137
NZ 127.083 125.963
Keo
S1 4.05583 4.03272 1.89062 10.5385
NX 2 ;1) 2.68101
NY 3.32002
CR-N-Zn CV-NB-Zn
NX 23.1399 ND
NY 34.1629 37.8760
NZ 18.0772 9.73914
Mean 0 Ko
N-Zn-N 105.835 2.795
S-Zn-N 112.950 12.488

Table 5-15. Zn-HHHH Cluster Bond Lengths, r, in A and Force Constants, Kr, in
kcal/(mol-.2) (PDB ID: 1PBO).
Bond r Kr
Zn-NW 2.00656 221.593
Zn-NX 2.01037 217.622
Zn-NY 2.01428 213.845
Zn-NZ 2.01130 217.407
Zn-N Mean 2.010 217.616


shown in Fig. 5-15(b). It is worth noting here that Zn donor bond lengths differ within

environments. Thus having a single Zn-S or Zn-N bond equilibrium and force constant

value would not work. The proposed solution to this problem is to store all Zn bond types

and assign the parameters in an automatic manner within the metal center perception

algorithm of MTK++.

The average angle size and force constants of S-Zn-S are smaller and stronger,

109.138/(15.016) kcal/(mol rad2) in the CCCC cluster compared to those of the









Table 5-16. Zn-HHHH Cluster Angles, 0, in Degrees and Force
kcal/(mol rad2) (PDB ID: 1PBO).
Angle
N-Zn-NX N-Zn-NY N-Zn-NZ
0


NW
NX
NY
NZ


NW
NX
NY
NZ

NW
NX
NY
NZ

NW
NX
NY
NZ
Mean
N-Zn-N
CR-N-Zn
CV-N-Zn


111.145 106.809 110.679
111.299 106.671
110.288


Constants, Ko, in


CR-N-Zn


127.688
127.492
127.172
127.161


CV-N-Zn
126.018
126.229
126.584
126.540


5.49705 6.93160
5.64881


CV-N-Zn
34.1690
32.2668
32.0421
32.3428
0
109.481
127.378
126.342


Ko
5.80074
7.14382
5.50663


33.8058
32.1774
31.7145
32.1549


Ko
6.088
32.463
32.705


CCHH cluster where values of 135.0250 and 9.714 kcal/(mol rad2) were determined.

The N-Zn-N angles of the CCHH, CHHH, and HHHH clusters lie between 105.8350

and 109.481 with force constants between 2.795 and 8.4102 kcal/(mol rad2). The

experimental force constant of the N-Zn-N was reported to be approximately 5.0

kcal/(mol rad2) which is in good agreement with those calculated here. It has been

reported that this angle force constant is too weak to prevent the angle opening beyond

the ideal tetrahedral angle in MD simulations and in the past arbitrary scaling factors

have been applied to prevent this from occurring [252, 293]. A general scaling factor has

not been developed here as this study was designed to investigate the raw force constants









produced by QM packages but it may be further developed in the future. Thus the

FFs shown in this chapter can be described as the zeroth order with further corrections

necessary to carry out meaningful simulations.

The partial charges of the Zinc clusters CCCC, CCCH, CCHH, and CHHH were

determined applying two different methods using the larger models described above. The

first method allows all atoms of the bound residue to change (C('l!s\ i.dA) while the second

technique restrains the backbone atoms (CA, H, HA, N, HN, C, O) to those values found

in the AMBER parm94 force field (C( !;\ !.)dB). The charges were determined by first

calculating the MK charges from Gaussian (1.1A radius for Zinc was used) and then using

the RESP program to zero out the charges on the capping groups. This procedure was

carried out to disperse the charge over the entire cluster thus removing the need to have

a large +2 formal charge on Zinc. The C('!I\ !.)dA charges are shown in Table 5-17 and

C('!;,\! dB charges are presented in Table 5-18. The SG atom in CYM (unprotonated

cysteine) residue in parm94 has a charge of -0.736 and fluctuates between -0.485

and -0.640 in ('C!I\ \!dA and between -0.473 and -0.669 in ('C!I\ \!dB. One of the

'i.---- -1 differences between ('C!i, \IdA and ('C! i,\IdB are the charges on CB. CYM@CB

has a -0.736 charge while in ('C!;\ 1\!dA its charge lies closer to zero. C('!;\! .dB on

the other hand places a -0.4 on CB. It is unclear whether C!i;l\!I.dA is superior to

C'!i;,\!.dB or vice versa. Though it could be advantageous to keep the charges of the

backbone atoms fixed to the parm94 values as these have been used in the fitting of the

torsional parameters of the FF. It is also difficult to determine if this would matter as

the movement of the residues bound to Zinc would be restricted. The C'!i;\ I .)dA and

C'!,l5\!,dB charges for the Zinc- CCCH, CCHH, CHHH, and HHHH clusters are outlined

in Tables 5-19 and 5-20.

The variation of bond distances, angles, and partial charges of Zinc clusters containing

histidine residues and water molecules was determined. The 1CA2 structure was used

to represent the Zn-HHHO cluster which is a structure of human Carbonic Anhydrase II










Table 5-17. Cysteine C'!I irges using C('!i!,\IdA for the Zn-CCCC, -CCCH, -CCHH, and
-CHHH C'!I-I i~-. Partial C'!i irges are in electron units.


Residue
CYM
CCCC
CCCH
CCCH
CCCH
CCHH
CCHH
CHHH


CYM
CCCC
CCCH
CCCH
CCCH
CCHH
CCHH
CHHH


CYM
CCCC
CCCH
CCCH
CCCH
CCHH
CCHH
CHHH


(1A73) CY1/CY3
(1A73) CY2
(2GIV)
CY1
CY2






(1A73) CY1/CY3
(1A73) CY2
(2GIV)
CY1
CY2


(1A73)
(1A73)
(2GIV)
CY1
CY2


CY1/CY3
CY2


N
-0.463000
-0.479963
-0.474605
-0.414898
-0.542307
-0.454687
-0.268043
-0.447464

CB
-0.736000
0.112192
0.019831
0.049561
-0.048990
-0.097120
0.074672
-0.056717

HB2/3
0.244000
0.002066
0.065870
0.056778
0.084419
0.084759
0.031256
0.044767


CA
0.035000
0.003180
-0.108977
0.031295
0.017445
0.005167
-0.035885
-0.025959

SG
-0.736000
-0.620103
-0.640071
-0.484948
-0.591652
-0.581202
-0.537449
-0.512655

ZN

(1 1,- ,-,17
0.593065
0.593065
0.359392
0.263107
0.263107
0.552747


C
0.616000
0.518101
0.595840
0.278787
0.632188
0.245131
0.462357
0.622632

HN
0.252000
0.281750
0.287703
(1 -'..1 's
0.331010
0.297192
0.295118
0.277771

(CI irge

-2.0
-1.0
-1.0
-1.0
0.0
0.0
1.0


0
-0.504000
-0.548044
-0.519264
-0.450243
-0.568329
-0. '" I,'" ;
-0.532708
-0.497571

HA
0.048000
0.057052
0.0" 1,,:1
0.052328
0.049510
0.126035
0.088752
0.099225


Table 5-18. Cysteine C(! irges using C('!i,\!dB for the Zn-CCCC, -CCCH,
-CHHH C('!-I i~-. Partial C(i irges are in electron units.


-CCHH, and


Residue
CYM
CCCC
CCCH (1A73) CY1/CY3
CCCH (1A73) CY2
CCCH (2GIV)
CCHH CY1
CCHH CY2
CHHH


CB
-0.736000
-0.462072
-0.372264
-0.32 :;
-0.435825
0.003587
-0.221037
0.159424


SG
-0.736000


HB2/3
0.244000


-0.530008 0.169606 0.675474
-0.533198 0.171501 0.501937
-0.472872 0.175531 0.501937
-0.555154 0.212869 0.45405
-0( -,,.'.1 0.073015 0.362845
-0.531646 0.134076 0.362845
-0.552344 -0.017513 0.505550











Table 5-19. Histidine C'!I irges using C('!i,1)dA for the Zn-CCCC, -CCCH, -CCHH, and


-CHHH Chl-.i-.
Residue
HIE
HID
CCCH (HIE)
CCCH (HID)
CCHH (HID)
CHHH (HID)
HHHH (HID)
Residue
HIE
HID
CCCH (HIE)
CCCH (HID)
CCHH (HID)
CHHH (HID)
HHHH (HID)
Residue
HIE
HID
CCCH (HIE)
CCCH (HID)
CCHH (HID)
CHHH (HID)
HHHH (HID)
Residue
HIE
HID
CCCH (HIE)
CCCH (HID)
CCHH (HID)
CHHH (HID)
HHH (HID)
Residue
HIE
HID
CCCH (HIE)
CCCH (HID)
CCHH (HID)
CHHH (HID)
HHHH (HID)


F

-
-





-




-




-




-


irges are in electron units.


_N CA
.0.415700 -0.058100
.0.415700 0.018800
0.013815 -0.094517
.0.465489 0.012311
.0.411312 -0.035148
*0.; :1".I..1 -0.089931
.0.321934 -0.129789
CB CG
.0.007400 0.1868000
.0.046200 -0.0266000
0.009333 0.1388840
0.035338 -0.1183230
0.087860 -0.0472060
.0.171947 0.0159560
.0.146428 -0.00 ,
CEl NE2
0.163500 -0.279500
0.205700 -0.572700
.0.002223 -0.152285
0.051160 -0.085718
.0.080618 -0.041354
.0.064531 -0.228608
0.059695 -0.145091
HB2 HB3
0.036700 0.036700
0.040200 0.040200
0.(02- :;- 0.(02- :S
0.029513 0.029513
0.072346 0.072346
0.107686 0.107686
0.120901 0.120901
HD1 HD2
0.186200
0.364900 0.114700
0.1,,2: ;- [
0.300711 0.125183
0.321771 0.161285
0.297609 0.099257
0.322378 0.137633


C
0.597300
0.597300
0.609742
0.744779
0.603874
0 .' :111,11
0.661692
ND1
-0.54320
-0.38110
-0.200272
-0.143343
-0.121450
-0.096528
-0.130169
H
0.271900
0.271900
-0.043986
0.269102
0.2'i I ;
0.285362
0.242073
HE1
0.143500
0.139200
0.144631
0.121003
0.175582
0.186129
0.185187
ZN


0.593065
0.359392
0.263107
0.552747
0.412289


O
-0.567900
-0.567900
-0.511703
-0.614433
-0.536756
-0.554579
-0.523604
CD2
-0.220700
0.129200
-0.231676
-0.003696
-0.070041
0.003599
-0.067660
HA
0.136000
0.088100
0.015705
0.080535
0.108816
0.104929
0.135166
HE2
0.333900

0.307178


artial C1 i









(HCA II) with the MCPB models shown in Fig. 5-16. As mentioned in Ch. 2.14 HCA II

is a catalytic center for the conversion of CO2 into bicarbonate. Therefore to account for

both the water and hydroxyl states two FFs were evaluated. It is of no surprise that the

bond lengths and associated force constants of the two systems are different. The Zn-O

bond is longer in the case of water binding while the Zn-N are shorter due to the strength

of the hydroxyl bond as outlined in Table 5-21. The accompanying force constants are also

considerably different. The Zn-O bond force constants changes from 120.287 to 394.674

kcal/(mol.A2) upon removal of a proton while the Zn-N force constant becomes weaker

from 248.420 to 194.357 kcal/(mol.A2). These calculated equilibrium bond lengths and

force constants are considerably different from those published by Hoops et al.; however,

the QM methods used to generate the numbers also differ. The Zn-O bond lengths of the

HHHO clusters in the PDB have a large standard deviation of 0.222A with a mean value

of 2.185A confirming that both states exist. The calculated angles and force constants for

this cluster are shown in Table 5-22 which are in good agreement with those published

previously, except for the H-O-Zn angle force constant that was arbitrarily set to a

higher value.

The 1VLI structure from the PDB was used to investigate the strength of bond and

angle force constants of the Zn-HHOO cluster. Again the MCPB program was used to

build the models (Fig. 5-17) required for parameterization with the resulting equilibrium

bond lengths and angles and corresponding force constants shown in Tables 5-23 and 5-24.

The equilibrium bond length of Zn-O was calculated as 1.946A which is A 0.4A shorter

than the bond length for the Zn-HHHO cluster. This contradicts the trend from the PDB

survey. Plausible reasons for this discrepancy include the small number of data points

for the Zn-HHOO cluster in the PDB and the large standard deviation value of 0.146A.

The angle force constants calculated for this cluster are of similar magnitude to those

calculated for the Zn-HHHO cluster.










Table 5-20. Histidine ('C! .iges using C'!i_\!lodB for the Zn-CCCC, -CCCH, -CCHH, and


Partial C' irges are in electron units.


-CHHH C'!hI-I i-.
Residue
HIE
HID
CCCH (HIE)
CCCH (HID)
CCHH (HID)
CHHH (HID)
HHHH (HID)

Residue
HIE
HID
CCCH (HIE)
CCCH (HID)
CCHH (HID)
CHHH (HID)
HHHH (HID)

Residue
HIE
HID
CCCH (HIE)
CCCH (HID)
CCHH (HID)
CHHH (HID)
HHHH (HID)


CB
-0.007400
-0.046200
-0.390226
0.052173
0.021453
0.097861
0.299667

CE1
0.163500
0.205700
0.025646
0.044669
-0.090768
-0.066727
-0.067193

HE1
0.143500
0.139200
0.125452
0.125431
0.171776
0.184941
0.181314


CG
0.186800
-0.026600
0.172204
-0.062260
-0.074575
-0.131328
-0.158251

NE2
-0.279500
-0.572700
-0.1386
-0.107399
-0.047611
-0.200851
-0.078907


ND1
-0.54320
-0.38110
-0.21 it -
-0.152337
-0.103342
-0.072 [7
-0.10536

HB2/3
0.036700
0.040200
0.176704
0.001993
0.034279
0.048588
0.001086


HE2 HD2
0.333900 0.186200
0.114700
0.300742 0.164376
0.184058
0.165012
0.106056
0.15876


Table 5-21. Zn-HHHO Cluster Bond Lengths, r in A and Force Constants, Kr, in
kcal/(mol.A2) (PDB ID: 1CA2).


Bond


H20 Zn-NX 1.9783 250.691
Zn-NY 1.9817 246.526
Zn-NZ 1.9836 248.043
Zn-OW 2.1122 120.287
Zn-N Mean 1.981 248.420
HO- Zn-NX 2.0293 190.815
Zn-NY 2.0400 194.203
Zn-NZ 2.0412 198.055
Zn-OW 1.8596 394.674
Zn-N Mean 2.036 194.357


CD2
-0.220700
0.129200
-0.240248
-0.093846
-0.083769
0.010476
-0.114661

HD1

0.364900

0.302699
0.317081
0.295541
0.310647

ZN


0.501937
0.45405
0.362845
0.50555
0.317246


r Kr












Table 5-22. Zn-HHHO Cluster Angles, 0, in Degrees and Force Constants, Ko, in
kcal/(mol rad2) (PDB ID: 1CA2).
Angle Angle Angle Angle
N-Zn-NY N-Zn-NZ N-Zn-OW CR-N-Zn
H20 0
NX 109.645 123.191 100.449 128.302
NY 113.479 109.534 127.604
NZ 98.1932 125.145
CV/CC-N-Zn H-OW-Zn
NX 125.323
NY 126.100
NZ 127.730
HW 123.942
H20 Ko
NX 8.00872 8.22809 5.61984 34.2350
NY 7.62091 3.74798 34.5691
NZ 5.75939 40.5598
CV/CC-N-Zn H-OW-Zn
NX 34.0616
NY 3 -'11
NZ 44.0110
HW 20.5484
HO- 0
NX 106.050 107.502 124.853 126.720
NY 116.017 102.656 114.165
NZ 100.380 113.620
CV/CC-N-Zn H-OW-Zn
NX 126.592
NY 139.179
NZ 138.859
HW 116.648
HO- Ko
NX 9.25545 9.22715 7.39156 29.5398
NY 6.99278 10.4632 38.9903
NZ 12.0718 42.5650
CV/CC-N-Zn H-OW-Zn
NX 30.0396
NY 33.8686
NZ 37.2270
HW 38.9579




























16 18 20 22 24 26 28 30
Bond Lengths

(a) CCCC


18 20 22 24 26 28 30
Bond Lengths


(c) CCHH


16 18 20 22 24 26 28 30
Bond Lengths

(b) CCCH


20 22 24 26 28
Bond Lengths


(d) CHHH


Figure 5-6. Zn-S Bond Length Distributions in CCCC, CCCH, CCHH, and CHHH
Tetrahedral Environments.


Table 5-23. Zn-HHOO Cluster Bond Lengths, r, in
kcal/(mol.A2) (PDB ID: 1VLI).
Bond r


A and Force Constants, Kr, in


Zn-NX 1.9443 199.330
Zn-NY 1.9488 286.855
Zn-OX 2.0577 157.459
Zn-OY 2.0493 77.0940
Zn-N Mean 1.946 243.092
Zn-O Mean 2.053 117.276


0-
L: "























Zn-S Bond Distance (A)
(a) Zn-S Bond Lengths through the series CCCC, CCCH, CCHH, and CHHH.


Zn-N Bond Distance (A)
(b) Zn-N Bond Lengths through the series HHHH, HCCC, HHCC, and HHHC.

Figure 5-7. Box Plots of Zn-S/N Bond Lengths in CCCC, CCCH, CCHH, CHHH, and
HHHH environments.


FffJI
T- _----
OOO=o --- T I-

0 00 awoo- -|--- Bxm

BEO aXIDCH T ^^ -


OOOD L --- m --

0 00 cDO1-f -- - .aaaO






















Table 5-24. Zn-HHOO Cluster Angles, 0, in Degrees and Force Constants, Ko, in


kcal/(mol rad2) (PDB ID: 1VLI).
Angle
N-Zn-NY N-Zn-OX
0
NX 121.973 108.959
NY 107.009


OX

NX
NY
OX
OY

NX
NY
OX

NX
NY
OX
OY


CV-N-Zn
122.128
126.588


5.81366


CV-N-Zn
10.6697
33.5296


CR-N-Zn
131.376
126.944


Ko
5.71586
4.36309

CR-N-Zn
16.9386
33.5677


O/N-Zn-OY

112.588
105.529
98.0344
HW-O-Zn


124.788
122.326


ND
4.47123
3.69369
HW-O-Zn


26.4380
ND






















































15 20 25

Bond Lengths


(a) Asp@OD1/OD2


16 18 20 22

Bond Lengths


(c) His@ND1


15 20 25

Bond Lengths


(b) C:(l',,OE1/OE2


24 26


16 18 20 22 24 26 28

Bond Lengths


(d) His@NE2


Figure 5-8. Tetrahedral Zn-O(Asp/Glu) and Zn-N(His) Bond Length Distributions.


8



8


c
c
9



8
N



O















S C Metal? Stored?


No .e

+en Carry out steps
Ein Fig. 5-2.


Figure 5-9. ZAFF Flow Diagram. This illustration demonstrates when a metalloprotein
structure is downloaded from the PDB and an equivalent metal site is stored
the MTK++ package has the ability to assign parameters to carry out MD
simulations.


(a) 1A5T Model 1


(b) 1A5T Model 2


Figure 5-10. Zn-CCCC Cluster Models (PDB ID: 1A5T).
























(b) 1A73 Model 2


S


(c) 2GIV Model 1


(d) 2GIV Model 2


Figure 5-11. Zn-CCCH Cluster Models (PDB ID: 1A73 and 2GIV).


00001 0 0


(a) 1A73 Model 1

















/. 0


(a) 1A1F Model 1


(b) 1A1F Model 2


Figure 5-12. Zn-CCHH Cluster Models (PDB ID: 1A1F).


SI







(a) 1CK7 Model 1 (b) 1CK7 Model 2
Figure 5-13. Zn-CHHH Cluster Models (PDB ID: 1CK7).






























(a) 1PBO Model 1


(b) 1PBO Model 2


Figure 5-14. Zn-HHHH Cluster Models (PDB ID: 1PBO).


soyrl


0 20O



























o
'0

(C 0
c



0
o *
C)


LI
C
N
0


2.30 2.35 2.40
Zn-S Calculated Bond Distance (A)



(a) The Correlation between Zn-Cys S Bond Lengths and Calculated Force Constants through the Series
CCCC, CCCH, CCHH, and CHHH.






0
04
04


2.05 2.10 2.15
Zn-N Calculated Bond Distance (A)


(b) The Correlation between Zn-His@N
CCCH, CCHH, CHHH, and HHHH.


Bond Lengths and Calculated Force Constants through the Series


Figure 5-15. The Correlation between (a) Zn-Cy- S and (b) Zn-His@N Bond Lengths and

Calculated Force Constants through the Series CCCC, CCCH, CCHH,

CHHH, and HHHH.


"0
-0
o
E
00



0

0
LL
z





2.00
























(b) 1CA2 Model 2


Figure 5-16. Zn-HHHO Cluster Models (PDB ID: 1CA2).


(a) 1VLI Model 1


(b) 1VLI Model 2


Figure 5-17. Zn-HHOO Cluster Models (PDB ID: 1VLI).


(a) 1CA2 Model 1


<000







The final tetrahedral environment containing histidine residues and water molecules
was the HOOO cluster. The 1L3F PDB structure was used with MCPB models shown in
Fig. 5-18. The average Zn-O and Zn-N bond lengths are 2.01A and 1.926A which are
shorter distances than those in the Zn-HHHO and Zn-HHOO clusters, agreeing with the
experimental means from the PDB.


p


p
"I '
*4

6


(a) 1L3F Model 1
Figure 5-18. Zn-HOOO Cluster Models (PDB ID: 1L3F).

Table 5-25. Zn-HOOO Cluster Bond Lengths, r, in A and
kcal/(mol.A2) (PDB ID: 1L3F).
Bond r K,


(b) 1L3F Model 2


Force Constants, Kr, in


Zn-NX 1.9256 325.449
Zn-OW 2.0170 179.519
Zn-OX 2.0022 189.759
Zn-OY 2.0136 181.502
Zn-O Mean 2.0100 183.593


The histidine residues, water molecules and zinc ion partial charges for the Zn-HHHO,
-HHOO, and -HOOO clusters are outlined in Table 5-27.
Two clusters containing histidine and aspartate residues were considered in this
study. The 2USN and 1UOA were chosen as characteristic structures of the Zn-HHHD
and Zn-HHDD (Fig. 5-19) environments. The PDB survey showed that the bond lengths









Table 5-26. Zn-HOOO Cluster Angles, 0, in Degrees and Force Constants, Ko, in
kcal/(mol rad2) (PDB ID: 1L3F).
Angle
N/O-Zn-OW N-Zn-OX O/N-Zn-OY
0
NX 114.389 118.388 118.918
OW 103.216 97.8629
OX 100.927
CR-N-Zn CC-N-Zn HW-O-Zn
NX 125.909 126.608
OW 127.562
OX 127.232
OY 124.457
Ko
NX 3.83477 8.16145 4.17334
OW 4.28235 3.60452
OX 4.24390
CR-N-Zn CC-N-Zn HW-O-Zn
NX 44.5591 51.1512
OW 23.8887
OX 24.9508
OY 25.1823


of Zn-O bonds in H/D systems changed from 2.007A to 2.166A going from HHHD to

HHDD and this trend is also seen in the calculated values of these clusters as shown in

Table 5-28.

5.4 Conclusions

This research describes the design, development, and implementation of two programs

called pdbSearcher and MCPB. The former carries out metalloprotein data mining of the

Protein Data Bank. Results focused on Zinc metalloproteins as a large number of proteins

contain this element. The nii, ii ily of Zn metalloproteins are tetrahedrally coordinated

to histidine, cysteine, aspartate, glutamate residues, or water molecules. The distribution

of bond lengths between Zn and the donor atoms of these residues was investigated, with

some short Zn-S bonds highlighted which may be due to errors during
refinement.












Table 5-27. Histidine and Water's Partial C'!I irges using C('li\I.dB for the Zn-HHHO,


-HHOO, and -HOOO C('i-. 1i-


Residue
HIE
HID
HHHO1 (HID1)
HHHO1 (HID2)
HHHO1 (HIE)
HHHO2 (HID1)
HHHO2 (HID2)
HHHO2 (HIE)
HHOO (HID)
HOOO (HIE)
Residue
HIE
HID
HHHO1 (HID1)
HHHO1 (HID2)
HHHO1 (HIE)
HHHO2 (HID1)
HHHO2 (HID2)
HHHO2 (HIE)
HHOO (HID)
HOOO (HIE)
Residue
HIE
HID
HHHO1 (HID1)
HHHO1 (HID2)
HHHO1 (HIE)
HHHO2 (HID1)
HHHO2 (HID2)
HHHO2 (HIE)
HHOO (HID)
HOOO (HID)
Residue
WAT
-OH
HHHO (WAT)
HHHO (HO-)
HHOO (WAT)
HOOO (WAT)


CB
-0.007400
-0.046200
0.272769
0.579686
0.563536
0.07163
0.379846
0.123745
0.564561
0.817801
CE1
0.163500
0.205700
-0.153009
-0.057251
-0.093972
-0.081199
-0.005112
-0.071842
0.023257
-0.210361
HE1
0.143500
0.139200
0.187876
0.210086
0.169654
0.186714
0.167127
0.176463
0.176795
0.191318
O
-0.834

-0.765313
-1.003960
-0.742546
-0.595794


-
-
-
-

-
-
-
-

-
-
-
-
-
-
-
-
-


Partial ('!i irges are in
CG ND1
0.186800 -0.54320
.0.026600 -0.38110
.0.046469 -0.054651
.0.290892 -0.115057
.0.126874 -0.196737
0.005255 -0.085911
.0.232467 -( ii i, :1
.0.116227 0.001013
.0.334037 -0.04,.. ;
.0.151877 -0.360409
NE2 HB2/3
.0.279500 0.036700
.0.572700 0.040200
.0.160702 -0.015389
.0.113546 -0.078839
0.0' -'. -0.130733
.0.257224 0.038414
.0.210826 -0.036823
.0.204022 0.019225
.0.445271 -0.063809
0.125524 -0.140428
HE2 HD2
0.333900 0.186200
0.114700
0.122148
0.142360
0.345567 0.167258
0.149779
0.102959
0.348448 0.153575
0.096861
0.314799 0.264829
H
0.417


electron units.
CD2
-0.220700
0.129200
-0.122505
-0.073113
-0.099394
-0.103044
-0.024617
-0.073261
0.144851
-0.256153
HD1

0.364900
0.341095
0.323076

0.315667
0.297259

0.315238

ZN



0.702584



0.674911



1.02705
0.968391


0.468035
0.411829
0.400146
0.435269











Table 5-28.


Zn-HHHD and Zn-HHDD Cluster Bond Lengths, r,
K, in kcal/(mol.A2) (PDB ID: 2USN and 1UOA).


in A and Force Constants,


Bond

Zn-NX
Zn-NY
Zn-NZ
Zn-02
Zn-N Mean


r
2USN
2.0729
2.0269
2.0276
1.9865
2.0420


Kr

176.828
208.052
206.717
282.503
197.199


1UOA
Zn-NY 2.1247 133.766
Zn-NZ 2.1404 128.104
Zn-OA 2.1660 171.778
Zn-OB 2.0517 209.806
Zn-N Mean 2.1320 130.935
Zn-O Mean 2.1080 190.792



Table 5-29. Zn-HHHD Cluster Angles, 0, in Degrees and Force Constants, Ko, in
kcal/(mol rad2) (PDB ID: 2USN).


Angle
N-Zn-NY


NX
NY
NZ

NX
NY
NZ
02

NX
NY
NZ

NX
NY
NZ
02


105.439


CV/CC-N-Zn
136.979
130.128
130.170


17.8462


CV/CC-N-Zn
41.7749
45.0768
43.9566


Angle
N-Zn-NZ
0
105.366
120.039

C-O-Zn


100.499
Ko
17.4805
7.20713

C-O-Zn




197.677


Angle
N-Zn-O

95.8076
113.247
113.189


11.7020
13.3154
13.3171


Angle
CR-N-Zn

115.956
123.274
123.299


47.2274
50.0689
49.0374

























(b) 2USN Model 2


(c) 1UOA Model 1 (d) 1UOA Model 2

Figure 5-19. Zn-HHHD and Zn-HHDD Cluster Models (PDB ID: 2USN and 1UOA).


The MCPB program was used to build, prototype, and validate AMBER-like force

fields using the bonded plus electrostatics model for metalloproteins that can be added

to the AMBER suite of programs. MCPB was used to investigate the environmental

effects on bond lengths, angles, plus bond and angle force constants using 10 unique

metal coordinations. These included Zn bound to CCCC, CCCH, CCHH, CHHH, HHHH,

HHHO, HHOO, HOOO, HHHD, and HHDD clusters. A Zinc AMBER Force Field


(a) 2USN Model 1









Table 5-30. Zn-HHDD Cluster Angles, 0 in Degrees and Force Constants, Ko, in
kcal/(mol rad2) (PDB ID: 1UOA).
Angle Angle Angle Angle
N-Zn-NZ N-Zn-OA N-Zn-OB CR-N-Zn
0
NY 96.9897 101.475 100.576 121.551
NZ 87.8377 99.8952 117.541
OA 155.527
CC-N-Zn 02-C-O C-O-Zn
NY 131.075
NZ 132.416
OA 120.909 88.3673
OB 121.439 94.2323
Ko
NY 20.5679 9.54097 10.8215 54.9830
NZ 16.5019 14.8041 47.8802
OA 8.80214
CC-N-Zn 02-C-O C-O-Zn
NY 44.1986
NZ 61.5788
OA 148.620 187.032
OB 183.487 382.989


(ZAFF) library was created to store these FF parameters in a convenient way as to allow

later use with different metalloproteins than those used in the parameterization.

This work can have many uses in the future. Mainly the equilibrium bond lengths and

angles can be used to aid the refinement of Zn metalloprotein X-ray ( i -I I1 structures.

Also the MCPB program allows for rapid development, limited by the cost of the ab initio

or DFT calculations, of FF parameters for metalloproteins which could have many uses

in drug design projects for example where the target structure contains a metal ion. This

program also provides a platform where non-expert users can develop metalloprotein FF

parameters which until now was not available.


















Table 5-31. Histidine and Aspartate Residue C'! irges using C('!il\IdB for the Zn-HHHD
and -HHDD Clusters. Partial C' ir ges are in electron units.


Residue
HIE
HID
HHHD (HID1)
HHHD (HID2)
HHHD (HIE)
HHDD (HIE)

Residue
HIE
HID
HHHD (HID1)
HHHD (HID2)
HHHD (HIE)
HHDD (HIE)

Residue
HIE
HID
HHHD (HID1)
HHHD (HID2)
HHHD (HIE)
HHDD (HIE)


Residue
ASP
HHHD
HHDD


CB
-0.007400
-0.046200
0.042591
0.160366
0.374184
0.028284

CE1
0.163500
0.205700
-0.147686
-0.01., 1 :'
-0.01., 1 :'
0.030139

HE1
0.143500
0.139200
0.193188
0.179092
0.179092
0.124130

CB
-0.030300
0.441335
0.202830


CG
0.186800
-0.026600
-0.024504
-0.006538
-0.005512
0.076696

NE2
-0.279500
-0.572700
-0.122517
-0.063515
-0.038235
-0.137031


ND1
-0.54320
-0.38110
-0.07-; :,
-0.161147
-0.1 1 :;,,
-0.306581

HB2/3
0.036700
0.040200
0.047948
0.007072
-0.074862
0.039508


HE2 HD2
0.333900 0.186200
0.114700
0.193977
0.195708
0.327123 0.195982
0.309302 0.165779


CG
0.799400
0.373495
0.429585


OD1/2
-0.801400
-0.493992
-0.607229


CD2
-0.220700
0.129200
-0.137672
-0.192906
-0.245743
-0.169575

HD1

0.364900
0.313389
0.343690




ZN


0.431980


0.919685

HB2/3
-0.012200
-0.112244
0.050221









CHAPTER 6
CONCLUSIONS

This chapter provides a synopsis of the research presented in this dissertation where

computational chemistry tools were successfully developed and applied in areas of drug

design and metalloprotein modeling.

C'! lpter two outlined the drug discovery process from the point of view of a

computational chemist. The most common methods used to predict the binding free

energy between receptor and their ligands were summarized including ligand-based and

receptor-based techniques. Statistical and graph theory methods were illustrated and the

use of these tools in chemistry was discussed. A general introduction of metalloproteins

was also given.

The third chapter described the design and development of a computational chemistry

package called MTK++. This was designed as a general purpose molecular modeling suite

for use in drug design and metalloprotein chemistry studies. This work was essential for

the research in the entire thesis to be carried out. Also MTK++ provides a platform for

further development of novel algorithms in modeling of small molecules, proteins, and

most importantly metalloproteins.

C'! lpter four details the development and large scale validation of a semiflexible

alignment approach using a semiempirical scoring function called CuTieP for in silicon drug

design. Results were comparable to those of empirical alignment approaches where ligand

binding geometries within protein active sites were predicted with an accuracy of around

50'' CuTieP is a physics-based method which has potential to be improved on in terms of

speed and accuracy while avoiding the pitfalls of parameter transferability.

The final research chapter described the design, development, and implementation

of two programs called pdbSearcher and MCPB for the study of metalloproteins. These

programs were used to data mine the Protein Data Bank and develop molecular mechanics

force fields for Zinc metalloproteins. A total of 10 unique tetrahedral Zn force fields were









built and the properties of these parameters were elucidated. These parameters may

be used to address important structure, function and dynamics questions that are not

currently attainable using QM and QM/Il\ I based methods.









APPENDIX A
ALGORITHMS

A.1 Subgraph Isomorphism Algorithm

Algorithm A.1: Subgraph Isomorphism Algorithm
Data: molecule, fragment
Result: mapping if fragment found in molecule
begin
int Pa;
int Pb;
array A[Pa,Pa];
array B[Pb,Pb];
array M[Pa, Pb];
bool isomorphism = false;
Ullmann(l, M);
end



Algorithm A.2: Ullmann Function
Data: current atom index, match matrix
Result: mapping if fragment found in molecule
begin
array M1 = M;
bool mismatch;
for all unique i,,I'l'r,:l of atom d do
choose new unique mapping for query atom d;
update M accordingly;
refine(\!, mismatch);
if !mismatch then
if d == Pa then
isomorphism = true;
store M;
else
SUllmann(d+l, M);
end
else
SM M1;
end
end
end










Algorithm A.3: Refine Function
Data: match matrix, boolean mismatch
Result: mapping if fragment found in molecule
begin
mismatch = false;
bool change = false;
while not li,,ij,- or mismatch do
for i Pa do
for j -- Pb do
if M/ji[jj then
assign mij;
change change or not mij;
end
end
end
assign mismatch;
end
end










A.2 Maximum Common Pharmacophore
Algorithm A.4: Find Maximum Common Pharmacophore (\!CP)
Data: Two Mol. ii. s
Result: Maximum Common Pharmacophore Between the Two Molecules
begin
Generate Feature Correspondence Matrix between the two molecules;
Get Threshold Feature Score, TFS;
bestClqScore 0; bestClqSize 0;
for i CorrespondenceMatrix do
getPair 1; curClq i; curClqScore 0;
while getPair do
getPair 0; maxScore 0; pair 0;
for j -+ CorrespondenceMatrix do
testScore <- 0;
for k curClq do
jkScore exp(-((d dk)/dm)2);
if jkScore > TFS then
StestScore+ jkScore
else
Break;
end
end
if testScore > maxScore then
Pair j;
maxScore testScore;
end
end
if Pair then
Add Pair to curClq;
curClqScore+ maxScore;
getPair 1;
end
end
store <- 0;
if curClqScore > bestClqScore + 0.1 then
| store 1;
end
if curClqScore > bestClqScore 0.1 then
if curClqSize > bestSize then
Store 1;
end
if curClqSize bestSize then
if curClqDist > bestDist + 0.1 then
| store 1;
end
end
end
if store then
bestClqScore = curClqScore; bestClqSize = curClqSize;
bestClqDist curClqDist; bestClq curClq;
end
end
return bestClq
end









APPENDIX B
AMBER GRADIENTS

The uncompressed AMBER energy function has the following form:


Etotai >j K(r i req)2 + Ko(Oijk e 2 V+ > V1 + cos(-.'. ..- 7)] +
bonds angles dihedrals n
S[( 12 ( 6 1 1-4 (r* 12 2 r ]6
j -s -2 -2 +
i 1-4
+iq E qLE q (B-1)
Eij EL YEij
i
where E is the dielectric constant and VDW and ELE are the scaling factors for 1-4

non-bonded pairs (set to values of 2 and 1.2 respectively). All other terms are defined in

section 2.3.

Papers from Blondel and Karplus [319], Swope and Ferguson [320], and Tuzun, Noid

and Sumpter [321] in addition to the molecular modeling book by Leach [81] were used as

reference in the derivation of the derivatives or gradients of the AMBER function.

B.1 Vector Math and Derivatives

The distance, rij, between two atoms, i and j, can be defined as:


rij r rj (B-2)

ri| (x xj)2 + (y, yj) + ( zj) (B-3)

S((Xi- Xj)2 (i j)2 + (i_- Z)2)1/2 (B 4)

r V rl (B-5)

r^j (B-6)
Irij

where ri is the position of atom i and xi is the x component of the position vector of atom
i.

In Cartesian space the differential operator is defined as:


V x + y + z (B-7)
Oxi Oyi Ozi









where x is a unit vector parallel to axis of the reference coordinate system.

We want to calculate the OE/axi where xi is the x coordinate of atom i. It is best to

calculate OE/OA using the chain rule where A is the internal coordinate.

OE OE OA
(B-8)
x'i OA 'xi



dO (dcos0 -1
dcos 0 dO (B
1
Ss(B-10)
sin 0

B.2 AMBER First Derivatives

B.2.1 Bond




Ebond = Kr(rij req)2 (B11)

where Kr is the bond force constant, rij is the bond length, req is the standard bond

length.



OE
VE Vr (B-12)
or
S2K,(r- req) rij (B-13)
Irij



ar 1
((Xi Xj)2 + (Yi- yj)2 + (zi- j))-1/2 2(xi xj) (B-14)
Oxi 2

x4- (- x) (B-15)
r
Vr r j (B-16)
Ir (B-17)j
Vjr -Vr (1-(B17)









B.2.2 Angle


Eangle Ko(0 Oeq)2 (B-18)

where Ko is the angle force constant, 0 is the angle (0 < 0 < 7r), Oeq is the standard angle.



S= arccos( ij rkj (B-19)
0 rj\ I rkjlJ
cos 0 = r rkj (B-20)
Srijll kl



OE a0
ViE cos- Vcos0 (B-21)
60 6 cos 0
= 2Ko(O Oq) ( V cos 0 (B-22)

(B-23)

How to determine Vi cos 0?
Considering that cos 0 is a function of ri and rkj both of which are functions of ri, ry,
and rk. Therefore you need to use the chain rule:


9 cos 0
S xVi cos 0 (B-24)
0 Cos 0 Oxij + Cos 0 Oyij cos 0 Oz +
+ + +
axij xi ayij axi azij xi
0 cos 0 9xkj 0 COS 08 ykj 0 COS 08 0kj
+ + (B-25)
Oxkj Oxi 0ykj Oxi %zkj Oxi
8 cos 0
Cs 0 (B-26)
There is a total of 9 such expression similar to eq. B26 which lead to the following
There is a total of 9 such expression similar to eq. B-26 which lead to the following











0 cos 0 0 cos 0

0 acos0 (
+ -- +y
S Okj
0 cos 0
-+
9zkj ]
9 cos 0 o cos 0
aYkj 0zkj


(B-27)


cos 0 0 cos 0'
+ +
)yij a9kj/


(B-28)

(B-29)


Finally, how do you determine d cos O/dxi?


cos 0


d(rk ij rI )
^*rk -dx-


rij 2 1rkj 2
rij rkj Ti r k Tk- j T- ki
rij 2 1rk 2
rkj (rij rkj) rj
i rkj rij 1 j2 1rkj
1 I rk j (rij I rfj) rij
is L .1k 1 1 rij kj .ij


B.2.3 Dihedral


Edihedral


K0 [1 + cos(np 7)]


(B-33)


where K0 is the torsional constant, 0 is the dihedral angle (-Tr < 0 < 7r), 7 is the standard

dihedral angle.


vi Cos 0

Vj cos 0




Vk Cos 0


Scos 0

X


xzij
( cosa
a cos 0
x -j
&a k


rij rkj
rijl kj


(B-30)


d(\r II 1)
(ri rki) d. i


Scos 0
9xij











9 cos 0
Oxkj


1
-- [kj -

1
[\r -
rkI


cos Orij]

cos 8Orj]


(B-31)

(B-32)












cos tu (B-34)

where:


rij rj

rkj rk rj

rlk r -rk

t = rij x rkj (B-35)

u rlk x rkj (B-36)




8E o
VE a 0 V cos (B-37)
a0 0 cos 0

nKO sin (no 0eq) t i cos 0 (B-38)

How to determine Vi cos 0?

Considering that cos .,,: is a function of t and u, both of which are functions of rij,

rkj, and r1k, which are in turn functions of ri, rj, rk, rl. Therefore you need to use the
chain rule:



8 cos (
COS xV cos 0 (B-39)
axi
0 cos 0 t + 9 cos 0 ty + cos 0 at
at O9xi aty ,xi a tz xi
0 cos ( au + 0os 0 uy 0 cos 0 u,(4
+ + (B 40)
u Oxi Nuy Oxi 9uz 9xi
0 cos 0 cos 0
w (-zj) (ykj) (B41)

where:











t (r- ri) x (rk- rj) (B-42)

ij ( zki Ykj zij) (zij Xkj Zkj Xi)

(xij j Xkj yij)) (B-43)

z (Yi j Y- Ykj ij) (B-44)
adt a (j zkj Ykj ij) (B45)

S zj (B 46)
axi

There is a total of 72 such expression similar to eq. B-46.
Determine 0 cos 0/0tx (result from angle derivation)?


Scos 1 u, t1
0 C t"- Cos 0 t(B-47)
atX Itl U CO l (B 47)

However, there is a problem with this result due to the 1/ sin 0 in Eq. B-38. There are
singularities when = 0, 7r. Therefore, 0 cos 0/0tx needs to be rewritten.










Vcos t (B-48)

St (B49)
IFllv (u+ 7u t-
u (t ] (B-50)
1 (B-51)

S(t (t -t.) (B-52)

[t x x it) (B-53)

1 [tx (-sin ~pkj)] (B-54)

[ x (rkj)] (B-55)
V, cos = [u x (sinA)] (B-56)

1 [i x (kj)] (B-57)
Il











S, 9 ( (- ,kj) + (_a) j
Sat (at) k) (Zk) + at (- )

ax + a akj)





atx a ty ( y)
x (Zkj -ij) + (Xij (kj) + -)+
t9, at, u, 9u,





S (zyk ) (xk ) Xk ) +
at at, u buz,
('g W,{ v} + ( X kj ) + 'go ( Y l) + 'g o



Zij) + az9 ) + a- a) + (- uk
t J )atz Nu Ouz
y (-z99) + (xi 9) + (-zk + Ik
(i) + (-xy) + (ylk + Y Ukj) + (-

x-.kj) + (- zkj + ( j) + k(- j

0 (-Yk9) + 09 ) ~
z NYu,


Eele



OE
9ray


+


(B-58)


(Yk)] +

-Xk) +

(xIk)

- YIk)] +


(B-59)


+Xkj)] +

-xk)] (B-60)

*^


(B-61)


qiqj


(B-62)


(B-63)


Finally,


V,






v1~


B.2.4 Electrostatic









B.2.5 van der Waals


(ri*. 12 r*.)61
Evdw = Eij -2 (B-64)
i rj rij
Ei *1212 2r *6 (B-65)
vwiij L ri/ 2 / ij
i l i' j



S r *12 *61
-ij rij ij j





-2j [D2- D] /rj (B -69)

r*6
S12 r12 12 r
D y -- + -(B-67)

1*12 *6
-- ij- (B 68)

-12E, [D2 D] /r (B-69)

D =- (B-70)
'ij










APPENDIX C
FRAGMENT LIBRARY
C.1 Terminal Fragments

Table C-l. Terminal Fragments.

Terminal Fragments have one connection point
Structure Name 3L 8L

1 R- Methyl CH3 TFOOOC


:H3


2 R-
3 R


4
5

6 R










R-NH2
7


8 1'
H H

9 RH
10 R H
1 R-NH2


H
RIN"
H








R-N


Ethyl
Propyl


Isopropyl
n-butyl

sec-butyl

Isobutyl


tert-butyl


Vinyl
Alkynl
Primary amine


NHMe

NHEt

NHPr


NHi-Pr


NMe2


ETH
PYL


IPL
NBL


TFOOOETH
TFOOOPYL


TFOOOIPL
TFOOONBL


SBL TFOOOSBL

IBL TFOOOIBL


TBL TFOOOTBL


AK1
TLK
PAM
NH3


TFOOOAK1
TFOOOTLK
TFOOOPAM
TFOOONH3


NHM TFOOONHM

NHE TFOOONHE

NHP TFOOONHP


NHI TFOOONHI


NM2 TFOOONM2


Primary aldimine
Nitrile
Continued on next


PAD
NIT
page.


TF00OPAD
TFOOONIT


R-NH3+ NH3


R H

R-C-N


('1-,g










Table C-1.


Name

Isonitrile

Azide

Diazonium


(continued)

Structure

10 R-N=C

11 R-N=N=N

12 R-N-N
H H


13 R
1 R1 OH

2 R-O'H

3 R

4 R/O

5 R0-O



0
6 R


7 R1 H
0
8 R OH
0
9 RO
0
R OH
10 OH

11 R('OH
0
12 R F
0
13 R /C
0
14 R? Br
0
15 RI/
1 R-SH


3L

INI

AZD

DZM



AMD
POL

ARA

OME

OET

OPR


OIP


ALD


CAA


CAR



AHA

HPO

ACF


ACC

ACB

ACI
RSH
page.


8L

TFOOOINT

TFOOOAZD

TFOOODZM



TFOOOAMD
TFOOOPOL

TFOOOARA

TFOOOOME

TFOOOOET

TFOOOOPR


TFOOOOIP


TFOOOALD


TFOOOCAA


TFOOOCAR



TFOOOAHA

TFOOOHPO

TFOOOACF


TFOOOACC

TFOOOACB

TFOOOACI
TFOOORSH


Amidine
Primary alcohol

Alcohol

Methyl ether

Ethyl ether

Propyl ether


Isopropyl ether


Aldehyde


Carboxylic acid


Carboxylate


Al11i ,!i-droxy acid

Hydroperoxide

Acyl fluoride


Acyl Chloride

Acyl Bromide

Acyl Iodide
Thiol
Continued on next


C0'i

0

0

+1



+1
0

0

0

0

0


0


0


0


-1



0

0

0


0

0

0
0









Table C-l. (continued)


Structure

2 RsS

3 RS

4 R-S

5 R
s

6 R1 H

7 RS'F

8 R S'CI

9 R S'Br

10 RNS'I
1 R-F
2 R-Cl
3 R-Br
4 R--1
5 R--CF3
6 R-CCI3
0

1 R NH2
0
2 R N=N=N
0
R NO0H
3 H
0
4 R C
4 k
5 R-O-C-N
6 R-S-C-N
7 R-N=C=0
8 R-N=C=S
9 R-N=O


Name

Methyl thioether

Ethyl thioether

Propyl thioether

Isopropyl thioether


Thioaldehyde

Sulfenic acid fluoride

Sulfenic acid chloride

Sulfenic acid bromide

Sulfenic acid iodide
Fluoride
Chloride
Bromide
Iodide
Trifluromethyl
Trichloromethyl


Primary amide

Carboxylic acid azide


Hydroxamic acid


Acyl cyanide
Cyanate
Thiocyanate
Isocyanate
Isothiocyanate
Nitroso
Continued on next


3L

SME

SET

SPR


8L

TFOOOSET

TFOOOSET

TFOOOSPR


SIP TFOOOSIP


TAD

SAF

SAC

SAB

SAI
FL1
CL1
BR1
I01
3FM
3C'i\


TFOOOTAD

TFOOOSAF

TFOOOSAC

TFOOOSAB

TFOOOSAI
TFOOOFL1
TFOOOCL1
TFOOOBR1
TFOOIO1
TF0003FM
TF0003C\ I


PMD TFOOOPMD

CAZ TFOOOCAZ


HOA TFOOOHOA


ACY
CYN
TCY
ICY
ITC
NRS
page.


TFOOOACY
TFOOOCYN
TFOOOTCY
TFOOOICY
TFOOOITC
TFOOONRS


0

0

0

0


0

0

0

0

0
0
0
0
0
0
0


0

0


0


0
0
0
0
0
0
0










Table C-l. (continued)


Structure Name


8L C'!i


+P
R-N
10 0
11 R-O-N=O
0
R-O-N
12 b
0
R-S-OH
13 0
o

15 Rn S'OH

16 R'S'OH
OH
17 RR'BOH
S

18 R
s

19 R S
S

20 R? OH
S


0
21 Ri SH

R-S-F
22 0
o
R-S-CI
23 0
o
R-S-Br
24 0
o
SI
R-S-I
25 0
0
o

26 RS'F


27 RS'C/


Nitro
Nitrite


Nitrate


Sulfonic acid


Sulfinic acid

Sulfenic acid


Boronic acid


Thio-carboxylate


Dithio-carboxylate


Thio-carboxylic acid


Dithio-carboxylic acid


Sulfonic acid fluoride


Sulfonic acid chloride


Sulfonic acid bromide


Sulfonic acid iodide


Sulfinic acid fluoride


Sulfinic acid chloride


NRO
NRT


TFOOONRO
TFOOONRT


NRA TFOOONRA


SNA TFOOOSNA


SIA

SEA


BOA


TCA


DTC


TCO


TCS


TFOOOSIA

TFOOOSEA


TFOOOBOA


TFOOOTCA


TFOOODTC


TFOOOTCO


TFOOOTCS


SOF TFOOOSOF


SOC TFOOOSOC


SOB TFOOOSOB


SOI


SIF


TFOOOSOI


TFOOOSIF


SIC TFOOOSIC


Continued on next page.










Table C-1.


(continued)

Structure
o

28 R 'SBr
0

29 RS"I
o
R\ '"OH


0-P-OH

0
31 R OH
0
S -P-O-
32 R 0-
32 R O"


8L


TFOOOSIB


TFOOOSII


TFOOOPPA


TFOOOPP2


TFOOOPP3


Name 3L


Sulfinic acid bromide SIB


Sulfinic acid iodide SII


Phosphonic acid PPA


Phosphoric acid PP2


Phosphate PP3
Terminal Fragments.


('i:g


0


0


0


0


0












Table C-2.




1


2


3


4
5

1


2


3

4
5


6


7


8


9


1

2


C.2 Two Point Linker Fragments

Two Point Linker Fragments.

Two point linker fragments have two connection points
Structure Name 3L 8L
RI-CH2-R2 Ethyl LYH 2PLOOLYH
H R2
R1 H Trans-alkene AK2 2PLOOAK2
H H
R1 R2 Cis-Alkene CIS 2PLOOCIS
H H

R1 R2 Geminal-Alkene AK3 2PLOOAK3
R- R2 Alkyne AKY 2PLOOAKY
H
,N,
R', R2 Sec-amine SAM 2PLOOSAM

N'R2
R1 H Sec-aldimine SAD 2PLOOSAD

N'H
R1 R2 Prim-ketimine PKT 2PLOOPKT

R1,/ .NR2
N Azo AZO 2PLOOAZO
R1-N=C=N-R2 Carbo-diimide DII 2PLOODII

N-R2
R1 F Imidoyl fluoride IMF 2PLOOIMF

N-R2
R1 Cl Imidoyl chloride IMC 2PLOOIMC

N-R2
R1 Br Imidoyl bromide IMB 2PLOOIMB

N-R2
R1 I Imidoyl iodide III 2PLOOIII


R1 R2 Ketone KET 2PLOOKET

R lo' R2 Ethoxy ETO 2PLOOETO

Continued on next page.


0


0


0


0
0

0


0


0

0
0


0


0


0


0


0

0










Table C-2. (continued)


Structure
0
II

R1 R2


4 R,1 OH
HO OH

5 R1 R2



7 RfO'OIR2
0

8 R OR2
0 0

9 Ri OA R2

1 R S'R2
t R1'S, SR2
0 RI S


S
RI1 R2


0
RA N-R2
R2
1 H
H
N OH

2 R1 R2
0
R1j OH
3 R2NH

3 R2
0
R1 'N OH
4 k2




0
5 R2
0
RI'N C
S R2


Name


8L C'!i


Ketene


Sec-alcohol


Enediol


Ether


Peroxide


Carboxylic acid ester


Anhydride

Thio ether

Disulfide


Thio ketone


Sec-amide


Oxime


Alpha amino acid



Carbamic acid



Carbamic acid floride



Carbamic acid chloride


KTE 2PLOOKTE


SAL 2PLOOSAL


END 2PLOOEND

ETR 2PLOOETR

PXD 2PLOOPXD


CAE 2PLOOCAE


ANH 2PLOOANH

STR 2PLOOSTR

DIS 2PLOODIS


TKT 2PLOOTKT


AMI 2PLOOAMI


OXM 2PLOOOXM


AAA 2PLOOAAA



CBA 2PLOOCBA



CBF 2PLOOCBF



CBC 2PLOOCBC


Continued on next page.










Table C-2. (continued)


Structure Name


8L C'!i


0
R1 N Br
7 R2,
0
I
8 kR2
0
R1-S-R2
1 0
0

2 R1 R2
0

3 R S'OR2
0
RI-O-S-O-R2
4 0
S
SRIAOR2
[ R1 0


S
R1 S R2
S
R1 N)I F
R2
S
R1. N CI
R2
S
RI, N 'k Br
R2


S
R.I N I/
tk,
10 R2
S
R1.'N OH



HO-S-N
12 o R1


Carbamic acid bromide



Carbamic acid iodide


Sulfone


Sulfoxide


Sulfinic acid ester


Sulfuric acid ester


Thio carboxylic acid ester


Dithio carboxylic acid ester



Thio carbamic acid fluoride



Thio carbamic acid chloride



Thio carbamic acid bromide



Thio carbamic acid iodide


Thio carbamic acid


Sulfuric acid amide


CBB 2PLOOCBB



CBI 2PLOOCBI


S02 2PL0002


SLF 2PLOOSLF


SLE 2PLOOSLE


SFE 2PLOOSFE


TCE 2PLOOTCE


DCE 2PLOODCE



TBF 2PLOOTBF



TBC 2PLOOTBC



TBB 2PLOOTBB



TBI 2PLOOTBI



TBA 2PLOOTBA


SAA 2PLOOSAA


Continued on next page.










Table C-2. (continued)

Structure Name
o
R1-S-0R2
13 0 Sulfon

14 R("SOR2 sulfeni


15 Phosp
15 0- Phosp


ic acid ester SAE

c acid ester SEE


honic acid ester PP4
wo Point Linker Fragments.


8L Clg


2PLOOSAE 0

2PLOOSEE 0


TFOOOPP4 0










C.3 Three Point Linker Fragments

Table C-3. Three Point Linker Fragments.

Three point linker fragments have three connection points


Structure
H
^ R3
1 RI R2
H R2

2 R1 R3
R3

1 RN'R2
R2
I
0 Rl'N "R3


NR3
R1 R2


R2
"R3
1 R1 OH
OH
Rj"- R3

2 R2
R30 OH
9 Ri1 R2


N OR3
R1 R2


2 RN + R2
R2
,N ,oR3


0
R NR2
R3

NoR3
RI OkR2

0 0
RA N R2
R3


Name


Propyl


Alkene


Tertiary amine


Tertiary amine Et



Secondary ketimine


Tertiary alcohol



Enol


Hemiacetal



Oxime ether


N-oxide


Hydroxylamine



Tertiary amide



Imido ester


Imide
Continued on next


8L C('!


ROP 3PLOOROP


AK4 3PLOOAK4


TTA 3PLOOTTA


TT1 3PLOOTT1



SKT 3PLOOSKT


TAL 3PLOOTAL



ENL 3PLOOENL


HEM 3PLOOHEM



OXE 3PLOOOXE


NOX 3PLOONOX


HXA 3PLOOHXA



TAM 3PLOOTAM



IDE 3PLOOIDE


IME
page.


3PLOOIME










Table C-3. (continued)

Structure
0
R1'N OR3
7 R2
0
RI OH
,NH
8 R2'NH
S
AR2
R1 N'R2
1 R3
NoR3

2 R1 S
S
R1,'N OR3

3 R2
o RC
R1-O-S-N

4 0 R2

SR,
R,-S-N
5 0 R2
R( S'N'R3
6 R2
0
R S'N'R3
7 R2
R3

1 R PR R2
0
2 R( \R3
2 R2


Name


8L C'!i


Carbamic acid ester



Alpha amino acid



Thiotertiary amide


Imidothioester



Thiocarbamic acid ester


Sulfuric acid amide ester


Sulfonamide


Sulfenic acid amide


CAD 3PLOOCAD



ADB 3PLOOADB


TIA


ITE



ARB


AAE


SFA


SLA


Sulfinic acid amide SIM


Phosphine PHI


Phosphinoxide PHO
Three Point Linker Fragments.


3PLOOTIA


3PLOOITE



3PLOOARB


3PLOOAAE


3PLOOSFA


3PLOOSLA



3PLOOSIM


3PLOOPHI


3PLOOPHO










C.4 Four Point Linker Fragments


Table C-4. Four Point Linker Fragments.


Four point linker fragments have
Structure Name
R4
R1 R2 Butyl


four connection points
3L 8L


BUT 4PLOOBUT


R34
Rj Rs
R2

R3

N N'R4
R1 R2


R2
RN,,N' R3
2 R4
N'R4
R R3
R1 NR
3 R2
RR
%4
4 R + R2
OR4
RI1 R3

1 R2
HO OH
R1-+ R3
2 R2 R4
R30 OR4
3 RI R2
HO NH2
R1+)-R3
1 R2 R4
0 R4
R1 N R3


0
R1.'N NR4
I2 I
R2 R3


Alkene


AK5 4PLOOAK5


Hydrazone



Hydrazine


HZO 4PLOOHZO



HZI 4PLOOHZI


Amidine


Quaternary ammonium


Enol ether


1,2-diol

Acetal


1,2-amino alcohol



Carboxylic acid hydrazide


Urea


URE
Continued on next page.


AME 4PLOOAME


QTA 4PLOOQTA



ELE 4PLOOELE


12D 4PL0012D

ACE 4PLOOACE


12A 4PL0012A



CAH 4PLOOCAH


4PLOOURE


0('i:


0











Table C-4. (continued)


Structure Name


8L C'!i


OR4
RI'N N
R2 R3


R3S SR4
1 RI R2
S
R.I N NR4
I I
2 R2 R3

sIR4
RN NJN
I I
3 R2 R3
RI 0 R4
N-S-N
4 R2 R3
4 R2 0 R3


Isourea


Thio acetal



Thiourea


IUR 4PLOOIUR


CET 4PLOOCET



TIU 4PLOOTIU


Isothiourea


ITU 4PLOOITU


Sulfuric acid diamide DIA
Four Point Linker Fragments.


4PLOODIA











C.5 Five Point Linker Fragments


Table C-5. Five Point Linker Fragments.


Five point linker fragments have five connection points
Structure Name 3L 8L

N'R5
N,Rs
RI'N )N'R4
I I
1 R2 R3 Guanidine GUD 5PLOOGUD

N'R5R4
R1 N' R3
R2 A i A T7 DZ T nn A 7


R5s.NR4
RI R3

3 R2
R4

4 R1 R2
R4
R3S N.R,
5 RI R2
R4




N'N.R3

6 R1 R2
R4

OSRs
N
N' N'R3

7 R1 R2
0 Rs
RIN N.N R4

8 R2 R3
S Rs
RI"N NN' R4
n R2 R3


Enamine



Hemiaminal



Thiohemiaminal





Semicarbazone





Thiosemicarbazone



Semicarbazide


tXILj Uil IJUUtI/JL


ENM 5PLOOENM



HMI 5PLOOHMI



THI 5PLOOTHI





SCZ 5PLOOSCZ





TSZ 5PLOOTSZ



SCI 5PLOOSCI


Thiosemicarbazide TSI
Five Point Linker Fragments.


5PLOOTSI


('i,




0




0


.iiiim razonelUI











C.6 Three Membered Ring Fragments


Table C-6. Three Membered Ring Fragments.


Three Membered Ring Fragments


Structure Name


8L C('l


Cyclopropyl



1,2,2,3,3-aziridine


2,2,3,3-epoxide


2,2,3,3-thiirane


CPP 3MROOCPP



AZI 3MROOAZI


EPO 3MROOEPO


TII 3MROOTII


Three Membered Ring Fragments.


R3 R4
R2^ Rs
N
RI
R2 R3
Rl-lR4
0
R2 R3
R1L, -1R4
S










C.7 Four Membered Ring Fragments


Table C-7. Four Membered Ring Fragments.


Four Membered Ring Fragments
Structure Name 3L


Ci- 1 .1 utane


qR2
2 R1
R5 R4
R R3
R7T R2
3 R8 RI
R5 R4
R6 R3

4 R 00

R R4 Rs
R NH
1 RI
R3 O
R2 O
R0 RI


R O
R2 NH
RI


R3 R4 OH



R32
4 RP

1 R'


1,1-cyclobutane



1,1,2,2,3,3,4,4-cyclobutane



2,2,3,3,4,4-cyclobutane- -one



2,2,3,3,4,4-azetidine



3,3,4,4-beta lactone



3,3,4,4-beta lactam



3,3,4,4-beta lactim


CBT iI 00CBT


1BT I\lI1001BT



2BT \I \IR002BT



4BT I\l Ri004BT



4BX I\l R1004BX



4BO I Il1004BO



4BA iIl R004BA



4BI I 1004BI


2,2,3,3,4,4-oxetane OTE
Four Membered Ring Fragments.


I \ROOOTE


('lI-g










C.8 Five Membered Ring Fragments


Table C-8. Five Membered Ring Fragments.


Structure


R2
R1


R3

2 R1
R4



3 R1



4 R2

R2-

5 0
R

1 6

HN R3

1 R2

R1-N R
9 R3


R1
R4

R3
R2


R5
HN R4

R3
4 R2
R5
R1-N -
N R3


R1-N


Five Membered Ring Fragments
Name 3L


1,2-cyclopentadiene



1,3-cyclopentadiene




1,4-cyclopentadiene



2,3-cyclopentadiene



2-cyclopentan-l-one



Cyclopentyl



2,3-pyrrole


1,3,4-pyrrole




1,2,3,4,5-pyrrole


2,3,4,5-pyrrole



1,3,5-pyrazole


1-pyrazoline
Continued on next


8L C('l


PT1 5MROOPT1



PT2 5MROOPT2




PT3 5MROOPT3



PT4 5MROOPT4



PT5 5MROOPT5



CPL 5MROOCPL



YR1 5MROOYR1



YR2 5MROOYR2




YR3 5MROOYR3


YR4 5MROOYR4



PRZ 5MROOPRZ


RA1
page.


5MROORA1










Table C-8. (continued)


Name


8L C'!ig


3-pyrazoline


1,3-pyrazoline


1-pyrazolidine


RA2 5MROORA2


RA3 5MROORA3


RI1 5MROORI1


0

R1-N R4
O
2 H
RIN R4

N
1 R2

R-NQf
1 0N

R-N-
R1 \NH

R_-N"I




3 0

RI-N
HNR3
4 0

RI-Nf O
NR3
5 0
R5
R4
R N'

Rs
> N
Rjl N -R3


H
N
N"'


1,4-pyrazolidine-3,5-dione


1,2,4-imidazole

3-imidazoline

1-imidazolidine


1,3-imidazolidine



1-imidazolidinone



1,3-imidazolidinone



1,3-imidazoline-2,4-dione



1,4,5-1,2,3-triazole



1,3,5-pyrrodiazole



1H-tetrazole
Continued on next


RI2 5MROORI2


IDZ 5MROOIDZ

IZ1 5MROOIZ1

IM1 5MROOIM1


IM2 5MROOIM2



IM3 5MROOIM3



IM4 5MROOIM4



IM5 5MROOIM5



AZ1 5MROOAZ1



PRR 5MROOPRR


TZ1
page.


5MROOTZ1


Structure

HN -
R3

R1-N
SR3


H-
H










Table C-8. (continued)


Structure
N-N


N
,N N
R-N I
1 N

1 R-NJ
0

2 R -N jI
o
2 R
o


3 0



1 R2


2 R3


O R3
3 R2


5 R R4


4 R
5 0o


R5
o R4


R2



N R3


Name


8L C',ly


Tetrazole


Pentazole


1-pyrrolidine



1-pyrrolidone




1-pyrrolidine-2,5-dione


2-furan


3-furan


2,3-furan


3,4-furan


5-R-gamma lactone




4,4,5,5-1,3-dioxolan-2-one




2,4,5-oxazole


3,5-iso


)xazole
Continued on next pa


TZ2 5MROOTZ2


PZ1 5MROOPZ1

RD1 5MROORD1



RD2 5MROORD2




RD3 5MROORD3



FN1 5MROOFN1


FN2 5MROOFN2



FN3 5MROOFN3


FN4 5MROOFN4


FN5 5MROOFN5




XL1 5MROOXL1




OZO 5MROOOZO



IOZ 5MROOIOZ
ige.










Table C-8. (continued)


Name


8L C'!i,


2-oxazoline


ZO1 5MROOZO1


0
2 2
2 R2
Rs

N' R3
1 0
R5b 5a R4b
0/0 R4a
N'R3
2 0
R-


1 R2


R4


R3


S R3
R2
R4

R3


R2
S
R2


R5

1 "


SR
S R2


2-1,3-oxazol-4-one




3,5-oxazolidinone




3,4,4,5,5-oxazolidinone




2,5-1,3,4-oxadiazole



3,4-1,2,5-oxadiazole


3-thiophene



2,3-thiophene



3,4-thiophene




2,5-thiophene



4,5-thiazole




2,4,5-thiazole
Continued on ne


Z02 5MROOZ0O2




AO1 5MROOAO1




A02 5MROOA0O2




DZ1 5MROODZ1



DZ2 5MROODZ2


3TP 5MR003TP



23T 5MR0023T



34T 5MR0034T




25T 5MR0025T



TZL 5MROOTZL




TIZ 5MROOTIZ


xt page.


Structure

o/"

R2










Table C-8. (continued)


Name


8L C'Ig


2-thiazoline


ZL1 5MROOZL1


s/,R4
,-NH
1 R2

NH
2 R2

Sy R4
N,.
N'R3
3 R2
R5

NH
4 0
Rs
So
1 N



2 N
NR3

3 R4
R5

Sy
4 R2


2,4-1,3-thiazolidine



2-1,3-thiazolidine



2,3,4-1,3-thiazolidine




5-thiazolidinedione



4,5-1,2,3-thiadiazole



5-1,2,3-thiadiazole



3,4-1,2,5-thiadiazole




2,5-1,3,4-thiadiazole
Five Membered Ring


IL1 5MROOIL1



IL2 5MROOIL2



IL3 5MROOIL3




TLD 5MROOTLD



DI1 5MROODI1



DI2 5MROODI2


DI3 5MROODI3


DI4
Fragments.


5MROODI4


Structure


'fN
R2










C.9 Six Membered Ring Fragments


Table C-9. Six Membered Ring Fragments.


Structure
R



R1
H R2

H H
2 H
R1


H ,
H H
H R2
3 H
R1
H H

H H
4 R2
R1


Rsr R3
R4



1 6
R



R1

3 2R2
N R

1

N
G IR


R


Six Membered Ring
Name



Phenyl




1,2-phenyl (ortho)




1,3-phenyl (meta)




1,4-phenyl (para)




1,2,3,4,5-phenyl



Cyclohexyl



1,1-cyclohexane


1,2-cyclohexene


2-pyridine


3-pyridine


4-pyridine
Continued


Fragments
3L


8L C'!i


BNZ 6MROOBNZ




OSB 6MROOOSB




MSB 6MROOMSB




PSB 6MROOPSB




4BZ 6MR004BZ



6CH 6MR006CH



11C 6MR0011C


12C 6MR0012C


2PY 6MR002PY


3PY 6MR003PY


4PY
on next page.


6MR004PY










Table C-9. (continued)


Structure Name


8L C('l


SR2





R4


R


1 0




Re N R2

0 Rs R3


R
N

N
H




R4


R


1 N
R1


N 0


2,5-pyridine




3,4-pyridine



1-piperidine


2-pyrazine



2,3,5,6-pyrazine




1-piperazine





1,4-piperazine



1-pyrimidine




1,5-pyrimidine-2,4-dione


PY2 6MROOPY2




PY3 6MROOPY3



1PP 6MR001PP


ZI1 6MROOZI1



ZI2 6MROOZI2




PPZ 6MROOPPZ





PZ2 6MROOPZ2



MI1 6MROOMI1




MI2 6MROOMI2


? R3
R4


R6 N R3





R4


3,4-pyridazine



3,6-pyridazine




3,4,5,6-pyridazine
Continued on next


ZA1 6MROOZA1



ZA2 6MROOZA2


ZA3
page.


6MROOZA3










Table C-9. (continued)


Structure Name


8L C('l


R6s ,N R2
N-0


N R2
N N

1 R4
R6 N R2
N N

2 R4
0


o0
1 5Rs R5b
0
RI.'N 'NH


2 Rsa R5b


1
OH

9 "OH
OH


R

VoO


R









0 0
5 R2


2,6-pyridazin-3-one



2,4-1,3,5-triazine



2,4,6-1,3,5-triazine


5,5-barbituric acid




1,5,5-barbituric acid


2-phenol


3,4-diphenol



2-diphenylether





1,4-benzoate






1,4-benzoate ester


ZA4 6MROOZA4



All 6MROOAll



A12 6MROOA12


Bl1 6MROOB11




B12 6MROOB12


2PO 6MR002PO


3PO 6MR003PO



DPE 6MROODPE





14B 6MR0014B






BE1 6MROOBE1


2,3,4,5,6-benzoate 1B4
Continued on next page.


6MR001B4










Table C-9. (continued)


Structure Name


8L C('l


R4
Rs5 R3

R6 R2
0 0
R7


O R2

1 U
R6 0 R2

R3
2 R4a R4b
R


1 0

R R3

2 Rio R1
R 0 .OH

HO'I )"'OH
3 OH
R 0 .,R

HO'' 'OH
4 OH
R O ,R

HO'' 'R
5 OH
R
CN

1 0



2 s


01
1 R


2,3,4,5,6-benzoate ester


2-4H-pyran



2,3,4,4,5,6-4H-pyran


4-oxane


2,2,3,3,4,4,5,5,6,6-oxane



alpha-D-glucose



alpha-*D-glucose



2-deoxy-alpha-*D-glucose



Morpholine



Thiomorpholine


6-R-d


elta-lactone
Continued on next pa


BE2 6MROOBE2


C1l 6MROOC11



C12 6MROOC12


Dl1 6MROOD11


D12 6MROOD12



ADG 6MROOADG



ASD 6MROOASD



2DA 6MR002DA



MOR 6MROOMOR



MR1 6MROOMR1



Ell 6MROOEll
ige.










Table C-9. (continued)


Structure Name


8L C('


0
RN


6-R-delta-lactam


E12 6MROOE12


R


3 5



4



5 N R2


2-pyridone



2-thiopyridone


2YO 6MR002YO



2TP 6MR002TP


2-iminopyridine 2IP
Six Membered Ring Fragments.


6MR002IP










C.10 Greater than Six Membered Ring Fragments

Table C-10. Greater than Six Membered Ring Fragments.

Greater than Six Membered Ring Fragments
Structure Name 3L 8L C'g
R


1 Cycloheptyl G61 GMROOG61 0
R


2 Ci-v. .,ctyl G62 GMROOG62 0
Greater than Six Membered Ring Fragments.










C.11 Fused Ring Fragments


Table C-11. Fused Ring Fragments.


Fused Ring Fragments
Name


8L C'!i


1-naphthalene


R



R



R



Rs R4


R7 N R2
R4
R6R2








R6\) R3
R


N
R RI






R3



R4 R3
RR



-7 R8
R7 R?


NAP FROOONAP


ID1 FROOOID1



ID2 FROOOID2




QE1 FR000QE1




QE2 FR000QE2





QE3 FR000QE3



IQ1 FROOOIQ1





IQ2 FROOOIQ2



HT1 FROOOHT1





HT2 FROOOHT2


Structure


1-indan



1-inden-l-yl




2,4,5,7-quinoline



2,4,6-quinoline





2,3,4,5,6,7,8-quinoline



1,3-isoquinoline





1,3,4,5,6,7,8-isoquinoline



3-phthalazine





3,4,5,6,7,8-phthalazine
Continued on next page.










Table C-11. (continued)


Name


8L C('l


0
N'NR2

R4

R4




Rs R4
R6 & R3

R N
R 8

R4

R6 NN
Rj NR2
Rs R4


N IR2


RR N
RS
R R






N) N R3
H2N N N R2
RS









R2 N N
i yNR6

R4




R4


2,4-phthalazinone




3,4-cinnoline





3,4,5,6,7,8-cinnoline




4,6,7-quinazoline





2,4,5,6,7,8-quinazoline





2,3,5,6,7,8-quinoxaline




2,4,6,7-pteridine


6,7-pterin


2,4-pteridine




2,4,6-pteridine
Continued on next page.


HT3 FROOOHT4 0




CI1 FROOOCI1 0





CI2 FROOOCI2 0




IN1 FROOOIN1 0





IN2 FROOOIN2 0





NO1 FROOONO1 0




ER1 FROOER1 0


ER2 FR000ER2


ER3 FR000ER1




ER4 FR000ER1


Structure










Table C-11. (continued)


Name


8L C('y,


0


H N;R6
H2N N N

R3


H
R4


H
R4 R3

N N
H
R3
Rs R
H
R3




R4 R3

N-H

R7 R1

R6


R2 N N
H
R4 R3

R I IH

R7


0 II "R
R


6-pterin


3-indole


ER5 FROOOER1


LE1 FROOOLE1


4-indole


LE2 FROOOLE2


3,4-indole




3,5-indole




1,2,3,5-indole





1,3,4,5,6,7-isoindole




2,6,8-purine





3,4,5,6,7-indazole



1-benzotriazole


LE3 FROOOLE3




LE4 FROOOLE4




LE5 FROOOLE5





LE6 FROOOLE6




PU1 FROOOPU1





Zll FROOOZ11



Y11 FROOOY11


1-1,3-benzodiazepine
Continued on next page.


N11 FROONll


Structure


R

CG/










Table C-11. (continued)


Name


8L C('l


1-1,4-benzodiazepine


N12 FROOON12


N R


H 0
R4
R3


R4 R4
R6 R3

0 0
Ra




R6

N 0 R2

0




11
0



Rs ,

Rs
0
R5 0


(Y,0


5I01


R3

CZ 0


4-1,5-benzodiazepin-2-one



3,4,7-coumarin





3,4,5,6,7,8-coumarin




2,2,6,7,8,9-chroman



2,3-chromone



6-chromone






3,5,7,10,11-flavone



5-1,4-benzodioxane


2-benzofuran



3-benzofuran
Continued on next page.


N13 FROOON13



CU1 FROOOCU1




CU2 FROOOCU2




CHR FROOOCHR



CE1 FROOOCE1



CE2 FROOOCE2






CE3 FROOOCE3



BD1 FROOOBD1


BF1 FROOOBF1



BF2 FROOOBF2


Structure
R


CN











Table C-11. (continued)


Name


8L C('l


5-benzofuran





4-phthalide


4-1,3-benzodioxole


BF3 FROOOBF3





PH1 FROOOPH1


BZ1 FROOOBZ1


R4 R3

-R2
R7
R3


R6

R3

R2


R4 R3

R RI
R7 R1


NOQ
R-<1
0


2,3,4,5,6,7-benzothiophene



3,6-benzothiophene



2,3,6-benzothiophene





1,3,4,5,6,7-isobenzothiophene


Benzoxazole


BN1 FROOOBN1



BN2 FROOOBN2



3BT FR0003BT





BN4 FROOOBN4


BXZ FROOOBXZ


3,4,5,6,7-benzisoxazole



3,5-benzisoxazole





2,4,5,6,7-benzothiazole
Continued on next page.


BI1 FROOOBI1 0



BI2 FROOOBI2 0





BH1 FROOOBH1 0


Structure
Rs

o
0


R4


R3 R4

Rs

R7
R3
Rs
N,
'0 _


R4


R7










Table C-11. (continued)


Name


8L C('l


R2

S R' R


R( ON O
2 H



R 0
R


2



3 0 0




1 Rio
R 0



2 0
R9


3 R3 0 6
R


n 0


R
N
-


2,6-benzothiazole


2-benzothiazole


3-1,4-benzoxazine



6-1,4-benzoxazin-3(4H)-one



1-fluorenone



1-dibenzofuran



carbazol-9-yl




1,10-anthracene




1-dioxoanthracene



3,6,9-xanthen-9-yl



1-oxanthrene


1-acridine
Continued on next page.


BH2 FROOOBH2


BH3 FROOOBH3


BX1 FROOOBX1



BX2 FROOOBX2



1FO FR0001FO



X 1 FROOOX11



X12 FROOOX12




AN1 FROOOAN1




IDA FROOOIDA



XA1 FROOOXA1



XO1 FROOOXO1


CR1 FROOOCR1


Structure










Table C-11. (continued)


Name


8L C('l


R1 Rg R8
R2'- -N- \i- R7


R4 Rs

R2 N


R1 R



R4 R5


1,2,3,4,5,6,7,8,9-acridine


2-phenazine


CR2 FROOOCR2


EZ1 FROOOEZ1


1,2,3,4,5,6,7,8-phenazine
Fused Ring Fragments.


EZ2 FROOOEZ2


Structure









REFERENCES


[1] T. M. Speight and N. H. G. Holford, editors. Avery's Drug Treatment. Adis Press,
Auckland, New Zealand, 4th edition, 1997.

[2] N. A. Roberts, J. A. Martin, D. Kinchington, A. V. Broadhurst, J. C. Ci i-_
I. B. Duncan, S. A. Galpin, B. K. Handa, J. Kay, A. Krohn, R. W. Lambert,
J. H. Merrett, J. S. Mills, K. E. B. Parkes, S. Redshaw, A. J. Ritchie, D. L.
Taylor, G. J. Thomas, and P. J. Machin. Rational Design of Peptide-Based HIV
Proteinase-Inhibitors. Science, 248(4953):358-361, 1990.

[3] D. W. Cushman, M. A. Ondetti, E. M. Gordon, S. Natarai 1,i D. S. Karanewsky,
J. Krapcho, and Jr. Petrillo, E. W. Rational design and biochemical utility of
specific inhibitors of angiotensin-converting enzyme. J. Cardiovasc. Pharmacol.,
10:S17-30, 1987.

[4] A. R. Leach, B. K. Shoichet, and C. E. Peishoff. Prediction of protein-ligand
interactions, docking and scoring: Successes and gaps. J. Med. Ci ,
49(20):5851-5855, 2006.

[5] P. Ferrara, H. Gohlke, D. J. Price, G. Klebe, and C. L. Brooks. Assessing scoring
functions for protein-ligand interactions. J. Med. C'. i, ,, 47(12):3032-3047, 2004.

[6] M. B. Peters, K. Raha, and K. M. Merz Jr. Quantum mechanics in structure-based
drug design. Curr. Opin. Drug Discovery Dev., 9(3):370-379, 2006.

[7] K. Raha, A. J. van der Vaart, K. E. Riley, M. B. Peters, L. M. Westerhofft, H. Kim,
and K. M. Merz Jr. Pairwise decomposition of residue interaction energies using
semiempirical quantum mechanical methods in studies of protein-ligand interaction.
J. Am. C, I,, Soc., 127(18):6583-6594, 2005.

[8] A. R. Ortiz, M. T. Pisabarro, F. Gago, and R. C. Wade. Prediction of
Drug-Binding Affinities by Comparative Binding-Energy Analysis. J. Med. CI, ,
38(14):2681-2691, 1995.

[9] M. B. Peters and K. M. Merz Jr. Semiempirical comparative binding energy
analysis (SE-COMBINE) of a series of trypsin inhibitors. J. C', i,, Theory Comput.,
2(2):383-399, 2006.

[10] R. D. Cramer III, D. E. Patterson, and J. D. Bunce. Comparative Molecular Field
Analysis (CoMFA). 1. Effect of Shape on Binding of Steroids to Carrier Proteins. J.
Am. Ch, I, Soc., 110:5959-5967, 1988.

[11] G. Klebe, U. Abraham, and T. Mietzner. Molecular Similarity Indexes in a
Comparative-Analysis (CoMSIA) of Drug Molecules to Correlate and Predict
their biological-activity. J. Med. C'h, i, 37(24):4130-4146, 1994.

[12] G. Klebe. Comparative molecular similarity indices analysis: CoMSIA. Persp. Drug
Disc. Design, 12:87-104, 1998.









[13] F. Estienne, Y. Vander Heyden, and D. L. Massart. C'!, .ii. ii. 1ii1 s and modeling.
Chimia, 55(1-2):70-80, 2001.

[14] S. Wold, M. Sjostrom, and L. Eriksson. PLS-regression: a basic tool of
chemometrics. Ch., in.i Intell. Lab. Syst., 58(2):109-130, 2001.

[15] F. C. Bernstein, T. F. Koetzle, G. J. B. Williams, E. F. 1M. i,- r, M. D. Brice, J. R.
Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi. Protein Data Bank -
Computer-Based Archival File for Macromolecular Structures. J. Mol. Biol.,
112(3):535-542, 1977.

[16] D. A. Case, T. A. Darden, T. E. C(!. !, .i1, III, C. L. Simmerling, J. Wang, R. E.
Duke, R. Luo, K. M. Merz Jr., D. A. Pearlman, M. M. Crowley, R. C. R.C. Walker,
W. W. Zhang, B. Wang, S. Hayik, A. Roitberg, G. Seabra, K. F. Wong, F. Paesani,
X. Wu, S. Brozell, V. Tsui, H. Gohlke, L. Yang, C. Tan, J. Mongan, V. Hornak,
G. Cui, P. Beroza, D. H. Mathews, C. Schafmeister, W. S. Ross, and P. A. Kollman.
AMBER 9, 2006.

[17] T. Fink, H. Bruggesser, and J. L. Reymond. Virtual exploration of the
small-molecule chemical universe below 160 daltons. Angew. C'h.( i Int.,
44(10):1504-1508, 2005.

[18] M. A. Koch, A. Schuffenhauer, M. Scheck, S. Wetzel, M. Casaulta, A. Odermatt,
P. Ertl, and H. Waldmann. ('C! i'ting biologically relevant chemical space: A
structural classification of natural products (SCONP). Proc. Natl. Acad. Sci. U. S.
A., 102(48):17272-17277, 2005.

[19] D. G. Lloyd, G. Golfis, A. J. S. Knox, D. Fayne, M. J. Meegan, and T. I. Oprea.
Oncology exploration: charting cancer medicinal chemistry space. Drug Discov.
T.,I,I.r 11(3-4):149-159, 2006.

[20] T. Fink and J. L. Reymond. Virtual exploration of the chemical universe up
to 11 atoms of C, N, 0, F: Assembly of 26.4 million structures (110.9 million
stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical
properties, compound classes, and drug discovery. J. C'h, ii Inf. Model.,
47(2):342-353, 2007.

[21] A. Schuffenhauer, P. Ertl, S. RB.-.- S. Wetzel, M. A. Koch, and H. Waldmann.
The scaffold tree visualization of the scaffold universe by hierarchical scaffold
classification. J. C. ,, Inf. Model., 47(1):47-58, 2007.

[22] R. E. Babine and S. L. Bender. Molecular recognition of protein-ligand complexes:
Applications to drug design. C'. I,, Rev., 97(5):1359-1472, 1997.

[23] J. G. Robertson. Mechanistic basis of enzyme-targeted drugs. Biochemistry,
44(15):5561-5571, 2005.









[24] R. B. Silverman. The oClri..' chemistry of ,..,i, ,,-,l,.., 1: reactions. Academic
Press, San Diego, 2002.

[25] P. A. Boriack-Sjodin, S. Zeitlin, H. H. C'!. i., L. Crenshaw, S. Gross,
A. Danil ii, i iv ii, P. Delgado, J. A. May, T. Dean, and D. W. C('!~!,-I, ,-..,I.
Structural analysis of inhibitor binding to human carbonic anhydrase II. Protein
Sci., 7(12):2483-9, 1998.

[26] Ajay and M. A. Murcko. Computational methods to predict binding free energy in
ligand-receptor complexes. J. Med. C(' i ,,, 38(26):4953-4967, 1995.

[27] J. J. Irwin and B. K. Shoichet. ZINC-a free database of commercially available
compounds for virtual screening. J. C'l,. i Inf. Model., 45(1):177-82, 2005.

[28] J. J. Irwin, F. M. Raushel, and B. K. Shoichet. Virtual screening against
metalloenzymes for inhibitors and substrates. Biochemistry, 44(37):12316-28,
2005.

[29] R. E. Dolle, B. Le Bourdonnec, G. A. Morales, K. J. Moriarty, and J. M. Salvino.
Comprehensive survey of combinatorial library synthesis: 2005. J. Comb. C'l, ,
8(5):597-635, 2006.

[30] A. J. S. Knox, M. J. Meegan, G. Carta, and D. G. Lloyd. Considerations in
compound database preparation-" hidden" impact on virtual screening results. J.
C'i ,, Inf. Model., 45(6):1908-1919, 2005.

[31] A. J. S. Knox, M. J. Meegan, and D. G. Lloyd. Estrogen receptors: Molecular
interactions, virtual screening and future prospects. Curr. Top. Med. C'I, ,
6(3):217-243, 2006.

[32] J. C. Baber, A. S. William, Y. H. Gao, and M. Feher. The use of consensus scoring
in ligand-based virtual screening. J. C'. i,, Inf. Model., 46(1):277-288, 2006.

[33] W. P. Walters and M. A. Murcko. Prediction of 'drug-likeness'. Adv. Drug Delivery
Rev., 54(3):255-271, 2002.

[34] M. Feher and J. M. Schmidt. Property distributions: Differences between drugs,
natural products, and molecules from combinatorial chemistry. J. C', i,, Inf.
Comput. Sci., 43(1):218-227, 2003.

[35] M. C. Hutter. Separating drugs from nondrugs: A statistical approach using atom
pair distributions. J. C', i,, Inf. Model., 47(1):186-194, 2007.

[36] M. Snarey, N. K. Terrett, P. Willett, and D. J. Wilton. Comparison of algorithms
for dissimilarity-based compound selection. J. Mol. Graphics Modell., 15(6):372-385,
1997.









[37] S. L. Dixon and K. M. Merz Jr. One-dimensional molecular representations
and similarity calculations: Methodology and validation. J. Med. Ci ,
44(23):3795-3809, 2001.

[38] T. Ewing, J. C. Baber, and M. Feher. Novel 2D fingerprints for ligand-based virtual
screening. J. Ch, i,, Inf. Model., 46(6):2423-2431, 2006.

[39] M. Stahl and H. Mauser. Database clustering with a combination of fingerprint and
maximum common substructure methods. J. C'I. ,, Inf. Model., 45(3):542-548,
2005.

[40] P. Willett. Searching techniques for databases of two- and three-dimensional
chemical structures. J. Med. C'h, ,, 48(13):4183-4199, 2005.

[41] I. Ml. -- S. L. Heald, and D. Brittelli. Simple selection criteria for drug-like
chemical matter. J. Med. C. i,, 44(12):1841-1846, 2001.

[42] C. A. Lipinski, F. Lombardo, B. W. Dominy, and P. J. Feeney. Experimental and
computational approaches to estimate solubility and permeability in drug discovery
and development settings. Adv. Drug Delivery Rev., 23(1-3):3-25, 1997.

[43] M. Hornig and A. Klamt. COSMOfrag: A novel tool for high-throughput ADME
property prediction and similarity screening based on quantum chemistry. J. C', i,,
Inf. Model., 45(5):1169-1177, 2005.

[44] A. L. C'!. n and K. M. Merz Jr. Prediction of aqueous solubility of a diverse set
of compounds using quantitative structure-property relationships. J. Med. C', ,,
46(17):3572-3580, 2003.

[45] A. C'!. n and S. L. Dixon. In silico models for the prediction of dose-dependent
human hepatotoxicity. J. Comput. Aided Mol. Des., 17(12):811-23, 2003.

[46] R. G. Susnow and S. L. Dixon. Use of robust classification techniques for the
prediction of human cytochrome P450 2D6 inhibition. J. C', i,, Inf. Comput. Sci.,
43(4):1308-1315, 2003.

[47] W. J. Egan, K. M. Merz Jr., and J. J. Baldwin. Prediction of drug absorption using
multivariate statistics. J. Med. C('. i 43(21):3867-3877, 2000.

[48] J. C. Baber and M. Feher. Predicting synthetic accessibility: Application in drug
discovery and development. Mini-Rev. Med. Ch. in 4(6):681-692, 2004.

[49] M. Stahl, N. P. Todorov, T. James, H. Mauser, H. J. Boehm, and P. M. Dean. A
validation study on the practical use of automated de novo design. J. Comput.-Aided
Mol. Des., 16(7):459-478, 2002.

[50] P. M. Dean, D. G. Lloyd, and N. P. Todorov. De novo drug design: Integration
of structure-based and ligand-based methods. Curr. Opin. Drug Discovery Dev.,
7(3):347-353, 2004.









[51] H. Mauser and M. Stahl. ('! i.ii. d fragment spaces for de novo design. J. C'l, i,
Inf. Model., 47(2):318-324, 2007.

[52] D. G. Lloyd, C. L. Buenemann, N. P. Todorov, D. T. Manallack, and P. M. Dean.
Scaffold hopping in de novo design. ligand generation in the absence of receptor
information. J. Med. ChI,,n 47(3):493-496, 2004.

[53] H. Gohlke, M. Hendlich, and G. Klebe. Predicting binding modes, binding affinities
and 'hot spots' for protein-ligand complexes using a knowledge-based scoring
function. Perspect. Drug Discovery Des., 20(1):115-144, 2000.

[54] B. A. Grzybowski, A. V. Ishchenko, J. Shimada, and E. I. Shakhnovich. From
knowledge-based potentials to combinatorial lead design in silico. Aceh.C'I, i, Res.,
35(5):261-269, 2002.

[55] B. A. Grzybowski, A. V. Ishchenko, C. Y. Kim, G. Topalov, R. ('!C 11, 1i,
D. W. C'!i -I, i,-~~, G. M. Whitesides, and E. I. Shakhnovich. Combinatorial
computational method gives new picomolar ligands for a known enzyme. Proc. Natl.
Acad. Sci. U. S. A., 99(3):1270-1273, 2002.

[56] M. Feher, E. Deretey, and S. Roy. BHB: A simple knowledge-based scoring function
to improve the efficiency of database screening. J. C',. i Inf. Comput. Sci.,
43(4):1316-1327, 2003.

[57] H. F. G. Velec, H. Gohlke, and G. Klebe. DrugScore(CSD)-knowledge-based
scoring function derived from small molecule ( rate of near-native ligand poses and better affinity prediction. J. Med. C'l ,,
48(20):6296-6303, 2005.

[58] W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz Jr., D. M. Ferguson,
D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. A. Kollman. A 2nd Generation
Force-Field for the Simulation of Proteins, Nucleic-Acids, and Organic-Molecules. J.
Am. C, i,, Soc., 117(19):5179-5197, 1995.

[59] W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz Jr., D. M. Ferguson,
D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. A. Kollman. A second generation
force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am.
Cih i,, Soc., 118(9):2309-2309, 1996.

[60] D. A. Case, T. E. C'!., i I I T. Darden, H. Gohlke, R. Luo, K. M. Merz Jr.,
A. Onufriev, C. Simmerling, B. Wang, and R. J. Woods. The AMBER biomolecular
simulation programs. J. Comput. Cih i 26(16):1668-1688, 2005.

[61] B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and
M. Karplus. CHARMM a Program for Macromolecular Energy, Minimization, and
Dynamics Calculations. J. Comput. Cih in 4(2):187-217, 1983.









[62] T. A. Halgren. Merck molecular force field.5. Extension of MMFF94 using
experimental data, additional computational data, and empirical rules. J. Com-
put. Ch,. i 17(5-6):616-641, 1996.

[63] T. A. Halgren. Merck molecular force field.3. Molecular geometries and vibrational
frequencies for MMFF94. J. Comrput. Ch. in 17(5-6):553-586, 1996.

[64] T. A. Halgren. Merck molecular force field.2. MMFF94 van der waals and
electrostatic parameters for intermolecular interactions. J. Comput. C'i in
17(5-6):520-552, 1996.

[65] T. A. Halgren. Merck molecular force field.1. Basis, form, scope, parameterization,
and performance of MMFF94. J. Comrput. Ch. i 17(5-6):490-519, 1996.

[66] T. A. Halgren. Representation of van der Waals (vdW) Interactions in Molecular
Mechanics Force-Fields Potential Form, Combination Rules, and vdW parameters.
J. Am. Ch, in Soc., 114(20):7827-7843, 1992.

[67] T. A. Halgren and R. B. Nachbar. Merck molecular force field.4. Conformational
energies and geometries for MMFF94. J. Comput. Ch. in 17(5-6):587-615, 1996.

[68] T. A. Halgren. MMFF VII. C'!I i .'.terization of MMFF94, MMFF94s,
and other widely available force fields for conformational energies and for
intermolecular-interaction energies and geometries. J. Comput. C', i ,
20(7):730-748, 1999.

[69] T. A. Halgren. MMFF VI. MMFF94S Option for Energy Minimization Studies. J.
Comput. C(. I, 20(7):720-729, 1999.

[70] W. L. Jorgensen and J. Tiradorives. The OPLS Potential Functions for Proteins -
Energy Minimizations for Crystals of Cyclic-Peptides and Crambin. J. Am. C'hl i
Soc., 110(6):1657-1666, 1988.

[71] N. L. Allinger, Y. H. Yuh, and J. H. Lii. Molecular Mechanics the MM3 force-field
for Hydrocarbons.1. J. Am. C'h, in Soc., 111(23):8551- .1 ;t 1989.

[72] M. J. S. Dewar and W. Thiel. Ground States of Molecules. 38. the MNDO method.
approximations and Parameters. J. Am. Ch', i Soc., 99(15):4899-4907, 1977.

[73] M. J. S. Dewar, E. G. Zoebisch, E. F. Healy, and J. J. P. Stewart. AMI: A New
General Purpose Quantum Mechanical Molecular Model. J. Am. C'h, in Soc.,
107:3902-3909, 1985.

[74] James J. P. Stewart. Optimization of Parameters for Semiempirical Methods I.
Method. J. Comp. C'. I, 10:209-220, 1989.

[75] James J. P. Stewart. Optimization of Parameters for Semiempirical Methods II.
Applications. J. Comp. Ch. i, 10:221-264, 1989.









[76] W. Thiel and A. A. Voityuk. Extension of MNDO to d Orbitals: Parameters and
Results for the Second-Row Elements and for the Zinc Group. J. Phys. C'hl in
100:616-626, 1996.

[77] M. P. Repasky, J. C'! ,i.h isekhar, and W. L. Jorgensen. PDDG/PM3 and
PDDG/\INDO: Improved semiempirical methods. J. Comput. C'I, .
23(16):1601-1622, 2002.

[78] M. Elstner, D. Porezag, G. Jungnickel, J. Elsner, M. Haugk, T. Frauenheim,
S. Suhai, and G. Seifert. Self-consistent-charge density-functional tight-binding
method for simulations of complex materials properties. Phys. Rev. B,
58(11):7260-7268, 1998.

[79] M. Elstner. The SCC-DFTB method and its application to biological systems.
Theor. CI i,, Ace., 116(1-3):316-325, 2006.

[80] K. W. Sattelmeyer, J. Tirado-Rives, and W. L. Jorgensen. Comparison of
SCC-DFTB and NDDO-based semiempirical molecular orbital methods for organic
molecules. J. Phys. C(, i, A, 110(50):13551-13559, 2006.

[81] A. R. Leach. Molecular modelling: principles and applications. Prentice Hall,
Harlow, England; New York, 2nd edition, 2001.

[82] B. H. Mevik and H. R. Cederkvist. Mean squared error of prediction (\!Sl P)
estimates for principal component regression (PCR) and partial least squares
regression (PLSR). J. Ci., i,,..i 18(9):422-429, 2004.

[83] R. Guha and P. C. Jurs. Development of linear, ensemble, and nonlinear models
for the prediction and interpretation of the biological activity of a set of PDGFR
inhibitors. J. Ci. i,, Inf. Comp. Sci., 44(6):2179-2189, 2004.

[84] M. Karelson, V. S. Lobanov, and A. R. Katritzky. Quantum-C'l., I,,. I Descriptors in
QSAR/QSPR Studies. Ch. i,, Rev., 96(3):1027-1044, 1996.

[85] M. Briistle, B. Beck, T. Schindler, W. King, T. Mitchell, and T. Clark. Descriptors,
Physical Properties, and Drug-Likeness. J. Med. Ch.i in 45:3345-3355, 2002.

[86] J. Wan, L. Z!i i.-: G. Yang, and C. Zhan. Quantitative Structure-Activity
Relationship for Cyclic Imide Derivatives of Protoporphyrinogen Oxidase Inhibitors:
A Study of Quantum ('C!, i ,i I Descriptors from Density Functional Theory. J.
Ch. i,, Inf. Comput. Sci., 44:2099-2105, 2004.

[87] J. J. Sutherland, L. A. O'Brien, and D. F. Weaver. A Comparison of Methods
for Modeling Quantitative Structure-Activity Relationships. J. Med. Cl i ,
47:5541-5554, 2004.

[88] S. Dixon, K. M. Merz Jr., G. Lauri, and J. C. Ianni. QMQSAR: Utilization of a
Semiempirical Probe Potential in a Field-Based QSAR Method. J. Comput. C'l, i
26:23-34, 2005.









[89] A. H. Asikainen, J. Ruuskanen, and K. Tuppurainen. Spectroscopic QSAR methods
and self-organizing molecular field analysis for relating molecular structure and
estrogenic activity. J. CI. in, Inf. Comrput. Sci., 43(6):1974-1981, 2003.

[90] A. H. Asikainen, J. Ruuskanen, and K. A. Tuppurainen. Alternative QSAR models
for selected estradiol and cytochrome P450 ligands: comparison between classical,
spectroscopic, CoMFA and GRID/GOLPE methods. SAR QSAR Environ. Res.,
16(6):555-565, 2005.

[91] D. B. Turner, P. Willett, A. M. Ferguson, and T. Heritage. Evaluation of a novel
infrared range vibration-based descriptor (EVA) for QSAR studies. 1. General
application. J. Cormput. Aided Mol. Des., 11(4):409-22, 1997.

[92] A. M. Ferguson, T. Heritage, P. Jonathon, S. E. Pack, L. Phillips, J. Rogan, and
P. J. Snaith. EVA: A new theoretically based molecular descriptor for use in
QSAR/QSPR analysis. J. Comput.-Aided Mol. Des., 11(2):143-152, 1997.

[93] C. M. R. Ginn, D. B. Turner, P. Willett, A. M. Ferguson, and T. W. Heritage.
Similarity searching in files of three-dimensional chemical structures: Evaluation of
the EVA descriptor and combination of rankings using data fusion. J. C'i, i,, Inf.
Comrput. Sci., 37(1):23-37, 1997.

[94] T. W. Heritage, A. M. Ferguson, D. B. Turner, and P. Willett. EVA: A novel
theoretical descriptor for QSAR studies. Perspect. Drug Discovery Des.,
9-11:381-398, 1998.

[95] D. B. Turner, P. Willett, A. M. Ferguson, and T. W. Heritage. Evaluation of a novel
molecular vibration-based descriptor (EVA) for QSAR studies: 2. Model validation
using a benchmark steroid dataset. J. Comput.-Aided Mol. Des., 13(3):271-296,
1999.

[96] D. B. Turner and P. Willett. The EVA spectral descriptor. Eur. J. Med. C'I ,,
35(4):367-375, 2000.

[97] D. B. Turner and P. Willett. Evaluation of the EVA descriptor for QSAR studies:
3. the use of a genetic algorithm to search for models with enhanced predictive
properties (EVAGA). J. Corput.-Aided Mol. Des., 14(1):1-21, 2000.

[98] M. Ford, L. Phillips, and A. Stevens. Optimising the EVA descriptor for prediction
of biological activity. Org. Biomol. ChIi, 2(22):3301-3311, 2004.

[99] K. Tuppurainen. Frontier orbital energies, hydrophobicity and steric factors as
physical QSAR descriptors of molecular mutagenicity. a review with a case study:
MX compounds. C('. in,,.'..l re, 38(13):3015-3030, 1999.

[100] K. Tuppurainen. EEVA (electronic eigenvalue): A new QSAR/QSPR descriptor
for electronic substituent effects based on molecular orbital energies. SAR QSAR
Environ. Res., 10(1):39-46, 1999.









[101] K. Tuppurainen and J. Ruuskanen. Electronic eigenvalue (EEVA): a new
QSAR/QSPR descriptor for electronic substituent effects based on molecular orbital
energies. a QSAR approach to the Ah receptor binding affinity of polychlorinated
biphenyls (PCBs), dibenzo-p-dioxins (PCDDs) and dibenzofurans (PCDFs). Ci i.'-
sphere, 41(6):843-848, 2000.

[102] K. Tuppurainen, M. Viisas, R. Laatikainen, and M. P( ,1:i-1 i Evaluation of a
novel electronic eigenvalue (EEVA) molecular descriptor for QSAR/QSPR studies:
Validation using a benchmark steroid data set. J. C'I,. ,, Inf. Comput. Sci.,
42(3):607-613, 2002.

[103] R. Bursi, T. Dao, T. van Wijk, M. de Goc.- r, E. Kellenbach, and P. Verwer.
Comparative spectra analysis (CoSA): spectra as three-dimensional molecular
descriptors for the prediction of biological activities. J. C'I,. ,, Inf. Comput. Sci.,
39(5):861-867, 1999.

[104] E. Besalu, X. Giron6s, L. Amat, and R. Carb6-Dorca. Molecular Quantum similarity
and the fundamentals of QSAR. Aceh. I, ,, Res., 35:289-295, 2002.

[105] R. Carb6-Dorca and X. Giron6s. Foundation of Quantum Similarity Measures and
Their Relationship to QSPR: Density Function Structure, Approximations, and
Application Examples. Int. J. Quantum C'l, i 101:8-20, 2005.

[106] P. Bultinck, T. Kuppens, X. Girone, and R. Carb6-Dorca. Quantum similarity
superposition algorithm (QSSA): A consistent scheme for molecular alignment and
molecular similarity based on quantum chemistry. J. IC'Ih ,, Inf. Comput. Sci.,
43(4):1143-1150, 2003.

[107] S. E. O'Brien and P. L. A. Popelier. Quantum Molecular Similarity. 3. QT\ M
Descriptors. J. C.I ,, Inf. Comput. Sci., 41:764-775, 2001.

[108] U. A. C'i iiiilry and P. L. A. Popelier. Estimation of pKa Using Quantum
Topological Molecular Similarity Descriptors: Application to Carboxylic Acids,
Anilines and Phenols. J. Org. CI. i,, 69:233-241, 2004.

[109] P. Y. Ren and J. W. Ponder. Consistent treatment of inter- and intramolecular
polarization in molecular mechanics calculations. J. Comput. C' In
23(16):1497-1506, 2002.

[110] V. Gogonea, D. Suarez, A. van der Vaart, and K. M. Merz Jr. New developments in
applying quantum mechanics to proteins. Curr. Opin. Struct. Biol., 11(2):217-223,
2001.

[111] S. L. Dixon and K. M. Merz Jr. Semiempirical Molecular Orbital Calculations with
Linear System Size Scaling. J. C, in,, Phys., 104:6,, ;-6649, 1996.

[112] S. L. Dixon and K. M. Merz Jr. Fast, Accurate Semiempirical Molecular Orbital
Calculations for Macromolecules. J. Ch, i,, Phys., 107:879-893., 1997.









[113] A. van der Vaart, V. Gogonea, S. L. Dixon, and K. M. Merz Jr. Linear Scaling
Molecular Orbital Calculations of Biological Systems Using the Semiempirical Divide
and Conquer Method. J. Comrput. C'h, ,,, 21:1494-1504, 2000.

[114] W. Kohn. Density-Functional Theory for Systems of Very Many Atoms. Int. J.
Quantum C'i, ,, 56(4):229-232, 1995.

[115] X. P. Li, R. W. Nunes, and D. Vanderbilt. Density-matrix electronic-structure
method with linear system-size scaling. P,;;'I.i ..l Review B, 47(16):10891-10894,
1993.

[116] J. J. P. Stewart. Application of localized molecular orbitals to the solution of
semiempirical self-consistent field equations. Int. J. Quantum C'l, h. 58(2):133-146,
1996.

[117] K. Raha and K. M. Merz Jr. A Quantum Mechanics Based Scoring Function: Study
of Zinc-ion Mediated ligand binding. J. Am. C, i,, Soc., 126:1020-1021, 2004.

[118] K. Raha and K. M. Merz Jr. Large-Scale Validation of a Quantum Mechanics
Based Scoring Function: Predicting the Binding Affinity and the Binding Mode of a
Diverse Set of Protein-Ligand Complexes. J. Med. C' I,, 48:4558-4575, 2005.

[119] H. Fischer and H. Kollmar. Energy Partitioning with the CNDO Method. Theoret.
Chim. Acta., 16:163, 1970.

[120] M. J. S. Dewar and D. H. Lo. Application of Energy Partitioning to the MINDO/2
Method and a Study of Cope Rearragement. J. Am. C'I. ,, Soc., 93:7201-7205,
1971.

[121] S. Olivella and J. Vilarrasa. Application of the Partitioning of Energy in the MNDO
Method to the Study of the Basicity of Imidazole, Pyrazole, Oxazole, and Isoxazole.
J. H. /. I... ;/. 1 C i ,, 18:1189, 1981.

[122] F. M. H. Zipse. C'I irge distribution in the water molecule. a comparison of methods.
J. Comp. C. I,, 26(1):97-105, 2005.

[123] J. B. Li, T. H. Zhu, C. J. Cramer, and D. G. Truhlar. New class IV charge model
for extracting accurate partial charges from wave functions. J. Phys. C', ,, A,
102(10):1820-1831, 1998.

[124] J. B. Li, B. Williams, C. J. Cramer, and D. G. Truhlar. A class IV charge model for
molecular excited states. J. C, I,, Phys., 110(2):724-733, 1999.

[125] J. B. Li, B. Williams, C. J. Cramer, and D. G. Truhlar. A class IV charge model for
molecular excited states. J. CI, ,, Phys., 111(12):5624-5624, 1999.

[126] U. C. Singh and P. A. Kollman. An approach to computing electrostatic charges for
molecules. J. Comrput. Ch. I 5(2):129-145, 1984.









[127] C. I. Bayly, P. Cieplak, W. D. Cornell, and P. A. Kollman. A Well-Behaved
Electrostatic Potential Based Method Using C'! ir'ge Restraints for Deriving Atomic
C'!i irges the RESP Model. J. Phys. C'hI in 97(40):10269-10280, 1993.

[128] W. D. Cornell, P. Cieplak, C. I. Bayly, and P. A. Kollman. Application of RESP
C'Ii irges to Calculate Conformational Energies, Hydrogen-Bond Energies, and
Free-Energies of Solvation. J. Am. Cl. i,, Soc., 115(21):9620-9631, 1993.

[129] J. M. Wang, P. Cieplak, and P. A. Kollman. How well does a restrained electrostatic
potential (RESP) model perform in calculating conformational energies of organic
and biological molecules? J. Comput. C'h in 21(12):1049-1074, 2000.

[130] R. C. Wade, A. R. Ortiz, and F. Gago. Comparative binding energy analysis. Persp.
Drug Disc. Design, 9-11:19-34, 1998.

[131] A. R. Ortiz, M. Pastor, A. Palomer, G. Cruciani, F. Gago, and R. C. Wade.
Reliability of comparative molecular field analysis models: Effects of data scaling
and variable selection using a set of human synovial fluid phospholipase A(2)
inhibitors. J. Med. Ci. ,, 40(7):1136-1148, 1997.

[132] C. Perez, M. Pastor, A. R. Ortiz, and F. Gago. Comparative binding energy analysis
of HIV-1 protease inhibitors: Incorporation of solvent effects and validation as a
powerful tool in receptor-based drug design. J. Med. Ch', in 41(6):836-852, 1998.

[133] T. Wang and R. C. Wade. Comparative binding energy (COMBINE) analysis of
influenza neuraminidase-inhibitor complexes. J. Med. C',. in 44(6):961-971, 2001.

[134] J. Kmunicek, S. Luengo, F. Gago, A. R. Ortiz, R. C. Wade, and J. Damborsky.
Comparative binding energy analysis of the substrate specificity of haloalkane
dehalogenase from xanthobacter autotrophicus GJ10. Biochemistry,
40(30):8905-8917, 2001.

[135] T. Wang, S. Tomic, R. R. Gabdoulline, and R. C. Wade. How optimal are the
binding energetic of barnase and barstar? B;. /*I'I,. J., 87(3):1618-1630, 2004.

[136] S. Tomic, L. Nilsson, and R. C. Wade. Nuclear receptor-DNA binding specificity: A
COMBINE and free-wilson QSAR analysis. J. Med. C',. i 43(9):1780-1792, 2000.

[137] T. Wang and R. C. Wade. Comparative binding energy (COMBINE) analysis of
OppA-peptide complexes to relate structure to binding thermodynamics. J. Med.
C'h. i 45(22):4828-4837, 2002.

[138] M. Murcia and A. R. Ortiz. Virtual screening with flexible docking and
COMBINE-based models. application to a series of factor Xa inhibitors. J. Med.
C'. i, 47(4):805-820, 2004.

[139] K. Hasegawa, T. Kimura, and K. Funatsu. GA strategy for variable selection in
QSAR studies: Enhancement of comparative molecular binding energy analysis by
GA-based PLS method. Quant. Struct.-Act. Relat., 18(3):262-272, 1999.









[140] M. B. Peters. A semiempirical comparative binding energy analysis study of a series
of trypsin inhibitors. Master's thesis, The Pennsylvania State University, 2005.

[141] R. Diestel. Graph 'I .'. ,;Springer, Berlin, 2005.

[142] P. Labute. On the perception of molecules from 3D atomic coordinates. J. C'i, .i
Inf. Model., 45(2):215-221, 2005.

[143] T. R. Cundari, C. Sarbu, and H. F. Pop. Robust fuzzy principal component analysis
(FPCA). a comparative study concerning interaction of carbon-hydrogen bonds with
molybdenum-oxo bonds. J. C'In I Inf. Comp. Sci., 42(6):1363-1369, 2002.

[144] S. Wold, J. Trygg, A. Berglund, and H. Antti. Some recent developments in pls
modeling. CI in.. i Intell. Lab. Syst., 58(2):131-150, 2001.

[145] G. M. Ullmann, E. W. Knapp, and N. M. Kostic. Computational simulation
and analysis of dynamic association between plastocyanin and cytochrome f.
consequences for the electron-transfer reaction. J. Am. Ch' in Soc., 119(1):42-52,
1997.

[146] J. O. A. De Kerpel and U. Ryde. Protein strain in blue copper proteins studied by
free energy perturbations. Proteins: Struct. Funct. Genet., 36(2):157-174, 1999.

[147] M. H. M. Olsson and U. Ryde. The influence of axial ligands on the reduction
potential of blue copper proteins. J. Biol. Inorg. C'h, I 4(5):654-663, 1999.

[148] R. Remenyi and P. Comba. A new general molecular mechanics force field for the
oxidized form fo blue coppper proteins. J. Inorg. Biochem., 86(1):397-397, 2001.

[149] P. Comba, A. Lledos, F. Maseras, and R. Remenyi. Hybrid quantum
mechanics/molecular mechanics studies of the active site of the blue copper proteins
.11111i i- ii and rusticyanin. Inorg. Chim. Acta, 324(1-2):21-26, 2001.

[150] P. Comba and R. Remenyi. A new molecular mechanics force field for the oxidized
form of blue copper proteins. J. Comput. C'h, I 23(7):697-705, 2002.

[151] D. Suarez, N. Diaz, and K. M. Merz Jr. Ureases: Quantum chemical calculations on
cluster models. J. Am. (', in Soc., 125(50):15324-15337, 2003.

[152] G. Estiu and K. M. Merz Jr. Enzymatic catalysis of urea decomposition:
Elimination or hydrolysis? J. Am. C'h, i Soc., 126(38):11832-11842, 2004.

[153] G. Estiu and K. M. Merz Jr. Catalyzed decomposition of urea. Molecular dynamics
simulations of the binding of urea to urease. Biochemistry, 45(14):4429-4443, 2006.

[154] G. Estiu, D. Suarez, and K. M. Merz Jr. Quantum mechanical and molecular
dynamics simulations of ureases and Zn beta-lactamases. J. Comput. C', I ,
27(12):1240-1262, 2006.









[155] The Apache Project. Xerces-C++ Parser. http://xml.apache.org/xerces-c/ (accessed
Oct 1, 2005).

[156] A. M. Wollacott. Computational studies of the '.'l.: .'1'::/', of semiempirical quan-
tum mechanical methods to study protein structure. PhD thesis, The Pennsylvania
State University, 2005.

[157] A. M. Wollacott and K. M. Merz Jr. Haptic applications for molecular structure
manipulation. J. Mol. Graphics Modell., 25(6):801-805, 2007.

[158] R. J. F. Branco, P. A. Fernandes, and M. J. Ramos. Molecular dynamics simulations
of the enzyme cu, zn superoxide dismutase. J. Phys. C. i,, B, 110(33):16754-16762,
2006.

[159] T. Wang and J. J. Zhou. 3DFS: A new 3D flexible searching system for use in drug
design. J. C, i,, Inf. Comput. Sci., 38(1):71-77, 1998.

[160] J. M. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman, and D. A. Case.
Development and testing of a general amber force field. J. Comput. CI, ,
25(9):1157-1174, 2004.

[161] E. C. Meng and R. A. Lewis. Determination of Molecular Topology and Atomic
Hybridization States from Heavy-Atom Coordinates. J. Comput. C',l
12(7):891-898, 1991.

[162] F. H. Allen, O. Kennard, D. G. Watson, L. Brammer, A. G. Orpen, and R. Taylor.
Tables of Bond Lengths Determined by X-Ray and Neutron-Diffraction. 1. Bond
Lengths in Organic-Compounds. J. C', i,, Soc., Perkin Trans. 2, (12):S1-S19, 1987.

[163] J. C. Baber and E. E. Hodgkin. Automatic Assignment of C'I, 1I11. I Connectivity to
Organic-Molecules in the Cambridge Structural Database. J. C', i,, Inf. Comput.
Sci., 32(5):401-406, 1992.

[164] A. G. Orpen, L. Brammer, F. H. Allen, O. Kennard, D. G. Watson, and R. Taylor.
Tables of Bond Lengths Determined by X-Ray and Neutron-Diffraction.2.
Organometallic Compounds and Co-Ordination Complexes of the D-Block and
F-Block Metals. J. C, i,, Soc., Dalton Trans., (12):S1-S83, 1989.

[165] M. Hendlich, F. Rippmann, and G. Barnickel. BALI: Automatic assignment of bond
and atom types for protein ligands in the Brookhaven Protein Databank. J. C'I i,
Inf. Comput. Sci., 37(4):774-778, 1997.

[166] J. M. Wang, W. Wang, P. A. Kollman, and D. A. Case. Automatic atom type
and bond type perception in molecular mechanical calculations. J. Mol. Graphics
Modell., 25(2):247-260, 2006.

[167] B. T. Fan, A. Pa,, ,i,., J. P. Doucet, and A. Barbu. Ring Perception A New
Algorithm for Directly Finding the Smallest Set of Smallest Rings from a Connection
Table. J. C, i,, Inf. Comput. Sci., 33(5):657-662, 1993.









[168] B. L. Roos-kozel and W. L. Jorgensen. Computer-Assisted Mechanistic Evaluation
of Organic-Reactions.2. Perception of Rings, Aromaticity, and Tautomers. J. C'l. i ,,
Inf. Comrput. Sci., 21(2):101-111, 1981.

[169] M. Lipton and W. C. Still. The multiple minimum problem in molecular modeling
tree searching internal coordinate conformational space. J. Comput. C'I, In
9(4):343-355, 1988.

[170] J. R. Ullmann. An algorithm for subgraph isomorphism. J. AC_'l 23(1):31-42, 1976.

[171] Wilson T. Willett, P. and S. F. R., '.1-, iv. Atom-by-atom searching using massive
parallelism, implementation of the ullmann subgraph isomorphism algorithm on the
distributed array processor. J. C'hI I, Inf. Model., 31(2):225-233, 1991.

[172] E. J. Barker, D. Buttar, D. A. Cosgrove, E. J. Gardiner, P. Kitts, P. Willett, and
V. J. Gillet. Scaffold hopping using clique detection applied to reduced graphs. J.
Ch, n,, Inf. Model., 46(2):503-511, 2006.

[173] S. K. Kearsley. On the Orthogonal Transformation Used for Structural Comparisons.
Acta Cr;,.lIll.i.,., Sect. A: Found. Cr;..ll.it,., 45:208-210, 1989.

[174] W. Kabsch. Solution for Best Rotation to Relate 2 Sets of Vectors. Acta Cr;.il-il-
logr., Sect. A: Found. Cr;,'l..lit., 32:922-923, 1976.

[175] W. Kabsch. Discussion of Solution for Best Rotation to Relate 2 Sets of Vectors.
Acta Cr;..lIll.'i,., Sect. A: Found. Crq.il.1.t,., 34:827-828, 1978.

[176] G. Carta, V. Onnis, A. J. S. Knox, D. Fayne, and D. G. Lloyd. Permuting input
for more effective sampling of 3D conformer space. J. Comput.-Aided Mol. Des.,
20(3):179-190, 2006.

[177] C. Lemmen, M. Zimmermann, and T. Lengauer. Multiple molecular superpositioning
as an effective tool for virtual database screening. Perspect. Drug Discovery Des.,
20(1):43-62, 2000.

[178] F. D i- ', rt, M. de Jonge, J. Heeres, L. Koymans, P. Lewi, M. H. Vinkers, and
P. A. J. Janssen. A pharmacophore docking algorithm and its application to the
cross-docking of 18 HIV-NNRTI's in their binding pockets. Proteins: Struct. Funct.
Genet., 54(3):526-533, 2004.

[179] C. Lemmen and T. Lengauer. Computational methods for the structural alignment
of molecules. J. Comput.-Aided Mol. Des., 14(3):215-232, 2000.

[180] Q. C'!, i, R. E. Higgs, and M. Vieth. Geometric accuracy of three-dimensional
molecular overl-,- J. C(',, In Inf. Model., 46(5):1996-2002, 2006.

[181] S. K. Drayton, K. Edwards, N. Jewell, D. B. Turner, D. J. Wild, P. Willett, P. M.
Wright, and K. Simmons. Similarity searching in files of three-dimensional chemical









structures: Identification of bioactive molecules. Internet J. C'l, I, 1(37):CP3-U34,
1998.

[182] F. Melani, P. Gratteri, M. Adamo, and C. Bonaccini. Field interaction and
geometrical overlap: A new simplex and experimental design based computational
procedure for superposing small ligand molecules. J. Med. C'i, 11, 46(8):1359-1371,
2003.

[183] R. P. Sheridan, R. Nilakantan, J. S. Dixon, and R. Venkataraghavan. The ensemble
approach to distance geometry application to the nicotinic pharmacophore. J. Med.
C7.ii, 29(6):899-906, 1986.

[184] S. K. Kearsley and G. M. Smith. An alternative method for the alignment of
molecular structures: Maximizing electrostatic and steric overlap. Tetrahedron
Corlput. Methodol., 3:615-633, 1990.

[185] A. C. Good, E. E. Hodgkin, and W. G. Richards. Utilization of gaussian functions
for the rapid evaluation of molecular similarity. J. C', i,, Inf. Comrput. Sci.,
32(3):188-191, 1992.

[186] Y. C. Martin, M. G. Bures, E. A. Danaher, J. Delazzer, I. Lico, and P. A. Pavlik. A
fast new approach to pharmacophore mapping and its application to dopaminergic
and benzodiazepine agonists. J. Comput.-Aided Mol. Des., 7(1):83-102, 1993.

[187] B. B. Masek, A. Merchant, and J. B. Matthew. Molecular Shape Comparison of
Angiotensin-II Receptor Antagonists. J. Med. Ch, i,, 36(9):1230-1238, 1993.

[188] G. Klebe, T. Mietzner, and F. Weber. Different approaches toward an automatic
structural alignment of drug molecules applications to sterol mimics, thrombin and
thermolysin inhibitors. J. Comput.-Aided Mol. Des., 8(6):751-778, 1994.

[189] A. N. Jain, T. G. Dietterich, R. H. Lathrop, D. ('!i .pin .1, R. E. Critchlow, B. E.
Bauer, T. A. Webster, and T. Lozanoperez. Compass a shape-based machine
learning tool for drug design. J. Comput.-Aided Mol. Des., 8(6):635-652, 1994.

[190] G. Jones, P. Willett, and R. C. Glen. A genetic algorithm for flexible molecular
overlay and pharmacophore elucidation. J. Comput.-Aided Mol. Des., 9(6):532-549,
1995.

[191] R. A. Dammkoehler, S. F. Karasek, E. F. B. Shands, and G. R. Marshall. Sampling
conformational hyperspace: Techniques for improving completeness. J. Comput.-
Aided Mol. Des., 9(6):491-499, 1995.

[192] C. Mcmartin and R. S. Bohacek. Flexible Matching of Test Ligands to a 3D
Pharmacophore Using a Molecular Superposition Force-Field Comparison of
Predicted and Experimental Conformations of Inhibitors of 3 Enzymes. J. Comput.-
Aided Mol. Des., 9(3):237-250, 1995.









[193] T. D. J. Perkins, J. E. J. Mills, and P. M. Dean. Molecular surface-volume and
property matching to superpose flexible dissimilar molecules. J. Comput.-Aided Mol.
Des., 9(6):479-490, 1995.

[194] M. Petitjean. Geometric molecular similarity from volume-based distance
minimization application to saxitoxin and tetrodotoxin. J. Comput. CI, I,
16(1):80-90, 1995.

[195] J. A. Grant, M. A. Gallardo, and B. T. Pickup. A fast method of molecular shape
comparison: A simple application of a gaussian description of molecular shape. J.
Comput. Ci. I, 17(14):1653-1666, 1996.

[196] C. Lemmen and T. Lengauer. Time-efficient flexible superposition of medium-sized
molecules. J. Comput.-Aided Mol. Des., 11(4):357-368, 1997.

[197] A. J. Mcmahon and P. M. King. Optimization of Carbo molecular similarity index
using gradient methods. J. Comput. C. i, ,,, 18(2):151-158, 1997.

[198] A. Coss6-Barbi and M. Raji. Discrete pattern recognition by fitting onto a
continuous function. J. Comput. Ch. i ,,, 18(15):1875-1892, 1997.

[199] J. Mestres, D. C. Rohrer, and G. M. MI .-.-i i. Mimic: A molecular-field matching
program. exploiting applicability of molecular similarity approaches. J. Comput.
C'. I, 18(7):934-954, 1997.

[200] J. W. M. Nissink, M. L. Verdonk, J. Kroon, T. Mietzner, and G. Klebe.
Superposition of molecules: Electron density fitting by application of fourier
transforms. J. Comput. Ch'. i ,t 18(5):638-645, 1997.

[201] M. F. Parretti, R. T. Kroemer, J. H. Rothman, and W. G. Richards. Alignment
of molecules by the Monte Carlo optimization of molecular similarity indices. J.
Comput. C'. I, 18(11):1344-1353, 1997.

[202] M. Cocchi and P. G. De Benedetti. Use of the supermolecule approach to derive
molecular similarity descriptors for QSAR analysis. J. Mol. Model., 4(3):113-131,
1998.

[203] M. C. De Rosa and A. Berglund. A new method for predicting the alignment of
flexible molecules and orienting them in a receptor cleft of known structure. J. Med.
C'. I ,, 41(5):691-698, 1998.

[204] S. Handschuh, M. Wagener, and J. Gasteiger. Superposition of three-dimensional
chemical structures allowing for conformational flexibility by a hybrid method. J.
C' I,, Inf. Comput. Sci., 38(2):220-232, 1998.

[205] T. Wang and J. J. Zhou. 3DFS: 3D flexible searching system for lead discovery new
version 1.2. Journal of Molecular Modeling, 5(11):231-251, 1999.









[206] C. Lemmen, C. Hiller, and T. Lengauer. RigFit: A new approach to superimposing
ligand molecules. J. Cormput.-Aided Mol. Des., 12(5):491-502, 1998.

[207] M. D. Miller, R. P. Sheridan, and S. K. Kearsley. SQ: a program for rapidly
producing pharmacophorically relevent molecular superpositions. J. Med. Ci, In
42(9):1505-14, 1999.

[208] G. Klebe, T. Mietzner, and F. Weber. Methodological developments and strategies
for a fast flexible superposition of drug-size molecules. J. Comrput.-Aided Mol. Des.,
13(1):35-49, 1999.

[209] M. de Caceres, J. Villa, J. J. Lozano, and F. Sanz. MIPSIM: similarity analysis of
molecular interaction potentials. Bioinformatics, 16(6):568-569, 2000.

[210] D. A. Cosgrove, D. M. B -v-ida, and A. P. Johnson. A novel method of aligning
molecules by local surface shape similarity. J. Comrput.-Aided Mol. Des.,
14(6):573-591, 2000.

[211] M. Feher and J. M. Schmidt. Multiple flexible alignment with seal: A study
of molecules acting on the colchicine binding site. J. C'h. Inf. Comrput. Sci.,
40(2):495-502, 2000.

[212] X. Girones, D. Robert, and R. Carb6-Dorca. TGSA: A molecular superposition
program based on topo-geometrical considerations. J. Comrp. C'I, i 22(2):255-263,
2001.

[213] X. Girones and R. Carb6-Dorca. TGSA-flex: Extending the capabilities of the
topo-geometrical superposition algorithm to handle flexible molecules. J. Comrp.
C'. ,, 25(2):153-159, 2004.

[214] P. Labute, C. Williams, M. Feher, E. Sourial, and J. M. Schmidt. Flexible alignment
of small molecules. J. Med. Ch. m 44(10):1483-1490, 2001.

[215] J. E. J. Mills, I. J. P. de Esch, T. D. J. Perkins, and P. M. Dean. Slate: A method
for the superposition of flexible ligands. J. Comrput.-Aided Mol. Des., 15(1):81-96,
2001.

[216] M. C. Pitman, W. K. Huber, H. Horn, A. Kramer, J. E. Rice, and W. C. Swope.
FLASHFLOOD: A 3D field-based similarity search and alignment method for
flexible molecules. J. Comrput.-Aided Mol. Des., 15(7):587-612, 2001.

[217] A. Kramer, H. W. Horn, and J. E. Rice. Fast 3D molecular superposition and
similarity search in databases of flexible molecules. J. Comrput.-Aided Mol. Des.,
17(1):13-38, 2003.

[218] S. P. Korhonen, K. Tuppurainen, R. Laatikainen, and M. P( 1:i-1 i Comparing
the performance of FLUFF-BALL to SEAL-CoMFA with a large diverse estrogen
data set: From relevant superpositions to solid predictions. J. C'n,, Inf. Model.,
45(6):1874-1883, 2005.









[219] A. J. Tervo, T. Ronkko, T. H. Nyronen, and A. Poso. BRUTUS: Optimization of a
grid-based similarity function for rigid-body molecular superposition. I. alignment
and virtual screening applications. J. Med. C'. in 48(12):4076-4086, 2005.

[220] T. Ronkko, A. J. Tervo, J. Parkkinen, and A. Poso. BRUTUS: Optimization of a
grid-based similarity function for rigid-body molecular superposition. II. description
and characterization. J. Comput.-Aided Mol. Des., 20(4):227-236, 2006.

[221] S. J. Cho and Y. X. Sun. FLAME: A program to flexibly align molecules. J. C' i,,
Inf. Model., 46(1):298-306, 2006.

[222] J. Marialke, R. Korner, S. Tietze, and J. Apostolakis. Graph-based molecular
alignment (GMA). J. C'h, i, nf. Model., 47(2):591-601, 2007.

[223] G. Klebe and T. Mietzner. A fast and efficient method to generate biologically
relevant conformations. J. Comput.-Aided Mol. Des., 8(5):583-606, 1994.

[224] J. Sadowski and J. Bostr6m. Mimumba revisited: Torsion angle rules for conformer
generation derived from x-ray structures. J. C'h ,, Inf. Model., 46(6):2305-2309,
2006.

[225] F. D,, i- irt, M. de Jonge, J. Heeres, L. Koymans, P. Lewi, W. van den Broeck, and
M. Vinkers. Pareto optimal flexible alignment of molecules using a non-dominated
sorting genetic algorithm. Ch,. ii..i Intell. Lab. Syst., 77(1-2):232-237, 2005.

[226] A. Strizhev, E.J. Abrahamian, S. Choi, J.M. Leonard, P.R.N. Wolohan, and R.D.
Clark. The Effects of Biasing Torsional Mutations in a Conformational GA. J.
Ch.I,, Inf. Model., 46(4):1862-1870, 2006.

[227] D.K. Agrafiotis, A.C. Gibbs, F. Zhu, S. Izrailev, and E. Martin. Conformational
sampling of bioactive molecules: A comparative study. J. C'I,. i Inf. Model.,
47(3):1067-1086, 2007.

[228] J. Bostr6m, P. O. Norrby, and T. Liljefors. Conformational energy penalties of
protein-bound ligands. J. Comput.-Aided Mol. Des., 12(4):383-396, 1998.

[229] J. Bostr6m. Reproducing the conformations of protein-bound ligands: A critical
evaluation of several popular conformational searching tools. J. Comput.-Aided Mol.
Des., 15(12):1137-1152, 2001.

[230] D. J. Diller and K. M. Merz Jr. Can we separate active from inactive conformations?
J. Comput.-Aided Mol. Des., 16(2):105-112, 2002.

[231] J. Bostr6m, J. R. Greenwood, and J. Gottfries. Assessing the performance of
omega with respect to retrieving bioactive conformations. J. Mol. Graphics Modell.,
21(5):449-462, 2003.

[232] S. Putta, G. A. Landrum, and J. E. Penzotti. Conformation mining: An algorithm
for finding biologically relevant. J. Med. C.In 48(9):3313-3318, 2005.









[233] S. L. Dixon and K. M. Merz Jr. QMALIGN.

[234] C. Lemmen, T. Lengauer, and G. Klebe. FLEXS: A method for fast flexible ligand
superposition. J. Med. C'. I 41(23):4502-4520, 1998.

[235] R Development Core Team. R: A lu,o;a,w.ij.: and environment for statistical comput-
ing. R Foundation for Statistical Computing, Vienna, Austria, 2005.

[236] T. M. Willson, P. J. Brown, D. D. Sternbach, and B. R. Henke. The PPARs: From
orphan receptors to drug discovery. J. Med. ChI in 43(4):527-550, 2000.

[237] J. C. Parker. Troglitazone: the discovery and development of a novel therapy for the
treatment of type 2 diabetes mellitus. Adv. Drug Deliv. Rev., 54(9):1173-97, 2002.

[238] P. J. Ryb i-l!-1 R. E. Zeck, J. Dudash, D. W. Combs, T. P. Burris, M. Yang,
M. C. Osborne, X. L. C('!, i and K. T. Demarest. Benzoxazinones as PPAR gamma
agonists. 2. SAR of the amide substituent and in vivo results in a type 2 diabetes
model. J. Med. Ci. I, 47(1):196-209, 2004.

[239] C. Z. Liao, A. H. Xie, L. M. Shi, J. J. Zhou, and X. P. Lu. Eigenvalue analysis of
peroxisome proliferator-activated receptor gamma agonists. J. C'. I,, Inf. Comput.
Sci., 44(1):230-238, 2004.

[240] C. Z. Liao, A. H. Xie, J. J. Zhou, L. M. Shi, Z. B. Li, and X. P. Lu. 3D QSAR
studies on peroxisome proliferator-activated receptor gamma agonists using CoMFA
and CoMSIA. J. Mol. Model., 10(3):165-177, 2004.

[241] T. Tuccinardi, E. Nuti, G. Ortore, C. T. Supuran, A. Rossello, and A. Martinelli.
Analysis of human carbonic .I lh idrase II: Docking reliability and receptor-based
3D-QSAR study. J. C,. i,, Inf. Model., 47(2):515-525, 2007.

[242] C.-Y. Kim, D. A. Whittington, J. S. C'!! ,:- J. Liao, J.A. May, and D.W.
C('1 1-I ii-n ii. Structural Aspects of Isozyme Selectivity in the Binding of Inhibitors
to Carbonic Anhydrases II and IV. J. Med. C', ii ,, 45(4):S; I ;, 2002.

[243] B. A. Grzybowski, A. V. Ishchenko, C. Y. Kim, G. Topalov, R. ('!: 1',, 1',
D. W. Cii i-Ii ii,-.i, G. M. Whitesides, and E. I. Shakhnovich. Combinatorial
computational method gives new picomolar ligands for a known enzyme. Proc. Natl.
Acad. Sci. U. S. A., 99(3):1270-3, 2002.

[244] S. Griineberg, M. T. Stubbs, and G. Klebe. Successful Virtual Screening for Novel
Inhibitors of Human Carbonic Anhydrase: Strategy and Experimental Confirmation.
J. Med. C. i,, 45(17):3588-3602, 2002.

[245] G. M. Smith, R. S. Alexander, D. W. C('! -I ii,-..!,, B. M. McKeever, G. S.
Ponticello, J. P. Springer, W. C. Randall, J. J. Baldwin, and C. N. Habecker.
Positions of His-64 and a bound water in human carbonic anhydrase II upon binding
three structurally related inhibitors. Protein Sci., 3(1):118-25, 1994.









[246] A. Weber, A. Casini, A. Heine, D. Kuhn, C.T. Supuran, A. Scozzafava, and
G. Klebe. Unexpected Nanomolar Inhibition of Carbonic Anhydrase by
COX-2-Selective Celecoxib: New Pharmacological Opportunities Due to Related
Binding Site Recognition. J. Med. Ch. in 47(3):550-557, 2004.

[247] R. Recacha, M. J. Costanzo, B. E. Maryanoff, and D. C'! ,l In.1 Il!:vay. Crystal
structure of human carbonic anhydrase II completed with an anti-convulsant sugar
sulphamate. Biochem. J., 361(3):437-41, 2002.

[248] M. D. Lloyd, N. Thiyagareai in Y. T. Ho, L. W. L. Woo, O. B. Sutcliffe, A. Purohit,
M. J. Reed, K. R. Acharya, and B. V. L. Potter. First Crystal Structures of
Human Carbonic Anhydrase II in Complex with Dual Aromatase-Steroid Sulfatase
Inhibitors. Biochemistry, 44(18):6858-1F,1 1, 2005.

[249] C.-Y. Kim, P. P. C('! iidi ., A. Jain, and D. W. ('!Ci-ii-. i-i,
Fluoroaromatic-Fluoroaromatic Interactions between Inhibitors Bound in the Crystal
Lattice of Human Carbonic Anhydrase II. J. Am. C'. I,, Soc., 123(39):9620-9627,
2001.

[250] V. Menchise, G. DeSimone, V. Alterio, A. DiFiore, C. Pedone, A. Scozzafava, and
C. T. Supuran. Carbonic Anhydrase Inhibitors: Stacking with Phel31 Determines
Active Site Binding Region of Inhibitors As Exemplified by the X-ray Crystal
Structure of a Membrane-Impermeant Antitumor Sulfonamide Complexed with
Isozyme II. J. Med. Ci. I, 48(18):5721-5727, 2005.

[251] R. D. Hancock. Molecular Mechanics Calculations as a Tool in Coordination
C1!. im-1 ry. Prog. Inorg. C' in ,, 37:187-291, 1989.

[252] S. C. Hoops, K. W. Anderson, and K. M. Merz Jr. Force-Field Design for
Metalloproteins. J. Am. Ch, i, Soc., 113(22):8262-8270, 1991.

[253] Cieplak P. Cornell W. Bayly, C. I. and P. A. Kollman. A well-behaved electrostatic
potential based method using charge restraints for deriving atomic charges: the resp
model. J. Phys. ChI, i 97(40):10269-10280, 1993.

[254] R. H. Stote and M. Karplus. Zinc binding in proteins and solution: a simple but
accurate nonbonded representation. Proteins, 23(1):12-31, 1995.

[255] D. V. Sakharov and C. Lim. Zn protein simulations including charge transfer and
local polarization effects. J. Am. C', in, Soc., 127(13):4921-4929, 2005.

[256] J. Aqvist and A. Warshel. Computer simulation of the initial proton transfer step in
human carbonic .i-ilidrase i. J. Mol. Biol., 224(1):7-14, 1992.

[257] Y. P. Pang, K. Xu, J. E. Yazal, and F. G. Prendergas. Successful molecular
dynamics simulation of the zinc-bound farnesyltransferase using the cationic dummy
atom approach. Protein Sci., 9(10):1857-65, 2000.









[258] Y. P. Pang. Successful molecular dynamics simulation of two zinc complexes bridged
by a hydroxide in phosphotriesterase using the cationic dummy atom method.
Proteins, 45(3):183-9, 2001.

[259] A. Vedani and D. W. Huhta. A New Force-Field for Modeling Metalloproteins. J.
Am. Chi, I Soc., 112(12):4759-4767, 1990.

[260] N. Gresh, J. P. Piquemal, and M. Krauss. Representation of Zn(II) complexes
in polarizable molecular mechanics. Further refinements of the electrostatic and
short-range contributions. Comparisons with parallel ab initio computations. J.
Comput. CIh, 26(11):1113-30, 2005.

[261] N. Gresh. Development, validation, and applications of anisotropic polarizable
molecular mechanics to study ligand and drug-receptor interactions. Curr. Pharm.
Des., 12(17):2121-58, 2006.

[262] A. K. Rappe, C. J. Casewit, K. S. Colwell, W. A. Goddard, and W. M. Skiff. UFF,
a Full Periodic-Table Force-Field for Molecular Mechanics and Molecular-Dynamics
Simulations. J. Am. C'h, I Soc., 114(25):10024-10035, 1992.

[263] A. K. Rappe, K. S. Colwell, and C. J. Casewit. Application of a Universal
Force-Field to Metal-Complexes. Inorg. C'h, i 32(16):3438-3450, 1993.

[264] J. M. Sirovatka, A. K. Rappe, and R. G. Finke. Molecular mechanics studies
of coenzyme B-12 complexes with constrained Co-N(axial-base) bond lengths:
introduction of the universal force field (UFF) to coenzyme B-12 chemistry and its
use to probe the plausibility of an axial-base-induced, ground-state corrin butterfly
conformational steric effect. Inorg. Chim. Acta, 300:545-555, 2000.

[265] P. Brandt, T. Norrby, E. Akermark, and P. O. Norrby. Molecular mechanics
(\1\ i*) parameters for ruthenium(ii)-polypyridyl complexes. Inorg. Ch in
37(16):4120-4127, 1998.

[266] H. M. Marques and K. L. Brown. A Molecular Mechanics Force-Field for the Cobalt
Corrinoids. J. Mol. Struct. (Theochem), 340:97-124, 1995.

[267] K. L. Brown, X. Zou, and H. M. Marques. NMR-restrained molecular modeling of
cobalt corrinoids: cyanocobalamin (vitamin B-12) and methylcobalt corrinoids. J.
Mol. Struct. (Theochem), 453:209-224, 1998.

[268] H. M. Marques and K. L. Brown. The structure of cobalt corrinoids based on
molecular mechanics and NOE-restrained molecular mechanics and dynamics
simulations. Coord. Ch, I Rev., 192:127-153, 1999.

[269] H. M. Marques, B. Ngoma, T. J. Egan, and K. L. Brown. Parameters for the
AMBER force field for the molecular mechanics modeling of the cobalt corrinoids. J.
Mol. Struct., 561(1-3):71-91, 2001.









[270] J. Aqvist and A. Warshel. Free-Energy Relationships in Metalloenzyme-Catalyzed
Reactions Calculations of the Effects of Metal-Ion Substitutions in Staphylococcal
Nuclease. J. Am. Ch. I,, Soc., 112(8):2860-2868, 1990.

[271] U. Ryde. Molecular-Dynamics Simulations of Alcohol-Dehydrogenase with a
4-Coordinate or 5-Coordinate Catalytic Zinc Ion. Proteins: Struct. Funct. Genet.,
21(1):40-56, 1995.

[272] U. Ryde. On the Role of Glu-68 in Alcohol-Dehydrogenase. Protein Sci.,
4(6):1124-1132, 1995.

[273] U. Ryde. Carboxylate binding modes in zinc proteins: A theoretical study. B.:. I/*Ij'
J., 77(5):2777-2787, 1999.

[274] R. D. Hancock, J. S. Weaving, and H. M. Marques. A Molecular Mechanics Model
of the Metalloporphyrins the Role of Steric Hindrance in Discrimination in Favor
of Dioxygen Relative to Carbon-Monoxide in Some Heme Models. J. C',. ,, Soc.,
Chi. ,, Commun., (16):1176-1178, 1989.

[275] H. M. Marques and I. Cukrowski. Molecular mechanics modelling of porphyrins.
using artificial neural networks to develop metal parameters for four-coordinate
metalloporphyrins. Phys. Ch. I,, CIh. ,, Phys., 4(23):5878-5887, 2002.

[276] H. M. Marques and K. L. Brown. Molecular mechanics and molecular dynamics
simulations of porphyrins, metalloporphyrins, heme proteins and cobalt corrinoids.
Coord. Ci. i,, Rev., 225(1-2):123-158, 2002.

[277] C. E. Skopec, J. M. Robinson, I. Cukrowski, and H. M. Marques. Using artificial
neural networks to develop molecular mechanics parameters for the modelling of
metalloporphyrins. III. five coordinate Zn(II) porphyrins and the metalloprophyrins
of the early 3d metals. J. Mol. Struct., 738(1-3):67-78, 2005.

[278] C. E. Skopec, I. Cukrowski, and H. M. Marques. Using artificial neural networks
to develop molecular mechanics parameters for the modelling of metalloporphyrins:
Part IV. Five-, six-coordinate metalloporphyrins of Mn, Co, Ni and Cu. J. Mol.
Struct., 783(1-3):21-33, 2006.

[279] P. O. Norrby and T. Liljefors. Automated molecular mechanics parameterization
with simultaneous utilization of experimental and quantum mechanical data. J.
Comput. Ch. in 19(10):1146-1166, 1998.

[280] P. O. Norrby and P. Brandt. Deriving force field parameters for coordination
complexes. Coord. Chi. in Rev., 212:79-109, 2001.

[281] K. M. Merz Jr. CO2 Binding to Human Carbonic Anhydrase-II. J. Am. C'h, i Soc.,
113(2):406-411, 1991.

[282] K. M. Merz Jr., M. A. Murcko, and P. A. Kollman. Inhibition of
Carbonic-Anhydrase. J. Am. CI, i,, Soc., 113(12):4484-4490, 1991.









[283] N. Diaz, D. Suarez, and K. M. Merz Jr. Hydration of zinc ions: theoretical study
of [Zn(H20)(4)](H20)(8)(2+) and [Zn(H20)(6)](H20)(6)(2+). CI,. ,, Phys. Lett.,
326(3-4):288-292, 2000.

[284] N. Diaz, D. Suarez, and K. M. M. Merz Jr. Zinc metallo-beta-lactamase from
Bacteroides fragilis: A quantum chemical study on model systems of the active site.
J. Am. Ch. Ii Soc., 122(17):4197-4208, 2000.

[285] N. Diaz, D. Suarez, and K. M. Merz Jr. Molecular dynamics simulations
of the mononuclear zinc-beta-lactamase from bacillus cereus completed with
benzylpenicillin and a quantum chemical study of the reaction mechanism. J. Am.
C'h. in Soc., 123(40):9867-9879, 2001.

[286] N. Diaz, D. Suarez, T. L. Sordo, and K. M. Merz Jr. A theoretical study of the
Iu iii. .i.. -i reaction of lysine 199 of human serum albumin with benzylpenicillin:
Consequences for immunochemistry of penicillins. J. Am. C', i Soc.,
123(31):7574-7583, 2001.

[287] N. Diaz, D. Suarez, T. L. Sordo, and K. M. Merz Jr. Acylation of class a
beta-lactamases by penicillins: A theoretical examination of the role of serine
130 and the beta-lactam carboxylate group. J. Phys. C',. in B, 105(45):11302-11313,
2001.

[288] D. Suarez and K. M. Merz Jr. Molecular dynamics simulations of the mononuclear
zinc-beta-lactamase from Bacillus cereus. J. Am. C', in Soc., 123(16):3759-3770,
2001.

[289] N. Diaz, T. L. Sordo, K. M. Merz Jr., and D. Suarez. Insights into the acylation
mechanism of class A beta-lactamases from molecular dynamics simulations
of the TEM-1 enzyme completed with benzylpenicillin. J. Am. C' I, i Soc.,
125(3):672-684, 2003.

[290] N. Diaz, D. Suarez, K. M. Merz Jr., and T. L. Sordo. Molecular dynamics
simulations of the TEM-1,beta-lactamase completed with cephalothin. J. Med.
Ci. I, 48(3):780-791, 2005.

[291] D. Suarez, E. N. Brothers, and K. M. Merz Jr. Insights into the structure and
dynamics of the dinuclear zinc beta-lactamase site from Bacteroides fragilis. Bio-
chemistry, 41(21):6615-6630, 2002.

[292] D. Suarez, N. Diaz, and K. M. Merz Jr. Molecular dynamics simulations of the
dinuclear zinc-beta-lactamase from bacteroides fragilis completed with imipenem. J.
Comput. C(i in 23(16):1587-1600, 2002.

[293] G. Cui, B. Wang, and K. M. Merz Jr. Computational studies of the
farnesyltransferase ternary complex Part I: Substrate binding. Biochemistry,
44(50):16513-16523, 2005.









[294] J. R. Collins, D. L. Camper, and G. H. Loew. Valproic Acid Metabolism by
Cytochrome-P450 a Theoretical-Study of Stereoelectronic Modulators of Product
Distribution. J. Am. C.I ,, Soc., 113(7):2736-2743, 1991.

[295] J. R. Collins, P. Du, and G. H. Loew. Molecular-Dynamics Simulations of the
Resting and Hydrogen Peroxide-Bound States of Cytochrome-C Peroxidase. Bio-
chemistry, 31(45):11166-11174, 1992.

[296] S. J. Yao, J. P. Plastaras, and L. G. Marzilli. A Molecular Mechanics Amber-Type
Force-Field for Modeling Platinum Complexes of Guanine Derivatives. Inorg. ChI. ,,
33(26):6061-6077, 1994.

[297] M. M. Harding. The geometry of metal-ligand interactions relevant to proteins. Acta
Cr;,'. ill..1I,., Sect. D: Biol. Cr;I,' ill..I,i., 55:1432-43, 1999.

[298] M. M. Harding. The geometry of metal-ligand interactions relevant to proteins.
II. angles at the metal atom, additional weak metal-donor interactions. Acta
Cr;,'Ill1.. I,., Sect. D: Biol. Cr;,'Ilill..',., 56:857-67, 2000.

[299] M. M. Harding. Geometry of metal-ligand interactions in proteins. Acta Cr;,'-lll1.' I,.,
Sect. D: Biol. Cr;,l ill,..I, ., 57:401-11, 2001.

[300] M. M. Harding. Metal-ligand geometry relevant to proteins and in proteins: sodium
and potassium. Acta Cr;I,-1"ll/.i., Sect. D: Biol. Cr;l,l.ll.t1., 58:872-4, 2002.

[301] M. M. Harding. The architecture of metal coordination groups in proteins. Acta
Cr;,'I ll./. I,., Sect. D: Biol. Cr;,-lill/.I,., 60:849-59, 2004.

[302] M. M. Harding. Small revisions to predicted distances around metal sites in proteins.
Acta Cr;,-la.lli., Sect. D: Biol. Cr; '.ll/..,i., 62:678-82, 2006.

[303] J. Aqvist. Ion Water Interaction Potentials Derived from Free-Energy Perturbation
Simulations. J. Phys. CI. I 94(21):8021-8024, 1990.

[304] A. Bondi. van Der Waals Volumes + Radii. J. Phys. Ch.I ,, 68(3):441-451, 1964.

[305] S. S. Batsanov. van der Waals radii of elements. Inorg. Mater., 37(9):871-885, 2001.

[306] S. S. Batsanov. The determination of van der Waals radii from the structural
characteristics of metals. Russ. J. Phys. C'h. ,, 74(7):1144-1147, 2000.

[307] D. Asthagiri, L. R. Pratt, M. E. Paulaitis, and S. B. Rempe. Hydration structure
and free energy of biomolecularly specific aqueous dications, including Zn2+ and
first transition row metals. J. Am. CI, i,, Soc., 126(4):1285-1289, 2004.

[308] C. S. Babu and C. Lim. Empirical force fields for biologically active divalent metal
cations in water. J. Phys. C.I i,, A, 110(2):691-699, 2006.

[309] C. S. Babu and C. Lim. A new interpretation of the effective born radius from
simulation and experiment. CIh. ,, Phys. Lett., 310(1-2):225-228, 1999.









[310] C. S. Babu and C. Lim. Theory of ionic hydration: Insights from molecular
dynamics simulations and experiment. J. Phys. ChI In B, 103(37):7958-7968, 1999.

[311] A. C. Vaiana, A. Schulz, J. Wolfrum, M. Sauer, and J. C. Smith. Molecular
mechanics force field parameterization of the fluorescent probe rhodamine 6G using
automated frequency matching. J. Comput. C'h, i 24(5):632-639, 2003.

[312] M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R.
C'!.. ~-in 11, Jr J. A. Montgomery, T. Vreven, K. N. Kudin, J. C. Burant, J. M.
Millam, S. S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani,
N. Rega, G. A. Petersson, H. N .1: .i-i ii M. Hada, M. Ehara, K. Toyota, R. Fukuda,
J. Hasegawa, M. Ishida, T. N i: liii i Y. Honda, O. Kitao, H. Nakai, M. Klene,
X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, V. Bakken, C. Adamo, J. Jaramillo,
R. Gomperts, R. E. Stratmann, O. Y.. i, ,-, A. J. Austin, R. Cammi, C. Pomelli,
J. W. Ochterski, P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador, J. J.
Dannenberg, V. G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C. Strain, O. Farkas,
D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J. V. Ortiz, Q. Cui,
A. G. Baboul, S. Clifford, J. Cioslowski, B. B. Stefanov, G. Liu, A. Liashenko,
P. Piskorz, I. Komaromi, R. L. Martin, D. J. Fox, T. Keith, M. A. Al Laham, C. Y.
Peng, A. N t, i- ikkara, M. ('!C .11.. ilhe, P. M. W. Gill, B. Johnson, W. ('C!. in
M. W. Wong, C. Gonzalez, and J. A. Pople. Gaussian 03, revision c.02. Gaussian,
Inc., Wallingford, CT, 2004.

[313] B. H. Besler, K. M. Merz Jr., and P. A. Kollman. Atomic ('C! irges Derived from
Semiempirical Methods. J. Comrput. C'h, i 11(4):431-439, 1990.

[314] P. Cieplak, W. D. Cornell, C. Bayly, and P. A. Kollman. Application of the
Multimolecule and Multiconformational RESP Methodology to Biopolymers
('C! ,i'ge Derivation for DNA, RNA, and Proteins. J. Comput. C'i in
16(11):1357-1377, 1995.

[315] A. D. Becke. Density-Functional Exchange-Energy Approximation with Correct
Asymptotic-Behavior. Phys. Rev. A, 38(6):3098-3100, 1988.

[316] C. T. Lee, W. T. Yang, and R. G. Parr. Development of the Colle-Salvetti
Correlation-Energy Formula into a Functional of the Electron-Density. Phys.
Rev. B, 37(2):785-789, 1988.

[317] A. D. Becke. Density-Functional Thermochemistry.3. the Role of Exact Exchange. J.
Ch, I,, Phys., 98(7):5648-5652, 1993.

[318] P. E. M. Siegbahn and T. Borowski. Modeling enzymatic reactions involving
transition metals. Ace. C'h, ,, Res., 39(10):729-738, 2006.

[319] A. Blondel and M. Karplus. New formulation for derivatives of torsion angles and
improper torsion angles in molecular mechanics: Elimination of singularities. J.
Comput. C'h, 17(9):1132-1141, 1996.









[320] W. C. Swope and D. M. Ferguson. Alternative expressions for energies and forces
due to angle bending and torsional energy. J. Comrput. C'hI. 13(5):585-594, 1992.

[321] R. E. Tuzun, D. W. Noid, and B. G. Sumpter. Computation of internal coordinates,
derivatives, and gradient expressions: Torsion and improper torsion. J. Comrput.
Ch. i, 21(7):553-561, 2000.









BIOGRAPHICAL SKETCH

Martin Barry Peters was born on April 3rd, 1980 in Tipperary, Republic of Ireland

to Martin and Mary Peters. He attended primary and secondary school in New Inn and

Cashel respectively. In June 2002 he received his B.A. Mod. degree in Computational

C'!. iii-I ry from Trinity College, University of Dublin (TCD). While at Trinity he worked

under the supervision of Dr. Isabel Rozas where he was introduced to computational

chemistry and to his future significant other, Jane Montague. Martin enrolled in the PhD

program at Penn State University (PSU) and worked with Prof. Kenneth M. Merz Jr.

on the application of semi-empirical quantum mechanics to structure-based drug design.

In August 2005, Martin received his second degree, M. Sc. in chemistry from PSU. In

September 2005 he moved to the University of Florida (UF) and joined the Department of

C'!. i ,1-I ry and the Quantum Theory Project to continue his work with Prof. K. M. Merz

Jr in the pursuit of a doctoral degree. In his final year as a graduate student he applied

for a government of Ireland postdoctoral fellowship in science, engineering and technology

(IRCSET) which was successful. After graduating from UF, he joined Dr. David Lloyd at

TCD as an IRCSET postdoctoral fellow in his group.





PAGE 1

THEAPPLICATIONOFSEMIEMPIRICALMETHODSINDRUGDESIGN By MARTINB.PETERS ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2007 1

PAGE 2

c r 2007MartinB.Peters 2

PAGE 3

ForJane 3

PAGE 4

ACKNOWLEDGMENTS WordscannotdescribemyJane.SheiseverythingIcancoulda skfor.Shehasstood bymeevenwhenIleftIrelandtopursuemydreamofgettingmyP hD.Thankyouhoney foryourlove,supportandthesacricesyouhavemadeforus. Ithankmymotherfor alwaysgivingmetremendoussupportandforherwordsofwisd omandencouragement.I wouldalsoliketothankmytwobrothers,PatrickandFrancis ,andmytwosisters,Marian andDeirdre,foralltheirencouragementandsupport. Kenniethankyouforgivingmetheopportunitytoworkwithyo u;Ihavetruly enjoyedtheexperience.Iwouldliketoexpressmygratitude toallMerzgroupmembers especiallyKaushik,Andrew,Ken,Kevin,andDuanefortheir supportandfriendship. AlsoIwouldliketoacknowledgetheeortofMikeWeaverwhoh elpedbyeditingthis dissertation. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................4 LISTOFTABLES .....................................8 LISTOFFIGURES ....................................11 LISTOFABBREVIATIONS ...............................15 ABSTRACT ........................................19 CHAPTER 1INTRODUCTION ..................................21 2THEORYANDMETHODS .............................25 2.1Receptor-LigandBindingFreeEnergy .....................28 2.2ComputationalDrugDesign ..........................30 2.3MolecularMechanics ..............................32 2.4QuantumMechanics ..............................33 2.5LigandBasedDrugDesign ...........................34 2.5.13D-QSARwithQMdescriptors ....................35 2.5.2Field-basedMethods ..........................36 2.5.3Spectroscopic3-DQSAR ........................37 2.5.4QuantumQSARandMolecularQuantumSimilarity .........39 2.6ReceptorBasedDrugDesign ..........................40 2.7SemiempiricalDivide-And-ConquerApproach ................42 2.8PairwiseEnergyDecomposition(PWD) ....................44 2.9QuantumMechanicalChargeModels .....................46 2.10ComparativeBindingEnergyAnalysis(COMBINE) .............47 2.11SemiEmpiricalComparativeBindingEnergyAnalysis(S E-COMBINE) ..48 2.12GraphTheory ..................................49 2.13StatisticalMethods ...............................54 2.14Metalloproteins .................................59 3MODELINGTOOLKIT++ .............................67 3.1Introduction ...................................67 3.2Overview ....................................68 3.2.1Development ...............................68 3.2.2LibraryHierarchy ............................69 3.2.3MoleculeLibrary ............................70 3.2.4GraphLibrary ..............................77 3.2.5MMLibrary ...............................78 3.2.6GALibrary ...............................78 5

PAGE 6

3.2.7StatisticsLibrary ............................80 3.2.8MolecularFragmentLibrary ......................80 3.2.9ParsersLibrary .............................82 3.3Hybridization,BondOrderandFormalChargePerception .........83 3.4RingPerception .................................87 3.5AdditionofHydrogenAtomstoMolecules ..................92 3.6ConformationalSampling ...........................94 3.7SubstructureSearching/Functionalize ....................98 3.8CliqueDetection/MaximumCommonPharmacophore ...........101 3.9Superimposition .................................102 3.10Conclusions ...................................104 4SEMIFLEXIBLEQUANTUMMECHANICALALIGNMENT OFDRUG-LIKEMOLECULES ...........................106 4.1Introduction ...................................106 4.2Implementation .................................110 4.2.1LigandConformationalSearching ...................110 4.2.2StructuralAlignmentandCliqueDetection ..............111 4.2.3SemiempiricalSimilarityScore .....................112 4.3ResultsandDiscussion .............................113 4.3.1DataSet .................................113 4.3.2CarboxypeptidaseA ...........................117 4.3.3GlycogenPhosphorylase ........................118 4.3.4Immunoglobin ..............................119 4.3.5Streptavidin ...............................121 4.3.6DihydrofolateReductase ........................123 4.3.7Trypsin ..................................125 4.3.8EstrogenReceptor ............................128 4.3.9PeroxisomeProliferator-ActivatedReceptor r .............131 4.3.10HumanCarbonicAnhydraseII .....................132 4.3.11Thrombin ................................136 4.3.12Elastase .................................136 4.3.13Thermolysin ...............................140 4.4Conclusions ...................................144 5METALCLUSTERMOLECULARMECHANICSPARAMETERIZATION ..146 5.1Introduction ...................................146 5.2Implementation .................................148 5.2.1EquilibriumBondLengthsandAngles ................150 5.2.2ForceConstants .............................150 5.2.3PointCharges ..............................151 5.3ZincAMBERForceField ...........................152 5.3.1ProteinDataBankSurveyofZincContainingProteins .......154 5.3.2TetrahedralZnEnvironmentForceFieldParameteriza tion .....157 6

PAGE 7

5.4Conclusions ...................................183 6CONCLUSIONS ...................................189 APPENDIX AALGORITHMS ....................................191 A.1SubgraphIsomorphismAlgorithm .......................191 A.2MaximumCommonPharmacophore ......................193 BAMBERGRADIENTS ................................194 B.1VectorMathandDerivatives ..........................194 B.2AMBERFirstDerivatives ...........................195 B.2.1Bond ...................................195 B.2.2Angle ...................................196 B.2.3Dihedral .................................197 B.2.4Electrostatic ...............................201 B.2.5vanderWaals ..............................202 CFRAGMENTLIBRARY ...............................203 C.1TerminalFragments ...............................203 C.2TwoPointLinkerFragments ..........................208 C.3ThreePointLinkerFragments .........................212 C.4FourPointLinkerFragments ..........................214 C.5FivePointLinkerFragments ..........................216 C.6ThreeMemberedRingFragments .......................217 C.7FourMemberedRingFragments ........................218 C.8FiveMemberedRingFragments ........................219 C.9SixMemberedRingFragments ........................224 C.10GreaterthanSixMemberedRingFragments .................229 C.11FusedRingFragments .............................230 REFERENCES .......................................237 BIOGRAPHICALSKETCH ................................263 7

PAGE 8

LISTOFTABLES Table page 2-1CorrespondencebetweenGraphTheoryandChemicalTermi nology. .......53 3-1DisuldeBondPredictionParameters. .......................73 3-2MengAtomicCovalentRadii. ............................84 3-3LabuteAlgorithmUpperBoundBondConditions. .................85 3-4LabuteAlgorithmAtomHybridizationAssignment. ................86 3-5LabuteAlgorithmLowerBoundSingleBondLengths. ...............86 3-6LabuteAlgorithmBondWeights. ..........................87 3-7HydrogenBondLengths. ...............................94 3-8HydrogenBondAngles. ...............................94 3-9HydrogenBondDihedrals. ..............................95 3-10DihedralAnglesAvailablebasedonBondType. ..................95 4-1CompoundAlignmentLiterature. ..........................107 4-2Protein-LigandDataSet. ...............................115 4-3StatisticsofCuTiePPerformance. ..........................117 4-4CarboxypeptidaseALigandAlignments. ......................118 4-5GlycogenPhosphorylaseLigandAlignments. ....................120 4-6ImmunoglobinLigandAlignments ..........................123 4-7StreptavidinLigandAlignments ...........................125 4-8DihydrofolateReductaseLigandAlignments. ....................127 4-9TrypsinLigandAlignments .............................130 4-10EstrogenReceptorLigandAlignments. .......................132 4-11PPAR r LigandAlignments. .............................132 4-1240HumanCarbonicAnhydraseIIInhibitors. ....................134 4-13HumanCarbonicAnhydraseIIResults. .......................138 4-14ThrombinLigandAlignments ............................139 8

PAGE 9

4-15ElastaseLigandAlignments. .............................140 4-16ThermolysinLigandAlignments. ..........................142 5-1MetalIonsintheProteinDataBank. ........................146 5-2PublishedMetalloproteinForceFieldsUsingtheBonded PlusElectrostatics Model. .........................................148 5-3Metal-DonorBondTargetLengths. .........................153 5-4IdealAnglesUsedtoCalculateRootMeanSquareDeviatio nsforTetrahedral, SquarePlanar,TrigonalBipyramidal,SquarePyramidandOc tahedral Geometries. ......................................155 5-5TetrahedralZincPrimaryLigatingResidues. ....................157 5-6Zn-CCCCClusterBondLengthsandForceConstants. ..............159 5-7Zn-CCCCClusterAnglesandForceConstants. ..................160 5-8Zn-CCCHClusterBondLengthsandForceConstants. ..............160 5-9Zn-CCCHClusterAnglesandForceConstants. ..................161 5-10Zn-CCCHClusterAnglesandForceConstants. ..................161 5-11Zn-CCHHClusterBondLengthsandForceConstants. ..............162 5-12Zn-CCHHClusterAnglesandForceConstants. ..................163 5-13Zn-CHHHClusterBondLengthsandForceConstants. ..............163 5-14Zn-CHHHClusterAnglesandForceConstants. ..................164 5-15Zn-HHHHClusterBondLengthsandForceConstants. ..............164 5-16Zn-HHHHClusterAnglesandForceConstants. ..................165 5-17CysteineChargesusingChgModAfortheZn-CCCC,-CCCH, -CCHH,and-CHHH Clusters. ........................................167 5-18CysteineChargesusingChgModBfortheZn-CCCC,-CCCH, -CCHH,and-CHHH Clusters. ........................................167 5-19HistidineChargesusingChgModAfortheZn-CCCC,-CCCH ,-CCHH,and-CHHH Clusters. ........................................168 5-20HistidineChargesusingChgModBfortheZn-CCCC,-CCCH ,-CCHH,and-CHHH Clusters. ........................................170 5-21Zn-HHHOClusterBondLengthsandForceConstants. ..............170 9

PAGE 10

5-22Zn-HHHOClusterAnglesandForceConstants. ..................171 5-23Zn-HHOOClusterBondLengthsandForceConstants. ..............172 5-24Zn-HHOOClusterAnglesandForceConstants. ..................174 5-25Zn-HOOOClusterBondLengthsandForceConstants. ..............182 5-26Zn-HOOOClusterAnglesandForceConstants. ..................183 5-27HistidineandWater'sPartialChargesusingChgModBfo rtheZn-HHHO,-HHOO, and-HOOOClusters. ................................184 5-28Zn-HHHDandZn-HHDDClusterBondLengthsandForceCons tants. .....185 5-29Zn-HHHDClusterAnglesandForceConstants. ..................185 5-30Zn-HHDDClusterAnglesandForceConstants. ..................187 5-31HistidineandAspartateResidueChargesusingChgModB fortheZn-HHHD and-HHDDClusters. .................................188 C-1TerminalFragments. .................................203 C-2TwoPointLinkerFragments. ............................208 C-3ThreePointLinkerFragments. ...........................212 C-4FourPointLinkerFragments. ............................214 C-5FivePointLinkerFragments. ............................216 C-6ThreeMemberedRingFragments. ..........................217 C-7FourMemberedRingFragments. ..........................218 C-8FiveMemberedRingFragments. ..........................219 C-9SixMemberedRingFragments. ...........................224 C-10GreaterthanSixMemberedRingFragments. ...................229 C-11FusedRingFragments. ................................230 10

PAGE 11

LISTOFFIGURES Figure page 2-1DrugDevelopmentProcess. .............................25 2-2TheIterativeDrugDesignProcess. .........................26 2-3ThermodynamicCycleofReceptor-LigandBinding ................29 2-4ComputationalComponentofDrugDesign. ....................31 2-5HierarchyofQMmethodsusedinSBDD. .....................35 2-6NMRQSAR. .....................................38 2-7TheClassic\Pac-man"RepresentationofReceptor-Liga ndBinding. .......41 2-8PWDDensityMatrixRepresentation ........................41 2-9SchematicDiagramoftheHumanCarbonicAnhydraseIIinh ibitor Fragmentation. ....................................46 2-10SE-COMBINEDescriptorTable. ..........................49 2-11SchematicDiagramofaTrypsinInhibitorFragmentatio n. ............50 2-12SE-COMBINEIntermolecularInteractionMap(IMM). ..............51 2-13GraphTheoryI. ...................................52 2-14GraphTheoryII. ...................................54 2-15PrincipalComponentAnalysis(PCA)SchematicDiagram oftheMatricesand VectorsInvolved. ...................................58 2-16PartialLeastSquares(PLS)SchematicDiagramoftheMa tricesandVectors Involved. ........................................60 2-17MostCommonAminoAcidResidueswhichBondtoMetalIons .........61 2-18ZincMetalloproteins. .................................63 2-19CopperMetalloproteins. ...............................64 2-20Homo-NuclearMetalloproteins. ...........................65 2-21Hetero-NuclearMetalloproteins. ...........................66 3-1ComputationalDrugDesign. .............................67 3-2LibraryHierarchyasImplementedinMTK++. ..................69 11

PAGE 12

3-3CoreClasshierarchyoftheMoleculeLibraryasimplemen tedinMTK++. ...71 3-4ClassHierarchyoftheParametersComponentoftheMolec uleClassas ImplementedinMTK++. ..............................72 3-5ClassHierarchyoftheStandardLibraryComponentofthe MoleculeClassas ImplementedinMTK++. ..............................72 3-6DisuldeBondinProteins. ..............................73 3-7TheStructuralTypesoftheHistidineResidue. ..................74 3-8ClassHierarchyoftheMoleculeComponentoftheMolecul eClassasImplemented inMTK++. ......................................77 3-9ClassHierarchyoftheGraphLibraryasImplementedinMT K++. .......78 3-10ClassHierarchyoftheMMlibraryasImplementedinMTK+ +. .........79 3-11ClassHierarchyoftheGALibraryasImplementedinMTK+ +. .........81 3-12ClassHierarchyoftheStatisticsLibraryasImplement edinMTK++. ......82 3-13ClassHierarchyoftheParsersLibraryasImplementedi nMTK++. .......83 3-14Hybridization,BondOrder,andFormalChargePercepti onUsingtheLabute Algorithm. .......................................88 3-15RingPerception. ...................................90 3-16RingPerceptionContd. ...............................91 3-17Aromatic,Non-aromatic,andAnti-aromaticRings. ................93 3-18HydrogenBond. ....................................94 3-19RotatableBondTypes. ................................96 3-20SystematicConformationalSearching. .......................96 3-21ConformerGeneration. ................................97 3-22UllmanSubgraphIsomorphismIllustration. ....................99 3-23CliqueDetectionIllustration. ............................103 3-24MolecularSuperposition. ...............................104 4-1CarboxypeptidaseALigands. ............................119 4-21CBXConformerAnalysis. .............................120 4-3CarboxypeptidaseAAlignmentResults. ......................121 12

PAGE 13

4-4GlycogenPhosphorylaseLigands. ..........................122 4-5GlycogenPhosphorylaseAlignmentResults. ....................123 4-6ImmunoglobinLigands ................................124 4-7ImmunoglobinAlignmentResults. ..........................125 4-8StreptavidinLigands .................................126 4-9StreptavidinAlignmentResults. ...........................127 4-10DihydrofolatreductaseLigands. ...........................128 4-11TrypsinInhibitors. ..................................129 4-12TrypsinAlignmentResults. .............................130 4-13EstrogenReceptorLigands. .............................131 4-14PeroxisomeProliferator-ActivatedReceptor r Agonists. ..............133 4-15HCAIILigands. ...................................137 4-16ThrombinInhibitors. .................................139 4-17ElastaseLigands. ...................................141 4-18ElastaseAlignmentResults. .............................142 4-19ThermolysinInhibitors. ...............................143 4-20ThermolysinAlignmentResults. ...........................144 5-1ApproachestoIncorporateMetalAtomsintoMolecularMe chanicsForceFields. 147 5-2MCPBFlowDiagram. ................................150 5-3MetalLigandGeometriesPerceivedUsingHarding'sRule s. ............154 5-4ZincCoordinationGeometryDistributionfromthePDB. .............156 5-5TheMostCommonTetrahedralZincCoordinatingligandsC ombination Distribution. .....................................158 5-6Zn-SBondLengthDistributionsinCCCC,CCCH,CCHH,andC HHHTetrahedral Environments. .....................................172 5-7BoxPlotsofZn-S/NBondLengthsinCCCC,CCCH,CCHH,CHHH ,andHHHH environments. .....................................173 5-8TetrahedralZn-O(Asp/Glu)andZn-N(His)BondLengthDi stributions. .....175 13

PAGE 14

5-9ZAFFFlowDiagram. .................................176 5-10Zn-CCCCClusterModels(PDBID:1A5T). ....................176 5-11Zn-CCCHClusterModels(PDBID:1A73and2GIV). ..............177 5-12Zn-CCHHClusterModels(PDBID:1A1F). ....................178 5-13Zn-CHHHClusterModels(PDBID:1CK7). ....................178 5-14Zn-HHHHClusterModels(PDBID:1PB0). ....................179 5-15CorrelationbetweenZn-SandZn-NBondLengthsandCalc ulatedForceConstants throughtheSeriesCCCC,CCCH,CCHH,CHHH,andHHHH. .........180 5-16Zn-HHHOClusterModels(PDBID:1CA2). ....................181 5-17Zn-HHOOClusterModels(PDBID:1VLI). ....................181 5-18Zn-HOOOClusterModels(PDBID:1L3F). ....................182 5-19Zn-HHHDandZn-HHDDClusterModels(PDBID:2USNand1U0 A). .....186 14

PAGE 15

LISTOFABBREVIATIONS Abbreviation page PDBProteinDataBank................................. ..............21 DDDrugDesign....................................... ...............25 NDANewDrugApplication.............................. ..............25 INDInvestigationalNewDrug.......................... ...............25 FDAFoodandDrugAdministration....................... ............25 ADMEAbsorption,Distribution,Metabolism,andExcretion .............25 SBDDStructure-BasedDrugDesign...................... ...............30 LBDDLigand-BasedDrugDesign......................... ...............30 MMMolecularMechanics............................... ...............32 QMQuantumMechanics................................. .............33 HFHartreeFock...................................... ...............33 DFTDensityFunctionalTheory......................... ...............33 SESemiEmpirical.................................... ................33 MNDOModiedNeglectofDierentialOverlap............. .............33 AM1AustinModel1.................................... ...............33 PM3ParametricModel3................................ ...............33 PDDG/PM3PairwiseDistanceDirectedGaussianmodication ofPM3.........33 SCC-DFTBSelf-Consistent-ChargeDensity-FunctionalTig ht-Binding.........33 RBDDReceptor-BasedDrugDesign....................... ...............34 QSARQuantitativeStructureActivityRelationship...... ................34 MLRMultipleLinearRegression........................ ................34 PCRPrincipalComponentRegression.................... ..............34 PLSRPartialLeastSquaresRegression.................. ................34 CNNsComputerNeuralNetworks......................... ..............34 HOMOHighestOccupiedMolecularOrbital................ ..............35 15

PAGE 16

LUMOLowestUnoccupiedMolecularOrbital............... ..............35 CODESSACOmprehensiveDEscriptorsforStructuralandStat isticalAnalysis35 CoMFAComparativeMolecularFieldAnalysis............. ...............36 CoMSIAComparativeMolecularSimilarityIndicesAnalysis ................36 PLSPartialLeastSquares............................. ................36 PIEProbeInteractionEnergy.......................... ...............36 QSMQuantumSimilarityMeasure........................ .............39 CSICarboSimilarityIndex........................... ................39 QQSARQuantumQSAR................................... ...............39 QSSAQuantumSimilaritySuperpositionAlgorithm........ .............39 QTMSQuantumTopologicalMolecularSimilarity.......... ..............39 BCPsBondCriticalPoints............................. .................39 AIMAtoms-In-Molecules.............................. .................39 DnCDivide-and-Conquer.............................. ................42 NDDONeglectofDierentialDiatomicOverlap............ ..............43 SASASolventAccessibleSurfaceArea................... ................43 PWDPairwiseEnergyDecomposition..................... ..............44 CNDOCompleteNeglectofDierentialOverlap............ ..............44 CM1ChargeModel1.................................... ..............46 CM2ChargeModel2.................................... ..............46 RESPRestrainedElectroStaticPotential............... .................46 MKMerz-Singh-Kollman............................... ...............46 COMBINEComparativeBindingEnergyAnalysis............ ................47 SE-COMBINESemiEmpirical-ComparativeBindingEnergyAna lysis.............48 IMMInterMolecularinteractionMap.................... ...............48 LOOLeave-One-Out................................... ................55 PRESSpredictedresidualsumofsquares................. ................55 16

PAGE 17

SDECStandardDeviationofErrorofCalculations......... ..............55 SDEPStandardDeviationofErrorPrediction............. ..............55 RMSDRootMeanSquaredDeviation....................... .............56 PCAPrincipalComponentAnalysis...................... ..............56 PCPrincipalComponent............................... ..............56 CYSCysteine........................................ ..................59 METMethionine...................................... .................59 ASPAsparticAcid.................................... .................59 GLUGlutamicAcid.................................... ................59 HISHistidine....................................... ..................59 HCAIIHumanCarbonicAnhydraseII...................... .............60 MTK++ModelingToolKit++............................. ...............67 APIApplicationProgrammingInterface................. ..............67 GAGeneticAlgorithm................................. ...............68 BLASBasicLinearAlgebraSubprograms.................. ..............68 LAPACKLinearAlgebraPACKage......................... ................68 GAFFGeneralizedAMBERForceField..................... ............80 MEPMolecularElectrostaticPotential................. ................106 vdWvanderWaals..................................... ...............106 GFsGaussianFunctions............................... ................106 GAGeneticAlgorithm................................. ...............106 RFORationalFunctionOptimization.................... ..............106 RIPSRandomIncrementalPulseSearch................... .............106 BFGSBroyden-Fletcher-Goldfarb-Shanno............... .................106 SDSteepestDescent.................................. ................106 NRNewton-Raphson................................... ..............106 MCPMaximumCommonPharmacophore...................... ........106 17

PAGE 18

ASAAtomicShellApproximation........................ ..............106 SASurfaceArea...................................... ................106 MOMolecularOrbital................................. ................106 DHFRDihydrofolateReductase......................... .................113 PPAR r PeroxisomeProliferator-ActivatedReceptor r .....................113 EREstrogenReceptor................................. ...............113 ESPElectroStaticPotential.......................... ..................146 UFFUniversalForceField............................. ................146 CCSDCrystallographicStructuralDatabase............. ................146 MCPBMetalCenterParameterBuilder.................... ..............148 18

PAGE 19

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy THEAPPLICATIONOFSEMIEMPIRICALMETHODSINDRUGDESIGN By MartinB.Peters August2007 Chair:KennethM.MerzJr.Major:Chemistry Theapplicationofquantummechanicalmethodsin denovo drugdesigniscurrently quitelimitedinbothscopeandutility.Thisthesisoutline swherethesemethodsare placedinthisprocessandwheretheycanbeimprovedon. Chaptersoneandtwoofthisdissertationdescribethedrugd evelopmentprocess andcurrentmethodsusedtocalculatethefreeenergyofrece ptor-ligandbinding.Some ofthecomputationaltoolsusedindrugdesignarediscussed suchasscoringfunctions, molecularmechanics,quantummechanics,semiempiricalpa ir-wiseenergydecomposition, comparativebindingenergyanalysis,theSE-COMBINEappro achandpopular3D-QSARs approaches. Theremainingchaptersofthisworkdescribesthedevelopme ntandapplicationof apackageofcomputationalchemistryC++librariescalledt heModelingToolKit++ (MTK++).Thistoolkitwasusedtodevelopanewtechniquetos uperimposedrug-like moleculesontooneanotherusingaquantummechanicalscore function.Obtainingthe correctalignmentoftwomoleculestoreproducetheposewit hinaproteinactivesite isachallengingproblem.Thisnewmethodwasvalidatedonal most90protein-ligand complexesforwhichx-raycrystallographicdatawasavaila ble. MTK++wasalsousedtodevelopageneralizedtetrahedralZin cforceeldfor metalloproteinmoleculardynamicssimulations.Itisdesi rabletomodelmetalloprotein systemsusingMMmodelsbecauseonecancarryoutsimulation stoaddressimportant 19

PAGE 20

structure/functionanddynamicsquestionsthatarenotcur rentlyattainableusingQM andQM/MMbasedmethods.Untilnowforceeldsformetallopr oteinswerebuiltbyhand throughaconvolutedprocess.Thecreationofacomputerpro gramtodothisremovesthe humanerrorfactor.Thisprogramwasusedtobuildforceeld sfor10Zinctetrahedral activesites.Thisrequiredtheparameterizationofbondan dangleforceconstantsand thecalculationofpartialcharges.MTK++wasdesignedtoau tomaticallyperceivemetal centersandassignparametersnecessarytocarryoutMMorMD calculations. 20

PAGE 21

CHAPTER1 INTRODUCTION Drugdiscoveryhasevolvedfrombeingserendipitoustoarat herrationalprocessof design.High-throughputscreening,combinatorialchemis try,thehumangenomeproject, andcomputationalmethodshavebeendevelopedtothisend.N onetheless,thecostof creatingadrughasincreasedexponentiallyoverthelast50 years[ 1 ]withoutthenumber ofnewdrugsgettingtothemarketincreasingaccordingly.T hemostplausiblereasonisa lackoffundamentalunderstandingofmolecularrecognitio n,bindingandultimatelydrug deliveryprocesses. Computationalmedicinalchemistryspansabroadspectrumo fdisciplinesincluding theoretical,computational,andstructuralchemistries. Theoreticalchemistryinvolvesthe developmentofnewandimprovedtheorieswhereascomputati onalchemistryentailsthe applicationofestablishedtheoreticaltoolstochemicalp roblems.Structuralchemistry techniquessuchasX-raycrystallographyandNMRspectrosc opyhaveplayedasignicant roleinfacilitatingourunderstandingofmolecularrecogn itionandinteraction.Although computationalmedicinalchemistrycannotdesignnewdrugs onitsown,ithasbeenshown thatitcanplayaroleinpredictingbindingfreeenergiesan dgeometriesofreceptor-ligand complexes.ExamplesincludethedevelopmentoftheHIVprot easeinhibitor,saquinavir, by\insilico"designasatransition-stateanalogue[ 2 ]andtherationaldesignofan Angiotensin-ConvertingEnzyme(ACE)inhibitorcalledcap topril[ 3 ]. Thecomputationaltechniquesemployedtoaidthedrugdesig nprocessincludevirtual screening,docking,andscoringwiththeresultsor\hits"u tilizedbymedicinalchemists [ 4 ].Computationalmethodsvaryincost;screeningcanbecarr iedoutonlargedatabases ofcompounds,whilescoringanddockingaregenerallycarri edoutonasmallernumber ofstructures.Screeningattemptstopredictphysicochemi calpropertiesofmolecules suchasaqueoussolubilityandbydoingsoreducesthenumber ofmoleculeswithpoor drug-likepropertiesbeingsynthesized.Dockingisatechn iqueofplacingadrugcandidate 21

PAGE 22

intotheactivesiteofareceptor.Thedockedposeofaligand intheactivesiteofa receptorcanbescoredusingknowledge-,empirical-,orphy sics-basedmethodswiththe latterbeingmoreexpensive[ 5 ].Alsocomputationalmethodslendthemselvestovirtual combinatorialchemistrywhichcanbeusedtooptimizetheco mplementaritiesbetweena receptorandaligand.Althoughthiscanalsobedoneexperim entally,themainreasonof thecomputationalapproachisthereductionofcostandtime Thecomputationalpredictionofbindingfreeenergiesisst illnotanexactscience. However,utilizingcurrentcomputerhardwareandtheoreti caltechnologies,problemsthat werehithertoreputedlyunfeasiblearenowtractable.Ther earetwoareaswhereincreased computationalpowercanbeusedfortheaccurateprediction ofbindingfreeenergies: increasedsamplingofconformationalspace,andinteracti onenergycalculationsusing completeHamiltonians[ 6 ].Theuseofbothwillbeinvestigatedinthisthesis. Thisdissertationdescribestheapplicationofquantummec hanicalmethodsin bio-andmedicinalchemistry.Thefollowingchaptersdescr ibethedevelopmentof computationalchemistrymodelingsoftware,therexibleal ignmentofdrug-likemolecules, andthegenerationofaZincforceeldformetalloproteinsi mulationsanddrugdesign applications. InChapter 2 theindustrialdrugdesignprocessisoutlinedasanovervie wofwhy computationaltoolsareusedindrugdesign.Thethermodyna micbasisandcurrent methodsusedtocalculatethefreeenergyofreceptor-ligan dbindingaredescribed.The equationsofbindingarederivedinordertorerectthecurre ntunderstandingofbinding, bothexperimentallyandcomputationally.Someofthecompu tationaltoolsusedin StructureBasedDrugDesign(SBDD)arediscussed,includin gscoringfunctions[ 4 ], MolecularMechanics(MM),QuantumMechanics(QM),SemiEmp irical(SE)Pair-Wise energyDecomposition(PWD)[ 7 ],theComparativeBindingEnergyAnalysis(COMBINE) [ 8 ],andtheSE-COMBINE[ 9 ]approaches.Popular3D-QSARsapproachesarealso outlinedincludingCoMFA(ComparativeMolecularFieldAna lysis)[ 10 ]andCoMSIA 22

PAGE 23

(ComparativeMolecularSimilarityIndicesAnalysis)[ 11 12 ]andtwomultivariate statisticaltools;PrincipalComponentRegression(PCR)[ 13 ]andPartialLeastSquares (PLS)[ 14 ]. Chapter 3 outlinesthedesignanddevelopmentoftheModelingToolKit ++ (MTK++)packageofC++librariesfortheuseofQMmethodsind rugdesign.The algorithmssuchasatomichybridizationandformalcharged etermination,bondorder andringperception,substructuresearchingandcliquedet ectionaredescribedindetail withnumerousillustrations.Theimpetusforthisworkwast ocreateacomputational chemistryplatformwhereQMmethodscouldbeconvenientlyi ncorporatedindrug discoveryapplications.Thisworkwasfundamentaltothist hesisandallmodelinginlater chaptersusedthispackage. ThefourthChapterdescribesamethodtorexiblyaligndruglikemoleculesonto oneanotherusingasemiempiricalscoringfunction.Theali gnmentoftwobodiesisa mathematicalproblem;however,thechallengeistoreprodu cetheposeseeninx-ray crystallographicstudies.Traditionally,molecularsupe rpositionhasbeencarriedoutusing empiricalscoringfunctionsduetotheirspeed.Thegoaloft hisresearchwastoinvestigate theapplicabilityofsemiempiricalmethodsinmolecularal ignmentanditsabilitytodo sowasvalidatedagainstover80protein-ligandcomplexesf romtheProteinDataBank (PDB)[ 15 ]. ThefthChapteroutlinesthedevelopmentofamolecularmec hanicsforceeld (FF)fortetrahedralZincmetalloproteinssuitableforthe AMBERsuiteofprograms[ 16 ]. Severalissuesregardingthemodelingofmetalloproteinsw ereaddressed.Therstgoal wastodevelopsoftwaretoconvenientlyhandlemetalloprot einstructures.Theprogram MCPB(MetalCenterParameterBuilder)wascreatedtobuilda ndvalidatemetalloprotein FFsforuseinmolecularsimulationstostudystructure,fun ction,anddynamics.Secondly, theautomatedperceptionofmetalcentersinproteinswasun dertakenandgaveriseto theprogramcalledpdbSearcher.Thissoftwarewasusedtosu rveythePDBforallZinc 23

PAGE 24

containingmetalloproteins.Themostabundantprimaryshe llcombinationsboundtoZn atomswereextractedandtheFFsgeneratedwiththeresultin gparametersanalyzedin detail. Finally,Chapter 6 providesabriefsummaryoftheworkpresentedinthisdisser tation. Ihopethisdissertationdemonstratestheutilityofcurren tquantummechanical approachesintheareasofdrugdesignandmetalloproteinmo deling.Theuseofquantum mechanicalmethodsindrugdesigncanbeviewedasthenalfr ontierduetothefact thatthesemethodsdescribemolecularinteractionsfromr stprinciples[ 6 ].Nevertheless, theuseofquantummechanicsoverclassicalapproachesbrin gsextraexpenseandsoitis necessarytoshowthatthesemethodscanbesuperiortosimpl ermodels. 24

PAGE 25

CHAPTER2 THEORYANDMETHODS Designingadrug(amoleculewhichaectsbiologicalproces seswithoutcausing injury)requiresnumerousstepsfromitsinceptiontoitsin troductionintothemarket. Thisprocesstakesapproximately10-15yearsasshowninFig 2-1 andcancostinthe orderofahalfbilliondollars.Thisisduetoboththevastne ssofchemicalspace[ 17 { 21 ] andthecostofresearchandtesting[ 1 ]. Compound Discovery Basic Research Ongoing Millions Compounds Safety Testing Preclinical Development 3-4 years IND Preparation IND Submission 1 year 1000 Compounds FormulationResearch ProcessDevelopment Phases I,II,III Pharmacokinetics Toxicology Clinical Development 6-8 Years NDA Preparation FDA Review NDASubmission 2-3 years 10 Compounds 1 Compound Figure2-1.DrugDevelopmentProcess.Adaptedfrom http://www.netsci.org/ .An IND(InvestigationalNewDrug)ispreparedandsubmittedto theFDA(Food andDrugAdministration)attheendofthepreclinicalphase ofdrug development.WithgoodresultsfromtheclinicalphaseanND A(NewDrug Application)issubmittedtotheFDAforapprovaltorelease thedrugtothe generalpublic. Thepre-clinicalphaseofDrugDesign(DD)iscarriedoutusi nganiterativeprocess (rstthreecolumnsofFig. 2-1 ).Itstartswithsomeknowledgeofatarget,i.eknown naturalsubstrateoracrystal/NMRstructureofthetargeto rreceptor.Thetargetis chosenbasedonsomeknownchemicalfeatureofabiologicald isease.Thedesigncycle takesmanystepssuchascomputationaldesign,liganddesig n,synthesis,biochemical 25

PAGE 26

evaluation,andcrystallographyconvergingtoadrugcandi dateorleadasshowninFig. 2-2 [ 22 ].Duringeachcycleofthisprocessdierentcomputational toolsareusedwith varyingcostsandaccuracieswhichwillbediscussedinmore detaillaterinthischapter. Attheendofthepre-clinicalphaseanIND(Investigational NewDrug)ispreparedto allowacompanytotestthedruginhumans.AftertheINDisapp rovedbytheFDA (FoodandDrugAdministration)theclinicalstage(4 th columnofFig. 2-1 )begins. PhaseIoftheclinicaltrialsteststhetoxicity,pharmacok ineticsorADME(Absorption, Distribution,Metabolism,andExcretion)properties,and dosageonapproaximately50 healthyvolunteers.PhaseIIevaluatesthedrugseectiven essandsideeectsonvolunteer patients( 500)anditisatthisstagewheremostadverseeectofthedru gsuseare observed.Thenalphase,PhaseIII,ofclinicaltrialsdete rminestheeectsoflongterm useonalargepoolofvolunteerpatients.AfterphaseIIIaco mpanypreparesanNDA (NewDrugApplication)andsubmitsittotheFDAforapproval toreleasethedrugto thegeneralpublic.TheNDAcontainsresultsofallclinical studiesandonceapprovedby theFDAthedrugcanbemarketed.Afterreleasethecompanyca rriesoutpost-marketing surveillanceofthedrugseectivenessinaso-calledphase IV. Target Information Crystallographic Analysis Computational LigandDesign Synthesis Biochemical Testing Drug Lead Figure2-2.TheIterativeDrugDesignProcess.Adaptedfrom BabineandBender[ 22 ]. 26

PAGE 27

Drugtargetsorreceptorsincludeenzymes,ionchannels,nu clearhormonereceptors, andDNA,whichinteractwithendogenousphysiologicalsubs tancessuchashormones andneurotransmitters.Therearecurrentlyover1200drugs approvedbytheFDAforthe therapeuticuseintheUnitedStates,25%ofwhichtargetenz ymes[ 23 ].Themajority ofenzyme-targeteddrugsareenzyme-substratebasedandmo stactvianon-covalent interactions.Drugsthatmimictheeectsofendogenousreg ulatorycompoundsare calledagonists,whilecompoundsthatdonothave100%activ ityaretermedpartial agonists[ 24 ].Drugsthatbindtoreceptorsbuthavenoactivityandpreve ntendogenous compoundsfrombindingaretermedantagonistsorinhibitor s.Therearetwomain typesofenzymaticinhibition,reversibleandirreversibl e.Reversibleinhibitionoccurs throughcompetitive,noncompetitiveanduncompetitiveme chanisms.Diureticsused tocontrolbloodpressureandmanyanti-depressiveagents, forexampleantagonists ofdopaminereceptors,arereversiblecompetitiveinhibit ors.Thesedrugscompetefor thesamebindingsiteasthenaturalsubstrate,buttheenzym ecannotprocessthe inhibitor,thuspreventingcatalyticactivity.Non-compe titiveorallostericinhibitorsbind todierentregionsoftheenzymeanddonotcompeteforthebi ndingsite.However,the processofbindingtheinhibitorcanchangetheshapeofthea ctivesitethuspreventing catalyticactivity.Uncompetitiveinhibitiontakesplace whentheinhibitoronlybinds theenzyme-substratecomplex,consequentlypreventingca talysis.Irreversibleinhibition occurswhentheinhibitorcovalentlyattachestotheenzyme activesitesuchasinhibitors ofCarbonicAnhydrase[ 25 ].Structuraldeterminationofreceptorsorcomplexesisof ten carriedoutbyx-raycrystallography.Itshouldbenotedtha ttheatomicpositionsfrom crystallographyhaveanassociatederrorandgenerallycan beintheorderof1 = 6ofthe resolution( 0 : 4 Auncertaintyfroma2 : 4 Aresolutionstructure)[ 22 ]. Afundamentalunderstandingoftheinteractionsbetweenre ceptorsandligandsis necessarytothedesignofnewdrugs.Theseforcesincludeio nicorelectrostaticeects, ion-dipoleanddipole-dipoleinteractions,chargetransf er,vanderWaals,andhydrophobic 27

PAGE 28

interactions.Moleculeswithhighbiologicalactivityusu allypossessashapethatis complementary(hydrophobic,electrostatic,andpolarcon tactsarepaireduponbinding)to thatofthereceptorsactivesiteasrstproposedbyFischer (\lock-and-key"hypothesis). 2.1Receptor-LigandBindingFreeEnergy Inthesimplestcase,receptor-ligandbindingcorresponds toasingleligandmolecule forminga1:1complexwithareceptorthatcontainsonlyasin glebindingsiteasshown inEq. 2{1 R representsthereceptor, L theligandand R L isthecomplex,where k 1 and k 1 aretheassociationanddissociationrateconstants,respe ctively. R + L k 1 ) k 1 R L (2{1) Atequilibrium,associationofareceptorandligandoccurs atthesamerateasdissociation andtheequilibriumconstants, K a and K d ,canbedenedas: K a = [ RL ] [ R ][ L ] = 1 [ K d ] (2{2) Itisacommonpracticetouse K d forpracticalreasonsasithasunitsofconcentration. K d istheconcentrationoffreeligandatwhichhalfoftherecep torbindingsitesatequilibrium areoccupied.Smallvaluesof K d correspondtoahighanitybetweenthereceptorand ligand. Togainafundamentalunderstandingofreceptor-ligandbin dingonemustbeginwith athermodynamicdescription.TheGibbsfreeenergyismosto ftenusedinbiochemistryas bindingexperimentsarecarriedoutunderconditionsofcon stanttemperature,pressure, andnumberofparticles. G ,Eq. 2{3 isthefreeenergychangeforthereaction, H and S aretheenthalpyandentropychangesrespectively,andTist hetemperature. G = H T S (2{3) Thechangeinfreeenergycanbeexpressedintermsoftheequi librium K d asfollows: G = G RTlnK d (2{4) 28

PAGE 29

where G isthestandardstate(1M,1bar)freeenergychangeand R isthegas constant.Whencomplexassociationanddissociationreach equilibrium, G =0,the expressiontakestheform: G = RTlnK d (2{5) Sincefreeenergyisastatefunctionitcanbecalculatedand comparedwithexperimental values.Thefreeenergyofbinding, G bind ,iscalculatedbydeterminingthefreeenergyof reactants,( G R + G L ),andproducts, G RL ,separately.Thesuperscript" "isdropped fromtheremainingequationsforsimplicity;however,itis implied. G bind = G RL ( G R + G L )(2{6) R + L G gas RL ??y G Rsolv ??y G Lsolv ??y G RLsolv R + L G solv RL Figure2-3.ThermodynamicCycleofReceptor-LigandBindin g UsingthethermodynamiccycleinFig. 2-3 andEq. 2{6 ,thefreeenergyofbindingin solution, G solvbind ,canbefullydecomposedinEq. 2{7 [ 26 ]. G gasbind isthefreeenergyof complexationinthegasphase.Thistermisdominatedbythee nthalpiccontributions fromstericandelectrostaticinteractions. G solv ,isthesolvationfreeenergyof complexation,whichincorporatesthedesolvationoftheli gand, G Lsolv ,receptor, G Rsolv andcomplex, G RLsolv G solvbind = G gasbind + G solv (2{7) where G gasbind and G solv aredenedbyequations 2{8 and 2{9 G gasbind = H gas bind T S gas bind (2{8) G solv = G RLsolv G Rsolv G Lsolv (2{9) 29

PAGE 30

Fortightbindingligandstheinteractionsinthecomplexar esignicantlystronger thanthoseofthereceptorandligandaloneinsolution.Also thefavorableenthalpic interactionsmustcompensatetheentropiclossofconforma tionaldegreesoffreedomfor boththereceptorandligandplusthethreerotationalandth reetranslationaldegrees offreedom.Itshouldbenotedthatsmallvariationsinacomp lex'sstability( G )in kcal=mol correspondstolargedierencesinanity( K d ).Forexample,adierenceof 5 kcal=mol coincideswiththreeordersofmagnitudevariationinobser vedanity. 2.2ComputationalDrugDesign Thecomputationalcomponentsofthedrugdesignprocesstak eplaceduringthe initialstagesofeachiterativecycleasshowninFig. 2-2 andthemainreasonsfortheiruse istoreducecostsandprovideatomiclevelinsightintorece ptor-ligandinteraction.This arecanbebrokendownintotwoarea:Structure-BasedDrugDe signandLigand-Based DrugDesign.Theformerrequiresstructuralknowledgeofth ereceptorwhilethelatter doesnotandbothwillbediscussedindetailbelow.Theearly iterationsofthedrug designprocessinvolvedthesearchingorscreeningofdatab asesofmoleculessuchas ZINC[ 27 28 ]andothercombinatoriallibraries[ 29 ]forcompoundswhichmaybeactive againstthetarget[ 30 { 32 ],thusseparatingdrugsfromnon-drugs[ 33 { 35 ].Screeningcan involvesimilarity/dissimilaritysearching[ 36 ]againstaknownactive/inactivemolecule. Compoundscanbecomparedtoeachotherin1D[ 37 ],2D[ 38 ]or3D[ 39 40 ]with thelatertechniquebeingthemostexpensive.Simplecounti ngtechniques[ 41 ]suchas Lipinski's\rule-of-ve"[ 42 ]arealsousedtolteroutnon-drugmolecules.Screensare usedtopredictADMEproperties[ 43 ]suchaqueoussolubility[ 44 ],hepatotoxicity[ 45 ], P450inhibition[ 46 ],andabsorption[ 47 ].Screensarealsocarriedouttopredictthe syntheticaccessibilityofcompoundsthusallowingforlat erfunctionalgroupoptimization [ 48 ].Subsequent\hits"fromascreenserveasleadcompoundsfo rmedicinalchemists. Denovodrugdesign[ 49 { 51 ]isanothertoolusedtoidentifynovelleadcompounds.This 30

PAGE 31

technique\grows"moleculesintheactivesiteofareceptor orpseudoreceptorfrom alignemntofknownactivemolecules[ 52 ]. Drug Candidate Lead Optimization Docking Scoring Database Screening Target million Compounds 1000s Compounds 100s Compounds milli-seconds seconds/ hours seconds/ hours Figure2-4.ComputationalComponentofDrugDesign.Timing sarepercompound. Lead,or\drug-like",compoundsareexpectedtohavegoodph armacokinetics andbeaccessibletosyntheticmodication.Thetransition fromaleadcompoundtoa drugcandidateinvolvesoptimizingstructuralandchemica lcomplementaritieswiththe receptor.Dockingandscoringaretoolstomeasurethecompl ementarybetweenleadand receptor[ 4 ].Dockingistheprocessbywhichaligandstructureisplace dintheactivesite ofthereceptorwhilescoringpredictsthebindingfreeener gyofcomplexformation.Lead optimizationisoftenusedtooptimizethepharmacokinetic sthroughfunctionalgroup substitution.Aschematicofthecomputationalaspectofdr ugdesignisshowninFig. 2-4 Thisisdrawnasafunneltohighlightthatthenumberofcompo undsdecreasesfromthe toptobottom;however,mostoftentheexpenseofcomputatio naltoolsusedincreases. 31

PAGE 32

Variousapproacheshaveemergedtocalculateorpredictthe bindingfreeenergy. Thesehavemetwithvaryingdegreesofsuccess.Theyinclude physics-,empirical-,and knowledge-basedscoringfunctions[ 5 53 { 57 ],andvariousQSARapproaches[ 10 11 ]. Theresultsofempiricalandknowledge-basedscoringfunct ionsarehighlydependent onparameterizationandthecalculationofbindingfreeene rgiesofcompoundsunlike thoseinthetrainingsetcanyieldspuriousresults.Physic sbasedscoringfunctionstry tomodeleachcomponentofEq. 2{7 fromrstprinciples.Physics-basedtechniquesand QSARapproachesareintroducedinthefollowingsectionsan dtheiradvantagesand disadvantagesindeterminingthefreeenergyofbindingare discussed. 2.3MolecularMechanics MolecularMechanics(MM)forceeldssuchasAMBER[ 16 58 { 60 ],CHARMM [ 61 ],MMFF[ 62 { 69 ],OPLS[ 70 ],andMM3[ 71 ]canbeusedtocalculatetheenthalpic componentofthebindingfreeenergybetweenthereceptoran dligand. TheAMBERenergyfunction,Eq. 2{10 ,containsbond,angle,dihedral,and non-bondedterms.Thebondandangletermsarerepresentedb yharmonicexpressions. ThevanderWaalstermisa6-12potential,andtheelectrosta ticisexpressedasa Coulombicinteractionwithatomcenteredpointcharges. E total = X bonds K r ( r r eq ) 2 + X angles K ( eq ) 2 + X dihedrals V n 2 [1+cos( n r )]+ X i
PAGE 33

ThefourthtermdescribesthestericinteractionasaLennar d-Jonespotential,where r ij isthedistancebetweenatoms i and j A ij = ij r 12 ij and B ij =2 ij r 6 ij areparameters thatdenetheshapeofthepotentialwhere r ij = r i + r j in A, r i isthevanderWaals radiusforatom i ,and ij = p i j i isthevanderWaalswelldepthinkcal/mol and q aretheatom-centeredpointcharges.Avigorousderivation ofthegradientsofthe AMBERfunctionaredescribedinAppendix B 2.4QuantumMechanics Higherordermolecularinteractionssuchaspolarizationa ndchargetransferare neglectedinmolecularmechanicsforceeldsduetotheirpo intchargebasedapproaches. Quantummechanicaltechniquesintrinsicallyincludesuch interactions.The highcomputationalcostof abinitio methodssuchasHartree-Fock(HF)andDensity FunctionalTheory(DFT)restricttheirusetosmallsystems suchasorganicmolecules, proteinactivesites,andmetalclusters. ThankstotheworkbyPople,DewarandStewartamongstothers ,theRoothaan-Hall equationshavebeenapproximatedandparameterizedtogive usaseriesofso-called SemiEmpirical(SE)methods.ThemostcommonlyusedSEmetho dsarederivedfrom theMNDO(ModiedNeglectofDierentialOverlap)[ 72 ]methodincludingAM1(Austin Model1)[ 73 ],PM3(ParametricModel3)[ 74 75 ],MNDO/d(MNDOwithdorbitals)[ 76 ] andPDDG/PM3(PairwiseDistanceDirectedGaussianmodica tionofPM3)[ 77 ]. Recently,DFTmethodshavebeenapproximatedcreatingtheS CC-DFTB(SelfConsistent-ChargeDensity-FunctionalTight-Binding)me thod[ 78 79 ].TheSCC-DFTB approachhasbeencomparedtothetraditionalSEmethods,AM 1andPM3,with comparableerrorsinpredictingheatsofformationforaset of622neutralmolecules; however,errorswerehigherthanthosefromthePDDG/PM3met hod[ 80 ]. SEmethodscanbeusedtocalculatethetotalelectrostatice nergyofamolecular system,whichisthesumoftheelectronicenergy, E el ,andcore-corerepulsion, E core core E tot = E el + E core core (2{11) 33

PAGE 34

where E el and E core core aredescribedinequations 2{12 and 2{13 .Intheseequations H is theone-electronmatrix, F istheFockmatrix,and P representsthedensitymatrix. Z is thenuclearchargeontheatom, R AB istheatomicseparationbetween A and B ,and N is thetotalnumberofatoms. E el = 1 2 X X ( H + F ) P (2{12) E core core = N X A =1 N X B>A Z A Z B R AB (2{13) TheuseofQMinSBDDcanbedividedintotwobroadcategories, receptor-based andligand-basedmethods(Fig. 2-5 ).Receptor-BasedDrugDesign(RBDD)methods includescoring-,QM/MMandcomparativebindingenergy(CO MBINE)-typemethods. RBDDrequireseitheranX-raycrystalorNMRstructureoflig andsincomplexwith therelevantreceptor.Ligand-baseddrugdesigntechnique sincludevariousQuantitative Structure-ActivityRelationship(QSAR)methods,whichre lyonlyonknowledgeofthe ligandstructure.Ingeneral,QSARcanbeconductedusingtw o-dimensional(2D)or three-dimensional(3D)structures;however,theusermust utilize3Dstructureswhen usingQMbecauseoftheneedtohaveanall-atomdescriptiono fthenucleiandassociated electrons[ 6 ]. 2.5LigandBasedDrugDesign OneoftheoldesttoolsusedinrationaldrugdesignisQSAR(Q uantitativeStructure ActivityRelationship)[ 81 ].QSARmodelsarederivedforasetofcompoundswith dependentvariables(activityvaluese.g. K i IC 50 ),andasetofcalculatedmolecular propertiesorindependentvariablescalleddescriptors.E achcompoundinthedatasetis assumedtobeinitsactiveconformation.Modelsaregenerat edusingstatisticaltechniques suchasMultipleLinearRegression(MLR),PrincipalCompon entRegression(PCR)[ 13 ], PartialLeastSquaresRegression(PLSR)[ 82 ],andComputerNeuralNetworks(CNNs) 34

PAGE 35

QM+SBDD Ligand-Based Receptor-Based Field-based eg.QM-QSAR 3D-QSAR eg.AMPAC+CODESSA Scoring eg.QMScore QM/MM eg.DivCon/AMBER COMBINE eg.SE-COMBINE Figure2-5.HierarchyofQMmethodsusedinSBDD.[ 83 ]tonameafew.Ligand-basedmethodscanbefurtherdividedi ntotwocategories, 3D-QSARandeld-basedmethods.Bothwillbetouchedonbelo w. 2.5.13D-QSARwithQMdescriptors Thedescriptorsusedin3D-QSARareusuallydividedintothr eecategories:1) Electronic,suchasHOMOandLUMOenergies,2)Topological, forexampleconnectivity indices,and3)Geometricsuchasmomentofinertia.Themode lsinallcasesareoften createdusingmultivariatestatisticaltoolsduetothelar genumberandhighdegreeof collinearityofdescriptors.AnexcellentreviewbyKarels on,Lobanov,andKatritzky providesdetailsofQMbaseddescriptorsusedinQSARprogra mssuchasCOmprehensive DEscriptorsforStructuralandStatisticalAnalysis(CODE SSA)[ 84 ].Theseincludethose thatcanbeobservedexperimentally,suchasdipolemoments ,andthosethatcannot,such aspartialatomiccharges.Clarkandco-workershaverecent lyusedAM1-baseddescriptors todistinguishbetweendrugsandnon-drugsandtounderstan dtherelationshipbetween descriptorsandtheirphysicalproperties[ 85 ]. Mostdescriptorsarecalculatedatthesemiempiricallevel oftheoryusingprograms suchasAMPACorMOPAC.However,withcomputerspeedincreas ingsteadilytheuseof 35

PAGE 36

abinitio andDFTmethodsarebecomingincreasinglycommon.Thesemet hodsallowthe descriptorstobecalculatedfromrstprinciples.Yangand co-workersexaminedvarious DFT-baseddescriptorstogeneratemodelsforaseriesofpro toporphyrinogenoxidase inhibitors.ItwasshownthattheDFT-basedmodeloutperfor medthePM3basedmodel [ 86 ]. 2.5.2Field-basedMethods CoMFA(ComparativeMolecularFieldAnalysis)[ 10 ]andCoMSIA(Comparative MolecularSimilarityIndicesAnalysis)[ 11 12 ]areeld-basedorgrid-basedmethods whereallthecompoundsinthedatasetarealignedontopofon eanotherandstericand electrostaticdescriptorsarecalculatedateachgridpoin tusingaprobeatom.Asaresult therearemanymoredescriptorsthanmolecules,thereforea PartialLeastSquares(PLS) dataanalysisisusedtogeneratelinearequations.Astudyb yWeaverandco-workers comparesdierenteld-basedmethodsforQSARincludingCo MFA,andCoMSIAnding thateld-basedmethodsprovidearobusttooltoaidmedicin alchemists[ 87 ].Absent fromthetraditionalMFAapproachesarequantummechanical lyderiveddescriptorsof electronicstructure.QMQSARisarelativelynewtechnique wheresemiempiricalQM methodsareusedtodevelopquantummoleculareld-basedQS ARmodels[ 88 ].Placing thealignedtrainingsetligandsintoanelyspacedgridpro ducesquantummolecular elds,whereeachligandischaracterizedbyasetofProbeIn teractionEnergy(PIE) values.APIEisdenedasthe\electrostaticpotentialener gyobtainedbyplacinga positivelychargedcarbon's2sorbitalatagivengridpoint g i andsummingtheattractive andrepulsivepotentialsexperiencedbythatelectronasit interactswiththeeldofthe ligandL": PIE = h s i s i j V ( L ) i (2{14) = Z r 1 s i ( r 1 ) s i ( r 2 ) N atoms X =1 z j r 1 r j X 2 X 02 P 0 Z r 2 ( r 2 ) 0 ( r 1 ) j r 1 r 2 j dr 2 #! dr 1 36

PAGE 37

Thenuclearcharge z issimplythenumberofvalenceelectronsonatom andthe notation 2 indicatesthesetofvalenceatomicorbitalscenteredonato m .Density matrixelements P 0 aregivenbythefollowingsumovertheoccupiedMOs: P 0 =2 N occ X k =1 c k c 0 k (2{15) Whenappliedtodatasetscontainingcorticosteroids,endo thelinantagonists,and serotoninantagonists,linearregressionmodelswereprod ucedwithsimilarpredictability comparedtovariousCoMFAmodels.2.5.3Spectroscopic3-DQSAR TheSpectroscopicQSARmethods[ 89 90 ]includeEVA(Vibrationalfrequencies), [ 91 { 98 ]EEVA(MOenergies),[ 99 { 102 ]andCoSA(NMRchemicalshifts)[ 103 ].Itisa requirementof3-DQSARthatallcompoundswhicharebeingst udiedcontainthesame numberofdescriptors.However,noneoftheabovetechnique sobeythisrequirement.The numberofvibrationalfrequenciesandMOsaredependentont henumberofatoms, N ,in amolecule(3 N 6,or3 N 5iflinear).WhilethenumberofNMRchemicalshiftsdepends onthenumberofatomicisotopeswithNMRactivenuclei, N .Asolutionofthisproblem istoforcetheinformationontoaboundscaleusingaGaussia nsmoothingtechnique, wheretheupperandlowerlimitsofthisscaleareconsistent forallcompoundsinthedata set.AGaussiankernel, f ( x ),withastandarddeviationof isplacedovereachcalculated point,EVA,EEVAorNMRchemicalshiftasshowninEq. 2{16 .Summingtheamplitudes oftheoverlaidGaussianfunctionsatintervals x alongthedenedrangeresultsinthe descriptorsforeachmolecule, ^ f ( x ),asshowninEq. 2{17 .Thisprocessisillustratedin gures 2-6(a) through 2-6(c) f ( x )= 1 p 2 exp ( x X A ) 2 = 2 2 (2{16) ^ f ( x )= shifts X i i exp i ( x X i ) (2{17) 37

PAGE 38

050100150200 0.000.010.020.030.04 ppm NMR Spectrum (a)CalculatedNMR C 13 ChemicalShifts 050100150200 0.000.010.020.030.04 ppm NMR Spectrum with Gaussian Kernels interval = 1 sigma = 10 (b)NMR C 13 ChemicalShiftswithGaussianKernels 050100150200 0.00.10.20.30.40.50.6 ppmDensityNMR Spectrum with Gaussian Kernels + BNMRS interval = 1 sigma = 10 (c)NMR C 13 ChemicalShiftswithGaussian Kernelsplusthespectrumprojectedontoaboundscale(BNMRS) Figure2-6.NMRQSAR.CalculatedNMRspectraforasteroidmo leculewithGaussian kernelsplaceateachshiftfollowedbyaboundscaleproject edfromthe spectrum. Thesedescriptorscontainawealthofstructuralinformati onwhenweconsiderthe physicalbasisofthemethods.IRspectroscopyprovidesinf ormationconcerningthe presenceofmolecularfunctionalgroupsandNMRchemicalsh iftsarehighlydependenton 38

PAGE 39

substituenteectsinacongenericseriesofcompounds.MOe nergiesgivetheelectronic structureofthemoleculesuchastheHOMO-LUMOenergiestha tplayanimportantrole inthebindingprocess. Thechoiceoftheoryusedtocalculatethesedescriptorsdep endsonthenumberof compoundsinthedatasetandtheaccuracythatisrequired;a llcanbecalculatedusing SEor abinitio methods.TheQSARresultsalsodependonthechoiceof and x inthe aboveequations. Thesemethodshaveprovidedpredictivemodelsforanumbero fdatasetsandhave anadvantageovertheeld-basedmethodsbecausetheyareal ignment-free,inotherwords thereisnoneedtosuperimposethestructuresinthedataset .Asikainenandco-workers providedacomparisonofthesemethodsinarecentpaperwher etheystudiedestrogenic activityinaseriesofcompounds[ 89 ]. 2.5.4QuantumQSARandMolecularQuantumSimilarity TheCarbogrouphasbeeninvolvedinthedevelopmentofthe eldofquantumQSAR andmolecularquantumsimilaritysincethe1980s[ 104 ].TheQuantumSimilarityMeasure (QSM)betweenanytwomolecules, A and B ,canbecalculatedusingthefollowing: Z AB = h A j n j B i = ZZ A ( r 1 )n( r 1 r 2 ) B ( r 2 ) dr 1 dr 2 (2{18) wherenissomepositivedeniteoperator(e.g.kineticener gyorCoulomb)and isthe electrondensity.TheQSMscanbetransformedintoindicesr angingbetween0and1 using: r AB = z AB p z AA z BB (2{19) yieldingthesocalledCarboSimilarityIndex(CSI).Calcu latinganarrayofQSMsorCSIs betweenallmolecularpairsinsomedatasetprovidesdescri ptorsforQuantumQSAR (QQSAR)[ 105 ]. AdrawbackoftheCoMFA-basedmethodsistheneedtosuperimp osethemolecules inthetrainingset.Thisisnoeasytaskduetothemanydegree soffreedom(bothrigid 39

PAGE 40

andinternalmotions).However,thealignmentofthemolecu larstructuresinacommon 3Dframeworkprovidesaconvenientmethodofdeterminingwh ichregionsofthemolecules impactactivityandwhichregionscanbedevelopedtocreate newcompoundswithmore favorableproperties.Recently,QSMshavebeenusedwithaL amarckiangeneticalgorithm calledtheQuantumSimilaritySuperpositionAlgorithm(QS SA)tosuperimposethe classicCoMFAdataset[ 106 ].TheQSSAisperformedinsuchawayastomaximizethe molecularsimilarityanddoesnotrelyonatomtypingasothe rempiricalbasedmethods do. Popelierandco-workershavecoupledtheAtoms-In-Molecul es(AIM)theoryofBader withquantummolecularsimilaritytoproduceQuantumTopol ogicalMolecularSimilarity (QTMS)[ 107 ].Itusestheso-calledBondCriticalPoints(BCPs)ofprede nedbonds inaseriesofmoleculesasdescriptorsfollowedbymultivar iatestatisticalanalysis.The seriesofcompoundsmusthaveacommoncoreforthismethodto remaincomputationally tractable.QTMShasbeenusedtogeneratemodelstoestimate thevaluesforasetof aliphaticcarboxylicacids,anilines,andphenols[ 108 ]. 2.6ReceptorBasedDrugDesign Theclassic\Pac-Man"representationofReceptor-Ligandb indingisshowninFig. 2-7 wherethereceptorisdepictedontheleftsubdividedintore siduesandontherightis asmallmoleculesplitintofragments.Proteinscanbesplit usingstandardaminoacids denitions,whileligandstructurescanbedecomposedusin gfunctionalgroupdenitions. Thebindingfreeenergiesbetweenreceptorsandligandscan becalculatedusingclassical andquantummechanicalmethods.InmostcaseswhenQMisused inSBDDasingle snapshotofthiscomplexistakenandtheinteractionenergy isdetermined.Taking ensembleaveragesisexpensiveandtimeconsuming.Thesche meinFig. 2-8 isamatrix orgraphicalcomparisonbetweenclassicalandquantummech anicalmethodsinSBDD. Thisschemeisdividedintothreeparts,rstontheleftisal argeboxwhichrepresents areceptormadeupofsmallerboxesorresidues,I,suchasami noacidsorbases.The 40

PAGE 41

Figure2-7.TheClassic\Pac-man"RepresentationofRecept or-LigandBinding.The receptorisdepictedontheleftsubdividedintoresiduesan dontherightisa smallmoleculesplitintofragments. Figure2-8.PWDDensityMatrixRepresentationdarkblueboxrepresentshowalltheotherresiduesintherec eptorpolarizethatresidue. Polarizationiswherethechargescenteredoneachatomarea llowedtorelaxintheeld ofallothercharges.Thelighterblueboxsymbolizesthecha rgetransferthatcanoccur betweenresiduesinareceptor. Thesmallerboxinthemiddleoftheguresymbolizesaligand andthesmallerboxes itcontainsaremolecularfragments,J.Thepinkandyellowb oxescanbedescribedina similarfashiontotheboxesofthereceptor. 41

PAGE 42

Thelargestboxontherightisthecomplexstructure.Bothth eresidues,I,upon bindingthefragments,J,areallowedtorelaxinthepresenc eoftheeachother.TheI residuesaretransformedfromdarkbluetomustardwhilethe Jfragmentsarechanged frompinktobrown.Thisispolarization;however,nowitisc ausedbycomplexformation. Mostclassicalpotentialscannotmodeltheseeects;howev er,recentlytherehavebeen someattemptstoincorporatepolarizationintoclassicalm ethodssuchas02andamoeba [ 109 ].Conversely,QMmethodsincludetheseinteractionsimpli citly.TheI-K(blueto grey)andJ-L(yellowtomagenta)interactionsoriginatefr omtheeectbindinghasonthe intramolecularinteractionofthereceptororligand.Thes einteractionscanincludecharge transferandpolarization.TheI-Jinteractions(redbox)a rethemostimportant.Both methodscancalculatetheseandtheyareonlypresentinthec omplexstructure.Classical potentialsdescribetheCoulombicandvanderWaalsinterac tionsorelectrostaticand dispersiveeectsbetweenthemoieties.TheQMmethodsgoas tepfurthertoincludethe otherhigherordereectssuchaspolarizationandchargetr ansfer.ThisiswheretheQM methodsbegintodescribethephysicsofthesystemmorecomp letely;however,thisdoes notnecessarilysuggestthattheyaremorepredictive! 2.7SemiempiricalDivide-And-ConquerApproach Veryfewfullquantummechanicalstudiesofwholeproteinsh avebeenpublished[ 7 9 110 ]butwiththeincreasingspeedofcomputersandlinearscali ngDivide-and-Conquer (D&C)techniquestheabilitytoincludethewholeproteinis nowpossible[ 111 { 113 ]. TheD&Cmethodtakesadvantageofthelocalcharacterofchem icalinteractionsthat causethemagnitudeofdensitymatrixelementstodecreasee xponentiallywithdistance. ThroughtheuseofcutosfortheFockanddensitymatricesan dD&Ctechniques,the \nearsightedness"ofchemicalinteractionscanbeexploit edwithoutlossofaccuracy[ 114 ]. TheD&Cmethoddividesthemolecularsystemintooverlappin gsubsystemswhereeach localizedRoothaan-Hallequationcanbesolvedseparately : F C = C E (2{20) 42

PAGE 43

where F C ,and E arethesubsystemFock,coecient,andorbitalenergymatri ces. Theoverlapmatrix, S ,inSEmethodsissetequaltotheidentitymatrixduetothe NDDO(NeglectofDierentialDiatomicOverlap)approximat ion: A B j C D = A A j C C AB CD (2{21) where AB istheKroneckerdeltafunction: AB = 8><>: 1if A = B 0otherwise. (2{22) ThediagonalizationoftheglobalFockmatrixisthemostexp ensivepartofastandard SEcalculationcomparedtothetwo-centertwo-electronint egralevaluationwhichisthe bottleneckof abinitio methods.However,subdividingtheglobalFockmatrixinthe D&C methodreplacesglobaldiagonalizationwithsubsystemdia gonalizationswhichscales linearlywiththenumberofsubsystems, n sub ( N ) 3 .Thesubsystemdensitymatrices areusedtoassembletheglobaldensitymatrixandthetotale nergyiscalculatedusing Equation 2{11 .ThesubsettingschemeinD&Cmethodsisthekeytoitsecien cy. Usually,eachsubsystemcomprisesacoreregionsurrounded byoneormorebuerregions. Inproteinsystems,ithasbeenshownthattreatingeachamin oacidasacorewitha4.5 A/ 2.0 A-bueringschemetsthecompromiseofcomputationaleci encyandaccuracy.The D&Cmethodisnothowever,theonlylinearscalingSEmethod, othermethodsinclude densitymatrixminimization[ 115 ],andthelocalizedmolecularorbitalmethod[ 116 ]. Recently,RahaandMerzreportedaSED&Cbasedscoringfunct ion,QMSCORE, [ 117 118 ]whichiscapableofpredictingthebindingfreeenergyofpr otein-ligand complexes.QMSCOREisderivedusingcurrenttechnologiest obestdescribethemaster equation 2{7 : G bind = H gas bind + LJ 6 + S solv + S conf + G solv (2{23) 43

PAGE 44

Theenthalpicinteractionsinthegasphase, H gas bind ,betweentheprotein-ligandwere determinedusingsemiempiricalHamiltonianssuchasAM1an dPM3.Theattractivepart oftheLennard-Jonespotential, LJ 6 ,wasusedtorepresentthedispersiveinteractions neglectedbySEmethods.Thesolvententropy, S solv ,andconformationalentropy, S conf ,wereaccountedforbysolventaccessiblesurfacearea(SAS A)andnumber ofrotationalbonds.Thesolvationfreeenergyduetocomple xation, G solv ,was calculatedusingaPoisson-Boltzmanncontinuumapproach. QMSCOREwasapplied to165protein-ligandcomplexesincludingHIVprotease,Se rineprotease,FKBP,and DHFR.Althoughtherewasasubstantialincreaseincomputat ionalcost,itshowedbetter performancethanotherscoringfunctionssuchasAutodock, DrugScoreandLigScore. 2.8PairwiseEnergyDecomposition(PWD) QMmethodsarefrequentlyusedtodeterminetheelectronice nergyofmolecular systems.Electronicenergiesarequantitiesthatcharacte rizethewholesystemanddonot provideanyinformationregardingthekeyinteractionstak ingplace.UnlikeaMMforce eld,QMdoesnoteasilylenditselftodescriptionsofenerg eticsinapairwisefashion. However,workrstdonebyFischerandKollmarusingamodie dCNDO(Complete NeglectofDierentialOverlap)methodpartitionedtheene rgyintomono, E A ,and bicentricterms, E AB [ 119 ]. E TOT = N X A =1 E A + N XA =1 N X B>A E AB (2{24) = X A E A + X B

PAGE 45

ligands[ 7 ].SimilartothedecompositionbyFischerandKollmar,thet otalenergycan becalculatedbysummingthemonoandbicentrictermsasshow ninEquation 2{25 Thebicentrictermiscomprisedofarepulsiveterm E 0 AB ,anexchangeterm, E AB ,anda core-corerepulsionterm E core AB (Eq. 2{26 ). E core AB = X A X B

PAGE 46

residue I S O O NH 2 O HN F Figure2-9.SchematicDiagramofthehumanCarbonicAnhydra seIIinhibitor Fragmentation.Thestructureinblueisthesulfonamidemoi ety.Theamide groupiscoloredgreenwhiletherouro-substituedphenylgr oupisshownin red. E TOT = X I E res I + X J
PAGE 47

parameterizedfunctionand B AC isthebondorderbetweenatoms A and C q CM 1 i = q Mulliken i + X A 6 = C f CM 1 ( B AC )(2{33) TheatomicpointchargesusedintheAMBERFFarederivedusin gtheMerz-SinghKollman(MK)[ 126 ]andRestrainedElectroStaticPotential(RESP)[ 127 { 129 ]schemes. TheMKchargesaregeneratedfromQMtoreproducetheelectro staticpotentialatpoints aroundthemolecule.Thettingprocedurebeginsbygivinge achatomavanderWaals radiusandthenformingagridencompassingthemolecule.Ch argesarethengenerated usinggridpointsnotwithinthevanderWaalsvolume.TheMKp rocedurecanoften leadtoasymmetricalorlargechargesbeingplacedonatomsw hichleadstoproblems simulatingbiomolecularinteractions.TheRESPschemesol vestheseproblems. 2.10ComparativeBindingEnergyAnalysis(COMBINE) COMBINEwasrstdevelopedbyWadeandOrtizin1995wherethe ymodeled thebindingfreeenergiesofaseriesofligandstoareceptor usingaMMpotentialand PLS[ 8 130 ].TheCOMBINEmethodhasbeensuccessfullyappliedtoprote in-ligand [ 131 { 134 ],RNA-ligand[ 135 ],protein-DNA[ 136 ],andprotein-peptide[ 137 ]complexes wherepredictiveQSARmodelshavebeengenerated.COMBINEw asalsousedinrexible virtualscreeningapplicationofFactorXainhibitorsbyMu rcia[ 138 ]. Theenthalpicinteractionbetweenthereceptorandtheliga ndisapproximatedusing aMMfunctionandsolvationinteractionwithPoisson-Boltz mannorGeneralized-Born methods.TheHamiltonian, U ,inMMmethodsforthebindingofaligandtoa receptorcanbedescribedbyEq. 2{35 where u VDWij and u ELEij arethevanderWaals andelectrostaticinteractionsbetweenatoms i ofthereceptorandatom j oftheligand. The u B;A;Ti arethechangesinbondlengths,anglesanddihedralsand u NBii 0 arethenew intramolecularnon-bondedinteractionsuponbinding.The premiseoftheCOMBINE approachisthat U ,canbeapproximatedbyaweightedlinearcombinationofthe most importantenergeticinteractions, u i ,betweentheligandandthereceptorasshowninEq. 47

PAGE 48

2{35 .Ithasbeenshownusingexperimentalapproachessuchasthe rmodynamicdouble mutantcyclesandpHtitrationthatonlyasmallnumberofsit esofthereceptorplaya roleinbinding.Onthisbasis,theapproximationcentralto COMBINEcanbeconsidered reasonable. U = n l X i =1 n r X j =1 u VDWij + n l X i =1 n r X j =1 u ELEij +(2{34) n l X i =1 ( u B;Li + u A;Li + u T;Li )+ X i
PAGE 49

TheSE-COMBINEmethodwasusedtoelucidatethemostimporta ntinteractions betweentrypsinandaseriesofinhibitors.Protein-ligand interactionenergiesare decomposedtondthemostorleaststabilizinginteraction saswellasprovideameans toidentifyregionsofsignicantvariation(therebytarge tingareasthatcouldbenetfrom moreoptimization).Themultivariatestatisticaltools,P CAandPLS,wereusedtomine theinteractionsbetweenthereceptorresiduesandtheliga ndfragmentstogenerateQSAR models.Thefragmentationschemeusedinthisstudyisshown inFig. 2-11 whereeach ligandcontainsthe3-amidino-phenylalaninemoiety.Thea uthorsintroducedso-called IMMs(InterMolecularinteractionMaps),anexampleofwhic hisgivenin 2-12 ,which enabletheresearchertographicallyviewwhereacandidate drugcouldbemodiedor optimized. E INT = P I P J ( P A P B E AB + E 0 AB + E core AB )+ A 2 I;B 2 J P I P K < I ( P A P B E AB + E 0 AB + E core AB )+ A 2 I;B 2 K P J P L < J ( P A P B E AB + E 0 AB + E core AB )+ A 2 J;B 2 L P I P A E A + P B

PAGE 50

H 2 N NH 2 HN S O N H O O R 1 R 2 Figure2-11.SchematicDiagramofaTrypsinInhibitorFragm entation.Thestructurein blueisthe3-amidino-phenylalaninemoiety(APM).TheTOSg roupis coloredgreenwhilethePIPgroupisshowninred. edges, E ,whereanedgeisanunorderedpairofvertices. V = f v 1 ;v 2 ;v 3 ;v 4 ;v 5 ;:::;v n g E = f e 12 ;e 23 ;e 34 ;e 37 ;:::;e m g ,and G = f V;E g .Theorderandsizeofagraphisthe numberofvertices, n ,andthenumberofedges, m ,respectively.Thedegreeofavertex v of G isthenumberofedgesincidentupon v .Connectedgraphscontainaroutefrom everyvertextoeveryother.Multigraphs(multiplebondcon tainingmolecules)aregraphs whichcontainrepeatededgesbetweenverticeswhileasimpl egraphdoesnotcontainany. Adirectedgraph,ordigraph,isagraphwithdirectionsassi gnedtoeachedge.Complete graphsaredenotedby K n andaregraphswhereanedgeconnectseverypairofvertices. A labeledgraphisonewheretheverticesand/oredgesaregive nlabels.Aweightedgraphis atypeoflabeledgraphwherethelabelsarerealnumbers. Awalkin G isasequenceofvertices w =[ v 1 ;v 2 ;:::v k ] ;k 1,suchthat[ v j ;v j +1 ] 2 E for j =1 ;:::;k 1.Thewalkisclosedif k> 1and v k = v 1 ,andopeniftheyaredierent. Awalkiscalledapathiftherearenorepeatedvertices.Aclo sedwalkwithnorepeated verticesotherthanitsrstandlastoneiscalledacycle.Th elength l ofawalkisthe numberofedgesitcontains(openwalks: l = s 1,closedwalks: l = s ,where s isthe numberofverticesvisited).Thetermspathandchaindescri beanopenwalkandawalk 50

PAGE 51

DescriptorCompound PHE41 PIP3IJ HIS57 APM2IJ HIS57 PIP3IJ TYR94 PIP3IJ SER96 TOS1IJ SER96 PIP3IJ ASN97 TOS1IJ THR98 TOS1IJ LEU99 TOS1IJ LEU99 PIP3IJ GLN175 TOS1IJ ASP189 APM2IJ SER190 APM2IJ GLN192 APM2IJ GLN192 TOS1IJ GLN192 PIP3IJ GLY193 APM2IJ GLY193 PIP3IJ ASP194 APM2IJ SER195 APM2IJ SER195 PIP3IJ VAL213 APM2IJ SER214 PIP3IJ TRP215 APM2IJ TRP215 TOS1IJ TRP215 PIP3IJ GLY216 APM2IJ GLY216 TOS1IJ SER217 TOS1IJ GLY219 APM2IJ GLY219 TOS1IJ CYS220 APM2IJ GLY226 APM2IJ WAT235 APM2IJ 11 86 84 9 37 14 58 42 10 4 13 2 34 83 1 6 73 3 21 19 48 18 74 33 61 81 75 24 32 49 55 78 57 15 20 12 50 38 76 7 8 40 25 80 17 5 26 39 54 82 35 51 67 22 36 43 45 53 77 28 30 16 85 79 47 27 46 65 41 44 23 31 29 59 52 56 60 64 63 87 66 69 71 62 88 70 68 72 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 Figure2-12.ModelLig3CIntermolecularInteractionMap(I MM)oftheimportant E AB descriptors.Thekeyresiduesoftrypsinthatinteractwith thetriplefragment ligand(APM,TOS,andPIPseeFig. 2-11 )labelthex-axis.Thecompounds onthey-axisareorderedwithrespecttoactivity.Theactiv itydecreasesfrom toptobottom.Thelegendindicatesthemagnitudeoftheunsc aleddescriptor ineV. 51

PAGE 52

v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 9 v 8 e 12 e 78 (a)SampleGraph C C C C C C N CC (b)MolecularGraph Figure2-13.GraphstoLinktheTerminologyusedinGraphThe oryandChemistry. inwhichallvertices(andedges)aredistinct,respectivel y.Cyclesandpathsofsize n are denotedby C n and P n ,respectively.Ablockisagroupofverticessuchthatalled ges betweenthemareinvolvedinoneormorecycles.Anopenacycl icvertexisavertexthat isnotlocatedbetweentwoblockswhileaclosedacyclicvert exislocatedbetweentwo blocks. ThegraphinFig. 2-13(a) containsacycle, R = f R V ;R E g where R V = f v 7 ;v 8 ;v 9 g and R E = f e 78 ;e 89 ;e 97 g ,oftype C 3 R isasubgraphof G wheretheverticesandedgesof R aresubsetsof G orinotherwords R isisomorphictoasubgraphof G .Reversely, G isa supergraphof R .Thedeterminationifthegraph G 1 isisomorphictoasubgraphof G 2 is aknownasthesubgraphisomorphismproblemwhichisNP-comp lete(Non-deterministic Polynomialtime).Thetermcliqueisusedforasetofvertice swhereanedgeexists betweeneachpair.Acliqueisasubgraphof G anditselfisacompletegraph.A k -clique isacliqueoforder k .Cliquedetectionormaximumcommonsubgraphisomorphism isamethodtondthelargestsubgraphof G 1 isomorphictoasubgraphof G 2 .The subgraphisomorphismandmaximumsubgraphisomorphismpro blemsareknownin cheminformaticsassubstructuresearchingandpharmacoph oremapping.Amoleculecan berepresentedasagraphwheretheatomsaretheverticesand bondsaretheedgesasin Fig. 2-13(b) .Thisisalabeledorcoloredgraph,inotherwordseachverte xislabeledwith theelementtypeandeachedgeiscoloredwiththebondorder. Thesimilaritybetween thetwostructures(graphandmolecule)canbeseeninFigure s 2-13(a) and 2-13(b) .A dictionaryoftermsiscompiledinTable 2-1 52

PAGE 53

Table2-1.CorrespondencebetweenGraphTheoryandChemica lTerminology. GraphTheory Chemistry ConnectedGraph Molecule GraphOrder NumberofAtoms Graphsize Numberofbonds VertexDegree Numberofbondedatoms LeafVertex Terminalatom Closedpath/Cycle Ring CycleType RingSize Chain Chain Block ClusterofRings SubgraphIsomorphismProblem SubstructureSearching MaximumCommonSubgraphIsomorphism pharmacophoremapping Atree, T =( T V ;T E ),isaconnectedacyclicgraph.Treescontainleaveswhicha re verticesofdegree1andnon-leafvertices.Arootisavertex wherealledgespointaway fromit.Aforestisasetofdisjointtreeswhilea\k-arytree isarootedtreeinwhich everyvertexhas k children".Treesareoftenusedinconformationalsearchin gandother combinatorialproblems. Amatchingoredgeindependentset, M ,of G isasubsetoftheedges,suchthatno twoedgesin E sharesavertex.Therearethreetypesofmatchingcalledmax imum, maximal,andperfect.Amaximummatchingisamatchingofhig hestcardinality. Maximalmatchingisamatchingwherenootheredgescanbeadd ed,whileaperfect matchingcontainsallverticesofthegraph.Amatchingisma ximumifandonlyifithas noaugmentingpath.Anaugmentingpathisanalternatingpat hwhichstartsandends withfreeorunmatchedvertices.Analternatingpathdescri besamatchingwherethe edgesarealternatelyin M andnotin M .Formoleculargraphsthemaximumweighted matchingalgorithmisatechniqueofassigningdoubleandtr iplebondsandcorrespondsto maximizingthenumberofdoublebondsinapisystem[ 142 ]. Therearevariouswaysoftraversingorsearchingagraph.On esuchtechniqueisthe depth-rstsearch.Thisisimplementedasarecursiverouti neandtrackswhichvertexand edgeareencounteredthusonlyvisitingeachonce. 53

PAGE 54

(a)MolecularGraph v 6 v 7 v 8 v 9 v 10 v 5 v 2 v 1 v 3 v 4 (b)Graph v 6 v 7 v 8 v 9 v 10 v 5 v 2 v 1 v 3 v 4 (c)MaximumMatching v 6 v 7 v 8 v 9 v 10 v 5 v 2 v 1 v 3 v 4 (d)MaximalMatching (e)KekuleStructuresofBenzene Figure2-14.GraphstoLinktheTerminologyusedinGraphThe oryandChemistry. 2.13StatisticalMethods AllRBDDandLBDDmethodsusestatisticalmethodstocorrela tewithorpredict bindingfreeenergies.Commonstatisticalmeasuresortool sincludemean(Eq. 2{38 ), centering(Eq. 2{39 ),samplevariance(Eq. 2{40 ),standarddeviation( p 2 ),Z-score(Eq. 2{41 ),andcovariance(Eq. 2{42 )where E i istheobservablequantityforelement i and N isthetotalnumberofobservables. h E i = 1 N N X i =1 E i (2{38) E i = E i h E i (2{39) 2 = 1 N 1 X i E i h E i 2 (2{40) Z score = E i h E i (2{41) 54

PAGE 55

Cov = 1 N 1 X i ( E i h E i )( E 0 i h E 0 i )(2{42) MLRisanextensionoftheordinaryleastsquaresmethodwher emorethanone independentvariableisusedtoderivetheQSARmodel,which takesthefollowingform: Y = BX + E (2{43) where B isthematrixofregressioncoecientsand E aretheresiduals.Thequalityofthe tcanbeaccessedusingthePearsonscorrelationcoecient r ,asshowninEq. 2{44 r = P Ni =1 ( x i h x i )( y i h y i ) q [ P Ni =1 ( x i h x i ) 2 ][ P Ni =1 ( y i h y i ) 2 ] (2{44) ThesquareofthePearsonscorrelationcoecient, r 2 or R 2 ,isoftenreportedand describesthegoodnessoft.Cross-validationorjack-kni ngisatechniquethatchecks thequalityofthet.Itremovessomeofthedependentvariab lesfromthedatasetand derivesamodelwiththeremainder.Itthenpredictsthevalu esofthedatathathavebeen leftout.ThePRESSvalue(Eq. 2{45 )istheresidualsumofsquaresofthedataleftout andisusedinthecalculationof q 2 or Q 2 (Eq. 2{46 ),orcross-validated R 2 ,whichisthe measureofpredictability.Manydierentformsofcross-va lidationcanbeusedbutthe mostcommonisthe`Leave-One-Out'orLOOscheme,whereeach dependentvariableis leftoutandpredictedinturn. PRESS = N X i =1 ( y pred;i y i ) 2 (2{45) Q 2 =1 PRESS P Ni =1 ( y pred;i h y i i ) 2 (2{46) Togetherwiththecorrelationcoecient, R 2 ,andthecross-validatedcorrelation coecient, Q 2 ,thestandarddeviationoferrorofcalculations, SDEC ,andthestandard deviationoferrorprediction, SDEP areusedtoassessthequalityofthemodel. SDEP canalsobedenedastherootmeansquarederrorofthedepend entvariablesinaLOO schemeorexternaldatasetasshowninEq. 2{47 .Similarly,SDECiscalculatedforthose 55

PAGE 56

variablesusedtobuildthemodelortrainingset. r PRESS N (2{47) Unsignedorabsoluteerror(Eq. 2{48 )andsignederror(Eq. 2{49 ),meansquarederror (Eq. 2{50 )andRootMeanSquaredDeviation(RMSD,Eq. 2{51 )arealsooftencalculated tomeasurethequalityofthemodels. ae = P i abs ( E i E 0 i ) N (2{48) se = P i ( E i E 0 i ) N (2{49) MSE = P i ( E i E 0 i ) 2 N (2{50) RMSD = s P i ( E i E 0 i ) 2 N (2{51) PCAisamethodtoreducethedimensionalityofthedescripto rspacebygenerating linearcombinationsoftheoriginaldescriptorscalledPri ncipalComponents(PCs)that bestdescribetheirvariance.UsuallythenumberofPCsissm allerthanthenumber ofdescriptorsandindoingsorevealsthe\underlyingfacto rsorcombinationsofthe originalvariablesthatprincipallydeterminethestructu reofthedatadistribution" [ 143 ].GenerallyinPCAthe X matrixis\autoscaled"(Eq. 2{41 )orinotherwordseach descriptorisprocessedtohaveameanofzeroandastandardd eviationof1,ensuring certainvariablesdonotdominatebecauseoftheirmagnitud e.ThetheoryofPCA stemsfromtheeigenanalysisofthecorrelationmatrix, C (Eq. 2{52 )where X isthe descriptormatrix.Thedescriptormatrixhasdimensionsof n k ,where n isthenumber ofobservationsandkisthenumberofmeasuredvariables. C isasquareandsymmetric matrixandsofacilitatesthegenerationofprincipalcompo nentsoreigenvectors, P ,which areorthonormaltoeachother.Aschematicdiagramofthemat ricesandvectorsinvolved inPCAareshowninFigure 2-15 C = X T X (2{52) 56

PAGE 57

CP = i P (2{53) Themagnitudeofeacheigenvalue, i ,representsthevariancesthePCexplainsofthe X matrix. PC i = P i 1 X 1 + P i 2 X 2 + + P ik X k (2{54) PCAcanalsobederivedintermsoftheoriginaldescriptorma trix, X ,asshowninEq. 2{55 where P i aretheloadingvectorsoreigenvectorsand t i arethescorevectorsand E is anerrortermafterthedescriptormatrixwasreducedto q principalcomponents. X = t 1 p T1 + t 2 p T2 + + t q p Tq + E (2{55) t i = Xp i (2{56) Thescorevectors, t i ,determinedfromthe X matrixand p i ,arethenewvariablesof thereduceddatasetanddescribehowthedierentdependent variablesrelatetoone anotherwhiletheloadings, p i ,revealwhichdescriptorsareresponsible.Itiscommon toanalyzetheresultsofPCAusingscatterplotsofthescore sandloadings.The similarity/dissimilaritybetweendependentvariablesis investigatedusingscoreplots. Usually,clusteringofthedataoccurswithverysimilardat apointsgroupingtogetherand dissimilaronesbeingfurtherapart.Theloadingsplotisus edinconjunctionwiththescore plottodeterminethereasonswhysuchclusteringexistsand decipherwhichoftheoriginal X variablesarecausingit. PartialLeastSquares(PLS)isatechniquesimilartoPCR;ho wever,itisderivedina waythatthe X scores, t i ,explainthevariationin X andcorrelatewith Y simultaneously. PLSRtransformsthe X matrixintoorthogonalcomponents,so-calledlatentvaria bles,and thenperformsaregressionstepthatpredicts Y .NIPALS,SIMPLS,andthekernelmethod arealgorithmsforcalculatingaPLSmodel;thebasicalgori thmwasdevelopedbyWold etal .andisoutlinedbelowandthematricesandvectorsareshown inFig. 2-16 [ 14 144 ]. Thealgorithmbeginsbysettingthevector u tooneofthecolumnsof Y ,whichallowsthe 57

PAGE 58

X K N T P' Cmpdsdescriptors Figure2-15.PrincipalComponentAnalysis(PCA)Schematic DiagramoftheMatrices andVectorsInvolved. calculationofthe X -weights, w : w = X T u u T u (2{57) Theweightsarenormalized( w T w =1): w = w jj w jj (2{58) Fromthenormalizedweights,the X -scores, t ,arecalculated: t = Xw w T w (2{59) Fromthecalculated X -scorethe Y -weights, c ,and Y -scores, u aredetermined: c T = t T Y t T t (2{60) u = Y T c c T c (2{61) Thismethodisaniterativeprocess,whichistestedonthech angein t .Theconvergence criterionisnormallysetintherangeof 10 6 10 8 .Ifconvergencehasnotbeen reachedthealgorithmreturnstothecalculationofthe X -weights.Notethatifthereis 58

PAGE 59

onlyasingle Y variablethenthealgorithmconvergesinasinglestep. jj t old t new jj jj t new jj < (2{62) Onceconvergenceisreachedthe X -and Y -loadings, p and q ,arecalculated: p = X T t t T t (2{63) q = Y T u u T u (2{64) Toproceedtothenextlatentvariablethe X matrix( Y isoptional)mustbe\derated", inotherwords,thecurrentlatentvariable'sinformationm ustberemovedasshowninEq. 2{65 and 2{66 .Thetotalnumberoflatentvariablestoconsiderisgeneral lydetermined usingcross-validation. X = X tp T (2{65) Y = Y tc T (2{66) SimilartoMLR( Y = BX + E ),thefullPLSRsolutioncanbewrittenwhere B isthe regressioncoecientmatrixasEq. 2{67 B = W ( P T W ) 1 C T (2{67) Thekernelmethodisanalternativetotheabovealgorithm.S imilartoPCA,this useseigenvalue-eigenvectorequationstocometothesamer esult.Forexample,the X -weightsaredeterminedbytakingthersteigenvectorofth evariance-covariancematrix X T YY T X .SimilartotheanalysisofaPCAmodel,theinterpretationo faPLSRmodel involvesthescoreandloadingplots,however,inPLSRthesc oreandloadingplotsengage the X and Y scoresandweights. 2.14Metalloproteins Metalloproteinsareakeysubsetofproteinsinthebodywhic hbindatransition metal.ThemetalionactsasaLewisacid(electronpairaccep tor)towardsaminoacids 59

PAGE 60

X Y K N T P' W' U C' Cmpdsdescriptors Figure2-16.PartialLeastSquares(PLS)SchematicDiagram oftheMatricesandVectors Involved. orothermoleculeswhichareLewisbases(possessoneormore lonepairs)andarecalled ligands.Thebondingofligandstoametalionisdescribedas dativewhentheligand donatesoneormorelonepairstothemetal.Metalscanbecoor dinatedtoanynumber ofligandswithfour,veandsixbeingthemostcommoninbios ystems.Thusthemost likelygeometriesincludetetrahedral,squareplanar,tri gonalpyramidal,andoctahedral. Theaminoacidsthatmostcommonlybindtoametalioninmetal loproteinsareshown inFig. 2-17 .Thesidechainofeachaminoacidislabeledwithgreeklette rsandtheseare usedwhenreferringtowhichatomofanaminoacidbondstothe metal,e.g.Zn-CYS@SG wouldtranslatethatthegammaSulfurofacysteineresiduei sboundtoaZincion. Iron,Copper,andZincarethemostabundanttransitionmeta lsinthehumanbody. Metalionsinbiologicalsystemshavebothstructuralandfu nctionalroles.Theyare termedstructuralwhennochemicalreactiontakesplaceatt hemetalsitebutaideinthe stabilizationoftheproteinstructurewhereasfunctional metalloproteinscarryoutchemical reactions. 60

PAGE 61

N O SH a b g (a)Cysteine(CYS) N O S a b g d (b)Methionine(MET) N O a b g d d O OH (c)AsparticAcid(ASP) N O a b g ee d OH O (d)GlutamicAcid(GLU) N O N N a b g d e e d (e)Histidine(HIS) Figure2-17.MostCommonAminoAcidResidueswhichBondtoMe talIons. Zincproteinsarebothstructuralandcatalytic.Zincactsa sasuperacidandpromotes thehydrolysisorcleavageofchemicalbonds.Forexample,H umanCarbonicAnhydraseII (HCAII),(Fig. 2-18(a) ),catalysestheconversionof CO 2 intobicarbonateorviceversa. HCAIIcontainsatetrahedralzincatitsactivesiteasshown inFig. 2-18(b) .TheZinc atomisboundtothreehistidineresiduesandawatermolecul e(pH < 7)orahydroxyl ion.FarnesylTransferase(FTase)isazincmetalloenzymet hatremovesthediphosphate groupfromthefarnesyldiphosphatesubstrateandconnects theresultingfarnesylmoiety 61

PAGE 62

tothecysteine.Thefullproteinstructureandactivesiteo f1QBQareshowningures 2-18(c) and 2-18(d) .OtherZnmetalloproteinsincludecarboxypeptidasewhich cleavesthe terminalcarboxygroupfrompeptidesandalcoholdehydroge nasewhichconvertsalcohol toacetaldehyde. MetalloproteinsthatcontainCopperarealsobothstructur alandfunctional.Copper canchangeoxidationstateandisofteninvolvedinelectron -transferreactions.Human Antioxidantprotein(HAH1)containsatetrahedralCu(I)bo undbyfourcysteineresidues asshowninFig. 2-19(b) .HAH1isinvolvedinthetransportingofCopperinthebodyan d islabeledachaperone.AmicyaninisatetrahedralCu(II)co ntainingproteinwhichbinds twohistidines,amethionine,andacysteineresidueasshow ninFig. 2-19(d) .Thisprotein iscalledabluecopperproteinduetoitsspectroscopicprop ertiesarisingfromcysteineto Cu(II)charge-transfer[ 145 { 150 ]. Metalloproteinscanalsocontainmultiplemetalsinclosep roximity.Forexample, Aminopeptidaseisadi-zincproteinfrom Aeromonas.proteolytica (AAP)asshownin Fig. 2-20(a) whichcatalyticallycleavestheN-terminusofpolypeptide s.Theactivesite ofAminopeptidaseisshowninFig. 2-20(b) wherethezincionsareboundtohistidine, asparticacidsandarebridgedwithawatermolecule.Urease from Bacilluspasteurii (Fig. 2-20(c) ),isadi-nickelenzymethatcatalyzesthehydrolysisofure atoammoniaand carbondioxide.ItsactivesiteisshowninFig. 2-20(d) ,wheretwononequivalentNi(II) atoms(3.5 Aseparation)areboundtotwohistidineseachandabridging carbamylated lysine.Anaspartateresidue,twowatersandabridgingwate r/hydroxylioncomplete thecoordinationsphere.ThegeometryofbothNicenterscan bedescribedassquare pyramidalandoctahedral[ 151 { 154 ].BothAminopeptidaseandUreasearehomo-nuclear proteinsbuthetero-nuclearmetalloproteinsalsoexist.C opper-ZincSuperoxideDismutase, Cu,Zn-SOD,isonesuchproteinasshowninFig. 2-20(a) anditsactivesitehighlightedin Fig. 2-20(b) 62

PAGE 63

(a)HumanCarbonicAnhydrase,HCAII,(PDBID:1CA2) (b)1CA2ActiveSite (c)FarnesylTransferase,FTase,(PDBID:1QBQ) (d)1QBQActiveSite Figure2-18.ZincMetalloproteins.TheZincionisshowninp urplewhiletheOxygenatom ofthewatermoleculein1CA2boundtoZincisshowninred. 63

PAGE 64

(a)HumanAntioxidantProtein,HAH1,(PDBID:1FEE) (b)1FEEActiveSite (c)CopperAmicyanin(PBDID:1AAC) (d)1AACActiveSite Figure2-19.CopperMetalloproteins.TheCopperionisshow ningrey. 64

PAGE 65

(a)Aminopeptidase(PDBID:1AMP) (b)1AMPActiveSite (c)Urease.(PDBID:2UBP) (d)2UBPActiveSite Figure2-20.Homo-NuclearMetalloproteins.TheZincandNi ckelionsareshowninpurple andgreyrespectively,whileOxygenatomsofwatermolecule sareshownin red. 65

PAGE 66

Figure2-21.Hetero-NuclearMetalloproteins.TheActiveS iteofCopper-ZincSuperoxide Dismutase,Cu,Zn-SOD,(PDBID:1CBJ)isshown.TheZincandC opper ionsareshowninpurpleandgreyrespectively. 66

PAGE 67

CHAPTER3 MODELINGTOOLKIT++ 3.1Introduction Inanidealworldwhereonecouldusethemostaccuratetheori eswithinnite computerpowerandtimeitispossibletodesignanewdrug.Ho wever,inrealitythere isalwaysacompromisebetweenspeedandaccuracy.Figure 3-1 attemptstoillustrate currentresearcheortswhererealityismarkedasan\X"and progressisbeingmade, forexampleinlevelsoftheoryusedinscoringfunctionsfor receptor-ligandinteraction calculationsandconformationalsamplingtechniques[ 6 ].Theseadvanceshavemirrored therecentincreasesincomputingpoweroverthelastnumber ofyears. Theory Time/Money X Drug Figure3-1.ComputationalDrugDesign.Intheoryadrugcanb edesign insilico however inrealityithasnotmaterialized.Currentresearcheorts havefocusedon pushingtheboundariesoftheoriesusedintheSBDDareaswit hmixedresults. WiththedesiretouseQMtechniquesinSBDDrmlyestablishe dtherecomesaneed todevelopsoftwarewherethesemethodscanbeusedconvenie ntlyintheDDprocess. ThishasledtothedevelopmentofapackageofC++librariesc alledModelingToolKit++ (MTK++)tointerfacewithcommonQMprogramstotesttheappl icabilityandvalidate QMmethodsinSBDD.MTK++wasdesignedfromthegrounduptobe usedinareas 67

PAGE 68

of insilico SBDDsuchasmolecularalignmentandreceptor-ligandscori ng.Theuseof SEmethodsinmolecularalignmentandscoringisfurtherana lyzedinCh. 4 .Alsothis toolkitwasdesignedwithmetalloproteinsinmindwherenos uchsoftwarewasknownto beavailableandwillbediscussedindetailinCh. 5 .Alltoooftenmolecularmodeling softwareswhicharedescribedas\opensource"areobfuscat edbeforereleaseandsoit becomesalmostimpossibletoreadorextend.TocombatthisM TK++wasdeveloped asanin-housesuiteoflibrarieswithaconsistentApplicat ionProgrammingInterface (API)whichwillallownewandnovelmethodstobedeveloped. Thischapterdescribes thedesignanddevelopmentofMTK++.Thealgorithmsusedare describedindetailwith numerousillustratedexamples. 3.2Overview MTK++isanobjectorientedC++packageofmolecularmodelin glibraries includingMolecularMechanics(MM),GeneticAlgorithm(GA ),leprocessingand conversion(Parsers),statisticalandmoleculartoolsuse dinLBDDandSBDDandother computationalchemistryelds.TheBasicLinearAlgebraSu bprograms(BLAS),Linear AlgebraPACKage(LAPACK),Boost,andxerces-c[ 155 ]librarieswereusedinthe development.AtthetimeofpreparationofthisthesisMTK++ containedover30,000 linesofcode.Thus,acompletedescriptionofthecodecanno tbegiven;howeverall librariesandtheirmajorclassesandalgorithmsaredescri bed. 3.2.1Development MTK++isimplementedasaC++packageoflibraries.C++wasus edinsteadof otherprogramminglanguagessuchasFORTRAN,andCbecausei tisanobject-oriented programminglanguagethatenablesabstraction,encapsula tionandinheritance.C++ containstheStandardTemplateLibrarywhichhasconvenien ceclassessuchasvectors, maps,lists,etc.andexternallibrariessuchasBLASandxer ces-cformatrix-vectormath andxmlhandlingrespectively.AlsoC++codecanbecompiled onnearlyanyoperating systemandallowsmodularprogrammingwhichmakesmakingch angeseasy.Furthermore 68

PAGE 69

C++isbackwardscompatiblewithCandtheresultingC++code isveryecientdueto itsdualityasahigh-levelandlow-levellanguage.Thedeve lopmentanddebuggingwas doneona1.33GHzPowerPCG4ApplecomputerrunningMacOSX10 .4.8with512MB SDRAM.Thegcc(4.0.1)compilerwasused.Thecodeiscross-c ompilerandcross-platform compatibleandwastestedonbothMacOSXandLinuxoperating systems. 3.2.2LibraryHierarchy GA Molecule Graph Parsers Utils Statistics MM Minimizers Figure3-2.LibraryHierarchyasImplementedinMTK++.TheL ibrarywherethetailof thearrowstartsusesthelibrarywheretheheadofthearrowe nds,e.g.The ParserslibraryusestheMolecule,Utils,Statistics,andG ALibraries. Figure 3-2 showsthehierarchyoftheMTK++package.Atthecenterofthe package oflibrariesisagroupofutilityroutineswhichareusedina llotherpackages.These includeconstantsdenition,diagonalizationfunctions, anindexingclassforeasysorting ofobjects,andaclasscalledvector3dforatomiccoordinat estorageandtransformation. 1 TheParserslibrarytakescareofreadingandwritingofles anditrequirestheMolecule 1 Thevector3dclasswasoriginallydevelopedbyAndrewWolla cott[ 156 157 ].Extensivefunctionality wasadded. 69

PAGE 70

andGALibraries.AlsotheMoleculelibraryusestheGraphli braryforringperception andotherrecursivefunctions.Thedesignoftheindividual librariesisdiscussedfurtherin thesectionsbelow.3.2.3MoleculeLibrary ThecoreoftheMTK++packageistheMoleculelibraryanditsm ostimportant classesareshowninFig. 3-3 .Thislibrarycanhandlemultiplemoleculesatatimeand thesearestoredinthecollectionclass.Thecollectioncla ssalsotakescareofallthe elements(thisinformationonlyneedstobestoredonce,not foreverymolecule),and parameterandfragmentinformationforMMcalculations.Th emoleculesthemselvesare oftype\molecule"andthisclassstoressubmoleculeorresi dueinformation.Thisdivision isanalogoustothatofaminoacidsinproteinornucleotides inDNAorinfactfragments insmallorganicmolecule.Thesubmoleculeclassstoresali stofatomsandtheatomclass storespointerstoobjectssuchasitselementandcoordinat eswhichareavectorofthree doubleprecisionnumbers(vector3d). TheparametersclassstoresinformationforMMcalculation sasstructures,such asatomtypes,bond,angle,torsion,improper(forceconsta nts,equilibriumvalues) andnon-bonded(charges,Lennard-Jonesvalues)parameter sasshowninFig. 3-4 ThestdLibraryclassisthemainobjectwhichdealswiththes torageandfunctionofa molecularfragmentlibraryasshowninFig. 3-5 .stdLibrarystoresalistofstdGroups andastdGroupisastoragecontainerforstdFrag's.Forexam pleastdGroupcouldstore the20aminoacids,eachastdFrag,ofproteinsoralistoffun ctionalgroupsindrug design.ThestdFragclasscontainsinformationaboutitato ms,stdAtom,bonds,stdBonds, features,stdFeature,etc. Thefunctionalityavailabletomoleculessuchasproteins, DNA,andsmallorganics originatefromthemoleculeclassintheMoleculelibraryas showninFig. 3-8 .molecule storesalistsofbonds,(avectorofBondobjectsinC++),ang les,torsions,andimpropers. Theconnectivityinformationisdeterminedintheconnecti onsclass.Thisclasscan 70

PAGE 71

collection metalCenter parameters stdLibrary elements molecule disulde submolecule atom vector3d element Figure3-3.CoreClassHierarchyoftheMoleculeLibraryasI mplementedinMTK++. Solidlineboxesdenotesaclass,whileadashedboxsignies astructure.A classwherethetailofthearrowstartsusesorcontainsacla ssorstructure wheretheheadofthearrowends.e.g.Theelementsclasscont ainsorusesthe elementstructure. 71

PAGE 72

parameters atomType bondParam angParam torParam impParam LJ612Param eqAtoms Figure3-4.ClassHierarchyoftheParametersComponentoft heMoleculeClassas ImplementedinMTK++.Solidlineboxesdenotesaclass,whil eadashedbox signiesastructure.Aclasswherethetailofthearrowstar tsusesorcontains aclassorstructurewheretheheadofthearrowends.e.g.The parameters classcontainsorusestheatomTypestructure. stdLibrary stdGroup stdFrag stdAtom stdBond stdLoop stdImproper stdAlias stdFeature stdRing Figure3-5.ClassHierarchyoftheStandardLibraryCompone ntoftheMoleculeClassas ImplementedinMTK++.Solidlineboxesdenotesaclass,whil eadashedbox signiesastructure.Aclasswherethetailofthearrowstar tsusesorcontains aclassorstructurewheretheheadofthearrowends.e.g.The stdFragclass containsorusesthestdAtomstructure. 72

PAGE 73

perceivebondsusingdistanceandothergeometricinformat ion,andalsodeterminebonds throughuserdeneddatabasesofmolecularstructures(thi sisdiscussedfurtherbelowin section 3.2.8 andappendix C ).Forexampletheconnectivityofanalanineresidueina proteindoesn'tneedtobeperceivedsinceitisknown apriori ifthenamesoftheatoms areknown.DisuldebondsbetweenCysteineresidues,assho wninFig. 3-6 ,ofproteins areautomaticallyperceivedusingtheparametersintable 3-1 [ 156 ].IftheCysteine'sSG atomsarewithin dCuto ofeachotherand S S Energy fromEq. 3{1 islessthan eCuto theyareconsideredbonded.TheprotonationstatesofHisti dineresiduesboundtoametal ionarealsoperceivedusingabonddistancecutoof2 : 3 Angstrom.IftheHIS@NE2 (epsilonnitrogenofHistidine)atomiswithinthiscutoth eresidueissettoHIDtype.If theHIS@ND1iswithinthiscutotheresidueissettoHIEtype .IfbothHIS@NE2and HIS@ND1arebondedtoametalatomwithin2 : 3 AthentheresidueissettoHINtype suchasthebridginghistidineresidueinCopper-ZincSuper oxideDismutase[ 158 ].Bond order,hybridizationandformalchargeofatomsforsmallmo leculearedeterminedinthe hybridizeclasswhichisdiscussedinmoredetailbelowinse ction 3.3 SG 1 SG 2 CB 1 CB 2 CYS CYS Figure3-6.DisuldeBondinProteins.Table3-1.DisuldeBondPredictionParameters. ParameterValue dCuto2.5ssBondReq2.038ssBondKeq166.0CBSGSGReq103.7CBSGSGKeq68.0eCuto30.0 73

PAGE 74

HN N CB (a)HIN N NH CB (b)HID N N CB (c)HIE HN NH CB (d)HIP Figure3-7.TheStructuralTypesoftheHistidineResidue. S S Energy = E Bond SG 1 SG 2 + E Angle CB 1 SG 1 SG 2 + E Angle CB 2 SG 2 SG 1 (3{1) where : E Bond SG SG = ssBondKeq ( distance SG SG ssBondReq ) 2 (3{2) E Angle CB SG SG = CBSGSGKeq ( angle CB SG SG CBSGSGReq ) 2 (3{3) Ringmoietiesareperceivedwithintheringsclassandeachr ingfoundisstored inaringstructure.Theperceptionofringsisdiscussedfur therinsection 3.4 .The functionalizeclassdetermineswhichfunctionalgroupsar epresentinamoleculeusing thedatabaseoffragmentsasdenedinappendix C .Theimplementationdetailsofthe functionalizeclassisoutlinedinsection 3.7 Thengerprintclasscontainsrudimentaryfunctionalityf ormolecularngerprinting. Angerprintisdenedasinformationthatdescribesamolec ulein1-D.Thengerprint inMTK++isrepresentedasavectorofintegerswiththefollo wingform:\atominfo, bondtype,#ofringsringinfo".ThenumberofatomsfromHydr ogenthroughIodine arestoredintherst52positions,another52positionssto rethenumberofeachofthe 74

PAGE 75

followingbondtypesB-H,C-H,N-H,O-H,S-H,B-C,B=C,B-O,B -N,B-O,B-F,B-S, B-Cl,B-Br,B-I,C-C,C=C,C%C,N-N,N=N,C-N,C=N,C%N,N-O,N =0,N-P,N-Se, N=Se,O-O,C-O,C=O,O-Si,O-S,O=S,O-Se,O=Se,C-F,S-S,C-S ,C=S,S-N,C-Cl, P-P,P-C,P-O,P=O,P-S,P-Se,Se-Se,C-Se,C=Se,N-Se,where \-",\=",\%"denotea single,doubleandtriplebondrespectively.Finallythe10 5 th positionstoresthenumberof ringsinthemoleculeorfragment.Thesize,planarity,arom aticity,heterocyclicboolean, andthenumberofnitrogens,oxygensandsulfuratomsofeach ringisalsostoredafter the105 th position.Thusthelengthofthevectordependsonthenumber ofringspresent inthemoleculeorfragment.FingerprintinginMTK++isprim arilyusedinconjunction withthefunctionalizeclass.Fingerprintsareusedtoscre enoutfragmentsthatcouldnot beapartofamoleculebasedonelements,bondtypes,andring spresent,thusspeedingup thefunctionalizationofmolecules. Apharmacophoreiscommonlydenedasthethreedimensional geometricarrangement ofmolecularfeaturesthatarenecessaryforbiologicalact ivity.Pharmacophoresbetween twomoleculesaredetectedusingafeature(H-bondDonor/Ac ceptor,PiCenter, Positive/NegativeCenter,Hydrophobicity)matchingalgo rithminthepharmacophore class.Thefeaturescommontobothmoleculesarestoredinac liquestructure.Afull descriptionofthealgorithmimplementedinMTK++isoutlin edinsection 3.8 TheprotonateclasscarriesouttheadditionofHydrogenato mstomacromolecules (proProtonate),ligands(ligProtonate),andwater(watPr otonate)molecules.proProtonate usesinformationfromuserdenedlibrariestoaddHydrogen swhileligProtonateisused whennosuchlibraryisavailable.Watermoleculesoftensur roundstructuresderivedfrom X-raycrystallographicdatabutnoHydrogenatompositions areprovided.Hydrogensare addedseparatelytowatermoleculesaftertheyareaddedtom acromoleculesandligands. Thealgorithmicdetailsofthethreeprotonateclassesared escribedinsection 3.5 Conformationalsearchingofdrug-likemoleculesiscarrie doutintheconformer classusinggraphtheorymethods.Eachconformerofamolecu leisstoredinaconformer 75

PAGE 76

structure.Theinternalworkingsofthisclassaredescribe dinsection 3.6 .Aintegralpart ofconformationalsearchingisdeterminetheamountofconf ormationalspacesampled. Todeterminethisrequiresbeingabletosuperimposeaconfo rmerontosomereference structureandcalculatetheroot-mean-squreddeviation.T hesuperimpositionoftwo moleculesiscarriedoutinthesuperimposeclassandisdisc ussedbelowinsection 3.9 Theselectionclassisusedtoparsestringsthatrepresents ubsetsofmolecular dataandisessentialinprovidinganAPIforusersofMTK++.T hedatastructurein theMoleculelibraryhasaninherenthierarchalnature.Ato minformationisstoredin submolecule;bonds,angles,torsions,impropers,andsubm oleculesarestoredinmolecule andnallyallmoleculesarestoredincollection.Theatomc lassisatthebottomofthe hierarchy,whilecollectionisatthetop.Thustoretrievef orexampleallatomswhich aspecicnameinallmoleculesofthecollectionwouldrequi reacertainsyntax.The syntaxusedintheselectionclassresemblesthatofaUNIXop eratingsystemsuchas \/collection/molecule/submolecule/atom"Forexample,p rovidingthefollowingstring: \/COL/MOL/ALA-10/.CA."wouldselecttheatom\.CA.",alph acarbon,inalanine withresiduenumber10(ALA-10)andthat'spartofthemolecu lenamedMOLinthe COLcollection.The\/"onthelefthandsideofthestringass umesthattheselection isstartingfromthetopofthestructuralhierarchy.Thefol lowingselectionstringdoes notbeginwithaslash:\ALA-10/.CA."andrepresentsparsin gthehierarchyfromthe bottomup;allalphacarbonsofmoleculesinthecollectionw ithalanineatposition10will beselected.Thissyntaxcanhandlemolecule/submolecule/ atomnames,numbers,ora combination(name-number),suchasALA-10. TheatomTyperclassassignsmolecularmechanicsatomtypes totheatomsof amoleculeusinguserdenedfragmentlibrariessuchasthos einappendix C .The hydrophobicregionsofmoleculesisdeterminedusinganato madditiveapproachas outlinedbyWangandZhouin1998[ 159 ]. 76

PAGE 77

Bond Angle Torsion Improper ngerprint hybridize connections selection rings ring molecule superimpose atomTyper functionalize protonate conformers pharmacophore stdLibrary pro lig wat conformer clique Figure3-8.ClassHierarchyoftheMoleculeComponentofthe MoleculeClassas ImplementedinMTK++.Solidlineboxesdenotesaclass,whil eadashedbox signiesastructure.Aclasswherethetailofthearrowstartsusesorc ontainsaclassor structurewheretheheadofthearrowends.e.g.Themolecule classcontains orusesthehybridizeclass. 3.2.4GraphLibrary ThegraphlibrarycontainsclassesasshowninFig. 3-9 tohandlemoleculargraphsas describedinChapter 2.12 .Thislibraryisusedtondringsandtodeterminewhether graphsareisomorphic.Alsoitisusedtotraversethetorsio naltreeforsystematic conformationalsearching.Treeandgraphtraversaliscarr iedoutusingthedepth-rst searchalgorithm.Thegraphclassstoresbothverticesande dgeswiththeedgestruct storingpointerstotwovertexobjects.Bothverticesanded gesstoreabooleantodescribe whethereachhasbeenvisitedduringatraversalandanumeri calvariabletodescribe itscolororlabel.Thevertexclassalsostoresalistofitsn eighborsandwhichlayeritis placedon. 77

PAGE 78

vertex graph edge Figure3-9.ClassHierarchyoftheGraphLibraryasImplemen tedinMTK++.Solidline boxesdenotesaclass,whileadashedboxsigniesastructur e.Aclasswhere thetailofthearrowstartsusesorcontainsaclassorstruct urewherethehead ofthearrowends.e.g.Thegraphclasscontainsorusestheed gestructure. 3.2.5MMLibrary TheMMlibrarycontainsclassesandfunctionstocarryoutMo lecularMechanics minimizationsasshowninFig. 3-10 .Currently,theAMBERfunctionisusedasdescribed inChapter 2.3 .Theamberclasscontainsthedriverfunctionsforthelower levelclasses ambBond,ambAngle,ambTorsion,andambNonBondedthatcont aintheAMBER energy/gradientfunctions.ThemmPotentialclassistheco ntrollerforallMMfunctions whichcouldbedeveloped.Itperformsallthememoryallocat ion/deallocation.The MTK++wasdesignedastoeasilyallowtheextensionofitsfea tures.Forexample, crosstermssuchasbond-stretchandnon-harmonictermssuc hastheMorsepotential forbondedatomsarenowpossiblewithintheMMlibrary.This willbecomeessential whenMTK++isusedtostudyforexampleBlueCopperproteinsw heretheycontaina Copper-Sulfurbondthatcannotberepresentedusingaharmo nicpotential. 3.2.6GALibrary TheGAlibrarycontainsclassesandfunctionstocarryoutap arameteroptimization usingageneticalgorithmasshowninFig. 3-11 .Thiswasinitiallydesignedtocarry outconformationalsearchingoforganicmolecules.Howeve r,itsdesignisapplication independent.Ageneticalgorithmisaheuristicmethodwher ebyreachingtheglobal 78

PAGE 79

mmPotential amber ambBond ambAngle ambTorsion ambNonBonded Figure3-10.ClassHierarchyoftheMMlibraryasImplemente dinMTK++.Solidline boxesdenotesaclassandaclasswherethetailofthearrowst artsusesor containsaclasswheretheheadofthearrowends.e.g.Theamb erclass containsorusestheambBondclass.Theorangearrowisusedt orepresenta publicinheritancerelationshipbetweenclasses,i.e.amb erisoftype mmPotential. minimumofparameterspaceisnotguaranteedtobefound.Oth erheuristicmethods includetheMonteCarlotechnique. TheGAlibrarywasdesignedinsuchawayastomimichumancivi lizationor evolution.gaWorldisthemainclassinthelibrary.gaWorld cancontainmultipleregions orgaReg's.Eachregioncontainsapopulation(gaPop)ofind ividuals(gaInd).Each individualcontainssomechromosomal(gaChr)makeupthati nturnisdescribedbygenes (gaGene).Theindividualcancontainmultiplechromosomew hichinturncancontain multiplegenes.ForeachapplicationofthisGAthetnessof eachindividualmustbe evaluated.Thisenergyfunctionisprovidedbytheusertoth elibrary.The\survivalof thettest"modelisusedforindividualstopropagateorsur vivebetweengenerations. Individualscansurvivefromoneiterativesteptothenextt hroughasemi-random selectionprocessbiasedbyitstness.Reproductionbetwe enindividualsiscarriedout usingoperatorssuchascross-over,mutation,andaveragin g.Thenumberofiterations carriedoutbytheGAisuserdenedorthroughsomeconvergen cecriteria. 79

PAGE 80

Regionsaretreatedasbeingindependent,howeverthelibra rywasimplementedin awayastoallowthe\island-hopping"tobedeveloped.Thisw ouldallowtheGAto runinparallelandduringthecourseofanoptimizationanin dividualwouldwithsome probabilitytravelfromoneregiontoanother.Thiswouldal lowthegeneticinformationto bemorediverseandpreventearlyconvergence.3.2.7StatisticsLibrary TheStatisticslibrarycontainsclassestocarryoutstatis ticalanalysisasshownin Fig. 3-12 .Thislibraryisbuiltfromtheboostlibrarywherematrixan dvectormathis performveryeciently.Thesheetclasscontainsandhandle smatrixobjects.Thematrix classisderivedfromtheboostmatrixandextendsitsfeatur esbyallowingformatrix labeling.ThebaseStatsclassperformsthebasicstatistic alfunctionsasdescribedinCh. 2.13 .TheolsclasscarriesoutMultipleLinearRegressiontocal culatePearson'scorrelation coecient.ThepcaclasscarriesoutPrincipalComponentAn alysis(PCA)andthepls classperformsPartialLeastSquares(PLS)modelingusingt hekernelalgorithmwith leave-N-outcrossvalidationbeingimplemented.ThePLSal gorithmproducesanumberof matricesduringexecutionandsothesheetandmatrixclasse swereessential. 3.2.8MolecularFragmentLibrary Afragmentlibrarywasdevelopedforapplicationsincludin gfunctionalgroup recognition,molecularalignmentandfragmentationofdru g-likemoleculesforSE-COMBINE approach.Thelibrarycurrentlycontainsover300fragment s.Fragmentnames,internal codesand2-DstructuralpicturescanbefoundinAppendix C .Thisisahighly extendablelibrarywithallthetoolsrequiredtoaddfragme ntsavailablewithinMTK++. Thefragmentsarebuiltusingthemethodologydevelopedtoc onstructresiduesfor theAMBERsuiteofprograms.Thisapproachusesatomnamesan dtypesfromthe GeneralizedAMBERFF(GAFF)[ 160 ]andHF/6-31G*Merz-Kollmann/RESPchargesas describedinCh. 2.9 .TheuseofRESPeasilyallowedforsymmetricatomiccharges such astheoxygenatomsinacarboxylategroupandforfragmentst ocontainintegercharge. 80

PAGE 81

gaWorld gaReg gaPop gaInd gaChr gaGene gaOutput gaOperators gaSelection gaGaussian gaCrossOver gaMutate gaAverage Figure3-11.ClassHierarchyoftheGALibraryasImplemente dinMTK++.Solidline boxesdenotesaclassandaclasswherethetailofthearrowst artsusesor containsaclasswheretheheadofthearrowends.e.g.ThegaW orldclass containsorusesthegaRegstructure. 81

PAGE 82

baseStats pls pca ols table boost sheet Figure3-12.ClassHierarchyoftheStatisticsLibraryasIm plementedinMTK++.Solid lineboxesdenotesaclass.Aclasswherethetailofthearrow startsusesor containsaclassorstructurewheretheheadofthearrowends .e.g.Thepls classcontainsorusesthebaseStatsclass.Thebluearrowis usedtorepresent apublicinheritancerelationshipbetweenclasses,i.e.pl sisoftypebaseStats. 3.2.9ParsersLibrary TheParserslibrarycontainsclassestoreadandwritemolec ularletypes.XYZ, MOL,MOL2,PDB,SD,andZMATleformatsaresupported.Allcl assesinherit baseParserasshowninFig. 3-13 .baseParsercontrolstheerrorhandlingofallclasses inauniformway.ThexmlleparsersalsoinheritxmlConvert orsanddomErrorHandler fromthexerces-clibrarywhichdealwitherrorsinthexmll es.Inputandoutputles ofprogramssuchasDivConandGaussian(bothcartesianandi nternalcoordinates) arehandled.Theelementparserreadstheelements.xmlles toredintheMTK++ distributionandpopulatestheelementsobjectwhichtheco llectionclassstores.Foreach atomintheperiodictablethefollowinginformationisstor ed:atomicnumber,mass, 82

PAGE 83

group,period,valence,fullshellsize,covalentradius,v anderWaalsradius,Pauling's electronegativityvalue,andwhichsemiempiricalHamilto niansareavailabletoagiven atom.ThestdLibparserhandlesthelibraryxmllesdescrib edintheprevioussectionand populatesthestdLibraryandstdGroupclasses.Theparampa rserhandlestheparameter lesassociatedwiththefragmentlibrarysuchasparm94and GAFFfromAMBER.The GAparserhandlesthelesassociatedwiththeGAlibraryofM TK++.Theamberparser canexportandimportAMBERtopologyandcoordinateles. DivCon gaussian sd xyz zmat baseParser pdb amber mol2 stats element param mol stdLib ga xmlConvertors/domErrorHandler Figure3-13.ClassHierarchyoftheParsersLibraryasImple mentedinMTK++.Solid lineboxesdenotesaclassandaclasswherethetailofthearr owstartsuses orcontainsaclasswheretheheadofthearrowends.e.g.Thee lementclass containsorusesthexmlConvertorsclass.Thebluearrowisu sedtorepresent apublicinheritancerelationshipbetweenclasses,i.e.pdbisofty pebaseParser. 3.3Hybridization,BondOrderandFormalChargePerception ItisoftenthecaseinSBDDthatthedesignprocessstartswit hanx-raycrystal structureofatargetmoleculeincomplexwithsomeboundsub strate.Thisposesthe challengeofdeterminingatomichybridizations,formalch argesandbondordersofthe 83

PAGE 84

smallmoleculeduetothefactthattherearenoHydrogenatom spresent.Numerous algorithmshavebeenpublished[ 161 { 165 ]butthealgorithmbyLabutein2005[ 142 ]to perceiveatomhybridizations,bondordersandformalcharg esofdrug-likemoleculeswas implementedinMTK++asitwasshowntobesuperiortotheothe rs.Othermethodsto perceiveatomtypesandbondinformationincludeantechamb erbyWang etal. [ 166 ]. TheLabutealgorithmtakestenstepstodeterminetheatomhy bridizations,formal charges,andbondorders.AligandthatbindstoPPAR r (PDB:1FM9)asshowninFig. 3-14(a) isusedtoillustratethealgorithmwhere x 1 ;:::;x n denotethe3Dcoordinatesof n atomswithatomicnumber Z 1 ;:::;Z n ,andthenumberofbondedatomsforeachatomis Q i and r ij = j x i x j j isthedistancebetweentwoatoms. Bondsareperceivedbyrstproducingacandidatelistandth enreningitusing geometry.Covalentradii, R i ,fromMeng[ 161 ]shownintable 3-2 areusedinEq. 3{4 to determinethecandidatebondlist.Thenforeachatom, i ,a\dimension", d i ,isassigned basedonaprincipalcomponentanalysisoftheGramMatrix, D ,denedinEq. 3{5 where i isthecurrentatomindex, k isthenumberofbondedatomsand q isthegeometriccenter asshowninEq. 3{6 Table3-2.MengAtomicCovalentRadii. Atom Radii Atom Radii Atom Radii Atom Radii H 0.23 P 1.05 Ni 1.5 Nb 1.48 He 1.5 S 1.02 Cu 1.52 Mo 1.47 Li 0.68 Cl 0.99 Zn 1.45 Tc 1.35 Be 0.35 Ar 1.51 Ga 1.22 Ru 1.4 B 0.83 K 1.33 Ge 1.17 Rh 1.45 C 0.68 Ca 0.99 As 1.21 Pd 1.5 N 0.68 Sc 1.44 Se 1.22 Ag 1.59 O 0.68 Ti 1.47 Br 1.21 Cd 1.69 F 0.64 V 1.33 Kr 1.5 In 1.63 Ne 1.5 Cr 1.35 Rb 1.47 Sn 1.46 Na 0.97 Mn 1.35 Sr 1.12 Sb 1.46 Mg 1.1 Fe 1.34 Y 1.78 Te 1.47 Al 1.35 Co 1.33 Zr 1.56 I 1.4 Si 1.2 84

PAGE 85

0 : 1 2( sp hybridizedandlinear) 3 d i =2 ;Z i < 11( sp 2 hybridizedfor2 nd rowelements) 4 d i =2 ;Z i > 10or d i =3 ;Z i < 11(squareplanaror sp 3 hybridized) 7 otherwise rowoftable 3-4 iscarriedoutoneatatimewitheachrowonlybeingappliedto atoms withunassignedhybridizationresultinginFig. 3-14(c) .Onlyatomswithunassigned hybridizationshave d< 3, Z =(C,N,O,Si,P,S,Se), Q< 4andatleastonebondedneighbor withanunassignedhybridization.Atthisstageallbondord ers, b ij inwhichatom i or j hasnon-zerohybridizationaresetto1. Adihedraltestisusedtoidentifybondsoforder1.Thesmall estout-of-plane dihedraliscomputedusing:min j;k j P ijkl j ; j P ijkl j ; j P ijkl j .Ifthisdihedralis greaterthan15 then b ij issetto1.ResultsofthisstepareshowninFig. 3-14(d) 85

PAGE 86

Table3-4.LabuteAlgorithmAtomHybridizationAssignment hybridization Condition sp 3 Z i =1 ; 2 sp 3 d Q i > 4 ;Z i =(Group5)and Q i =5 ;Z i =Group4,5,6,7,8 sp 3 d 2 Q i > 4 ;Z i =(Group6)and Q i =6 ;Z i =Group4,5,6,7,8 sp 3 d 3 Q i > 4 ;Z i =(Group7)and Q i =7 ;Z i =Group4,5,6,7,8 sp 3 d 2 Q i =4 ;Z i > 10 ;d i =2 sp 3 d 2 Z i =(TransitionMetal) sp 3 d 2 Q i > 4 ;Z i > 10andnotSi,P,S,Se sp 3 Q i > 4 ;Z i > 10andSi,P,S,Se sp 3 ( Q i =4)and( Q i =3, d i =3) sp 3 Q i > 2 ;Z i =Group6,7,8 sp 3 Z i not(C,N,O,Si,P,S,Se) sp 3 Allatomswherenoneofitsbondedatomshavezerohybridizat ion Thefollowingtable 3-5 oflowerboundsinglebondlengthsand j x i x j j >L ij 0 : 05, where L ij isthereferencebondlength,areusedtoidentifysinglebon ds.Thebonds identiedusingthisstepareshowninFig. 3-14(e) Table3-5.LabuteAlgorithmLowerBoundSingleBondLengths bonddist bonddist C-C1.54 C-N1.47 C-O1.43 C-Si1.86 C-P1.85 C-S1.75 C-Se1.97 N-N1.45 N-O1.43 N-Si1.75 N-P1.68 N-S1.76 N-Se1.85 O-O1.47 O-Si1.63 O-P1.57 O-S1.57 O-Se1.97 Si-Si2.36 Si-P2.26 Si-S2.15 Si-Se2.42 P-P2.26 P-S2.07 P-Se2.27 S-S2.05 S-Se2.19 Se-Se2.34 Aftersteps5and6thehybridizationsofalluncharacterize datomsnotinvolvedina bondofunknownorderaresetto sp 3 asshowninFig. 3-14(f) Amoleculargraphisformedincludingonlyatoms(vertices) thathaveundened hybridizationandbonds(edges)thathaveunknownorder.Th isgraphisthendividedinto 86

PAGE 87

componentsorsubgraphs.Eachsubgraphisanalyzedindepen dentlyandbondordersare assignedasshowninFig. 3-14(g) .Edgeweightsareassignedwiththefollowingequation w ij = u i + u j +2 ( r ij
PAGE 88

C C C O O C C CC C C O C N C C O C C N C C C C N C C C C C C C O C CC C C C (a)START O O O N O N N O 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 (b)Step1 O O O N O N N O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (c)Step3 O O O N O N N O (d)Step5 O O O N O N N O (e)Step6 O O O N O N N O sp3 sp3 sp3 sp3 sp3 sp3 sp3 sp3 sp3 sp3 sp3 sp3 sp3 (f)Step7 O O N O N O (g)Step8a O O N O N O -1 5.2 3.2 10 10 10 10 10 -4.1 8 8 10 8 8 10 10.2 10.2 6.2 10 10 10 8 8 8 (h)Step8b O O N O N O (i)Step8c O O O N O N HN O (j)END Figure3-14.Hybridization,BondOrder,andFormalChargeP erceptionUsingtheLabute Algorithm. 88

PAGE 89

as S ( m 1 ;m 2 ;::: )where m i aretheringsizes.Takeforexamplethefollowingmolecule showninFig. 3-15(a) withallopenacyclicnodeshighlightedinFig. 3-15(b) areremoved resultinginthestructureshowninFig. 3-15(c) .Thenallclosedacyclicnodesareremoved ashighlightedingures 3-15(d) and 3-15(e) .Thestructureisthenseparatedintoblocks asshowninFig. 3-15(f) .Thequestionthenariseshowmanyringsarethereinthe currentblockasshowninFig. 3-15(g) ?Allowingtherstnodetobetherootnode, numerousringsystemscanbefoundincluding R 1 V = f v 1 ;v 2 ;v 3 ;v 15 ;v 13 ;v 14 g R 2 V = f v 1 ;v 2 ;v 3 ;v 15 ;v 16 ;v 10 ;v 11 ;v 12 ;v 13 ;c 14 g ,..., R n V .Theclosedpathfoundcontainingthe rootnodeisrecursivelysearcheduntilitcannotbereduced further,inotherwords R 1 V isfoundasshownin 3-16 .Onceanirreducibleclosedpathisfoundallnodeswith twolinksareremoved.Nodes2,1,and14arethenremoved.The algorithmthenpicks anotherrootnodeandthenextringisfounduntilallringsar efound.Onceallrings arefoundinamolecule,anaromaticitytestisapplied.Thea lgorithmusedisinclose agreementwiththatpublishedbyRoos-Kozel,andJorgensen in1981[ 168 ].Ringsare classiedasaromatic(AR),antiaromatic(AA),ornonaroma tic(NA).Aringisassigned tobenonaromaticifitcontainsnointra-ringdoublebonds, containsaquaternaryatom, containsmorethanonesaturatedcarbon,containsamonorad ical,orcontainsasulfoxide orsulfone.Aringsystemisaromaticifandonlyifitcontain s4n+2(n=0,1,2,3,4,...)pi electrons(Huckelrule)andisplanar(10 tolerance).Thenumberofpielectronsofaring isdetermineusingthefollowingrules:cationiccarbonand boroncontribute0,saturated heteroatomsgive2,ananioniccarbonhas2,andatomsonintr a-ringpibondcontribute1. Ifaringscontainsexocyclicpibond(s)(Carbondoublebond edtoaheteroatom),then1 pielectronisremoved.Someringscorrectlyperceivedbyth isalgorithmareshowninFig. 3-17 .Allareperceivedtobearomatricexceptforcyclooctatetr aene(COT).COTcontains alternatingsingleanddoublebondsbutitisnon-planarand iscorrectlyassignedtobe antiaromatric. 89

PAGE 90

(a)Step1 (b)Step2 (c)Step3 (d)Step4 (e)Step5 (f)Step6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 (g)Step7 Figure3-15.RingPerception. 90

PAGE 91

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Figure3-16.RingPerceptionStep8. 91

PAGE 92

Theringcentroid,planeandnormalarealsocalculatedforu sesinpharmacophore matchingandmolecularalignmentwhichwillbediscussedla terinthischapter.The centroidiscalculatedusingEq. 3{7 where k isthesizeoftheringand q i aretheatomic coordinates.Theringplaneandnormalarecomputedbycarry ingouttheprincipal componentanalysisoftheGrammatrixaspreviouslydescrib edinsection 3.3 .Matrix D is evaluated,Eq. 3{8 ,andthendiagonalizedwiththersttwoeigenvectorsdeni ngthering planeandthethirdbeingtheringnormal. c = 1 k k X i =0 q i (3{7) D = k X i =0 ( q i c )( q i c ) T (3{8) 3.5AdditionofHydrogenAtomstoMolecules TheadditionofHydrogenatomstoproteins,DNAorwatermole culesiscarried outusingapredenedlibrary.Smallmoleculeswithknownat omhybridization,ring informationandbondordersaredealtwithusingthefollowi ngalgorithm.First,Hydrogen atomsareaddedtopolar(N,O,andS)atoms,followedbyrings ystems,thenterminal atoms.Allotherunprotonatedatomsaredealtwithattheend .ThenumberofHydrogen atomstoaddisdenedbythecurrentvalenceandtheidealful lshellvalueoftheatomto whichtheHydrogenwillbeadded. ThebondlengthsusedaredenedinTable 3-7 .Theonlydistinctionoftypeofatom towhichaHydrogenisaddedistheelementtype,inotherword sthebonddistancefora Carbon sp 3 or sp 2 toHydrogenisthesame.TheangletowhichaHydrogenatomisa dded isdenedeitherbythehybridizationortypeoftheconnecti ngatomorthetypeofbond betweentheconnectingatomand1-3atomasshowninTable 3-8 .Thedihedralangle toaddaHydrogenatomisthemostcomplexcomponentofthisal gorithm.Supposeyou wanttoaddaproton, A ,ontoatom B whichisbondedtoatom C andis1 3bonded 92

PAGE 93

(a)Benzene(AR) (b)Anthracene(AR) H (c)Cycloheptatriene(AR) H (d)CyclopentadienylAnion(AR) N (e)Pyridine(AR) S (f)Thiophene(AR) N N (g)Pyrimidine(AR) N N N N (h)Purine(AR) NH HN O S (i)2-thioxo-2,3-dihydropyrimidin-4-one(AR) N NH S (j)imidazo-pyridine-3-thione(AR) (k)Cyclooctatetraene(AA) Figure3-17.Ringstructurewhicharecorrectlyassignedar omatic(AR),non-aromatic (NA)andanti-aromatric(AA). toatom D .Firstalistiscompiledofalltorsionalangles XBCD alreadyoccupied, where X isanyheavyatombondedtoatom A .Thedihedralthenusedisdenedby thehybridizationofatom B andbuiltusingatoms BCD .If B is sp 3 hybridizedthena Hydrogenatomisplaced120 fromotherbondedatoms.Dihedralanglesof0 and180 areusedwhen B is sp 2 and180 for sp hybridized.Aromaticringsareaspecialcaseof sp 2 hybridizedatomswhereonlyatorsionof180 isallowed.Thedihedralvaluesofpolar Hydrogensareoptimizedtomaximizeintra-molecularHydro genbondingusingEq. 3{9 where D H A istheanglebetweenthedonor,Hydrogenandacceptoratomsa nd r H A is theHydrogen-acceptordistance.If D A AA or D H A isgreaterthan90 ,or r D A isless 93

PAGE 94

than3 : 5 AthennoHydrogenbondisconsidered. E HB = cos 2 ( D H A ) e ( ( r H A 2 : 0) 2 ) (3{9) DH A AA Figure3-18.HydrogenBond.Table3-7.HydrogenBondLengths. atomBondLength( A) C1.09N1.008O0.95S1.008Se1.10Default1.05 Table3-8.HydrogenBondAngles. atom/BondAngle(Degrees) sp /triple180.0 sp 2 /double120.0 sp 3 /single109.47 AromaticRing((360 (( ringSize 2) 180) =ringSize ) = 2) Default109.47 3.6ConformationalSampling Conformationalsearchingofdrug-likemoleculesinMTK++i scarriedoutusinga systematicapproach.GAFF[ 160 ]atomtypesareassignedtotheatomsinaparticular moleculeusingANTECHAMBERandCM2chargesaregeneratedus ingtheDivCon program.Theatomichybridizationsandbondordersdenedi nthehybridizeclassare usedtomarkwhichsingleordoublebondsarerotatable.Ifei theroftheatomsinabond aredescribedasterminalthenthebondisremovedfromtheli stofrotatablebonds.If bothoftheatomsaremembersofaringthenthebondisalsorem oved,thusremoving 94

PAGE 95

Table3-9.HydrogenBondDihedrals. atomDihedral(Degrees) sp 180.0 sp 2 0.0,180.0 sp 3 120.0 AromaticRing180.0 ringrexibility.Theincorporationofringrexibilityispl annedinlaterreleasesofthe MTK++code.Thenforeachrotatablebondthatremainsatorsi onissoughtafter.The totalnumberofmolecularconformations, N conformers ,isthendenedbyEq. 3{10 ,where i isarotatablebondindex, R istherangeoftheassociatedtorsion(0 360 ),and is therotationincrement(120 for sp 2 sp 3 ).Theincrementcurrentlyusedaretabulatedin Table 3-10 .Oncethenumberandlocation,asshowninFig. 3-19 ,ofeachrotatablebond isdeterminedagraphisformedasdescribedinFig. 3-20 [ 169 ].Eachrotatablebondis denedasalayerwitheachuniquetorsionalvalueinformati oncontainedinavertexupon thislayer.Graphedgesarethendenedbetweeneachvertexo fonelayertoeveryvertex onelayerbelow.OnceformedthegraphistraversedandtheAM BERenergy, E MM ,is calculatedforeachconformer.Thelowestenergyconformer sarestored,basedonsome userprovidedcriteria,forlateruse. N conformers = n Y i =1 R i i (3{10) Table3-10.DihedralAnglesAvailablebasedonBondType. BondTypeAngles sp 3 sp 3 60,180,300 sp 2 sp 2 0,30,150,180,210,330 sp 2 sp 3 0,30,60,90,120,150,180,210,240,270,300,330 InFig. 3-21(a) wehaveanorganicmoleculewhichbindstothePeroxisome Proliferator-ActivatedReceptor r .Thisisafunctionalgrouprichmoleculecontaining phenylrings,acarboxylate,aheterocycle(2,4,5-oxazole ),aketone,anamine,andan ethermoiety.Thisstructurehas12rotatablebondsasshown inFig. 3-21(b) .Using 95

PAGE 96

O O sp 3 sp 3 sp 2 sp 2 Figure3-19.RotatableBondTypes. 60 Torsion1: sp 3 sp 3 180 0 30 60 90 120 150 0 Torsion3: sp 2 sp 2 30 150 180 210 330 180 210 240 270 300 330 Torsion2: sp 3 sp 2 300 Figure3-20.SystematicConformationalSearching.Torsio n1formstherstlayer containingthreevaluesorvertices.Followedbylayercont aining12vertices andnallythethirdlayerwithsixvertices.Thisgraphwoul dresultin216 conformersbeenformed. thetorsionresolutiondenitionsintable 3-10 wouldleadto4,353,564,672conformers! Evenonmoderncomputerhardwarethisnumberistoolarge.Ta kingacloserlookat thisstructure,Fig. 3-21(c) ,thesymmetryofthefunctionalgroupsbecomesapparent. Forexample,thecarboxylategroup,showningreen,is C 2 symmetricsincethenegative chargeisnotsolelyplacedononeoftheoxygenatomsandthep henylgroupshownin yellowisalsosymmetric.Removingthesesymmetrictorsion sfromthetotalnumber resultsin544,195,584conformers.Chemicalknowledgeoft hetorsionalprolebetween thephenylandoxazolegroupsenclosedintheredovalsugges tsevenfeweravailable torsionsduetoconjugation.Thusreducingthetotalnumber ofdiscreteconformers to181,398,528.MTK++attemptstoreducethisnumberevenfu rtherbyrecognizing 96

PAGE 97

O O O HN O O N (a)PPAR r Agonist O O O HN O O N 1 2 3 4 5 67 8 9 10 11 12 (b)NumberofRotatableBonds O O O HN O O N (c)SymmetricRegions O O O HN O O N 1 2 3 45 6 7 8 (d)ReducednumberofRotatableBonds Figure3-21.ConformerGeneration.\privilegedfragments"withknowntorsionproles.Thetyr osine-likegroupenclosedin theblueovalisonesuchfragmentandisstoredinthe\cores" libraryofthepackage. Removingtheserotatablebondsresultsin419,904conforme rs.InFig. 3-21(d) highlights therotatablebonds:greenrepresentsunrestricted,bluea rerestrictedbysymmetry,while redbondsarefrozen. Thesystematicapproachworksextremelywellformolecules withapprox.12 rotatablebondorless.Whenthenumberofpotentialconform ersexceedstwomillion thesearchingalgorithmrevertstousingtheGAlibraryofMT K++.DuringaGAsearch ofconformationalspacetheuserisrequiredtoprovidethem aximumnumberofMM calculationswhichareallowed.Othersearchingtoolssuch asMDandMD-LESare recommendedforlargepeptidicmoleculesthatbindcertain proteinssuchHIVProtease andEndothiapepsin. 97

PAGE 98

3.7SubstructureSearching/Functionalize Tofunctionalizeamoleculeinvolvessearchingitforchemi calsubstructures. Substructuressearchingisknownasthesubgraphisomorphi smproblemofgraphtheory andbelongstotheclassofNP-completecomputationalprobl ems.DuetotheNP-complete natureofsubstructuresearchingusuallyascreeniscarrie douttoeliminatesubgraphsthat cannotbecontainedinthemolecule.Thengerprintclassin themoleculelibrarycarries outthisscreeningprocessbetweenfragmentsandamolecule Thebrute-forcealgorithmforsubgraphisomorphismbegins bygeneratingthe adjacencymatricesAandBforthefragmentandthemoleculec ontaining P A and P B atomsrespectively.Thenanexhaustivesearchinvolvesgen erating P B = [ P A !( P B P A )!] combinationsof P A anddeterminingwhetheranycombinationsarematchestoapo rtion ofthemolecule.Thealgorithmusedisincloseagreementwit hthatpublishedbyUllmann [ 170 ],andWillett,Wilson,andReddaway[ 171 ].Ullmannrstnoticedthatusinga depth-rstbacktrackingsearchdramaticallyincreasese ciency,whileWillettuseda labeledgraphandanon-binaryconnectiontabletoincrease algorithmspeed. Thefunctionalityofndingsubstructuresinmoleculeswas developedtocarryout functionalgroupalignmentofdrug-likemoleculesandtoop timizefragmentpositionsin drug-proteincomplexesduringtheleadoptimizationstage ofthedrugdesignprocess. Thealgorithmicdetailsofthefunctionalizecodeareasfol lows(thisexamplewas adaptedfromMolecularModelling,PrinciplesandApplicat ions2ndEditionbyAndrewR. Leach[ 81 ]).TakeforexampleafragmentandmoleculeshowninFig. 3-22(a) and 3-22(b) ThecorrespondingadjacencymatricesareshowninEq. 3{11 and 3{12 .TheUllman algorithmtriestondthematchbetweenthefragmentandthe molecules(Fig. 3-22(c) ). Mathematicallythisisrepresentedasthematrix A ,Eq. 3{13 ,whichsatises A ( AM ) T as showninEq. 3{14 98

PAGE 99

1 2 3 4 (a)FragmentStructure 1 2 3 4 56 (b)MoleculeStructure 1 2 3 4 56 1 2 3 4 (c)Alignment Figure3-22.UllmanSubgraphIsomorphismIllustration. F = 0BBBBBBB@ 0100101001010010 1CCCCCCCA (3{11) M = 0BBBBBBBBBBBBBB@ 010000101100010000010000000100000100 1CCCCCCCCCCCCCCA (3{12) A = 0BBBBBBB@ 001000010000000100000001 1CCCCCCCA (3{13) 99

PAGE 100

A ( AM ) T = 0BBBBBBB@ 001000010000000100000001 1CCCCCCCA 0BBBBBBBBBBBBBB@ 010010100100010100100010 1CCCCCCCCCCCCCCA = 0BBBBBBB@ 0100101001010010 1CCCCCCCA = F (3{14) Thisdepth-rstbacktrackingalgorithmusesaGeneralmatc hmatrix, M that containsallthepossibleequivalencesbetweenatomsfromA andB.Theelementsofthis matrix, m ij (1 i P a ;1 j P b )aresuchthat: m ij = 8><>: 1ifthe i th atomof A canbemappedtothe j th atomof B 0otherwise. (3{15) TheUllmannheuristicstatesthat\ifafragmentatom a i hasaneighbor x ,anda moleculeatom b j canbemappedto a i ,thentheremustexistaneighborof b j y ,thatcan bemappedto x "andismathematicallywritteninEq. 3{16 m ij = 8 x (1 :::P A )[( a ix =1) )9 y (1 :::P B )( m xy b jy =1)](3{16) Ifatanystateduringthesearchanatom i in A suchthat m ij =0forallatomsin B thenamismatchisidentiedasdenedinEq. 3{17 andthematchisdiscarded. mismatch = 9 i (1 :::P A )[( m ij =0 8 j (1 :::P B )](3{17) Thecompletealgorithmtoperceivethefunctionalgroupsin amoleculeindescribed usingpseudocodeinAlgorithm 3.1 .Thealgorithmbeginsbyreadingusercreated fragmentlibrariesandmoleculeswhicharetobestudied.Fi ngerprintsofeachfragment andmoleculearethencreated.Foreachmoleculeunderconsi derationrings,atom hybridizations,bondordersandformalchargesareassigne dusingthealgorithmpreviously 100

PAGE 101

described.IfHydrogensarenotpresentonthemoleculethen theyareaddedusingan algorithmdescribedlaterinthischapter. Algorithm3.1 :FunctionalizeAlgorithm. Data :FragmentLibrariesandMDLFiles Result :Functionalgroupassignment begin ReadFragmentLibraries;Readinmoleculestofunctionalize;Generatengerprintsforallfragments;for i nMolecules do DetermineRings;PerceiveHybridizations,BondOrder,andFormalCharge;AddHydrogens;Generatengerprint;for j nFragments do boolbMatch1=Comparemoleculetosimplengerprint(Scree ning); if bMatch1 then boolbMatch2=MatchfragmenttomoleculeusingtheUllmanna nd Willettalgorithmofsubgraphisomorphism;if bMatch2 then assignfragmentcodetomolecule; end end end end end Thenforeachfragmentstoreinmemoryitsngerprintiscomp aredtothatof themolecule.IfthereisangerprintmatchthentheUllmann /Willettalgorithmis invoked.Thesubgraphisomorphismalgorithmisoutlinedin Appendix A .Ifthesubgraph isomorphismalgorithmresultsinamatchthenthefragmentc odeisassignedtothe molecule. 3.8CliqueDetection/MaximumCommonPharmacophore AsoutlinedinCh. 2.12 a3Dmolecularcliqueisdenedasagroupofpharmacophore pointsandthegeometricdistancesbetweenallpointsintha tgroup.Fig. 3-23 illustrates thecliquedetectionalgorithmimplementedinMTK++[ 172 ].Takeforexampletwo estrogenreceptorligands(PDBID:1ERRand3ERT)andnding thepharmacophore 101

PAGE 102

pointssuchasHydrogenbondacceptor/donor,positive/neg ativechargecenters, hydrophobes,rings,andringNormalsasshowninFig. 3-23(b) certainmolecularfeatures canbemappedtooneanother,asshowninFig. 3-23(c) .Themapping,Eq. 3{18 ,results inavalidcliquebecausetheinterpointdistances,Eq. 3{19 ,arecompatiblewithinsome tolerance.However,addingthemapping M =[ F 1 $ F 2 ],doesnotresultinavalid cliqueas d 1CF 6 d 2CF .Thecliquedetectionalgorithmthusrequiresamethodofpr uning apotentiallylargesetofmappingwhichiscarriedoutbyall owingeachtobeaseed andgrowingcliquesusingheuristiccriteria.Obviouslyce rtainseedmappingswilllead toequivalentcliqueshoweveradiversesetareoftenfound. Cliquesarethenscoredor rankedtodeterminethebestoverallmatchingusingEq. 3{20 where D aretheinter-point distancesand d 1i isadistancebetweentwofeaturesofmolecule1and d 2i isthedistance betweenequivalentfeaturesinmolecule2.Thefunctionfor thetwomapping A 1 $ A 2 and B 1 $ B 2 reachesavalueof1 : 0when d 1AB = d 2AB .Theparameter d max controlshow rapidlythematchscoredropsoasthedistancesbecomesles scompatible. P =[ A 1 $ A 2 ;B 1 $ B 2 ;C 1 $ C 2 ;D 1 $ D 2 ;E 1 $ E 2 ](3{18) D = d 1AB d 2AB ;d 1BC d 2BC ;d 1CD d 2CD ;d 1DE d 2DE ;::: (3{19) Score = D X i exp d 1i d 2i d max 2 # (3{20) 3.9Superimposition Molecularsuperimpositioniscarriedoutusingarigidbody leastsquaresprocedure fromKearsley[ 173 ]andKabsch[ 174 175 ].Therotationmatrixtominimizethesum ofthesquareddistancesbetweenatomsoftwomolecules,Eq. 3{21 issolvedusing quaternionsandeigenmethodsasdescribedbyKearsleyin19 89. F = X i j x i x 0i j 2 (3{21) 102

PAGE 103

N O O S OH HO N O HO (a)EstrogenLigands(PDB:1ERRand3ERT.) N O O S OH HO N O HO (b)ChemicalFeaturesHighlighted A 2 B 2 C 2 D 2 E 2 A 1 B 1 C 1 D 1 E 1 F 2 F 1 (c)ChemicalFeatureMappings Figure3-23.CliqueDetectionIllustration. 103

PAGE 104

Arequirementofthisprocedureisthatatom i inmolecule A correspondstoatom i inmolecule B .Forexampleifyouwantedtomeasurethermsdbetweentwoben zoicacid conformersasshowningures 3-24(a) and 3-24(b) wouldrequireacertaincorrespondence toremovearticialdierencesattributedtoautomorphism sorself-symmetry.Thisis carriedoutbygeneratingallmatchingsofnon-Hydrogenato msbytypeorelementkind andassigningthelowestrmsdasthetruevalue[ 176 ]. O O 1 2 3 4 8 5 6 7 9 (a)Conformer1 O O 3 2 1 4 6 9 8 7 5 (b)Conformer2 Figure3-24.Illustrationoftherequirementofatomcorres pondenceformolecular superposition. 3.10Conclusions ThischapterhasoutlinedthedesignanddevelopmentofaC++ packagecalled MTK++.Thispackagecontainsfunctionstohandlemolecular structuresrangingfrom proteinstosmallmolecules,thatmaybeutilizedtocalcula temolecularmechanics energiesandgradients,toperceiveatomhybridizations,a ndevaluatebondorders,formal charges,ringsandfunctionalgroups.UtilitiestoaddHydr ogenatomstostructureshave beendeveloped;thiscodewascreatedtodealwithmetallopr oteinsystemswhereno othersoftwarecouldsatisfactorilydoso.MTK++alsohasth ecapabilitytoperform conformationalsearchingofdrugmoleculesusingasystema ticapproachwherethe molecularmechanicscodewasaprerequisite.Thereisanobv iouslimitationtothis approach,thatiswhenthenumberofrotatablebondsincreas esthemethodbecomes intractable.Trickstoimprovethissuchastreepruningand creatingatorsionaltype librarycouldbeimplemented.Alsoanalgorithmtoperformc liquedetectionofmolecular 104

PAGE 105

featureswasimplementedtosuperimposetwomolecularspec iesontooneanotherforuse inligandandreceptorbaseddrugdesign.Additionally,the MTK++packagecontains othergeneralpurposelibrariesforparameteroptimizatio n,graphutilities,andstatistical methods. MTK++wasdevelopedwithalgorithmsfromtheliteraturebut thisisarm foundationtofurtherdevelopnewandnoveltoolsfordrugde signandmetalloprotein modeling.Theremainingchaptersofthisthesiswouldnotha vebeenpossiblewithoutthis software.Chapter 4 utilizesMTK++conformationalsearchingmethodstorexibl yalign over80smallmoleculeswhichbindvariousreceptorsontoon eanother.WhileinChapter 5 MTK++wasusedtoecientlymodelmetalloproteins,inparti cularZnproteins.Force eldsfortetrahedralZnproteinsweregeneratedusingMTK+ +wherepreviouslysuch workwouldhavebeentimeconsumingandpronetoerrorifatte mptedbyhand.The chaptersfurtherfosterthegrowthofabridgebetweenthede velopmentandapplicationof codeapplicabletobiologicalproblems. 105

PAGE 106

CHAPTER4 SEMIFLEXIBLEQUANTUMMECHANICALALIGNMENT OFDRUG-LIKEMOLECULES 4.1Introduction Theplacementofdrugmoleculesintotheactivesitesofrece ptorsremainsa challengingprobleminthedrugdesigneld[ 177 ].Docking[ 4 178 ]isthemethodofchoice whenthereisa3Dstructureofthereceptorwhiletemplatefo rcingorsuperimpositionof astructureontoaknownactivemoleculeisusedwhennosuchs tructureisavailable [ 179 180 ].Therehavebeenover20yearsspentdevelopingthetoolsne cessaryto alignmoleculesontoponeanother.Mostofthesemethodswer econceivedforusein ligand-baseddrugdesign(LBDD)wherenoreceptorstructur eisavailable,suchastargets thataremembraneproteins.Table 4-1 summarizesallalignmentapproachesfrom1986to thepresentwheremethodscanbedistinguishedbytreatment ofconformationalrexibility, optimizationorsuperpositionalgorithmused,andthesimi laritymetricbetweenthe twostructures[ 181 ].Therearethreetypesofrexibilityencounteredinthesem ethods, therstisrigidbodyalignment,thesecondisdescribedass emirexible,andnally rexiblealignment.Thesemirexiblealignmentdescribesat echniqueofperforminga conformationalsearchofaligandindependentofthealignm entalgorithm,whilefully rexiblealignmenttoolswillperformboththesetaskatthes ametime. TheSE-COMBINEapproachintroducedinCh. 2.11 decomposedtheinteraction energiesofaseriesofinhibitorsthatbindtrypsin[ 9 ].ThatimplementationofSECOMBINEcontainedanumberofdecienciesincludingtheneg lectofsolventand dispersiveeects,ligandconformationalrexibility,and fromamodelingstandpointit requiredthemanualplacementofinhibitorsintotheactive siteandthestructureswere fragmentedbyhand.ThischapterdescribestheConformatio nallyunlimitedTemplate forcedInteractionenergybiasedPharmacophore(CuTieP)p rogramwhichwasdeveloped usingMTK++torexiblyalignligandstructuresintorecepto ractivesitesusingaclique detectionalgorithmtoproducetrialalignmentswhichwere rankedusingasemiempirical 106

PAGE 107

(SE)scorefunction.Thishypothesisgoesagainstthenormo fDockingandmolecular alignment.HereweproposeusingaLBDDmethodtogeneratepo sesforreceptorbased drugdesign(RBDD)scoringand3DreceptorQSAR.However,si ncetheSE-COMBINE methodcurrentlycanonlybeconsideredapplicableformode lingaseriesofcongeneric compoundsandareceptorthisassumptioncanbeconsideredv alid. Table4-1.CompoundAlignmentLiterature.Thistablewasad aptedfromMelani etal. [ 182 ]wherethealignmentmethodsfrom2003tothepresentweread ded. ProgramSimilarityOptimization/NameCriteriaSuperpositionMode Sheridan[ 183 ]distancegeometryofrexible pharmacophore SEAL[ 184 ]electrostaticandstericRFOrigid ASP[ 185 ]MEP(Hodgkinfunction)simplexrigid DISCO[ 186 ]pharmacophorecombinatorialsemirexible pointssearch MSC[ 187 ]physicochemicalBFGSsemirexible properties TORSEAL[ 188 ] rexible COMPASS[ 189 ]surfacedescriptionneuralnetssemirexible GASP[ 190 ]intermolecularmatchingGArexible Energy AAA[ 191 ]distancemapofcombinatorialrexible pharmacophorepointssearch TFIT[ 192 ]inter/intramolecularMCandrexible Energylinesearch Continuedonnextpage. 107

PAGE 108

Table4-1.(continued) ProgramSimilarityOptimization/NameCriteriaSuperpositionMode PLM[ 193 ]surfaceoverlapvolumeSAsemirexible Petitjean[ 194 ]electronicpropertiesgradientmethodrigid Grant[ 195 ]vdWVolumeanalyticrigid (GFs)derivatives FLEXS[ 196 ]interactioncombinatorialrexible elds(GFs)search McMahon[ 197 ]electrostaticgradientrigid potential(GFs)method Cosse-Barbi[ 198 ]patternin3DSpacestepwiseapproachrigid MIMIC[ 199 ]stericandelectrostaticSDorNRsemirexible elds(GFs) QUASIMODI[ 200 ]electrondensitysimplexrigid withGFs Parretti[ 201 ]stericandelectrostaticMCrigid elds(GFs) Cocchi[ 202 ]MEP,sizeandsimplexrigid andshapedescriptors DeRosa[ 203 ]Euclideandistanceinrigid Hi-PCAspace Handschuh[ 204 ]geometrictGAandrexible Quasi-Newton 3DFS[ 159 205 ]pharmacophoreGMA,Powellrexible Continuedonnextpage. 108

PAGE 109

Table4-1.(continued) ProgramSimilarityOptimization/NameCriteriaSuperpositionMode points RigFit[ 206 ]pharmacophoreQuasi-Newtonsemirexible points(GFs) SQ[ 207 ]SQtypesimplexsemirexible Klebe[ 208 ]pharmacophoreQuasi-Newtonsemirexible points(GFs) MIPSIM[ 209 ]MEPgradientsemirexible approach Cosgrove[ 210 ]localsurfaceshapecliquedetectionrigid MutliSEAL[ 211 ] multiplerexible TGSA[ 212 213 ]Topo-Geometricalrexible Labute[ 214 ]atompropertiesmodiedRIPSrexible SLATE[ 215 ]distancematrixforSArexible H-bondingandaromatics FLASHFLOOD[ 216 ]commadescriptorsclustermethodrexible AUTOFITpharmacophorecombinatorialrexible pointssearch QSSA[ 106 ]ASAGAandsimplexsemirexible fFlash[ 217 ]pharmacophorepointsclique-basedrexible FIGO[ 182 ]eldinteractionandsimplexrigid geometricoverlap Continuedonnextpage. 109

PAGE 110

Table4-1.(continued) ProgramSimilarityOptimization/NameCriteriaSuperpositionMode FLUFF[ 218 ]vdWandelectrostaticrexible BRUTUS[ 219 220 ]chargedistributionrigid andvdW,grid-based FLAME[ 221 ]MCPGAandBFGSrexible GMA[ 222 ]MCPgradient-basedrexible torsionspace MEP:MolecularElectrostaticPotential,PCA:PrincipalCo mponentAnalysis, vdW:vanderWaals,GFs:GaussianFunctionswereused,GA:Ge neticAlgorithm, RFO:RationalFunctionOptimization,RIPS:RandomIncreme ntalPulseSearch, BFGS:Broyden-Fletcher-Goldfarb-Shanno,SD:SteepestDe scent, NR:Newton-Raphson,MCP:MaximumCommonPharmacophore,MC :MonteCarlo, ASA:AtomicShellApproximation,SA:SurfaceArea 4.2Implementation TheCuTiePapproachcanbedividedintothreekeyareas.The rstisthegeneration ofasetofconformersforthequerystructurethatistobeali gnedontoastationary moleculecalledthetarget.Theneachconformerisalignedo ntothetargetstructureand nallythesimilaritybetweenthetwomoleculesisdetermin ed. 4.2.1LigandConformationalSearching Conformationaltechniquestoreproducethebioactiveconf ormationofsmall moleculescanbedividedintotwo:deterministicorsystema ticandstochastic.Theformer exhaustivelyenumeratesallconformersbydeningrotatab lebondsanddiscretetorsional anglesasintheMIMUMBAprogram[ 223 224 ].Thelatterexploresconformational 110

PAGE 111

spaceusingmoleculardynamics,geneticalgorithm[ 225 226 ]orMonteCarlotechniques. Thereareproandconsforbothcategories;thesystematicap proachiscertaintosample allconformationalspacebutthesearchspacegrowsexponen tiallywiththenumberof rotatablebonds.Andsoforlargerexiblemoleculesthestoc hasticapproachesarefavored. Variouscommercialpackagesfortheconformationalsearch ingareavailableincluding SPE,Catalyst,Macromodel,Omega,MOE,andRubiconwhichwe rerecentlyreviewed byAgraotis etal. [ 227 ]whereSPEandCatalystweremoreeectiveinsampling conformationalspace.Thekeypointtonotewhenperforming conformationalsearching istherequirementofndingthebioactiveconformation,th oughmostoftenthisdoes notcorrespondtotheglobalenergyminimun[ 223 228 { 232 ].Andsoforthisstudyto investigatetheuseofSEmethodsinmolecularsuperpositio nasystematicapproachwas choseninordertoensurethatthebioactiveconformerwasfo undwithinsometolerance. 4.2.2StructuralAlignmentandCliqueDetection Iftwostructurescontainatleastthreepairsofreferencep ointsthentheycan bealignedontooneanotherbyminimizingthesumofthesquar eddistancesbetween pairsasdescribedinCh. 3.9 .ThisrigidbodyleastsquaresprocedurefromKearsley [ 173 ]andKabsch[ 174 175 ]generatesarotationmatrixusingquaternionsandeigen methods.AcliquedetectionalgorithmdescribedinCh. 3.8 isemployedtogeneratea setofcorrespondencesbetweenthetwomoleculesinquestio nwhichhaspreviouslybeen showntobeanecienttechniqueofproducingalignments[ 186 221 222 ].Eachsetof referencepointsorcliqueproducestrialalignmentsusing theKearsleyalgorithm.The cliquedetectionalgorithmusesascorefunctionshowninEq 4{1 where d 1i isthedistance betweentwopharmacophorefeaturesinmolecule1and d 2i isthedistancebetweentwo equivalentfeaturesinmolecule2.Theparameter d max controlshowrapidlythematch scoredropsoasthedistancesbecomeslesscompatible.The featuresusedinthisclique detectionalgorithmincludehydrogenbondacceptor/donor atoms,positive/negative 111

PAGE 112

chargecenters,hydrophobicgroups,rings,andringnormal s. ClqScore = D X i exp d 1i d 2i d max 2 # (4{1) 4.2.3SemiempiricalSimilarityScore Thetrialalignmentsfromtheprevioussteparescoredusing asemiempiricalfunction implementedintheQMALIGNprogramfromDixonandMerz[ 233 ].TheQMALIGN approachisdissimilartoallotherquantumsimilarityanda lignmentprogramswhere insteadofaligningbasedonthedensity, ( r ),asdescribedbyCarbousingEq. 4{2 ,this programalignsstructuresbasedontheirwavefunctions( r ).TheCarboapproach[ 106 ] matchestheoverallsizeandshapecharacteristicsofmolec ules;however,thereisnophase informationin ( r )thusanyoverlapcontributespositivelyto Z AB eventhoughorbitals maybeorthogonal.Alignmentbasedonmolecularwavefuncti onsormoreprecisely frontierorbitalstakesintoaccountbothphaseandorbital information.Inthisapplication ofQMALIGNonlythescorefunction(Eq. 4{3 )isusedwhere k and k 0 arethemapped MolecularOrbitals(MOs)fromeachmoleculeand k and k aretheMOsandenergies. Theparameter max issimilarlydenedas d max wasabove.Thesimilaritybetweentwo molecules, A and B ,wasthencalculatedasaCarboindex, SEScore ,asshowninEq. 4{4 Z AB = Z A ( r ) B ( r ) dr (4{2) Z AB = X k;k 0 exp Ak Bk 0 max Z Ak ( r ) Bk 0 ( r ) dr (4{3) SEScore = z AB p z AA z BB (4{4) ThecompleteCuTiePalgorithmisoutlinedinpseudocodeinA l. 4.1 wheretheuser providestwomoleculesandafragmentlibraryoutlinedinAp pendix C .Pharmacophore pointsareassignedtoeachmoleculeusingthesubstructure searchingtoolwithinMTK++ andthefragmentlibraryprovided.Atorsionalbasedconfor mationalsearchoftherexible moleculeusingthetorsionalresolutionsoutlinedinCh. 3.6 iscarriedoutwherethelowest 112

PAGE 113

energyconformersarestoredbasedonsomeenergycuto, dConf .Theneachofthese conformersisalignedontothetemplatestructureusingthe cliquedetectionalgorithm describedabove.Theuserdenesthemaximumnumberofcliqu esperconformeras nMaxCLiques ,andthetotalnumberoftrialalignmentsfromallconformer /clique combinationsas nTotalMCP .IfatargetstructureisavailablethenaMMinteraction energyiscalculatedbetweenthetrialalignmentsandthere ference'sreceptorstructure with nMM stored,thuseliminatingallunreasonablestructures.Thi sstepisfollowedby determiningtheSEsimilarityscoreoftherexibleandtempl atemoleculesusingtheAM1 [ 73 ]Hamiltonianwithatotalof nSE alignmentbeingsavedforlateruse. 4.3ResultsandDiscussion Thegoalofthisstudywastoinvestigatetheperformanceoft heCuTiePalignment approachtoreproducecrystallographicbindinggeometrie sofligandsinreceptoractive sitesobservedintheProteinDataBank(PDB)[ 15 ]. 4.3.1DataSet ToevaluatetheCuTiePalignmentapproach84crystalstruct uresofproteinligand complexesweredownloadedfromthePDBasoutlinedinTable 4-2 .Thissetcontains12 uniquereceptorsincludingCarboxypeptidaseA,GlycogenP hosphorylase,Immunoglobin, Streptavidin,DihydrofolateReductase(DHFR),Thrombin, Trypsin,EstrogenReceptor (ER),PeroxisomeProliferator-ActivatedReceptor r (PPAR r ),HumanCarbonic AnhydraseII(HCAII),Elastase,andThermolysin.Thisdata setresemblestheone usedtovalidatetheFLEXS[ 234 ]program;however,Concanavalin,Endothiapepsin, HIV-Protease,FructoseBisphosphatase,andHumanRhinovi rusreceptorswereomitted duetoligandsizeandrexibility.TheER,PPAR r ,andHCAIIligandsareusedherebut werenotintheFLEXSdataset.Thevariousligandswhichbind certainreceptorsare labeledusingthelowercaseformofthePDB-IDcorrespondin gtothecomplexstructure. Thedatasetwassplitintotwowheretheligandsintherstpo rtionwererexiblyaligned whereastheremainingligandswererigidlyaligned. 113

PAGE 114

Algorithm4.1 :FlexibleAlignmentofDrug-likeMolecules Data :FragmentLibraries,TemplateandFlexibleFiles Result :FlexibleMolecule/sAlignedontoTemplateMolecule begin dConf 20 30 kcal=mol ; nMaxCLiques 10; nTotalMCP 10000; nMM 1000; nSE 100; d max 0 : 5; ReadFragmentLibraries;temp ReadTemplateMolecule; rex ReadFlexibleMolecule; ndFunctionalGroups(temp);ndFunctionalGroups(rex);assignPharmacophorePoints(temp);assignPharmacophorePoints(rex);totalConformers confSampler(rex); nConformers confAnalysis(rex,dConf); for i nConformers do nCliques i MCP(Template, Conformer i ,nMaxCliques); for j nCliques i do rotMat Superimpose(Pharmacophore); alignedConformer ij rotMat Conformer i ; end endstorebestnTotalMCPalignments;for i nTotalMCP do Calculate E MM INT;i ( receptor;alignedConformer i ); endstorebestnMMalignments;for i nMM do Calculate SE qmalign i ( temp;alignedConformer i ); endnalAlignments CuTieP(); end 114

PAGE 115

Table4-2.Protein-LigandDataSet. ReceptorPDB-ID FlexibleAlignment CarboxypeptidaseA1CBX,2CTC,3CPAGlycogenPhosphorylase1GPY,3GPB,4GPB,5GPBImmunoglobin1DBB,1DBJ,1DBK,1DBM,2DBLStreptavidin1SRE,1SRF,1SRG,1SRH,1SRI,1SRJDihydrofolateReductase1DHF,4DFRTrypsin1PPH,1TNH,1TNI,1TNJ,1TNK,1TNL,3PTBEstrogenReceptor1ERR,3ERTPeroxisomeProliferator-1FM6,1FM9ActivatedReceptor r HumanCarbonicAnhydraseII1A42,1BN1,1BN3,1BN4,1BNM,1B NN,1BNQ,1BNT, 1BNU,1BNV,1BNW,1CIL,1CIM,1CIN,1CNX,1EOU,1G1D,1G52,1G53,1G54,1I8Z,1I90,1I91,1IF4,1IF5,1IF6,1IF7,1IF8,1IF9,1KWQ,1KWR,1OKL,1OKN,1OQ5,1TTM,1XPZ,1XQ0,1YDA,1ZE8 RigidAlignment Thrombin1DWC,1DWDElastase1ELA,1ELB,1ELC,1ELD,1ELEThermolysin1TLP,1TMN,2TMN,3TMN,4TLN,4TMN,5TMN Eachcomplexstructurewasbrokenintotwo,areceptorandli gand,withallco-factors andwatermoleculesremoved.Theligand'srelativeorienta tioninspacewasretained. TheLabutealgorithm[ 142 ]withinMTK++wasusedtoperceiveatomhybridizations, bondorders,andformalchargesoftheligands.Hydrogenato mswereaddedtoboththe receptorandligandstructuresusingtheprotonatefunctio nsalsobuildintoMTK++. GAFF[ 160 ]atomtypesandCM2[ 123 ]chargeswereassignedtotheligandsusingthe antechamber[ 166 ]andDivCon[ 112 ]programs.Thelocationsoftherotatablebondsand thefunctionalgroupsofeachligandweredeterminedusingt hetoolswithinMTK++ asdescribedinCh. 3 .EachfunctionalgroupwithintheMTK++packagecontains pharmacophoricinformationrequiredforthecliquedetect ionalgorithmabovetogenerate trialalignments.Theatomandbondtypes,charges,andfeat ureinformationisstoredin xmlformatforusewiththeCuTiePprogram. 115

PAGE 116

Todeterminetheperformanceofanalignmentapproachrequi resthedenitionofa referencestate.Foreachreceptorexamplethestructurest hereinweretakeninturnand allothercomplexessuperimposedontoitbyminimizingtheR MSDbetweenthepeptide backboneatoms.Thusforeachreceptorasetofligandswasal ignedintoitsactivesite whichdescribestheirreferencestateoridealalignment. Eachpairofligandsinallreceptorclasseswassuperimpose dontooneanother,except forHCAIIwhere1A42wasonlyusedasatarget.Aftereachalig nmenttherootmean squareddeviation(RMSD)betweenthequerystructureandth ereferencestatewas calculated.WithinthecalculationoftheRMSDanatomtypec orrespondencebetween atomsofthealignedqueryandidealstructureswasdetermin ed.Thisprocedureprevented articiallyhighRMSDvaluesduetoautomorphismsasdescri bedinCh. 3.9 .Andsothis procedureresultsinan NxN matrixofalignments,where N isthenumberofcomplexes foreachreceptorclass.Neverthelessthesematricesareno tsymmetricastheresults dependonwhichligandisusedasthetarget.ThelowestRMSDv aluefromthetopten alignmentsisreportinthetablesandguresbelow.Alsosho wnintheoutputtablesare thenumberofconformerssampled,stored,andRMSDfromtheb ioactiveconformation. TheClqScoreandSEScorevaluesarealsoplottedusingso-ca lledlevelplot'susingtheR program[ 235 ]tographicallyviewthealignmentresults. TheoverallperformanceofCuTiePtoreproducebindinggeom etriesofligandsin complexwithareceptorisshowninTable 4-3 .Itisconsideredacorrectalignmentwhen theRMSDvaluebetweentheposeandthereferencestructurei sbelow1 : 5 A[ 196 ],while under2 : 5 ARMSDrepresentsthecorrectorientationandconformeroft hequery,and nallyRMSDabove2 : 5 Aareconsideredtobemisaligned.Atotalof219alignments werecarriedoutandin48 : 9%ofthecasesasatisfactoryresultwasachieved.64 : 8%of thealignmentswereinthecorrectorientation;however,35 : 2%wouldberegardedas misaligned.Eachreceptorisoutlinedinmoredetailbelow. 116

PAGE 117

Table4-3.StatisticsofCuTiePPerformance. N < 1 : 5 A < 2 : 5 A < 3 : 5 ATotal CarboxypeptidaseA2366GlycogenPhosphorylase781112Immunoglobin7131520Streptavidin25273030DihydrofolateReductase0112Trypsin12202842EstrogenReceptor1122PPAR r 0022 HumanCarbonicAnhydraseII22333739Thrombin2222Elastase881120Thermolysin21263142 Total107142174219%48.964.879.5 4.3.2CarboxypeptidaseA ThreeCarboxypeptidaseAligandcomplexeswereusedinthis studyincluding L-benzyl-succinate(1cbx,Fig. 4-1(a) ),L-phenyl-lactate(2ctc,Fig. 4-1(b) ),andglycyl-Ltryrosine(3cpa,Fig. 4-1(c) ).Theligandstructuresof2ctcand3cpawhen2CTCand 3CPAarealignedonto1CBXareshowninFig. 4-1(d) andakeypointtonoteisthat thephenylmoietiesdonotoverlap.Thecarboxylates(close rtothephenylgroup)of allthreestructuresarealignedontooneanotherwhichform sastrongintermolecular interactionwithanArginineresidueoftheprotein,whilet hetailgroupbindsaZincatom. Theconformationanalysisandpairalignmentsareoutlined inTable 4-4 .Inallthree casesconformersweregeneratedwhichresemblethebioacti veconformation(0 : 5 ARMSD fromthebioactiveconformation)[ 223 ].Theconformationalsearchingof1cbxisfurther describedingures 4-2(a) 4-2(b) ,and 4-2(c) .TheplotofMMenergyversusRMSDfrom thebioactiveconformationindicatesthatindeedthebound geometryisnottheglobal energyminimumusingGAFF.TheEuclideanDistance(ED)isth eRMSDoftorsional spacebetweenconformersandthebioactivestructure.Thep lotofEDversusMMenergy showstheconformersareclusteredbetween10 and25 EDwhilethevariationofthethe 117

PAGE 118

RMSDversusSDisshowninFig. 4-2(c) .ThebestRMSDofthetop10posesbetween thequerystructureandtheirassociatedreferencealignme ntarealsooutlinedinTable 4-4 .When1cbxisusedasthereferenceorquerypooralignmentsa regeneratedwhen usingtheSEscoringfunction,thoughwheneither2ctcor3cp aareusedRMSDsofbelow 1 : 5 Aarefound.Thisisadisappointingresultasthelevelplotd iagramoftheClqScores forthisreceptor,showninFig. 4-3(a) ,generatedtrialalignmentunder1 : 0 Abutthe SEscorefunctiondidnotscorethesethehighestasshowninF ig. 4-3(b) .Themost probablereasonforthediscrepanciescanbeattributedtot hefactthatthephenylgroup isthenotthemostimportantpharmacophorefeatureinthese t.AndsosincetheSE scoringfunctionisafrontierorbitalapproachthegreates tsimilaritybetweenpairsofthese moleculeswouldbeobtainedwherethephenylgroupsoverlap Table4-4.CarboxypeptidaseALigandAlignments. Query 1cbx2ctc3cpa RotatableBonds 536 ConformersSampled 194410862208 ConformersStored 126706767 Conformers < 1 A 1130807 Conformers < 2 A 115403201 Target 1cbx 0.002.212.74 2ctc 2.380.001.01 3cpa 2.051.240.00 4.3.3GlycogenPhosphorylase Theligandsalpha-d-glucose-6-phosphate(1gpy,Fig. 4-4(a) ),alpha-*d-glucose-1phosphate(3gpb,Fig. 4-4(b) ),2-ruoro-2-deoxy-alpha-*d-glucose-1-phosphate(4gpb ,Fig. 4-4(c) ),andalpha-*d-glucose-1-methylene-phosphate(5gpb,Fi g. 4-4(d) )whichbindto GlycogenPhosphorylasewereusedinthisstudy.Theconform ationalanalysisandSE pairalignmentscoresareoutlinedinTable 4-5 .Meaningfulalignmentswhen1gpyisused asatargetorquerycannotbegeneratedasthisligandbindsi nadierentpocketas showninFig. 4-4(e) ;however,thesealignmentswerecarriedouttogeneratease nseofthe 118

PAGE 119

O OH OH O (a)1cbx HO OH O (b)2ctc HN O OH O NH 2 OH (c)3cpa (d)2CTCand3CPAalignedonto1CBX Figure4-1.CarboxypeptidaseALigands.predictederrorinthisapproach.Allotherpairalignments producedexcellentagreement usingboththeClqScoreandSEScorewiththeobservedstruct uralsuperimpositionsfrom thePDBashighlightedingures 4-5(a) and 4-5(b) 4.3.4Immunoglobin Theligandsprogesterone(1dbb,Fig. 4-6(a) ),aetiocholanolone(1dbj,Fig. 4-6(b) ), 5-beta-androstane-3,17-dione(1dbk,Fig. 4-6(c) ),progesterone-11-alpha-ol-hemisuccinate (1dbm,Fig. 4-6(d) ),and5-alpha-pregnane-3-beta-ol-hemisuccinate(2dbl, Fig. 4-6(e) ) whichbindtoImmunoglobinwereusedinthisstudy.Theligan ds1dbjand1dbkare cholicacidderivativeswhile1dbm,2dbl,and1dbbarestero idalmolecules.Theideal 119

PAGE 120

0.00.51.01.5 80859095100105 RMSD ( A )Energy (kcal/mol)(a)MMEnergyvsRMSD 0510152025 80859095100105 Euclidean Distance (Deg)Energy (kcal/mol)(b)MMEnergyvsElucideanDistance 0510152025 0.00.51.01.5 Euclidean Distance (Deg)RMSD ( A )(c)ElucideanDistancevsRMSD Figure4-2.1CBXConformerAnalysis.Table4-5.GlycogenPhosphorylaseLigandAlignments. Query 1gpy3gpb4gpb5gpb RotatableBonds 3333 ConformersSampled 27272727 ConformersStored 16161914 Conformers < 1 A 10121410 Conformers < 2 A 6454 Target 1gpy 0.003.663.193.92 3gpb 1.430.000.380.84 4gpb 1.520.420.000.64 5gpb 3.180.820.520.00 120

PAGE 121

ReferenceQuery 1CBX 2CTC 3CPA 1CBX 2CTC 3CPA 0 1 2 3 4 5 6 7 8 9 (a)ClqScore ReferenceQuery 1CBX 2CTC 3CPA 1CBX 2CTC 3CPA 0 1 2 3 4 5 6 7 8 9 (b)SEScore Figure4-3.CarboxypeptidaseAAlignmentResults.Referen ceortargetstructuresareon thex-axiswiththequerystructuresonthey-axis.ThebestR MSDvalueof eachposealignmentcomparedtotheidealalignmentaregive nwithanRMSD scalein A. alignmentsof1dbj,1dbk,1dbm,and2dblforthe1dbbtargets tructureareshowninFig. 4-6(f) .Noringrexibilitywasallowedinthisstudyandsoonlytheb ioactiveconformation of1dbjand1dbkwereusedasindicatedinTable 4-6 .Thisreceptorproducedthelargest dierencebetweenthetopposespredictedbyClqScoreandSE ScoreasshowninFig. 4-7(a) and 4-7(b) respectively.Thealignmentof1dbmand2dbponto1dbjand1d bk producelargeRMSDvaluesusingtheSEScoreeventhoughthec liquedetectionalgorithm producesanalignmentwhichvisuallyappearsmorereasonab le.Infactthetopposesare alignedperpendiculartotheplaneofthetargetstructures whichcouldbeputdowntothe dierencesinthestereochemistryofthetwosubsets.Theal ignmentsofligandswithinthe samesubsetproduceposesthatareingoodagreementwiththe referencestates. 4.3.5Streptavidin Streptavidinligandsincluding2-((4'-hydroxyphenyl)-a zo)benzoicacid(1sre, Fig. 4-8(a) ),2-((3'-tertbutyl-4'-hydroxyphenyl)azo)benzoicacid (1srf,Fig. 4-8(b) ), 2-((3'-methyl-4'-hydroxyphenyl)azo)benzoicacid(1srg ,Fig. 4-8(c) ),2-((3',5'-dimethoxy-4' -hydroxyphenyl)azo)benzoicacid(1srh,Fig. 4-8(d) ),2-((3',5'-dimethyl-4'-hydroxyphenyl)azo) 121

PAGE 122

O OH HO HO OH O P OH OH O (a)1gpy O OH HO OH HO O P OH OH O (b)3gpb O F HO OH O P OH OH O HO (c)4gpb O OH HO OH O P OH OH O HO (d)5gpb (e)3GPB,4GPBand5GPBalignedonto1GPY Figure4-4.GlycogenPhosphorylaseLigands.benzoicacid(1sri,Fig. 4-8(e) ),and2-((4'-hydroxynaphthyl)-azo)benzoicacid(1srj,F ig. 4-8(f) )wereusedinthevalidationofCuTieP.Allligandscontaint heazo-benzoicacid coreandvaryattheiminophenolgroupasshowninFig. 4-8(g) .Allidealalignmentswere predictedwithgoodagreementusingboththeClqScoreandth eSEScoreexceptforthe query1srfandtarget1sriasshowninTable 4-7 andgures 4-9(a) and 4-9(b) .Thishigh RMSDcanbeassociatedtothefactthattherearetwovalidali gnmentsavailableina 122

PAGE 123

ReferenceQuery 1GPY 3GPB 4GPB 5GPB 1GPY 3GPB 4GPB 5GPB 0 1 2 3 4 5 6 7 8 9 (a)ClqScore ReferenceQuery 1GPY 3GPB 4GPB 5GPB 1GPY 3GPB 4GPB 5GPB 0 1 2 3 4 5 6 7 8 9 (b)SEScore Figure4-5.GlycogenPhosphorylaseAlignmentResults.Ref erenceortargetstructuresare onthex-axiswiththequerystructuresonthey-axis.Thebes tRMSDvalueof eachposealignmentcomparedtotheidealalignmentaregive nwithanRMSD scalein A. Table4-6.ImmunoglobinLigandAlignments Query 1dbb1dbj1dbk1dbm2dbl RotatableBonds 10066 ConformersSampled 12119331293312 ConformersStored 91150005000 Conformers < 1 A 9117121361 Conformers < 2 A 00042883628 Target 1dbb 0.003.651.850.771.30 1dbj 3.240.000.296.414.48 1dbk 3.200.290.006.284.45 1dbm 0.411.591.590.000.86 2dbl 0.431.651.691.520.00 LBDDsense;thet-butylgroupof1srfmaybeplacedontoeithe rmethylmoietyof1sri. 4.3.6DihydrofolateReductase Theligandmoleculesdihydrofolate(1dhf,Fig. 4-10(a) )andmethotrexate(4dfr, Fig. 4-10(b) )whichbindtoDihydrofolateReductasewereusedtovalidat etheCuTieP approachwiththealignmentof4DFRonto1DHFshowninFig 4-10(c) .Bothligand structuresarequitelarge,havemanyrotatablebondsandth usalargenumberof 123

PAGE 124

H H H O O (a)1dbb H H H HO O H (b)1dbj H H H O H O (c)1dbk H H H O O O O O OH (d)1dbm H H H O H O O OH O (e)2dbl (f)1DBJ,1DBK,1DBMand2DBLalignedonto1DBB Figure4-6.ImmunoglobinLigands 124

PAGE 125

ReferenceQuery 1DBB 1DBJ 1DBK 1DBM 2DBL 1DBB 1DBJ 1DBK 1DBM 2DBL 0 1 2 3 4 5 6 7 8 9 (a)ClqScore ReferenceQuery 1DBB 1DBJ 1DBK 1DBM 2DBL 1DBB 1DBJ 1DBK 1DBM 2DBL 0 1 2 3 4 5 6 7 8 9 (b)SEScore Figure4-7.ImmunoglobinAlignmentResults.Referenceort argetstructuresareonthe x-axiswiththequerystructuresonthey-axis.ThebestRMSD valueofeach posealignmentcomparedtotheidealalignmentaregivenwit hanRMSDscale in A. Table4-7.StreptavidinLigandAlignments Query 1sre1srf1srg1srh1sri1srj RotatableBonds 343533 ConformersSampled 1081296108155552108108 ConformersStored 551175521495528 Conformers < 1 A 3649286612628 Conformers < 2 A 193627771260 Target 1sre 0.002.500.270.851.020.44 1srf 0.660.000.480.571.010.50 1srg 0.272.590.000.520.760.39 1srh 0.462.750.360.000.720.86 1sri 0.903.010.750.770.001.08 1srj 0.462.300.401.071.110.00 conformersweresampledasoutlinedinTable 4-8 .Boththecliquedetectionalgorithmand thesemiempiricalscoringfunctionproducedpoorresultsf orthisreceptorcomparedtothe resultsoftheFLEXSprogramwhereanaverageRMSDof1 : 53 Awaspredicted. 4.3.7Trypsin ThefollowingligandsofTyrypsin:4-ruorobenzylamine(1t nh,Fig. 4-11(a) ), 4-phenylbutylamine(1tni,Fig. 4-11(b) ),2-phenylethylamine(1tnj,Fig. 4-11(c) ), 125

PAGE 126

OH O N N OH (a)1sre OH O N N OH (b)1srf OH O N N OH (c)1srg OH O N N OH O O (d)1srh OH O N N OH (e)1sri N N OH O OH (f)1srj (g)1SRF,1SRG,1SRH,1SRIand1SRJalignedonto1SRE Figure4-8.StreptavidinLigands 126

PAGE 127

ReferenceQuery 1SRE 1SRF 1SRG 1SRH 1SRI 1SRJ 1SRE 1SRF 1SRG 1SRH 1SRI 1SRJ 0 1 2 3 4 5 6 7 8 9 (a)ClqScore ReferenceQuery 1SRE 1SRF 1SRG 1SRH 1SRI 1SRJ 1SRE 1SRF 1SRG 1SRH 1SRI 1SRJ 0 1 2 3 4 5 6 7 8 9 (b)SEScore Figure4-9.StreptavidinAlignmentResults.Referenceort argetstructuresareonthe x-axiswiththequerystructuresonthey-axis.ThebestRMSD valueofeach posealignmentcomparedtotheidealalignmentaregivenwit hanRMSDscale in A. Table4-8.DihydrofolateReductaseLigandAlignments.Clq Scoresareshownin parenthesis. Query 1dhf4dfr RotatableBonds 1010 ConformersSampled 629856629856 ConformersStored 50005000 Conformers < 1 A 75 Conformers < 2 A 796550 Target 1dhf 0.003.28(3.29) 4dfr 2.43(2.10)0.00 3-phenylpropylamine(1tnk,Fig. 4-11(d) ),trans-2-phenylcyclopropylamine(1tnl,Fig. 4-11(e) ),andbenzamidine(3ptb,Fig. 4-11(f) ),andm-amidino-nalpha-tosylatedpiperidide (1pph,Fig. 4-11(g) )wereusedinthisstudy.Theprimarydierencebetween1tnh ,1tni, 1tnj,1tnk,and1tnlliesinthedistancebetweentheprimary amineandthephenylgroup, while3ptbisasubstructureofthelargestmoleculeinthiss et1pph.Therearesome notabledierencesbetweenthebestposespredictedbyboth scoringfunctions.When1tnh isusedasthetargetstructuretheSEScorepredictsmorerea sonableposesthanthatof thecliquedetectionalgorithm,alsothealignmentof1tnho nto3ptbispredictedincloser 127

PAGE 128

N HN NH N NH 2 O NH O NH OH O HO O (a)1dhf N N N N NH 2 NH 2 N O NH HO O OH O (b)4dfr (c)4DFRalignedonto1DHF Figure4-10.DihydrofolatreductaseLigands.agreementwiththereferencestate.Thealignmentof1pphon toallothertargetsresultsin poorpredictedalignmentsduetoitssize.4.3.8EstrogenReceptor TwoEstrogenReceptor(ER)complexeswereusedincluding1E RRand3ERT whereraloxifene(1err,Fig. 4-13(a) )and4-hydroxytamoxifen(3ert,Fig. 4-13(b) )bind respectively.Thealignmentofthe3ERTcomplexontothe1ER Rstructureisshownin Fig. 4-13(c) .TheERsetwasusedinthisstudybecauseitisakeytargetfor breastcancer 128

PAGE 129

NH 3 F (a)1tnh NH 3 (b)1tni NH 3 (c)1tnj NH 3 (d)1tnk NH 3 (e)1tnl NH 2 H 2 N (f)3ptb NH 2 H 2 N HN H N O S O O (g)1pph (h)1TNH,1TNI,1TNJ,1TNK,1TNL,3PTBAlignedonto1PPH Figure4-11.TrypsinInhibitors. 129

PAGE 130

Table4-9.TrypsinLigandAlignments Query 1pph1tnh1tni1tnj1tnk1tnl3ptb RotatableBonds 8142311 ConformersSampled 55987212162185463 ConformersStored 10001290173664 Conformers < 1 A 21234142664 Conformers < 2 A 6205631000 Target 1pph 0.000.805.433.939.054.060.23 1tnh 1.160.002.791.541.731.090.33 1tni 6.753.500.002.412.792.463.38 1tnj 6.822.262.840.001.970.620.94 1tnk 7.143.392.902.640.000.301.56 1tnl 7.360.623.001.401.170.004.20 3ptb 3.810.702.982.131.324.700.00 ReferenceQuery 1PPH 1TNH 1TNI 1TNJ 1TNK 1TNL 3PTB 1PPH 1TNH 1TNI 1TNJ 1TNK 1TNL 3PTB 0 1 2 3 4 5 6 7 8 9 (a)ClqScore ReferenceQuery 1PPH 1TNH 1TNI 1TNJ 1TNK 1TNL 3PTB 1PPH 1TNH 1TNI 1TNJ 1TNK 1TNL 3PTB 0 1 2 3 4 5 6 7 8 9 (b)SEScore Figure4-12.TrypsinAlignmentResults.Referenceortarge tstructuresareonthex-axis withthequerystructuresonthey-axis.ThebestRMSDvalueo feachpose alignmentcomparedtotheidealalignmentaregivenwithanR MSDscalein A. 130

PAGE 131

N O O S OH HO (a)1err N O HO (b)3ert (c)3ERTAlignedonto1ERR Figure4-13.EstrogenReceptorLigands.drugdiscovery[ 31 ].SEScoreoutperformsClqScoreforthisreceptorasshowni nTable 4-10 .Thealignmentof3ertonto1errispredictedwithanRMSDval ueof1 : 12 Ausing SEScorebut1erronto3ertcanbeconsideredmisaligned.4.3.9PeroxisomeProliferator-ActivatedReceptor r Therosiglitazone(1fm6,Fig. 4-14(a) )andGI262570(1fm9,Fig. 4-14(b) )ligandsof PeroxisomeProliferator-ActivatedReceptor r (PPAR r )wereusedinthisstudywiththe idealstructuresshowninFig. 4-14(c) .Thisreceptorwasincludingbecauseithasbeen directlylinkedtotype2diabetes,cardiovasculardisease s,andobesity[ 236 { 240 ]andso 131

PAGE 132

Table4-10.EstrogenReceptorLigandAlignments.ClqScore sareshowninparenthesis. Query 1err3ert RotatableBonds 68 ConformersSampled 11664419904 ConformersStored 5671015 Conformers < 1 A 16132 Conformers < 2 A 238881 Target 1err 0.001.12(2.15) 3ert 2.82(3.47)0.00 posesasakeytargetfordrugresearch.TheClqScorepredict sthecorrectposefor1fm6 alignedonto1fm9buttheoppositealignmentgavepoorresul ts.SEScorewasonaverage betterthantheClqScorebutthetwoalignmentcouldbeconsi deredasmisaligned. Table4-11.PPAR r LigandAlignments. Query 1fm61fm9 RotatableBonds 48 ConformersSampled 324419904 ConformersStored 2712000 Conformers < 1 A 113 Conformers < 2 A 10188 Target 1fm6 0.002.69(5.09) 1fm9 2.59(1.39)0.00 4.3.10HumanCarbonicAnhydraseII The40HumanCarbonicAnhydraseII(HCAII)ligandsusedinth isstudyare showninTable 4-12 .HCAIIisaZincmetalloproteinasdescribedinCh. 2.14 andall inhibitorsbindtheZnionthroughthesulfonamidegroup.Th eHCAfamilyofproteins havebeenextensivelystudiedusingX-raycrystallography becausetheyposeasimportant targetsfordrugdiscoverywithdrugsutilizedasantiglauc oma,anticonvulsant,antirolithic, antiepileptic,andanticaneragents[ 241 ].Thissetwasalsochosenasitrepresentsa datasetwhichcouldbeusedwithintheSE-COMBINEapproacht opredictbindingfree energiesanddesignnewdrugs.Theidealstatesfor39ligand sforthe1a42inhibitorare showninFig. 4-15 .Intotal22ligandswerealignedinasatisfactorymanner,1 1were 132

PAGE 133

N N O S NH O O (a)1fm6 O OH HN O O O N (b)1fm9 (c)1FM6Alignedonto1FM9 Figure4-14.PeroxisomeProliferator-ActivatedReceptor r Agonists. alignedwiththecorrectorientation,whileonly6ligandsc anberegardedasmisaligned whereastheClqScoremisaligned12structuresasshowninTa ble 4-13 133

PAGE 134

Table4-12.40HumanCarbonicAnhydraseIIInhibitors. HCAIIInhibitors. PDBIDStructureRefStructureRef 1a42 N S S S O O NH 2 HN O O O [ 25 ]1i8z N S S S O O NH 2 O O O N O [ 242 ] 1bn1 S S O O NH 2 S HN O O [ 25 ]1i90 N S S S O O NH 2 O O NH 2 O [ 242 ] 1bn3 N S S S O O NH 2 O O O [ 25 ]1i91 N S S S O O NH 2 O O OH N O [ 242 ] 1bn4 S S O O NH 2 S HN O O O [ 25 ]1if4 S O O NH 2 F 1bnm N S S S O O NH 2 HN O O O [ 25 ]1if5 S O O NH 2 F F 1bnn N S S S O O NH 2 O O O [ 25 ]1if6 S O O NH 2 F F 1bnq N S S S O O NH 2 HN O O O [ 25 ]1if7 S O O NH 2 O HN N [ 243 ] 1bnt N S S S O O NH 2 O O O OH [ 25 ]1if8 S O O NH 2 O HN N [ 243 ] Continuedonnextpage. 134

PAGE 135

Table4-12.(continued) PDBIDStructureRefStructureRef 1bnu N S S S O O NH 2 O O OH S [ 25 ]1if9 S O O NH 2 O HN NH 1bnv N S S S O O NH 2 O O O HN [ 25 ]1kwq S O O NH 2 N O N O O [ 244 ] 1bnw S S O O NH 2 S HN O O S [ 25 ]1kwr S O O NH 2 N S O [ 244 ] 1cil S S S O O NH 2 HN O O [ 245 ]1okl S O O NH 2 N [ 244 ] 1cim S S S O O NH 2 NH 2 O O [ 245 ]1okm S O O NH 2 O HN H 2 N 1cin S S S O O NH 2 HN O O [ 245 ]1okn S O O NH 2 O HN HN SH 1cnx S O O NH 2 O HN O O NH 2 [ 244 ]1oq5 S O O NH 2 N N F F F [ 246 ] 1eou O S O O NH 2 O O O O O O O [ 247 ]1ttm O S O O NH 2 O O [ 248 ] Continuedonnextpage. 135

PAGE 136

Table4-12.(continued) PDBIDStructureRefStructureRef 1g1d S O O NH 2 O HN F [ 249 ]1xpz O S O O NH 2 N N N N N [ 248 ] 1g52 S O O NH 2 O HN F F [ 249 ]1xq0 O S O O NH 2 N N N N N Br [ 248 ] 1g53 S O O NH 2 O HN F F [ 249 ]1yda N S N S O O NH 2 HN O [ 244 ] 1g54 S O O NH 2 O HN F F F F F [ 249 ]1ze8 S O O NH 2 N [ 250 ] HCAIIInhibitors. 4.3.11Thrombin TwoinhibitorsofThrombinincluding1dwe,(Fig. 4-16(a) )and1dwd(Fig. 4-16(b) ) wereusedwiththe1DWEalignedonto1DWDshowninFig. 4-16(c) .TheClqScore providethesameresultsastheSEScorewithanaverageRMSDo f1 : 12 A;howeverno conformationalsearchingwasperformed.4.3.12Elastase FivetripeptideligandsofElastasewereusedinthisstudyi ncluding1ela(Fig. 4-17(a) ),1elb(Fig. 4-17(b) ),1elc(Fig. 4-17(c) ),1eld(Fig. 4-17(d) ),and1ele(Fig. 4-17(e) )where1ela,1eld,and1eleoccupydierentpocketsofthese rineprotease.The 136

PAGE 137

(a)1BN3,1BNM,1BNN,1BNQ,1BNT,1BNU,1NBV,1I8Z,1I90,1I91,1CIL,1CIM,and1CIN (b)1BN1,1BN4,1BNW,1IF4,1IF5,1IF6,1KWQ,1KWR,1OKL,and1YDA (c)1G1D,1G52,1G53,1G54,1IF7,1IF8,1IF9,1CNX,1OKM,1OKN,and1ZE8 (d)1EOU,1OQ5,1TTM,1XPZ,and1XQ0 Figure4-15.HCAIILigandsAlignedontothe1A42Structure. 137

PAGE 138

Table4-13.HumanCarbonicAnhydraseIIResults. RotatableConformers BondsSampledStored < 1 : 0 A < 2 : 0 AClqScoreSE 1a42787481bn151555235456832084.231.701bn3386413360730.940.501bn46186624500017948185.873.701bnm451847023093930.890.531bnn317264581882702.651.901bnq62916259801790.100.111bnt317284813311501.090.291bnu3432608521.411.671bnv451847843444402.931.091bnw515552377519429781.632.841cil3108251960.950.901cim1127700.580.581cin2367610.250.251cnx11944784500011941881.411.621eou32710731.291.291g1d520736431142738601.572.041g52520736447042637661.983.391g53520736354232331881.461.571g54520736423860632902.721.571i8z5155528031034383.441.101i9059727935441.091.051i91412966417471.370.791if434327700.270.241if51127700.410.341if634327700.520.391if77186624500010732791.551.451if87186624500022132431.612.041if97933125000235033.972.821kwq34328262.471.221kwr1127701.031.071okl2724131.211.251okm723328500051844821.250.751okn9104976500019646961.912.271oq52144736674.451.771ttm236373161.571.231xpz71866245000926574.143.781xq071866245000320193.792.821yda2723717202.031.631ze8412964802081272.771.69 138

PAGE 139

NH HN NH 2 NH O N O H 2 N (a)1dwe S HN NH O O O N O NH 2 HN (b)1dwd (c)1DWEAlignedonto1DWD Figure4-16.ThrombinInhibitors.Table4-14.ThrombinLigandAlignments Query 1dwc1dwd Target 1dwc 0.001.09 1dwd 1.150.00 139

PAGE 140

alignmentof1ELB,1ELC,1ELD,and1ELEonto1ELAisshowninF ig. 4-17(f) .Only thebioactiveconformationofeachligandwasusedinthisse ctionofthevalidation oftheCuTiePapproach.Allpairalignmentswereevaluated; however,noreasonable superpositionof1elband1elconto1ela,1eld,or1elecanbe expectedandtheresultsin Table 4-15 andgures 4-18(a) and 4-18(b) conrmedthis.TheprogramFLEXSwasalso unabletosuccessfullyalignthesepairswheretheauthorse xplainthatthevolumeoverlap isbelow60%[ 234 ].AllotherpairsweresuccessfullyalignedusingboththeC lqScoreand SEScorescores.Table4-15.ElastaseLigandAlignments. Query 1ela1elb1elc1eld1ele Target 1ela 0.003.177.020.480.28 1elb 3.020.001.383.863.29 1elc 5.030.980.005.624.18 1eld 0.503.945.870.000.48 1ele 0.263.215.740.470.00 4.3.13Thermolysin PairsofseveninhibitorsofThermolysinfromthePDBwereal ignedontooneanother inthisstudy.Theseincluded1tlp(Fig. 4-19(a) ),1tmn(Fig. 4-19(b) ),2tmn(Fig. 4-19(c) ), 3tmn(Fig. 4-19(d) ),4tln(Fig. 4-19(e) ),4tmn(Fig. 4-19(f) ),5tmn(Fig. 4-19(g) )which varyinsizeandchargeconsiderably.Fig. 4-19(h) showstheligandsstructureswhen 1TMN,2TMN,3TMN,4TLN,4TMN,5TLN,and5TMNarealignedonto 1TLP.The smallerligands2tmn,3tmn,and4tlnwereallowedtochangec onformationwiththerest keptintheirbioactiveconformationasoutlinedinTable 4-16 .Whenthesmallerligands areusedasthetargetstructurethebestposesgeneratedwit hboththecliquedetection algorithmandthesemiempiricalscorefunctionarepoorbut thiscanbeexpectedtooccur. 140

PAGE 141

O N O HN O F F F NH 2 NH (a)1ela O NH O HN O F F F NH 2 NH (b)1elb O NH O HN O F F F NH 2 NH (c)1elc O NH O HN O F F F NH F F F (d)1eld O NH O HN O F F F NH F F F (e)1ele (f)1ELB,1ELC,1ELDand1ELEalignedonto1ELA Figure4-17.ElastaseLigands. 141

PAGE 142

ReferenceQuery 1ELA 1ELB 1ELC 1ELD 1ELE 1ELA 1ELB 1ELC 1ELD 1ELE 0 1 2 3 4 5 6 7 8 9 (a)ClqScore ReferenceQuery 1ELA 1ELB 1ELC 1ELD 1ELE 1ELA 1ELB 1ELC 1ELD 1ELE 0 1 2 3 4 5 6 7 8 9 (b)SEScore Figure4-18.ElastaseAlignmentResults.Referenceortarg etstructuresareonthex-axis withthequerystructuresonthey-axis.ThebestRMSDvalueo feachpose alignmentcomparedtotheidealalignmentaregivenwithanR MSDscalein A. Table4-16.ThermolysinLigandAlignments. Query 1tlp1tmn2tmn3tmn4tln4tmn5tmn RotatableBonds 12145741616 ConformersSampled 1197218662421611 ConformersStored 116850002611 Conformers < 1 A 111225411 Conformers < 2 A 005525332200 Target 1tlp 0.000.811.352.483.681.981.41 1tmn 0.820.004.853.111.720.791.03 2tmn 0.930.790.003.031.800.530.51 3tmn 1.200.402.670.004.130.760.99 4tln 5.026.262.896.060.009.436.84 4tmn 1.370.501.426.403.170.000.48 5tmn 1.070.861.6010.13.680.440.00 142

PAGE 143

HN O O HN NH O P O O O OH HO OH (a)1TLP HN O O HN NH O O O (b)1tmn H 2 N NH O P OH O O (c)2tmn HN O O HN NH 2 O (d)3tmn HO NH O NH 2 (e)4tln HN O O NH O P O O HN O O (f)4tmn HN O O NH O P O O HN O O (g)5tmn (h)1TMN,2TMN,3TMN,4TLN,4TMN,5TLN,5TMNAlignedonto1TLP Figure4-19.ThermolysinInhibitors. 143

PAGE 144

ReferenceQuery 1TLP 1TMN 2TMN 3TMN 4TLN 4TMN 5TMN 1TLP 1TMN 2TMN 3TMN 4TLN 4TMN 5TMN 0 1 2 3 4 5 6 7 8 9 (a)ClqScore ReferenceQuery 1TLP 1TMN 2TMN 3TMN 4TLN 4TMN 5TMN 1TLP 1TMN 2TMN 3TMN 4TLN 4TMN 5TMN 0 1 2 3 4 5 6 7 8 9 (b)SEScore Figure4-20.ThermolysinAlignmentResults.Referenceort argetstructuresareonthe x-axiswiththequerystructuresonthey-axis.ThebestRMSD valueofeach posealignmentcomparedtotheidealalignmentaregivenwit hanRMSD scalein A. 4.4Conclusions Thisstudypresentstherstlargescalevalidationofasemi rexiblealignment approachusingasemiempiricalscoringfunction.Over80co mplexesand219unique alignmentswereconsideredwheretheobservedligandbindi nggeometrywaspredicted with49%accuracy.Thoughthepercentageofsuccessfulalig nmentusingthismethodis notashighasFLEXS(60%),itprovidesanestimateofhowphys ics-basedtechniquescan performagainsttheirempiricalcounterparts. Physics-basedmethodsprovideamoretheoreticallysatisf yingapproachtomolecular alignment.Theonlyparametersusedintheapproacharethos ewhichappearintheSE Hamiltonian.Notrainingsetwasusedtotasetofparameter sandsoitispredictedthat thismethodwouldnotfailwhereotherempiricalapproaches doduetotransferability. Speedisalsoanimportantpropertyofmolecularalignmenta lgorithms.Empirical methodscanoftenrexiblyalignmolecularstructuresintim esrangingfromseconds tominutes.ThecurrentimplementationoftheCuTiePapproa chtakesintheorderof minutestoafewhoursdependingonthe dConf nMaxCLiques nTotalMCP ,and 144

PAGE 145

nMM parametersused.Thoughconsideringthatthismethodwasde signedforuse withtheSE-COMBINEapproachwhereanSEinteractionenergy evaluationwould bemuchmoreexpensivethantheCuTiePalignmentandsospeed isnotasimportant thanobtainingthecorrectpose.Numeroustechniquesmaybe employedtoincreasethe eciencyoftheCuTiePmethodwhichincludetreepruningdur ingtheconformational searchingorusingthepoweroflargecomputerclustersordi stributedcomputingsincethe methodistriviallyparallelizable. 145

PAGE 146

CHAPTER5 METALCLUSTERMOLECULARMECHANICSPARAMETERIZATION 5.1Introduction Therearecurrently52550structuresintheProteinDataBan k(PDB)[ 15 ]and searchingfor\metal"resultsinover18,000hitswiththebr eakdownshownintable 5-1 Metalionsplayavitalroleinproteinfunction,structure, andstability,withzinc,copper, andironplayingthebiggestroleasdescribedinCh. 2.14 Table5-1.MetalIonsintheProteinDataBank(AccessedonAp ril5 th 2007). Metal Hits Metal Hits Metal Hits Na 2149 V 12 Pd 1 Mg 3467 Cr 7 Ag 9 K 632 Mn 984 Cd 361 Ca 3601 Fe 2022 Ir 6 Co 340 Pt 62 Ni 310 Au 28 Cu 589 Hg 323 Zn 3427 Total=18330 ItisdesirabletomodelmetalloproteinsystemsusingMMmod elsbecauseone cancarryoutsimulationstoaddressimportantstructure/f unctionanddynamics questionsthatarenotcurrentlyattainableusingQMandQM/ MMbasedmethods duetounavailabilityofparametersorsystemsize. Thereareanumberofapproachestoincorporatingmetalions intoFFs.The Bonded Model denesbonds,angles,andtorsion'sbetweenthemetalionan ditsligandwhich areaddedtotheFFplusthevanderWaalscomponentofthenonbondedfunction. Hancock[ 251 ]usedthisapproachtostudysystemsincludingCopperandNi ckel.The BondedpluselectrostaticsModel denesbondsandanglesbetweenthemetalion anditsligandaswellaselectrostaticpotential(ESP)char ges(Fig. 5-1(a) )[ 252 ].This methodattemptstodenethecorrectelectrostaticreprese ntationofthemetalactive siteasassigningaplustwochargetoadivalentmetalionwou ldnotdescribereality thoughformallycorrect.Thepartialatomicchargescanbec alculatedusingtheRESP 146

PAGE 147

approach[ 253 ]ortheCMXmodelsofTruhlarandCramer[ 123 ].Thebondandangle forceconstantsarederivedfromexperimentorcalculatedu sing abinitio orDFTmethods whilethetorsiontermhassofarbeenneglected.The NonBondedModel doesnot deneanyextrabondsandplacesintegerchargeonthemetali on[ 254 ].Electrostatic andLennard-Jonestermsdescribetheinteractions.Modic ationstothismodelto includepolarizationandchargetransfereectshavebeend eveloped(Fig. 5-1(b) )[ 255 ]. The CationicDummyAtomModel isrelatedtothenonbondedmethodwhereit placesdummyatoms(cations)tomimicvalenceelectronsaro undthemetalion[ 256 ]. ElectrostaticandLennard-Jonestermsbetweenthedummyat omsandligatingresidues describethemetal-ioninteractions(Fig. 5-1(c) )[ 257 258 ]. OthermethodsincludethoseofVedani etal .whichisacompromisebetweenthe bondedandnon-bondedmethodsandisimplementedintheYETI program[ 259 ],the SIBFAofGreshandco-workers[ 260 261 ]andtheUniversalForceField(UFF)of GoddardandRappeandco-workers[ 262 { 264 ].Thesemethodsdonotuseapairwise additivepotentialorarenotreadilyavailableintypicalb iomolecularmodelingpackages. R 1 M R 2 R 3 R 4 (a)BondedModel R 1 M R 2 R 3 R 4 (b)Non-BondedModel R 1 M R 2 R 3 R 4 (c)CationicDummyAtomModel Figure5-1.ThreeApproachestoIncorporateMetalAtomsint oMolecularMechanics ForceFields.Thebondedmodeldenesbonds,angles,anddih edralsbetween themetalandligands,whilethenon-bondedmodeldoesnotan duses electrostaticsandvanderWaalstomodeltheinteractions. Thecationic dummyatommodelisaderivativeofthenon-bondedmodelwher ecationsare placednearthemetalcentertomimicvalenceelectronsarou ndthemetal. CarryingoutMMmodelingorMDsimulationsofmetalcontaini ngproteinsisa complicatedprocedureusingthebondedpluselectrostatic smodel.Incorporatingmetals intoproteinforceeldsisaconvolutedprocessduetothepl ethoraofQMHamiltonians, 147

PAGE 148

basissetsandchargemodelstochoosefrom.Alsoithasgener allybeencarriedoutby handwithoutextensivevalidationforspecicmetalloprot eins.Someofthepublishedforce eldsforZinc,Copper,Nickel,Iron,andPlatinumcontaini ngsystemsusingthebonded pluselectrostaticsmodelarelistedinTable 5-2 .TherehavebeennumerousotherFFs containingvariousmetalspublishedincludingruthenium( II)-polypyridyl[ 265 ],cobalt corrinoids[ 266 { 269 ],StaphylococcalNuclease[ 270 ],alcoholdehydrogenase[ 271 { 273 ],and metalloporphyrins[ 274 { 278 ]. AutomatedproceduresfortheparameterizationofMMfuncti onsforinorganic coordinationchemistryhavebeendevelopedoverthelastnu mberofyearsbyNorrby andco-workers[ 279 280 ].Theirattemptshavefocusedongeneratingparametersusi ng experimental,structuraldatafromdatabasessuchastheCa mbridgeCrystallographic StructuralDatabase(CCSD)andquantummechanicalreferen cedatausingaversionof theMM3forceeld[ 71 ]. Table5-2.PublishedMetalloproteinForceFieldsUsingthe BondedPlusElectrostatics Model. Metal Protein References Zinc HumanCarbonicAnhydraseII [ 252 281 282 ] Beta-lactamase [ 283 { 290 ] DinuclearBeta-lactamase [ 291 292 ] FarnesylTransferase [ 293 ] Copper BlueCopperProteins [ 145 { 150 ] Nickel UreaAmidohydrolase [ 151 { 154 ] Iron CytochromeP450 [ 294 295 ] Platinum DNA/Cisplatin [ 296 ] Copper,Zinc SuperoxideDismutase [ 158 ] 5.2Implementation Thegoalofthisresearchwastoprovideaplatformtorapidly build,prototype,and validateMMmodelsofmetalloproteinsusingthebondedplus electrostaticsmodelfor theAMBERsuiteofprograms[ 16 ].Thebondedpluselectrostaticsmodelwaschosen overtheotherapproachesastheresultingparameterslendt hemselvestobereadily addedtoFFssuchasthoseinAMBER[ 58 ]andCHARMM[ 61 ].Alsothefunctionsused 148

PAGE 149

intheseprogramsarepairwiseadditivemeaningthereareno cross-termsandarethus easiertoparameterizeandlesscomputationallyexpensive .Thelatterisakeypointwhen consideringfullysolvatedmetalloproteinsinMDsimulati onscanhavemanyhundreds ofthousandsofatoms.Acomputerprogram,MCPB(MetalCente rParameterBuilder), togenerateFFparametersformetalloproteinswasdevelope dtothisend.MCPBwas notbuildtosupersedetheapproachesdevelopedbyNorrbyde scribedabovebutinstead toincorporatearealisticbondedandelectrostaticmodelo fthemetallocenterintothe AMBERFF.Thenatureoftheseparameterswasinvestigatedin asystematicmanner withtheobjectiveofcreatingageneralizedmetalFFwithin thebondedpluselectrostatics framework.TheMCPBprogramwasbuiltusingtheMTK++Applic ationProgram Interface(API)asdescribedinchapter 3 .AcompleteworkrowofMCPBcanbeseen inFig. 5-2 .TheMCPBprogramcarriesoutthefollowingstepsafterastr uctureis downloadedfromthePDB.Firsttheprogramcheckswhetherth estructurecontainsa transitionmetal.Ifthestructuredoesnotcontainametalt hentheprogramterminates. OtherwiseMCPBattemptstodeterminetheprimaryandsecond aryligandsofthemetal usingrulesdescribedbyHarding[ 297 { 302 ]whichwillbedescribedinmoredetaillaterin thischapter.Onceametalsiteisfound,MCPBcreatesmodels tructuresofthemetal's rstcoordinationspherewithwhich abinitio calculationscanbeperformedontogenerate AMBER-likeFFparameters.Thesemodelsincludeonetogener atecharges, q i ,and anothertodeterminebond, K r ,andangle, K ,forceconstants.TheAMBERfunction includesbond,angle,torsion,improper,vanderWaalsande lectrostatictermsasdescribed inchapter 2.3 ;however,onlybond,angleandelectrostatictermsarepara meterizedunder theassumptionmadebyLoops etal. thatdihedraltermscanbeignored.Lennard-Jones parametersarealsonotparameterizedhereduetothefactth atmostmetalsareburied andthatvanderWaalsinteractionsarenotasimportantasth eelectrostatics[ 280 ]. Lennard-Jonesparametersforthemostcommonmetalioninbi ologyweretakenfromthe literature[ 303 { 310 ].Themethodsofincorporatingthebond,angleandchargepa rameters 149

PAGE 150

PDB OK? ModelsSetup OK? QMCalculations Get q i K r ,and K OK? TestFF OK? End No No No NoMetalFound Figure5-2.MCPBFlowDiagramwhereabiomolecularstructur eisdownloadedfromthe PDBandtestedwhetheritcontainsatransitionmetal.Ifthe structure containsametaliontheMCPBprogramisusedtobuildandtest molecular mechanicsforceeldparameters. areoutlinedbelow.OnceaFFisproduceditistestedusingmi nimizationtechniquesto observeitsstability.Furthertoolssuchascomparingthef requenciesfromboth abinitio andtheresultingFFcouldalsobeused[ 311 ]. 5.2.1EquilibriumBondLengthsandAngles Equilibriumvaluesforbond, r eq ,angles, eq ,canbedeterminedthrough abinitio calculationsortakendirectlyfromthecrystalstructurei nthePDB.Thereareprosand consforusingvaluesfrombothmethods. Abinitio calculationsaregenerallycarriedout inthegasphasebutsolventeectscanbeincorporatedwithP CMbutwithanadded cost.Crystalstructuresmaycontainspuriousvaluesandma ynotberepresentativeofall structureswiththisbondtype.Bothtechniquesofdetermin ingtheequilibriumbondand anglevalueswillbeinvestigatedlaterinthischapter.5.2.2ForceConstants Forceconstants, K r and K ,arecalculatedbyrstcreatingamodel(model1)of themetalsite,addingHydrogenatomsusingthemethodsdesc ribedinCh. 3.5 and thenoptimizingitinthegasphase.Theresiduesboundtothe metalareapproximated, forexample,cysteinebyathiolateorhistidinebyamethylimidazole,toreducethe computationalcostoftheminimization.However,allbonds andanglesmissingfromthe 150

PAGE 151

FFwereaccountedfor.Onceaminimumisfoundthesecondderi vativesaredetermined. TheCartesianHessianmatrixisshowninEq. 5{1 ,whichisthesecondderivativeof energywithrespecttocoordinates.Theeigen-analysisof k providestheforceconstants, i andthenormalmodes,^ i asshowninEq. 5{2 .Theinteratomicforceconstant, K AB betweenatoms A and B isrequiredtodeterminetheforceonatom A bydisplacingatom B asshowninEq. 5{3 whichisrequiredforaMMfunction. [ k ]= k ij = @ 2 E @x i @x j (5{1) F i = [ k ]^ i r = i ^ i r (5{2) F A =[ k AB ] r B (5{3) Fromtheminimizedstructureofmodel1themetal-Ligandbon dandangle forceconstantsareevaluated.Theforceconstantsareconv ertedfromCartesianinto internalcoordinatesusingtheGaussianprogram[ 312 ]providingthefollowingkeyword (iop(7/33=1)).TheMCPBprogramthenreadstheresultingin ternalforceconstant matrixandassignsthevaluestotheappropriatebondsandan glesusingaconversion factorof627 : 5095between Hartree and kcal=mol and2240 : 87between Hartree=Bohr 2 and kcal=mol A 2 forbonds. 5.2.3PointCharges TheatomcenteredpartialchargeswerederivedusingtheMer z-Singh-Kollman(MK) [ 126 ]andtheRestrainedElectroStaticPotential(RESP)[ 127 313 314 ]schemesdescribed inCh. 2.9 usingasecondmodel(model2)ofthemetalcenter.Thismodel includedall atomsofaboundresiduewhichwerecappedwithacetyl(ACE)a ndN-methylamine (NME)residues.Iftwoligatingresidueswerelessthanver esiduesapartthentheywere tetheredwithglycineresiduesandthechaincappedwithACE andNME.Hydrogenatoms wereaddedusingthemethodsdescribedinCh. 3.5 .Thismodelwasnotallowedtorelax tosavecomputationalexpenseandtokeepthecrystallograp hicgeometry.Thevander WaalsradiiforthemetalsusedintheMKschemeweretakenfro mtheliterature.The 151

PAGE 152

MK/RESPschemewasfavoredoverotherchargemodelschemesb ecauseitsabilityto adjustthechargeofthecappedorlinkingresiduestoaninte gervalue,thusallowingthe formalchargeoftheclustertodisperseoverthemetalandth eboundligands. 5.3ZincAMBERForceField NowwiththeabilitytobuildandvalidatemetalFFsestablis hedthetaskof generatingageneralizedFFwasinitiated.Zincwaschosena saconsiderablenumberof proteinscontainthatmetalashighlightedinTable 5-1 ,whilealsobeingcomputationally wellbehaved.Metalloproteinscontainingzincarebothstr ucturalandfunctionalproteins asdescribedinCh. 2.14 andingeneralZnisfourcoordinate,sometimesveorsix coordinatewhenmultipleASP/GLUresiduesorwatermolecul esbind.Itwasthen necessarytodetermineallZnenvironmentswhichexistinpr oteins.Thiswascarried outusingaprogramcalledpdbSearchertoanalyzeallstruct urescurrentlyinthePDB. pdbSearcherwasdevelopedusingtheAPIprovidedbyMTK++as describedinCh. 3 AllX-raycrystalstructureswitharesolutionbelow3 : 0 Awereextractedfromalocal mirrorofthePDBforfurtheranalysis.Foreachmetalsiteth eprimaryandsecondary shellligandsweredeterminedusingHarding'sbondcut-ov aluesasshowninTable 5-3 [ 297 { 302 ].Thesevaluesweredeterminedfromaseriesofpapersdescr ibingmetal coordinationintheCCSD.Adonoratomisconsideredprimary coordinatedtoametal ifitiswithinthetargetdistanceasshowninTable 5-3 plussometolerance(0 : 5 Awas used).Metal-donordistanceslyingbetweenthetargetdist anceplusthetoleranceandthe targetdistanceplusasecondtolerance(1 : 0 Awasused)weredenedsecondaryligands. Forexample,ifaZnatomislessthan2 : 53 AfromaHistidineND1orNE2atomthen itisconsideredaprimaryligand.Ifitwaslessthan3 : 03 Aawaythenthatligandis labeledassecondary,otherwiseitisunbound.Oncethenumb erofprimaryandsecondary shellligandsweredetermined,thegeometryofthemetalcen terswereevaluated.The coordinationstatesallowedincludeoctahedral,Fig. 5-3(a) ,TrigonalBipyramid,Fig. 5-3(b) ,SquarePyramid,Fig. 5-3(c) ,tetrahedral,Fig. 5-3(d) ,squareplanar,Fig. 5-3(e) 152

PAGE 153

andtetrahedralplusanon-bondedcontact,Fig. 5-3(f) .FromFig. 5-3 wecanseethatthe coordinationnumberaloneisnotenoughtoassignametalgeo metry.Thustherootmean squaredeviation(RMSD)ofthegeometryanglesfromthosein aregularpolyhedronwere calculated.Equation 5{4 wasusedtodistinguishbetweensquareplanarandtetrahedr al geometrieswiththeidealanglesusedinTable 5-4 .Likewise,equations 5{5 and 5{6 wereusedforveandsixcoordinatemetalsrespectively.Th eatomindicesinTable 5-4 correspondtothoseatomsinFig. 5-3 .Thisindexingisusefultodierentiatebetween axial/equatorialandcis/transligands.Thecoordination statewiththelowestrmswas assignedtothemetalanditsligands.Table5-3.Metal-DonorBondTargetLengthsin A.Thefollowingdonoratomsofresidues areimplied:HOH@O,ASP@OD1/OD2,GLU@OE1/OE2,HIS@ND1/NE 2, CYS@SG,MET@SG,SER@O,THR@O,TYR@OandtheaminoacidbackbonecarbonyloxygenatomCRL.Ifametal-donordistanc eiswithinthese targetdistancesplussometolerance(0 : 5 A)itisconsideredaprimary interaction. Metal HOH ASP/GLU HIS CYS/MET SER/THR TYR CRL Na 2.41 2.41 2.38 Mg 2.07 2.07 2.10 1.87 2.26 K 2.81 2.82 2.74 Ca 2.39 2.36 2.43 2.20 2.36 Mn 2.19 2.15 2.21 2.35 2.25 1.88 2.19 Fe 2.09 2.04 2.16 2.30 2.13 1.93 2.04 Co 2.09 2.05 2.14 2.25 2.10 1.90 2.08 Ni 2.09 2.05 2.14 2.25 2.10 1.90 2.08 Cu 2.13 1.99 2.02 2.15 2.00 1.90 2.04 Zn 2.09 1.99 2.03 2.31 2.14 1.95 2.07 tet=sqp = 1 6 6 X i =1 ( a i a ideal ) 2 # 1 = 2 (5{4) tbp=ttp = 1 10 10 X i =1 ( a i a ideal ) 2 # 1 = 2 (5{5) oct = 1 15 15 X i =1 ( a i a ideal ) 2 # 1 = 2 (5{6) 153

PAGE 154

3 5 4 6 1 2 (a)Octahedral 5 4 3 1 2 (b)TrigonalBipyramid 3 1 4 2 5 (c)SquarePyramid 1 2 3 4 (d)Tetrahedral 1 3 2 4 (e)SquarePlanar 1 2 3 4 5 (f)TetrahedralplusNon-BondedContact Figure5-3.MetalLigandGeometriesPerceivedUsingHardin g'sRules. 5.3.1ProteinDataBankSurveyofZincContainingProteins TheresultsofsearchingthePDB(accessedonthe5thofApril 2007)forzinc metalloproteinsusingtherulesaboveareshowninFig. 5-4 .Thereare524casesof trigonalbipyramidal(tbp)and706casesofsquarepyramid( trp)and228instancesof octahedral(oct).615metalcentersarefoundastetrahedra lwithanon-bondedcontact (tnb)and1372areill-denedusingthecurrentdenitions( unk).2964outof6435total observationsor46 : 1%ofzincatomsinproteinstructuresarefoundtobetetrahe dral(tet), andthustheresultsanddiscussionwillfocusonthem.Themo stcommonZncoordinating 154

PAGE 155

Table5-4.IdealAnglesUsedtoCalculateRootMeanSquareDe viationsforTetrahedral, SquarePlanar,TrigonalBipyramidal,SquarePyramidandOc tahedral Geometries.Thenotation a 12 describestheanglebetweenatom1,themetal andatom2.TheatomindicescorrespondtoFig. 5-3 b m isthemeanofthe fouranglesbetweentheapicalbondandthebasalbondsinsqu arepyramid geometries. Type Coordination Angle(Deg) Atoms ML 4 Tetrahedral 109.5 SquarePlanar 180.0 a 12 ;a 34 90.0 Allothers ML 5 TrigonalBipyramidal 180.0 a 12 120.0 a 34 ;a 45 ;a 35 90.0 Allothers SquarePyramid b m = 1 4 P 4i =1 a i 5 a 15 ;a 25 ;a 35 ;a 45 (360 : 0 2 b m ) a 12 ;a 34 2sin 1 2 1 = 2 [sin(180 : 0 b m )] a 13 ;a 23 ;a 14 ;a 24 ML 6 Octahedral 180.0 a 12 ;a 34 ;a 56 90.0 Allothers ligandsintetrahedralenvironmentsareshowninFig. 5-5 .Herethe1letteraminoacid codesareusedwithXdescribinganunknownligandsuchasano n-standardaminoacidor drugmolecule. ThemostcommontetrahedralZnenvironmentisCCCCorfourcy steinesbound, followedbyCCCH,CCHH,DHHH,HHHO,HHHX,CCHX,andCCHOassh owninFig. 5-5 .Thisdataledtheresearchinadirectiontoinvestigatethe relationshipbetweenZinc coordinationenvironmentandgeometricparameterssuchas bondlengthsandangles. ThebondsbetweenZincandSulfur,NitrogenandOxygenfrom1 0uniqueprimaryshell environmentsareshowninTable 5-5 ThedistributionofZinc-Sulfurbondsinproteinsthatcont ainenvironmentssuchas CCCC,CCCH,CCHH,andCHHHareshownFig. 5-6 .ABoxplotasshownin 5-7(a) mayalsobeusedtorepresentthisdataasitshowsthemaxandm invalues,thelower andupperquartilesandthemedian.TheBoxplotsofthevaria tionofZn-Nbondsinthe seriesCCCC,CCCH,CCHH,andCHHHisshowninFig. 5-7(b) 155

PAGE 156

oct sqp tbp tet tnb trp unk Coordination TypeNumber 0 500 100015002000250030003500Figure5-4.ZincCoordinationGeometryDistributionfromt hePDB. ThepeaksoftheZn SbondlengthdistributionsintheCCCCandCCCH environmentsliebetween2 : 3and2 : 4 A,whileforCCHHandCHHHsystemsitoccurs between2 : 2and2 : 3 A.Thereareonly14instancesofCHHHandtheZn-SandZn-N bondshavelargestandarddeviationsof0 : 1364and0 : 1403respectively.Alsointhecaseof theZn-Sbondsthemediandiersfromthemeanconsiderably( 2 : 296comparedto2 : 344), thussuggestingunreliabilityofthisdata.TheBoxplotsal sopointtosomeoutliersin thedata,forexamplethereareZn-SbondsinthePDBbelow1 : 5 Awhichuponvisualize inspectionseemcrowdedandpoorlyresolved. ThedistributionofZinc-Oxygen,and-Nitrogenbondsaresh owninFig. 5-8 .Zinc bondedtoGlutamicandAsparticacidoxygens(GLU@OE1/OE2o rASP@OD1/OD2)or histidinenitrogens(HIS@ND1/NE2)allshowsimilarbehavi orwithbondslengthsaround 2 : 1 Abeingmostcommon.Inspiteofthissimilaritythestandard deviationsofASPand 156

PAGE 157

Table5-5.TetrahedralZincPrimaryLigatingResidues.Bon dlengthsarein A.1letter aminoacidcodesareused;C:CYS,H:HIS,O:HOH,D:ASP. N isthenumberof Bond instances. Min and Max aretheminimumandmaximumbondlengths respectively.The1 st Quartile ,3 rd Quartile mean median ,and standard deviation arestatisticalparameterstodescribethebondlengthdist ribution. NBondMin 1 st MedianMean 3 rd MaxStandard QuartileQuartileDeviation CCCC3284Zn-S1.4242.2942.3382.3382.3892.8050.1218CCCH1041Zn-S1.4482.2842.3322.3322.3823.0470.1089CCHH334Zn-S1.9082.2342.2952.3012.3612.7950.1289CHHH14Zn-S2.1802.2702.2962.3442.3902.6080.1364CCCH347Zn-N1.8332.0562.1242.1322.2002.5250.1157CCHH334Zn-N1.7162.0232.0782.0882.1492.4650.1188CHHH42Zn-N1.7781.9642.0342.0562.1132.4860.1403HHHH12Zn-N1.9352.0062.0402.0492.1072.1290.0627HHHO108Zn-O1.3592.0002.2522.1852.3622.5180.2218HHOO42Zn-O1.8662.0922.2682.2332.3842.5430.1816HOOO78Zn-O1.6112.0062.1432.1152.2412.4950.1914OOOO12Zn-O1.7812.0042.1582.1352.3002.4280.1917HHHO324Zn-N1.8722.0442.0982.1162.1762.7570.1140HHOO42Zn-N1.8502.0602.1432.1612.2602.4530.1455HOOO26Zn-N1.8362.0412.0892.1022.1212.4590.1295HHHD155Zn-O1.6881.9142.0002.0072.0862.4570.1425HHDD68Zn-O1.8052.0532.1482.1662.2622.9380.1840HHHD465Zn-N1.6042.0002.0642.0772.1442.4990.1275HHDD68Zn-N1.9592.1012.1842.1922.3022.4600.1280ASP460Zn-O1.6881.9582.0442.0772.1652.9880.1899GLU227Zn-O1.4621.9962.1022.1342.2652.8230.2276HIS825Zn-ND1.7162.0302.0962.1072.1812.5250.1256HIS1768Zn-NE1.6042.0312.0932.1082.1772.7570.1228 GLUbondsaregreaterthanthoseofHIS.Alsoitisworthnothi ngthatthemajorityofZn histidinebondingisthroughtheepsilonNitrogen.5.3.2TetrahedralZnEnvironmentForceFieldParameteriza tion Metal-ligandbondsaresofterthanthoseoforganicmolecul es;however,thereare obvioustrendsinthedatapresentedabove.Thesendingsen couragedtheformationof ageneralizedFFforZncalledtheZincAMBERForceFieldorZA FF.Aconceptkeyto thisworkis\plug-and-play"wherearesearchercandownloa dametalloproteinstructure 157

PAGE 158

NumberPrimary Shell Ligands 0200400600800 CCC CCCC CCCH CCCO CCCX CCDH CCEH CCH CCHH CCHO CCHX CEHO CHHH CHHO CHHX DDEX DDHH DHH DHHH DHHO DHHX DHO DHOO DOO EEH EEHH EHH EHOO HHH HHHO HHHX HHOO HHOX HHX HHXX HOO HOOO HSXX Figure5-5.TheMostCommonTetrahedralZincCoordinatingl igandsCombination Distribution.Threeletteredenvironmentsalsocontainas econdaryligandnot shown. 158

PAGE 159

fromthePDBandrundynamicsusingpredenedparametersasi llustratedinFig. 5-9 .To thisendFFsforthe10uniqueenvironmentsshowninTable 5-5 werebuiltusingMCPB. AsinglestructurefromthePDBrepresentativeofeachenvir onmentwaschosen.FFs werebuiltusingtheB3LYPDFTmethod[ 315 { 317 ]withthe6 31G basisset[ 318 ]. TheresultingFFswerestoredinclusterxmllesusingthede nitionsofstdLibraryand stdGroupfromCh. 3.2.3 forlateruse. The1A5TstructurefromthePDBwasusedasarepresentatives tructureofthe Zn-CCCCcluster.Twomodelsofthisclusterwerebuiltusing MCPBasshowninFig. 5-10 .ThecalculatedbondandangleforceconstantsareshowninT ables 5-6 and 5-7 .The averageZn Sbondlengthis2 : 426 Awhichishigherthanthemeanvaluefromthesurvey ofthePDBbutitiswithinonestandarddeviation.Thecorres pondingmeanbondforce constantis100 : 677 kcal= ( mol A 2 ).ThemeanS Zn SandC S Znanglesare109 : 138 and101 : 754 withanaverageforceconstantsof15 : 016and81 : 505 kcal= ( mol rad 2 ) respectively.Table5-6.Zn-CCCCClusterBondLengths, r ,in AandForceConstants, K r ,in kcal= ( mol A 2 )(PDBID:1A5T). Bond rK r ZN-S1 2.42511 100.709 ZN-S2 2.42442 101.586 ZN-S3 2.42459 101.013 ZN-S4 2.43049 99.4008 Mean 2.4260 100.677 Thestructures1A73and2GIVfromthePDBwereusedascharact eristicstructures oftheZn-CCCHcluster.ThedeltaNitrogenofHistidineisbo ndedtotheZincatomin 1A73whiletheepsilonNitrogenisboundin2GIV.Twomodelso feachclusterwerebuilt usingMCPBasshowninFig. 5-11 .Thebondlengthsandforceconstantsaretabulated inTable 5-8 ,whiletheanglesandangleforceconstantsareshowninTabl es 5-9 and 5-10 TheaverageZn Sbondlengthsare2 : 355 Aand2 : 352 Awhichareingoodagreement withthevaluesdeterminedfromthePDBsurvey.TheZn Sbondlengthsareshorterin 159

PAGE 160

Table5-7.Zn-CCCCClusterAngles, ,inDegreesandForceConstants, K ,in kcal= ( mol rad 2 )(PDBID:1A5T). S-Zn-S2S-Zn-S3S-Zn-S4C-S-Zn S1107.384111.062108.75101.721S2109.664110.75102.136S3109.22101.952S4101.210Mean109.138101.754 K S113.953216.020813.039374.2777S215.289816.725684.6981S315.068885.4549S481.5913Mean15.01681.505 Zn-CCCHthantheyareintheZn-CCCCclusterandthiscorresp ondstothechangein forceconstantfrom100 : 677to144 : 003and142 : 575 kcal= ( mol A 2 ).ThemeanS Zn S, S Zn N,andC S Znanglesfor1A73and2GIVclustersare115 : 161 /116 : 381 102 : 938 /101 : 270 and102 : 534 /101 : 793 withanaverageforceconstantsof13 : 695 = 10 : 377, 21 : 909 = 15 : 213,and78 : 568 = 65 : 170 kcal= ( mol rad 2 )respectively. Table5-8.Zn-CCCHClusterBondLengths, r ,in AandForceConstants, K r ,in kcal= ( mol A 2 )(PDBID:1A73and2GIV). Bond rK r 1A73 Zn-S1 2.38103 129.940 Zn-S2 2.33594 153.514 Zn-S3 2.34818 148.555 Zn-NB 2.17615 111.727 Zn-SMean 2.355 144.003 2GIV Zn-S1 2.35365 140.810 Zn-S2 2.37024 135.986 Zn-S3 2.33396 150.931 Zn-NB 2.14457 109.821 Zn-SMean 2.352 142.575 The1A1FstructurefromthePDBwasusedasarepresentatives tructureofthe Zn-CCHHcluster.Twomodelsofthisclusterwerebuiltusing MCPBasshowninFig. 160

PAGE 161

Table5-9.Zn-CCCHClusterAngles, ,inDegreesandForceConstants, K ,in kcal= ( mol rad 2 )(PDBID:1A73). AngleAngleAngleAngle S-Zn-S2S-Zn-S3S-Zn-NBC-S-Zn S1112.941117.54898.457101.452S2114.996110.215103.370S3100.144102.782Mean115.161102.938102.534 CR-NB-ZnCC-NB-Zn NB118.832133.957 K S114.801912.325431.403266.5568S213.958815.426482.7660S318.897486.3823Mean13.69521.90978.568 CR-NB-ZnCC-NB-Zn NB61.564562.4920 Table5-10.Zn-CCCHClusterAngles, ,inDegreesandForceConstants, K ,in kcal= ( mol rad 2 )(PDBID:2GIV). AngleAngleAngleAngle S-Zn-S2S-Zn-S3S-Zn-NBC-S-Zn S1122.16114.05199.8816100.693S2112.93294.1401102.432S3109.790102.254Mean116.381101.270101.793 CR-NB-ZnCV-NB-Zn NB124.745128.494 K S19.2021811.034115.338359.2390S210.897220.938572.5796S39.3631963.6916Mean10.37715.21365.170 CR-NB-ZnCV-NB-Zn NB33.410431.3493 161

PAGE 162

5-12 withthecalculatedbondandangleforceconstantsshowninT ables 5-11 and 5-12 TheaverageZn SandZn Nbondlengthare2 : 305 Aand2 : 088 Awithcorresponding forceconstantsof181 : 478and147 : 126 kcal= ( mol A 2 )respectively.Theaveragevalue oftheZn SbondlengthfromthePDBis2 : 301 AwhilethemeanZn Nvalueis 2 : 088 Awhichareinexcellentagreementwiththecalculatedvalue s.BoththeZn S andZn Nbondsareshorterthanthepreviousclustersandthisisrer ectedinstronger forceconstantsbeendetermined.ThemeanS Zn N,C S Zn,andC N Zn,angles forthe1A1Fclusterare103 : 232 ,105 : 054 ,and126 : 586 withaverageforceconstantsof 12 : 488,69 : 269,and34 : 502 kcal= ( mol rad 2 )respectively. Table5-11.Zn-CCHHClusterBondLengths, r ,in AandForceConstants, K r ,in kcal= ( mol A 2 )(PDBID:1A1F). Bond rK r Zn-S1 2.28799 191.997 Zn-S2 2.32226 170.959 Zn-NZ 2.09197 143.964 Zn-NY 2.08529 150.288 Zn-SMean 2.305 181.478 Zn-NMean 2.088 147.126 ThenalZncenterconsideredinthisstudywhichcontainsac ysteineresiduewas Zn-CHHH.The1CK7structurefromthePDBwasusedtomodelthe Zn-CHHHcluster. TwomodelsofthisclusterwerebuiltusingMCPBasshowninFi g. 5-13 withthe calculatedbondandangleforceconstantsshowninTables 5-13 and 5-14 .TheZn Sbond lengthwasdeterminedas2 : 262 Awithaforceconstantof186 : 196 kcal= ( mol A 2 ).The meanZn Nbondlengthis2 : 046 Awithaforceconstantof180 : 437 kcal= ( mol A 2 ).The meanN Zn NandS Zn Nanglesare105 : 835 and112 : 950 withforceconstantsof 2 : 795and12 : 488 kcal= ( mol rad 2 )respectively. ThereareaverysmallnumberofZincatomsboundtofourhisti dineresiduesinthe PDB.Buttocompletethiscomputationalstudythebondandan gleforceconstantswere determinedusing1PB0asastartinggeometry.Themodelscre atedbyMCPBareshown inFig. 5-14 withtheresultingbondlengthsandanglesandaccompanying forceconstants 162

PAGE 163

Table5-12.Zn-CCHHClusterAngles, ,inDegreesandForceConstants, K ,in kcal= ( mol rad 2 )(PDBID:1A1F). AngleAngleAngleAngle S/N-Zn-S2S/N-Zn-NYS/N-Zn-NZC-S-Zn S1135.025101.200108.303104.986S2102.721100.705105.122NY106.293 CR-N-ZnCV-NB-Zn NY123.668129.422NZ123.392129.863 K S19.7141610.176110.124163.3345S214.510515.143775.2045NY8.41020 CR-N-ZnCV-NB-Zn NY36.135132.4650NZ36.213833.1944 Mean K S-Zn-N103.23212.488C-S-Zn105.05469.269C-N-Zn126.58634.502 Table5-13.Zn-CHHHClusterBondLengths, r ,in AandForceConstants, K r ,in kcal= ( mol A 2 )(PDBID:1CK7). Bond rK r Zn-S1 2.26178 186.196 Zn-NX 2.05563 176.880 Zn-NY 2.04663 182.100 Zn-NZ 2.03803 182.333 Zn-NMean 2.046 180.437 areshowninTables 5-15 and 5-16 .TheaverageZn Nbonddistanceis2 : 010 Awitha forceconstantof217 : 616 kcal= ( mol A 2 ).TheanglesoftheZncenterare109 : 481 witha forceconstantof6 : 088 kcal= ( mol rad 2 ). Itisevidenttherearecleartrendsinthecalculatedbondle ngthsandforceconstants describedabove.ThebondlengthsofZn-Sthroughtheseries CCCC,CCCH,CCHH,and CHHHcorrelatewiththecalculatedforceconstantswithan R 2 valueof0 : 97asseenin Fig. 5-15(a) .TheZn Nbondlengthsandforceconstantscorrelatewithan R 2 of0 : 95as 163

PAGE 164

Table5-14.Zn-CHHHClusterAngles, ,inDegreesandForceConstants, K ,in kcal= ( mol rad 2 )(PDBID:1CK7). AngleAngleAngleAngle S/N-Zn-NXS/N-Zn-NYS/N-Zn-NZC-S-Zn S1108.625110.015120.210104.518NX109.309104.101NY104.096 CR-N-ZnCV-NB-Zn NX118.357135.099NY133.352120.137NZ127.083125.963 K S14.055834.032721.8906210.5385NX2.386292.68101NY3.32002 CR-N-ZnCV-NB-Zn NX23.1399NDNY34.162937.8760NZ18.07729.73914 Mean K N-Zn-N105.8352.795S-Zn-N112.95012.488 Table5-15.Zn-HHHHClusterBondLengths, r ,in AandForceConstants, K r ,in kcal= ( mol A 2 )(PDBID:1PB0). Bond rK r Zn-NW 2.00656 221.593 Zn-NX 2.01037 217.622 Zn-NY 2.01428 213.845 Zn-NZ 2.01130 217.407 Zn-NMean 2.010 217.616 showninFig. 5-15(b) .ItisworthnotingherethatZndonorbondlengthsdierwith in environments.ThushavingasingleZn SorZn Nbondequilibriumandforceconstant valuewouldnotwork.Theproposedsolutiontothisproblemi stostoreallZnbondtypes andassigntheparametersinanautomaticmannerwithinthem etalcenterperception algorithmofMTK++. TheaverageanglesizeandforceconstantsofS Zn Saresmallerandstronger, 109 : 138 /(15 : 016) kcal= ( mol rad 2 )intheCCCCclustercomparedtothoseofthe 164

PAGE 165

Table5-16.Zn-HHHHClusterAngles, ,inDegreesandForceConstants, K ,in kcal= ( mol rad 2 )(PDBID:1PB0). Angle N-Zn-NXN-Zn-NYN-Zn-NZCR-N-Zn NW111.145106.809110.679127.688NX111.299106.671127.492NY110.288127.172NZ 127.161 CV-N-Zn NW126.018NX126.229NY126.584NZ126.540 K NW5.497056.931605.8007433.8058NX5.648817.1438232.1774NY5.5066331.7145NZ 32.1549 CV-N-Zn NW34.1690NX32.2668NY32.0421NZ32.3428 Mean K N-Zn-N109.4816.088CR-N-Zn127.37832.463CV-N-Zn126.34232.705 CCHHclusterwherevaluesof135 : 025 and9 : 714 kcal= ( mol rad 2 )weredetermined. TheN Zn NanglesoftheCCHH,CHHH,andHHHHclustersliebetween105 : 835 and109 : 481 withforceconstantsbetween2 : 795and8 : 4102 kcal= ( mol rad 2 ).The experimentalforceconstantoftheN Zn Nwasreportedtobeapproximately5 : 0 kcal= ( mol rad 2 )whichisingoodagreementwiththosecalculatedhere.Itha sbeen reportedthatthisangleforceconstantistooweaktopreven ttheangleopeningbeyond theidealtetrahedralangleinMDsimulationsandinthepast arbitraryscalingfactors havebeenappliedtopreventthisfromoccurring[ 252 293 ].Ageneralscalingfactorhas notbeendevelopedhereasthisstudywasdesignedtoinvesti gatetherawforceconstants 165

PAGE 166

producedbyQMpackagesbutitmaybefurtherdevelopedinthe future.Thusthe FFsshowninthischaptercanbedescribedasthezerothorder withfurthercorrections necessarytocarryoutmeaningfulsimulations. ThepartialchargesoftheZincclustersCCCC,CCCH,CCHH,an dCHHHwere determinedapplyingtwodierentmethodsusingthelargerm odelsdescribedabove.The rstmethodallowsallatomsoftheboundresiduetochange(C hgModA)whilethesecond techniquerestrainsthebackboneatoms(CA,H,HA,N,HN,C,O )tothosevaluesfound intheAMBERparm94forceeld(ChgModB).Thechargeswerede terminedbyrst calculatingtheMKchargesfromGaussian(1 : 1 AradiusforZincwasused)andthenusing theRESPprogramtozerooutthechargesonthecappinggroups .Thisprocedurewas carriedouttodispersethechargeovertheentireclusterth usremovingtheneedtohave alarge+2formalchargeonZinc.TheChgModAchargesareshow ninTable 5-17 and ChgModBchargesarepresentedinTable 5-18 .TheSGatominCYM(unprotonated cysteine)residueinparm94hasachargeof 0 : 736andructuatesbetween 0 : 485 and 0 : 640inChgModAandbetween 0 : 473and 0 : 669inChgModB.Oneofthe biggestdierencesbetweenChgModAandChgModBarethechar gesonCB.CYM@CB hasa 0 : 736chargewhileinChgModAitschargeliesclosertozero.Ch gModBon theotherhandplaces 0 : 4onCB.ItisunclearwhetherChgModAissuperiorto ChgModBorviceversa.Thoughitcouldbeadvantageoustokee pthechargesofthe backboneatomsxedtotheparm94valuesasthesehavebeenus edinthettingofthe torsionalparametersoftheFF.Itisalsodiculttodetermi neifthiswouldmatteras themovementoftheresiduesboundtoZincwouldberestricte d.TheChgModAand ChgModBchargesfortheZinc-CCCH,CCHH,CHHH,andHHHHclus tersareoutlined inTables 5-19 and 5-20 Thevariationofbonddistances,angles,andpartialcharge sofZincclusterscontaining histidineresiduesandwatermoleculeswasdetermined.The 1CA2structurewasused torepresenttheZn-HHHOclusterwhichisastructureofhuma nCarbonicAnhydraseII 166

PAGE 167

Table5-17.CysteineChargesusingChgModAfortheZn-CCCC, -CCCH,-CCHH,and -CHHHClusters.PartialChargesareinelectronunits. ResidueNCACO CYM-0.4630000.0350000.616000-0.504000CCCC-0.4799630.0031800.518101-0.548044CCCH(1A73)CY1/CY3-0.474605-0.1089770.595840-0.51926 4 CCCH(1A73)CY2-0.4148980.0312950.278787-0.450243CCCH(2GIV)-0.5423070.0174450.632188-0.568329CCHHCY1-0.4546870.0051670.245131-0.289983CCHHCY2-0.268043-0.0358850.462357-0.532708CHHH-0.447464-0.0259590.622632-0.497571 CBSGHNHA CYM-0.736000-0.7360000.2520000.048000CCCC0.112192-0.6201030.2817500.057052CCCH(1A73)CY1/CY30.019831-0.6400710.2877030.088639CCCH(1A73)CY20.049561-0.4849480.2668280.052328CCCH(2GIV)-0.048990-0.5916520.3310100.049510CCHHCY1-0.097120-0.5812020.2971920.126035CCHHCY20.074672-0.5374490.2951180.088752CHHH-0.056717-0.5126550.2777710.099225 HB2/3ZNCharge CYM0.244000CCCC0.0020660.686817-2.0CCCH(1A73)CY1/CY30.0658700.593065-1.0CCCH(1A73)CY20.0567780.593065-1.0CCCH(2GIV)0.0844190.359392-1.0CCHHCY10.0847590.2631070.0CCHHCY20.0312560.2631070.0CHHH0.0447670.5527471.0 Table5-18.CysteineChargesusingChgModBfortheZn-CCCC, -CCCH,-CCHH,and -CHHHClusters.PartialChargesareinelectronunits. ResidueCBSGHB2/3ZN CYM-0.736000-0.7360000.244000CCCC-0.462072-0.5300080.1696060.675474CCCH(1A73)CY1/CY3-0.372264-0.5331980.1715010.501937CCCH(1A73)CY2-0.324838-0.4728720.1755310.501937CCCH(2GIV)-0.435825-0.5551540.2128690.45405CCHHCY10.003587-0.6685610.0730150.362845CCHHCY2-0.221037-0.5316460.1340760.362845CHHH0.159424-0.552344-0.0175130.505550 167

PAGE 168

Table5-19.HistidineChargesusingChgModAfortheZn-CCCC ,-CCCH,-CCHH,and -CHHHClusters.PartialChargesareinelectronunits. ResidueNCACO HIE-0.415700-0.0581000.597300-0.567900HID-0.4157000.0188000.597300-0.567900CCCH(HIE)0.013815-0.0945170.609742-0.511703CCCH(HID)-0.4654890.0123110.744779-0.614433CCHH(HID)-0.411312-0.0351480.603874-0.536756CHHH(HID)-0.385664-0.0899310.683060-0.554579HHHH(HID)-0.321934-0.1297890.661692-0.523604ResidueCBCGND1CD2 HIE-0.0074000.1868000-0.54320-0.220700HID-0.046200-0.0266000-0.381100.129200CCCH(HIE)-0.0093330.1388840-0.200272-0.231676CCCH(HID)-0.035338-0.1183230-0.143343-0.003696CCHH(HID)-0.087860-0.0472060-0.121450-0.070041CHHH(HID)-0.1719470.0159560-0.0965280.003599HHHH(HID)-0.146428-0.0046330-0.130169-0.067660ResidueCE1NE2HHA HIE0.163500-0.2795000.2719000.136000HID0.205700-0.5727000.2719000.088100CCCH(HIE)-0.002223-0.152285-0.0439860.015705CCCH(HID)0.051160-0.0857180.2691020.080535CCHH(HID)-0.080618-0.0413540.2694830.108816CHHH(HID)-0.064531-0.2286080.2853620.104929HHHH(HID)-0.059695-0.1450910.2420730.135166ResidueHB2HB3HE1HE2 HIE0.0367000.0367000.1435000.333900HID0.0402000.0402000.139200CCCH(HIE)0.0283280.0283280.1446310.307178CCCH(HID)0.0295130.0295130.121003CCHH(HID)0.0723460.0723460.175582CHHH(HID)0.1076860.1076860.186129HHHH(HID)0.1209010.1209010.185187ResidueHD1HD2ZNCharge HIE0.1862000.0HID0.3649000.1147000.0CCCH(HIE)0.1623840.593065-1.0CCCH(HID)0.3007110.1251830.359392-1.0CCHH(HID)0.3217710.1612850.2631070.0CHHH(HID)0.2976090.0992570.5527471.0HHHH(HID)0.3223780.1376330.4122892.0 168

PAGE 169

(HCAII)withtheMCPBmodelsshowninFig. 5-16 .AsmentionedinCh. 2.14 HCAII isacatalyticcenterfortheconversionof CO 2 intobicarbonate.Thereforetoaccountfor boththewaterandhydroxylstatestwoFFswereevaluated.It isofnosurprisethatthe bondlengthsandassociatedforceconstantsofthetwosyste msaredierent.TheZn O bondislongerinthecaseofwaterbindingwhiletheZn Nareshorterduetothestrength ofthehydroxylbondasoutlinedinTable 5-21 .Theaccompanyingforceconstantsarealso considerablydierent.TheZn Obondforceconstantschangesfrom120 : 287to394 : 674 kcal= ( mol A 2 )uponremovalofaprotonwhiletheZn Nforceconstantbecomesweaker from248 : 420to194 : 357 kcal= ( mol A 2 ).Thesecalculatedequilibriumbondlengthsand forceconstantsareconsiderablydierentfromthosepubli shedbyHoops etal. ;however, theQMmethodsusedtogeneratethenumbersalsodier.TheZn Obondlengthsofthe HHHOclustersinthePDBhavealargestandarddeviationof0 : 222 Awithameanvalue of2 : 185 Aconrmingthatbothstatesexist.Thecalculatedanglesan dforceconstantsfor thisclusterareshowninTable 5-22 whichareingoodagreementwiththosepublished previously,exceptfortheH O Znangleforceconstantthatwasarbitrarilysettoa highervalue. The1VLIstructurefromthePDBwasusedtoinvestigatethest rengthofbondand angleforceconstantsoftheZn-HHOOcluster.AgaintheMCPB programwasusedto buildthemodels(Fig. 5-17 )requiredforparameterizationwiththeresultingequilib rium bondlengthsandanglesandcorrespondingforceconstantss howninTables 5-23 and 5-24 TheequilibriumbondlengthofZn Owascalculatedas1 : 946 Awhichis 0 : 4 Ashorter thanthebondlengthfortheZn-HHHOcluster.Thiscontradic tsthetrendfromthePDB survey.Plausiblereasonsforthisdiscrepancyincludethe smallnumberofdatapoints fortheZn-HHOOclusterinthePDBandthelargestandarddevi ationvalueof0 : 146 A. Theangleforceconstantscalculatedforthisclusterareof similarmagnitudetothose calculatedfortheZn-HHHOcluster. 169

PAGE 170

Table5-20.HistidineChargesusingChgModBfortheZn-CCCC ,-CCCH,-CCHH,and -CHHHClusters.PartialChargesareinelectronunits. ResidueCBCGND1CD2 HIE-0.0074000.186800-0.54320-0.220700HID-0.046200-0.026600-0.381100.129200CCCH(HIE)-0.3902260.172204-0.218625-0.240248CCCH(HID)0.052173-0.062260-0.152337-0.093846CCHH(HID)0.021453-0.074575-0.103342-0.083769CHHH(HID)0.097861-0.131328-0.0728470.010476HHHH(HID)0.299667-0.158251-0.10536-0.114661ResidueCE1NE2HB2/3HD1 HIE0.163500-0.2795000.036700HID0.205700-0.5727000.0402000.364900CCCH(HIE)0.025646-0.13860.176704CCCH(HID)0.044669-0.1073990.0019930.302699CCHH(HID)-0.090768-0.0476110.0342790.317081CHHH(HID)-0.066727-0.2008510.0485880.295541HHHH(HID)-0.067193-0.0789070.0010860.310647ResidueHE1HE2HD2ZN HIE0.1435000.3339000.186200HID0.1392000.114700CCCH(HIE)0.1254520.3007420.1643760.501937CCCH(HID)0.1254310.1840580.45405CCHH(HID)0.1717760.1650120.362845CHHH(HID)0.1849410.1060560.50555HHHH(HID)0.1813140.158760.317246 Table5-21.Zn-HHHOClusterBondLengths, r in AandForceConstants, K r ,in kcal= ( mol A 2 )(PDBID:1CA2). Bond rK r H 2 O Zn-NX 1.9783 250.691 Zn-NY 1.9817 246.526 Zn-NZ 1.9836 248.043 Zn-OW 2.1122 120.287 Zn-NMean 1.981 248.420 HO Zn-NX 2.0293 190.815 Zn-NY 2.0400 194.203 Zn-NZ 2.0412 198.055 Zn-OW 1.8596 394.674 Zn-NMean 2.036 194.357 170

PAGE 171

Table5-22.Zn-HHHOClusterAngles, ,inDegreesandForceConstants, K ; in kcal= ( mol rad 2 )(PDBID:1CA2). AngleAngleAngleAngle N-Zn-NYN-Zn-NZN-Zn-OWCR-N-Zn H 2 O NX109.645123.191100.449128.302NY113.479109.534127.604NZ98.1932125.145 CV/CC-N-ZnH-OW-Zn NX125.323NY126.100NZ127.730HW123.942H 2 OK NX8.008728.228095.6198434.2350NY7.620913.7479834.5691NZ5.7593940.5598 CV/CC-N-ZnH-OW-Zn NX34.0616NY35.2802NZ44.0110HW20.5484HO NX106.050107.502124.853126.720NY116.017102.656114.165NZ100.380113.620 CV/CC-N-ZnH-OW-Zn NX126.592NY139.179NZ138.859HW116.648HO K NX9.255459.227157.3915629.5398NY6.9927810.463238.9903NZ12.071842.5650 CV/CC-N-ZnH-OW-Zn NX30.0396NY33.8686NZ37.2270HW38.9579 171

PAGE 172

Bond LengthsFrequency 1.61.82.02.22.42.62.83.0 0 500 100015002000 (a)CCCC Bond LengthsFrequency 1.61.82.02.22.42.62.83.0 0 100200300400500600 (b)CCCH Bond LengthsFrequency 1.82.02.22.42.62.83.0 050 100150 (c)CCHH Bond LengthsFrequency 2.02.22.42.62.8 0246810 (d)CHHH Figure5-6.Zn-SBondLengthDistributionsinCCCC,CCCH,CC HH,andCHHH TetrahedralEnvironments. Table5-23.Zn-HHOOClusterBondLengths, r ,in AandForceConstants, K r ,in kcal= ( mol A 2 )(PDBID:1VLI). Bond rK r Zn-NX 1.9443 199.330 Zn-NY 1.9488 286.855 Zn-OX 2.0577 157.459 Zn-OY 2.0493 77.0940 Zn-NMean 1.946 243.092 Zn-OMean 2.053 117.276 172

PAGE 173

CCCCCCCHCCHHCHHH 1.52.02.53.0 Zn-S Bond Distance ( A ) (a)Zn-SBondLengthsthroughtheseriesCCCC,CCCH,CCHH,an dCHHH. CCCHCCHHCHHHHHHH 1.52.02.53.0 Zn-N Bond Distance ( A ) (b)Zn-NBondLengthsthroughtheseriesHHHH,HCCC,HHCC,an dHHHC. Figure5-7.BoxPlotsofZn-S/NBondLengthsinCCCC,CCCH,CC HH,CHHH,and HHHHenvironments. 173

PAGE 174

Table5-24.Zn-HHOOClusterAngles, ,inDegreesandForceConstants, K ,in kcal= ( mol rad 2 )(PDBID:1VLI). Angle N-Zn-NYN-Zn-OXO/N-Zn-OY NX121.973108.959112.588NY107.009105.529OX98.0344 CV-N-ZnCR-N-ZnHW-O-Zn NX122.128131.376NY126.588126.944OX124.788OY122.326 K NX5.813665.71586NDNY4.363094.47123OX3.69369 CV-N-ZnCR-N-ZnHW-O-Zn NX10.669716.9386NY33.529633.5677OX26.4380OYND 174

PAGE 175

Bond LengthsFrequency 1.52.02.53.0 050 100150 (a)Asp@OD1/OD2 Bond LengthsFrequency 1.52.02.53.0 010203040506070 (b)Glu@OE1/OE2 Bond LengthsFrequency 1.61.82.02.22.42.6 050 100150200250300350 (c)His@ND1 Bond LengthsFrequency 1.61.82.02.22.42.62.8 0 200400600800 (d)His@NE2 Figure5-8.TetrahedralZn-O(Asp/Glu)andZn-N(His)BondL engthDistributions. 175

PAGE 176

PDB Contains Metal? Cluster Stored? Carryoutsteps inFig. 5-2 End Yes Yes No No Figure5-9.ZAFFFlowDiagram.Thisillustrationdemonstra teswhenametalloprotein structureisdownloadedfromthePDBandanequivalentmetal siteisstored theMTK++packagehastheabilitytoassignparameterstocar ryoutMD simulations. (a)1A5TModel1 (b)1A5TModel2 Figure5-10.Zn-CCCCClusterModels(PDBID:1A5T). 176

PAGE 177

(a)1A73Model1 (b)1A73Model2 (c)2GIVModel1 (d)2GIVModel2 Figure5-11.Zn-CCCHClusterModels(PDBID:1A73and2GIV). 177

PAGE 178

(a)1A1FModel1 (b)1A1FModel2 Figure5-12.Zn-CCHHClusterModels(PDBID:1A1F). (a)1CK7Model1 (b)1CK7Model2 Figure5-13.Zn-CHHHClusterModels(PDBID:1CK7). 178

PAGE 179

(a)1PB0Model1 (b)1PB0Model2 Figure5-14.Zn-HHHHClusterModels(PDBID:1PB0). 179

PAGE 180

l l l l l l l l l l l l l 2.302.352.40 100120140160180 Zn-S Calculated Bond Distance ( A )Zn-S Force Constant (kcal/mol A 2 )(a)TheCorrelationbetweenZn-Cys@SBondLengthsandCalcu latedForceConstantsthroughtheSeries CCCC,CCCH,CCHH,andCHHH. l l l l l l l l l l l 2.002.052.102.15 120140160180200220 Zn-N Calculated Bond Distance ( A )Zn-N Force Constant (kcal/mol A 2 )(b)TheCorrelationbetweenZn-His@NBondLengthsandCalcu latedForceConstantsthroughtheSeries CCCH,CCHH,CHHH,andHHHH.Figure5-15.TheCorrelationbetween(a)Zn-Cys@Sand(b)Zn -His@NBondLengthsand CalculatedForceConstantsthroughtheSeriesCCCC,CCCH,C CHH, CHHH,andHHHH. 180

PAGE 181

(a)1CA2Model1 (b)1CA2Model2 Figure5-16.Zn-HHHOClusterModels(PDBID:1CA2). (a)1VLIModel1 (b)1VLIModel2 Figure5-17.Zn-HHOOClusterModels(PDBID:1VLI). 181

PAGE 182

Thenaltetrahedralenvironmentcontaininghistidineres iduesandwatermolecules wastheHOOOcluster.The1L3FPDBstructurewasusedwithMCP Bmodelsshownin Fig. 5-18 .TheaverageZn OandZn Nbondlengthsare2 : 01 Aand1 : 926 Awhichare shorterdistancesthanthoseintheZn-HHHOandZn-HHOOclus ters,agreeingwiththe experimentalmeansfromthePDB. (a)1L3FModel1 (b)1L3FModel2 Figure5-18.Zn-HOOOClusterModels(PDBID:1L3F).Table5-25.Zn-HOOOClusterBondLengths, r ,in AandForceConstants, K r ,in kcal= ( mol A 2 )(PDBID:1L3F). Bond rK r Zn-NX 1.9256 325.449 Zn-OW 2.0170 179.519 Zn-OX 2.0022 189.759 Zn-OY 2.0136 181.502 Zn-OMean 2.0100 183.593 Thehistidineresidues,watermoleculesandzincionpartia lchargesfortheZn-HHHO, -HHOO,and-HOOOclustersareoutlinedinTable 5-27 Twoclusterscontaininghistidineandaspartateresiduesw ereconsideredinthis study.The2USNand1U0Awerechosenascharacteristicstruc turesoftheZn-HHHD andZn-HHDD(Fig. 5-19 )environments.ThePDBsurveyshowedthatthebondlengths 182

PAGE 183

Table5-26.Zn-HOOOClusterAngles, ,inDegreesandForceConstants, K ,in kcal= ( mol rad 2 )(PDBID:1L3F). Angle N/O-Zn-OWN-Zn-OXO/N-Zn-OY NX114.389118.388118.918OW103.21697.8629OX100.927 CR-N-ZnCC-N-ZnHW-O-Zn NX125.909126.608OW127.562OX127.232OY124.457 K NX3.834778.161454.17334OW4.282353.60452OX4.24390 CR-N-ZnCC-N-ZnHW-O-Zn NX44.559151.1512OW23.8887OX24.9508OY25.1823 ofZn ObondsinH/Dsystemschangedfrom2 : 007 Ato2 : 166 AgoingfromHHHDto HHDDandthistrendisalsoseeninthecalculatedvaluesofth eseclustersasshownin Table 5-28 5.4Conclusions Thisresearchdescribesthedesign,development,andimple mentationoftwoprograms calledpdbSearcherandMCPB.Theformercarriesoutmetallo proteindataminingofthe ProteinDataBank.ResultsfocusedonZincmetalloproteins asalargenumberofproteins containthiselement.ThemajorityofZnmetalloproteinsar etetrahedrallycoordinated tohistidine,cysteine,aspartate,glutamateresidues,or watermolecules.Thedistribution ofbondlengthsbetweenZnandthedonoratomsoftheseresidu eswasinvestigated,with someshortZn Sbondshighlightedwhichmaybeduetoerrorsduringcrystal structure renement. 183

PAGE 184

Table5-27.HistidineandWater'sPartialChargesusingChg ModBfortheZn-HHHO, -HHOO,and-HOOOClusters.PartialChargesareinelectronu nits. ResidueCBCGND1CD2 HIE-0.0074000.186800-0.54320-0.220700HID-0.046200-0.026600-0.381100.129200HHHO1(HID1)0.272769-0.046469-0.054651-0.122505HHHO1(HID2)0.579686-0.290892-0.115057-0.073113HHHO1(HIE)0.563536-0.126874-0.196737-0.099394HHHO2(HID1)0.071630.005255-0.085911-0.103044HHHO2(HID2)0.379846-0.232467-0.086636-0.024617HHHO2(HIE)0.123745-0.1162270.001013-0.073261HHOO(HID)0.564561-0.334037-0.0486530.144851HOOO(HIE)0.817801-0.151877-0.360409-0.256153ResidueCE1NE2HB2/3HD1 HIE0.163500-0.2795000.036700HID0.205700-0.5727000.0402000.364900HHHO1(HID1)-0.153009-0.160702-0.0153890.341095HHHO1(HID2)-0.057251-0.113546-0.0788390.323076HHHO1(HIE)-0.093972-0.092858-0.130733HHHO2(HID1)-0.081199-0.2572240.0384140.315667HHHO2(HID2)-0.005112-0.210826-0.0368230.297259HHHO2(HIE)-0.071842-0.2040220.019225HHOO(HID)0.023257-0.445271-0.0638090.315238HOOO(HIE)-0.2103610.125524-0.140428ResidueHE1HE2HD2ZN HIE0.1435000.3339000.186200HID0.1392000.114700HHHO1(HID1)0.1878760.1221480.702584HHHO1(HID2)0.2100860.142360HHHO1(HIE)0.1696540.3455670.167258HHHO2(HID1)0.1867140.1497790.674911HHHO2(HID2)0.1671270.102959HHHO2(HIE)0.1764630.3484480.153575HHOO(HID)0.1767950.0968611.02705HOOO(HID)0.1913180.3147990.2648290.968391ResidueOH WAT-0.8340.417-OHHHHO(WAT)-0.7653130.468035HHHO(HO-)-1.0039600.411829HHOO(WAT)-0.7425460.400146HOOO(WAT)-0.5957940.435269 184

PAGE 185

Table5-28.Zn-HHHDandZn-HHDDClusterBondLengths, r ,in AandForceConstants, K r in kcal= ( mol A 2 )(PDBID:2USNand1U0A). Bond rK r 2USN Zn-NX2.0729176.828Zn-NY2.0269208.052Zn-NZ2.0276206.717Zn-O21.9865282.503Zn-NMean2.0420197.199 1U0A Zn-NY2.1247133.766Zn-NZ2.1404128.104Zn-OA2.1660171.778Zn-OB2.0517209.806Zn-NMean2.1320130.935Zn-OMean2.1080190.792 Table5-29.Zn-HHHDClusterAngles, ,inDegreesandForceConstants, K ,in kcal= ( mol rad 2 )(PDBID:2USN). AngleAngleAngleAngle N-Zn-NYN-Zn-NZN-Zn-OCR-N-Zn NX105.439105.36695.8076115.956NY120.039113.247123.274NZ113.189123.299 CV/CC-N-ZnC-O-Zn NX136.979NY130.128NZ130.170O2100.499 K NX17.846217.480511.702047.2274NY7.2071313.315450.0689NZ13.317149.0374 CV/CC-N-ZnC-O-Zn NX41.7749NY45.0768NZ43.9566O2197.677 185

PAGE 186

(a)2USNModel1 (b)2USNModel2 (c)1U0AModel1 (d)1U0AModel2 Figure5-19.Zn-HHHDandZn-HHDDClusterModels(PDBID:2US Nand1U0A). TheMCPBprogramwasusedtobuild,prototype,andvalidateA MBER-likeforce eldsusingthebondedpluselectrostaticsmodelformetall oproteinsthatcanbeadded totheAMBERsuiteofprograms.MCPBwasusedtoinvestigatet heenvironmental eectsonbondlengths,angles,plusbondandangleforcecon stantsusing10unique metalcoordinations.TheseincludedZnboundtoCCCC,CCCH, CCHH,CHHH,HHHH, HHHO,HHOO,HOOO,HHHD,andHHDDclusters.AZincAMBERForce Field 186

PAGE 187

Table5-30.Zn-HHDDClusterAngles, inDegreesandForceConstants, K ,in kcal= ( mol rad 2 )(PDBID:1U0A). AngleAngleAngleAngle N-Zn-NZN-Zn-OAN-Zn-OBCR-N-Zn NY96.9897101.475100.576121.551NZ87.837799.8952117.541OA155.527 CC-N-ZnO2-C-OC-O-Zn NY131.075NZ132.416OA120.90988.3673OB121.43994.2323 K NY20.56799.5409710.821554.9830NZ16.501914.804147.8802OA8.80214 CC-N-ZnO2-C-OC-O-Zn NY44.1986NZ61.5788OA148.620187.032OB183.487382.989 (ZAFF)librarywascreatedtostoretheseFFparametersinac onvenientwayastoallow laterusewithdierentmetalloproteinsthanthoseusedint heparameterization. Thisworkcanhavemanyusesinthefuture.Mainlytheequilib riumbondlengthsand anglescanbeusedtoaidtherenementofZnmetalloproteinX -raycrystalstructures. AlsotheMCPBprogramallowsforrapiddevelopment,limited bythecostofthe abinitio orDFTcalculations,ofFFparametersformetalloproteinsw hichcouldhavemanyuses indrugdesignprojectsforexamplewherethetargetstructu recontainsametalion.This programalsoprovidesaplatformwherenon-expertuserscan developmetalloproteinFF parameterswhichuntilnowwasnotavailable. 187

PAGE 188

Table5-31.HistidineandAspartateResidueChargesusingC hgModBfortheZn-HHHD and-HHDDClusters.PartialChargesareinelectronunits. ResidueCBCGND1CD2 HIE-0.0074000.186800-0.54320-0.220700HID-0.046200-0.026600-0.381100.129200HHHD(HID1)0.042591-0.024504-0.072385-0.137672HHHD(HID2)0.160366-0.006538-0.161147-0.192906HHHD(HIE)0.374184-0.005512-0.148399-0.245743HHDD(HIE)0.0282840.076696-0.306581-0.169575ResidueCE1NE2HB2/3HD1 HIE0.163500-0.2795000.036700HID0.205700-0.5727000.0402000.364900HHHD(HID1)-0.147686-0.1225170.0479480.313389HHHD(HID2)-0.066439-0.0635150.0070720.343690HHHD(HIE)-0.066439-0.038235-0.074862HHDD(HIE)0.030139-0.1370310.039508ResidueHE1HE2HD2ZN HIE0.1435000.3339000.186200HID0.1392000.114700HHHD(HID1)0.1931880.1939770.431980HHHD(HID2)0.1790920.195708HHHD(HIE)0.1790920.3271230.195982HHDD(HIE)0.1241300.3093020.1657790.919685 ResidueCBCGOD1/2HB2/3 ASP-0.0303000.799400-0.801400-0.012200HHHD0.4413350.373495-0.493992-0.112244HHDD0.2028300.429585-0.6072290.050221 188

PAGE 189

CHAPTER6 CONCLUSIONS Thischapterprovidesasynopsisoftheresearchpresentedi nthisdissertationwhere computationalchemistrytoolsweresuccessfullydevelope dandappliedinareasofdrug designandmetalloproteinmodeling. Chaptertwooutlinedthedrugdiscoveryprocessfromthepoi ntofviewofa computationalchemist.Themostcommonmethodsusedtopred ictthebindingfree energybetweenreceptorandtheirligandsweresummarizedi ncludingligand-basedand receptor-basedtechniques.Statisticalandgraphtheorym ethodswereillustratedandthe useofthesetoolsinchemistrywasdiscussed.Ageneralintr oductionofmetalloproteins wasalsogiven. Thethirdchapterdescribedthedesignanddevelopmentofac omputationalchemistry packagecalledMTK++.Thiswasdesignedasageneralpurpose molecularmodelingsuite foruseindrugdesignandmetalloproteinchemistrystudies .Thisworkwasessentialfor theresearchintheentirethesistobecarriedout.AlsoMTK+ +providesaplatformfor furtherdevelopmentofnovelalgorithmsinmodelingofsmal lmolecules,proteins,and mostimportantlymetalloproteins. Chapterfourdetailsthedevelopmentandlargescalevalida tionofasemirexible alignmentapproachusingasemiempiricalscoringfunction calledCuTiePfor insilico drug design.Resultswerecomparabletothoseofempiricalalign mentapproacheswhereligand bindinggeometrieswithinproteinactivesiteswerepredic tedwithanaccuracyofaround 50%.CuTiePisaphysics-basedmethodwhichhaspotentialto beimprovedonintermsof speedandaccuracywhileavoidingthepitfallsofparameter transferability. Thenalresearchchapterdescribedthedesign,developmen t,andimplementation oftwoprogramscalledpdbSearcherandMCPBforthestudyofm etalloproteins.These programswereusedtodataminetheProteinDataBankanddeve lopmolecularmechanics forceeldsforZincmetalloproteins.Atotalof10uniquete trahedralZnforceeldswere 189

PAGE 190

builtandthepropertiesoftheseparameterswereelucidate d.Theseparametersmay beusedtoaddressimportantstructure,functionanddynami csquestionsthatarenot currentlyattainableusingQMandQM/MMbasedmethods. 190

PAGE 191

APPENDIXA ALGORITHMS A.1SubgraphIsomorphismAlgorithm AlgorithmA.1 :SubgraphIsomorphismAlgorithm Data :molecule,fragment Result :mappingiffragmentfoundinmolecule begin intPa;intPb;arrayA[Pa,Pa];arrayB[Pb,Pb];arrayM[Pa,Pb];boolisomorphism=false;Ullmann(1,M); end AlgorithmA.2 :UllmannFunction Data :currentatomindex,matchmatrix Result :mappingiffragmentfoundinmolecule begin arrayM1=M;boolmismatch;for alluniquemappingsofatomd do choosenewuniquemappingforqueryatomd;updateMaccordingly;rene(M,mismatch);if !mismatch then if d==Pa then isomorphism=true;storeM; else Ullmann(d+1,M); end else M=M1; end end end 191

PAGE 192

AlgorithmA.3 :ReneFunction Data :matchmatrix,booleanmismatch Result :mappingiffragmentfoundinmolecule begin mismatch=false;boolchange=false;while notchangeormismatch do for i Pa do for j Pb do if M[i][j] then assignmij;change=changeornotmij; end end endassignmismatch; end end 192

PAGE 193

A.2MaximumCommonPharmacophore AlgorithmA.4 :FindMaximumCommonPharmacophore(MCP) Data :TwoMolecules Result :MaximumCommonPharmacophoreBetweentheTwoMolecules begin GenerateFeatureCorrespondenceMatrixbetweenthetwomol ecules; GetThresholdFeatureScore, TFS ; bestClqScore =0; bestClqSize =0; for i CorrespondenceMatrix do getPair 1; curClq i ; curClqScore 0; while getPair do getPair 0; maxScore 0; pair 0; for j CorrespondenceMatrix do testScore 0; for k curClq do jkScore = exp ( (( d j d k ) =d m ) 2 ); if jkScore>TFS then testScore += jkScore else break; end endif testScore>maxScore then Pair j ; maxScore testScore ; end endif Pair then Add Pair tocurClq; curClqScore += maxScore ; getPair 1; end endstore 0; if curClqScore>bestClqScore +0 : 1 then store =1; endif curClqScore>bestClqScore 0 : 1 then if curClqSize>bestSize then store =1; endif curClqSize == bestSize then if curClqDist>bestDist +0 : 1 then store =1; end end endif store then bestClqScore = curClqScore ; bestClqSize = curClqSize ; bestClqDist = curClqDist ; bestClq curClq ; end endreturn bestClq end 193

PAGE 194

APPENDIXB AMBERGRADIENTS TheuncompressedAMBERenergyfunctionhasthefollowingfo rm: E total = X bonds K r ( r ij r eq ) 2 + X angles K ( ijk eq ) 2 + X dihedrals X n V n [1+cos( n ijkl r )]+ X i
PAGE 195

where^ x isaunitvectorparalleltoaxisofthereferencecoordinate system. Wewanttocalculatethe @E=@x i where x i isthexcoordinateofatom i .Itisbestto calculate @E=@ usingthechainrulewhere istheinternalcoordinate. @E @x i = @E @ @ @x i (B{8) d d cos = d cos d 1 (B{9) = 1 sin (B{10) B.2AMBERFirstDerivatives B.2.1Bond E bond = K r ( r ij r eq ) 2 (B{11) where K r isthebondforceconstant, r ij isthebondlength, r eq isthestandardbond length. r i E = @E @r r i r (B{12) =2 K r ( r ij r eq ) r ij j r ij j (B{13) @r @x i = 1 2 (( x i x j ) 2 +( y i y j ) 2 +( z i z j )) 1 = 2 2( x i x j )(B{14) = 1 r ( x i x j ) (B{15) r i r = r ij j r ij j (B{16) r j r = r i r (B{17) 195

PAGE 196

B.2.2Angle E angle = K ( eq ) 2 (B{18) where K istheangleforceconstant, istheangle(0 ), eq isthestandardangle. =arccos r ij r kj j r ij jj r kj j (B{19) cos = r ij r kj j r ij jj r kj j (B{20) r i E = @E @ @ @ cos r i cos (B{21) =2 K ( eq ) 1 sin r i cos (B{22)(B{23) Howtodetermine r i cos ? Consideringthatcos isafunctionof r ij and r kj bothofwhicharefunctionsof r i r j and r k .Thereforeyouneedtousethechainrule: @ cos @x i =^ x r i cos (B{24) = @ cos @x ij @x ij @x i + @ cos @y ij @y ij @x i + @ cos @z ij @z ij @x i + @ cos @x kj @x kj @x i + @ cos @y kj @y kj @x i + @ cos @z kj @z kj @x i (B{25) = @ cos @x ij (B{26) Thereisatotalof9suchexpressionsimilartoeq. B{26 whichleadtothefollowing 196

PAGE 197

r i cos =^ x @ cos @x ij +^ y @ cos @y ij +^ z @ cos @z ij (B{27) r j cos =^ x @ cos @x ij + @ cos @x kj +^ y @ cos @y ij + @ cos @y kj + ^ z @ cos @z ij + @ cos @z kj (B{28) r k cos =^ x @ cos @x kj +^ y @ cos @y kj +^ z @ cos @z kj (B{29) Finally,howdoyoudetermine d cos =dx ij ? cos = r ij r kj j r ij jj r kj j (B{30) @ cos @x ij = j r ij jj r kj j d ( r ij r kj ) dx ij ( r ij r kj ) d ( j r ij jj r kj j ) dx ij j r ij j 2 j r kj j 2 = j r ij jj r kj j r kj ( r ij r kj ) r ij j r ij j j r kj j j r ij j 2 j r kj j 2 = r kj j r ij jj r kj j ( r ij r kj ) r ij j r ij j j r ij j 2 j r kj j = 1 j r ij j r kj j r kj j ( r ij r kj ) j r ij jj r kj j r ij j r ij j = 1 j r ij j [^ r kj cos ^ r ij ](B{31) @ cos @x kj = 1 j r kj j [^ r ij cos ^ r kj ](B{32) B.2.3Dihedral E dihedral = K [1+cos( n r )](B{33) where K isthetorsionalconstant, isthedihedralangle( 0 ), r isthestandard dihedralangle. 197

PAGE 198

cos = t u j t jj u j (B{34) where: r ij = r i r j r kj = r k r j r lk = r l r k t = r ij r kj (B{35) u = r lk r kj (B{36) r i E = @E @ @ @ cos r i cos (B{37) = nK sin( n eq ) 1 sin r i cos (B{38) Howtodetermine r i cos ? Consideringthatcos ijkl isafunctionof t and u ,bothofwhicharefunctionsof r i j r k j ,and r l k ,whichareinturnfunctionsof r i r j r k r l .Thereforeyouneedtousethe chainrule: @ cos @x i =^ x r i cos (B{39) = @ cos @t x @t x @x i + @ cos @t y @t y @x i + @ cos @t z @t z @x i + @ cos @u x @u x @x i + @ cos @u y @u y @x i + @ cos @u z @u z @x i (B{40) = @ cos @t y ( z kj )+ @ cos @t z ( y kj )(B{41) where: 198

PAGE 199

t =( r i r j ) ( r k r j )(B{42) =(( y ij z kj y kj z ij ) ; ( z ij x kj z kj x ij ) ; ( x ij y kj x kj y ij ))(B{43) t x =( y ij z kj y kj z ij )(B{44) @t x @x i = @ ( y ij z kj y kj z ij ) @x i =0(B{45) @t y @x i = z kj (B{46) Thereisatotalof72suchexpressionsimilartoeq. B{46 Determine @ cos =@t x (resultfromanglederivation)? @ cos @t x = 1 j t j u x j u j cos t x j t j (B{47) However,thereisaproblemwiththisresultduetothe1 = sin inEq. B{38 .Thereare singularitieswhen =0 ; .Therefore, @ cos =@t x needstoberewritten. 199

PAGE 200

r t cos = r t t u j t jj u j (B{48) = 1 j t jj u j r t ( t u )+ t u j u j r t 1 j t j (B{49) = u j t jj u j t u j t j 2 j u j ^ t (B{50) = 1 j t j ^ u ^ t ^ u ^ t (B{51) = 1 j t j ^ t ^ t ^ u ^ t ^ u ^ t (B{52) = 1 j t j ^ t ^ u ^ t (B{53) = 1 j t j ^ t ( sin ^ r kj ) (B{54) = 1 j t j ^ t (^ r kj ) (B{55) r u cos = 1 j u j [^ u (sin ^ r kj )](B{56) = 1 j u j [^ u (^ r kj )](B{57) 200

PAGE 201

Finally, r i =^ x @ @t y ( z kj )+ @ @t z ( y kj ) +^ y @ @t x ( z kj )+ @ @t z ( x kj ) + ^ z @ @t x ( y kj )+ @ @t y ( x kj ) (B{58) r j =^ x @ @t y ( z kj z ij )+ @ @t z ( y ij y kj )+ @ @u y ( z lk )+ @ @u z ( y lk ) + ^ y @ @t x ( z ij z kl )+ @ @t z ( y kj x ij )+ @ @u x ( z lk )+ @ @u z ( x lk ) + ^ z @ @t x ( y kj y ij )+ @ @t y ( x ij x kj )+ @ @u x ( y lk )+ @ @u y ( x lk ) (B{59) r k =^ x @ @t y ( z ij )+ @ @t z ( y ij )+ @ @u y ( z lk + z kj )+ @ @u z ( y kj y lk ) + ^ y @ @t x ( z ij )+ @ @t z ( x ij )+ @ @u x ( z kj z lk )+ @ @u z ( x lk + x kj ) + ^ z @ @t x ( y ij )+ @ @t y ( x ij )+ @ @u x ( y lk + y kj )+ @ @u y ( x kj x lk ) (B{60) r l =^ x @ @u y ( z kj )+ @ @u z ( y kj ) +^ y @ @u x ( z kj )+ @ @u z ( x kj ) + ^ z @ @u x ( y kj )+ @ @u y ( x kj ) (B{61) B.2.4Electrostatic E ele = q i q j "r ij (B{62) @E @r ij = q i q j "r 2 ij (B{63) 201

PAGE 202

B.2.5vanderWaals E vdw = ij r ij r ij 12 2 r ij r ij 6 # (B{64) ij r 12 ij 1 r 12 ij 2 r 6 ij 1 r 6 ij (B{65) @E @r ij = ij 12 r 12 ij r 13 ij +12 r 6 ij r 7 ij (B{66) ij 12 r ij r 12 ij r 12 ij + 12 r ij r 6 ij r 6 ij (B{67) 12 ij r ij r 12 ij r 12 ij r 6 ij r 6 ij (B{68) 12 ij D 2 D =r ij (B{69) D = r 6 ij r 6 ij (B{70) 202

PAGE 203

APPENDIXC FRAGMENTLIBRARY C.1TerminalFragments TableC-1.TerminalFragments. TerminalFragmentshaveoneconnectionpoint StructureName3L8LChg 1 R MethylCH3TF000CH30 2 R EthylETHTF000ETH0 3 R PropylPYLTF000PYL0 4 R IsopropylIPLTF000IPL0 5 R n-butylNBLTF000NBL0 6 R sec-butylSBLTF000SBL0 7 R IsobutylIBLTF000IBL0 8 R tert-butylTBLTF000TBL0 9 R H H H VinylAK1TF000AK10 10 H R AlkynlTLKTF000TLK0 1 R NH 2 PrimaryaminePAMTF000PAM0 2 R NH 3 + NH3NH3TF000NH3+1 3 R HN NHMeNHMTF000NHM0 4 R HN NHEtNHETF000NHE0 5 R HN NHPrNHPTF000NHP0 6 R HN NHi-PrNHITF000NHI0 7 RN NMe2NM2TF000NM20 8 N R H H PrimaryaldiminePADTF000PAD0 9 R C N NitrileNITTF000NIT0 Continuedonnextpage. 203

PAGE 204

TableC-1.(continued) StructureName3L8LChg 10 RN C IsonitrileINITF000INT0 11 RN N N AzideAZDTF000AZD0 12 RN N DiazoniumDZMTF000DZM+1 13 H N H N H H R AmidineAMDTF000AMD+1 1 R OH PrimaryalcoholPOLTF000POL0 2 O H R AlcoholARATF000ARA0 3 R O MethyletherOMETF000OME0 4 R O EthyletherOETTF000OET0 5 R O PropyletherOPRTF000OPR0 6 R O IsopropyletherOIPTF000OIP0 7 O R 1 H AldehydeALDTF000ALD0 8 R OH O CarboxylicacidCAATF000CAA0 9 R O O CarboxylateCARTF000CAR-1 10 R OH OH O AlphahydroxyacidAHATF000AHA0 11 R 1 O OH HydroperoxideHPOTF000HPO0 12 R F O AcylruorideACFTF000ACF0 13 R Cl O AcylChlorideACCTF000ACC0 14 R Br O AcylBromideACBTF000ACB0 15 R I O AcylIodideACITF000ACI0 1 SH R ThiolRSHTF000RSH0 Continuedonnextpage. 204

PAGE 205

TableC-1.(continued) StructureName3L8LChg 2 R S MethylthioetherSMETF000SET0 3 R S EthylthioetherSETTF000SET0 4 R S PropylthioetherSPRTF000SPR0 5 R S IsopropylthioetherSIPTF000SIP0 6 S R 1 H ThioaldehydeTADTF000TAD0 7 R S F SulfenicacidruorideSAFTF000SAF0 8 R S Cl SulfenicacidchlorideSACTF000SAC0 9 R S Br SulfenicacidbromideSABTF000SAB0 10 R S I SulfenicacidiodideSAITF000SAI0 1 RF FluorideFL1TF000FL10 2 RCl ChlorideCL1TF000CL10 3 RBr BromideBR1TF000BR10 4 RI IodideIO1TF000IO10 5 RCF 3 Triruromethyl3FMTF0003FM0 6 RCCl 3 Trichloromethyl3CMTF0003CM0 1 R NH 2 O PrimaryamidePMDTF000PMD0 2 R N O N N CarboxylicacidazideCAZTF000CAZ0 3 R NH O OH HydroxamicacidHOATF000HOA0 4 R C O N AcylcyanideACYTF000ACY0 5 OC N R CyanateCYNTF000CYN0 6 SC N R ThiocyanateTCYTF000TCY0 7 RN C O IsocyanateICYTF000ICY0 8 RN C S IsothiocyanateITCTF000ITC0 9 RN O NitrosoNRSTF000NRS0 Continuedonnextpage. 205

PAGE 206

TableC-1.(continued) StructureName3L8LChg 10 RN O O NitroNROTF000NRO0 11 ON O R NitriteNRTTF000NRT0 12 ON O O R NitrateNRATF000NRA0 13 S O O OH R SulfonicacidSNATF000SNA0 15 R 1 S OH O SulnicacidSIATF000SIA0 16 R S OH SulfenicacidSEATF000SEA0 17 B OH R OH BoronicacidBOATF000BOA0 18 R O S Thio-carboxylateTCATF000TCA-1 19 R S S Dithio-carboxylateDTCTF000DTC-1 20 R OH S Thio-carboxylicacidTCOTF000TCO0 21 R SH S Dithio-carboxylicacidTCSTF000TCS0 22 S O O F R SulfonicacidruorideSOFTF000SOF0 23 S O O Cl R SulfonicacidchlorideSOCTF000SOC0 24 S O O Br R SulfonicacidbromideSOBTF000SOB0 25 S O O I R SulfonicacidiodideSOITF000SOI0 26 R S F O SulnicacidruorideSIFTF000SIF0 27 R S Cl O SulnicacidchlorideSICTF000SIC0 Continuedonnextpage. 206

PAGE 207

TableC-1.(continued) StructureName3L8LChg 28 R S Br O SulnicacidbromideSIBTF000SIB0 29 R S I O SulnicacidiodideSIITF000SII0 30 P O R OH OH PhosphonicacidPPATF000PPA0 31 P O OH OH O R PhosphoricacidPP2TF000PP20 32 P O O O O R PhosphatePP3TF000PP30 TerminalFragments. 207

PAGE 208

C.2TwoPointLinkerFragments TableC-2.TwoPointLinkerFragments. Twopointlinkerfragmentshavetwoconnectionpoints StructureName3L8LChg 1 R 1 CH 2 R 2 EthylLYH2PL00LYH0 2 R 2 H R 1 H Trans-alkeneAK22PL00AK20 3 H R 2 R 1 H Cis-AlkeneCIS2PL00CIS0 4 R 1 R 2 H H Geminal-AlkeneAK32PL00AK30 5 R 2 R 1 AlkyneAKY2PL00AKY0 1 R 1 HN R 2 Sec-amineSAM2PL00SAM0 2 N R 1 H R 2 Sec-aldimineSAD2PL00SAD0 3 N R 1 R 2 H Prim-ketiminePKT2PL00PKT0 4 R 1 N N R 2 AzoAZO2PL00AZO0 5 R 1 N C N R 2 Carbo-diimideDII2PL00DII0 6 R 1 F N R 2 ImidoylruorideIMF2PL00IMF0 7 R 1 Cl N R 2 ImidoylchlorideIMC2PL00IMC0 8 R 1 Br N R 2 ImidoylbromideIMB2PL00IMB0 9 R 1 I N R 2 ImidoyliodideIII2PL00III0 1 O R 1 R 2 KetoneKET2PL00KET0 2 R 1 O R 2 EthoxyETO2PL00ETO0 Continuedonnextpage. 208

PAGE 209

TableC-2.(continued) StructureName3L8LChg 3 C R 1 R 2 O KeteneKTE2PL00KTE0 4 R 1 OH R 2 Sec-alcoholSAL2PL00SAL0 5 R 2 R 1 OH HO EnediolEND2PL00END0 6 R 1 O R 2 EtherETR2PL00ETR0 7 R 1 O O R 2 PeroxidePXD2PL00PXD0 8 R 1 O O R 2 CarboxylicacidesterCAE2PL00CAE0 9 R 1 O O R 2 O AnhydrideANH2PL00ANH0 1 R 1 S R 2 ThioetherSTR2PL00STR0 2 R 1 S S R 2 DisuldeDIS2PL00DIS0 3 S R 1 R 2 ThioketoneTKT2PL00TKT0 1 R 1 NH O R 2 Sec-amideAMI2PL00AMI0 2 N R 1 R 2 OH OximeOXM2PL00OXM0 3 R 1 NH OH O R 2 AlphaaminoacidAAA2PL00AAA0 4 N R 1 R 2 OH O CarbamicacidCBA2PL00CBA0 5 N R 1 R 2 F O CarbamicacidrorideCBF2PL00CBF0 6 N R 1 R 2 Cl O CarbamicacidchlorideCBC2PL00CBC0 Continuedonnextpage. 209

PAGE 210

TableC-2.(continued) StructureName3L8LChg 7 N R 1 R 2 Br O CarbamicacidbromideCBB2PL00CBB0 8 N R 1 R 2 I O CarbamicacidiodideCBI2PL00CBI0 1 S O O R 2 R 1 SulfoneSO22PL00SO20 2 R 1 S R 2 O SulfoxideSLF2PL00SLF0 3 R 1 S OR 2 O SulnicacidesterSLE2PL00SLE0 4 S O O O O R 1 R 2 SulfuricacidesterSFE2PL00SFE0 5 R 1 O S R 2 ThiocarboxylicacidesterTCE2PL00TCE0 6 R 1 S S R 2 DithiocarboxylicacidesterDCE2PL00DCE0 7 N R 1 R 2 F S ThiocarbamicacidruorideTBF2PL00TBF0 8 N R 1 R 2 Cl S ThiocarbamicacidchlorideTBC2PL00TBC0 9 N R 1 R 2 Br S ThiocarbamicacidbromideTBB2PL00TBB0 10 N R 1 R 2 I S ThiocarbamicacidiodideTBI2PL00TBI0 11 N R 1 R 2 OH S ThiocarbamicacidTBA2PL00TBA0 12 S O O N HO R 2 R 1 SulfuricacidamideSAA2PL00SAA0 Continuedonnextpage. 210

PAGE 211

TableC-2.(continued) StructureName3L8LChg 13 S O O OR 2 R 1 SulfonicacidesterSAE2PL00SAE0 14 R 1 S OR 2 sulfenicacidesterSEE2PL00SEE0 15 P O R 1 O O R 2 PhosphonicacidesterPP4TF000PP40 TwoPointLinkerFragments. 211

PAGE 212

C.3ThreePointLinkerFragments TableC-3.ThreePointLinkerFragments. Threepointlinkerfragmentshavethreeconnectionpoints StructureName3L8LChg 1 R 1 R 2 H R 3 PropylROP3PL00ROP0 2 R 2 R 3 R 1 H AlkeneAK43PL00AK40 1 R 1 N R 2 R 3 TertiaryamineTTA3PL00TTA0 2 R 1 N R 2 R 3 TertiaryamineEtTT13PL00TT10 3 N R 1 R 2 R 3 SecondaryketimineSKT3PL00SKT0 1 R 1 OH R 2 R 3 TertiaryalcoholTAL3PL00TAL0 2 R 1 R 2 OH R 3 EnolENL3PL00ENL0 3 R 1 R 2 OH R 3 O HemiacetalHEM3PL00HEM0 1 N R 1 R 2 OR 3 OximeetherOXE3PL00OXE0 2 R 1 N R 2 O R 3 N-oxideNOX3PL00NOX0 3 R 1 N O R 2 R 3 HydroxylamineHXA3PL00HXA0 4 R 1 N O R 2 R 3 TertiaryamideTAM3PL00TAM0 5 R 1 O N R 3 R 2 ImidoesterIDE3PL00IDE0 6 R 1 N O R 2 O R 3 ImideIME3PL00IME0 Continuedonnextpage. 212

PAGE 213

TableC-3.(continued) StructureName3L8LChg 7 N R 1 R 2 OR 3 O CarbamicacidesterCAD3PL00CAD0 8 R 1 NH OH O R 2 AlphaaminoacidADB3PL00ADB0 1 R 1 N S R 2 R 3 ThiotertiaryamideTIA3PL00TIA0 2 R 1 S N R 2 R 3 ImidothioesterITE3PL00ITE0 3 N R 1 R 2 OR 3 S ThiocarbamicacidesterARB3PL00ARB0 4 S O O N O R 1 R 3 R 2 SulfuricacidamideesterAAE3PL00AAE0 5 S O O N R 1 R 3 R 2 SulfonamideSFA3PL00SFA0 6 R 1 S N R 3 R 2 SulfenicacidamideSLA3PL00SLA0 7 R 1 S N O R 3 R 2 SulnicacidamideSIM3PL00SIM0 1 P R 3 R 1 R 2 PhosphinePHI3PL00PHI0 2 P O R 1 R 3 R 2 PhosphinoxidePHO3PL00PHO0 ThreePointLinkerFragments. 213

PAGE 214

C.4FourPointLinkerFragments TableC-4.FourPointLinkerFragments. Fourpointlinkerfragmentshavefourconnectionpoints StructureName3L8LChg 1 R 1 R 2 R 4 R 3 ButylBUT4PL00BUT0 2 R 1 R 2 R 3 R 4 AlkeneAK54PL00AK50 1 N R 1 R 2 N R 3 R 4 HydrazoneHZO4PL00HZO0 2 R 1 N N R 2 R 3 R 4 HydrazineHZI4PL00HZI0 3 R 1 N N R 3 R 2 R 4 AmidineAME4PL00AME0 4 R 1 N R 2 R 3 R 4 QuaternaryammoniumQTA4PL00QTA+1 1 R 1 R 2 OR 4 R 3 EnoletherELE4PL00ELE0 2 R 2 HO R 1 OH R 4 R 3 1,2-diol12D4PL0012D0 3 R 1 R 2 OR 4 R 3 O AcetalACE4PL00ACE0 1 R 2 HO R 1 NH 2 R 4 R 3 1,2-aminoalcohol12A4PL0012A0 2 R 1 N O R 2 R 3 R 4 CarboxylicacidhydrazideCAH4PL00CAH0 3 N R 1 R 2 N O R 4 R 3 UreaURE4PL00URE0 Continuedonnextpage. 214

PAGE 215

TableC-4.(continued) StructureName3L8LChg 4 N R 1 R 2 N O R 3 R 4 IsoureaIUR4PL00IUR0 1 R 1 R 2 SR 4 R 3 S ThioacetalCET4PL00CET0 2 N R 1 R 2 N S R 4 R 3 ThioureaTIU4PL00TIU0 3 N R 1 R 2 N S R 3 R 4 IsothioureaITU4PL00ITU0 4 S O O N N R 4 R 3 R 2 R 1 SulfuricaciddiamideDIA4PL00DIA0 FourPointLinkerFragments. 215

PAGE 216

C.5FivePointLinkerFragments TableC-5.FivePointLinkerFragments. Fivepointlinkerfragmentshaveveconnectionpoints StructureName3L8LChg 1 N R 1 R 2 N N R 3 R 5 R 4 GuanidineGUD5PL00GUD0 2 R 1 N N N R 2 R 5 R 3 R 4 AmidrazoneADZ5PL00ADZ0 3 R 1 R 2 N R 3 R 4 R 5 EnamineENM5PL00ENM0 4 R 1 R 2 N R 3 O R 5 R 4 HemiaminalHMI5PL00HMI0 5 R 1 R 2 N R 3 S R 5 R 4 ThiohemiaminalTHI5PL00THI0 6 N R 1 R 2 N R 3 O R 5 R 4 SemicarbazoneSCZ5PL00SCZ0 7 N R 1 R 2 N R 3 S R 5 R 4 ThiosemicarbazoneTSZ5PL00TSZ0 8 N R 1 R 2 N O N R 3 R 4 R 5 SemicarbazideSCI5PL00SCI0 9 N R 1 R 2 N S N R 3 R 4 R 5 ThiosemicarbazideTSI5PL00TSI0 FivePointLinkerFragments. 216

PAGE 217

C.6ThreeMemberedRingFragments TableC-6.ThreeMemberedRingFragments. ThreeMemberedRingFragments StructureName3L8LChg 1 R CyclopropylCPP3MR00CPP0 2 N R 5 R 4 R 2 R 3 R 1 1,2,2,3,3-aziridineAZI3MR00AZI0 3 O R 4 R 3 R 1 R 2 2,2,3,3-epoxideEPO3MR00EPO0 4 S R 2 R 1 R 4 R 3 2,2,3,3-thiiraneTII3MR00TII0 ThreeMemberedRingFragments. 217

PAGE 218

C.7FourMemberedRingFragments TableC-7.FourMemberedRingFragments. FourMemberedRingFragments StructureName3L8LChg 1 R CyclobutaneCBT4MR00CBT0 2 R 2 R 1 1,1-cyclobutane1BT4MR001BT0 3 R 2 R 1 R 4 R 3 R 8 R 7 R 5 R 6 1,1,2,2,3,3,4,4-cyclobutane2BT4MR002BT0 4 R 4 R 3 R 8 R 7 R 5 R 6 O 2,2,3,3,4,4-cyclobutane-1-one4BT4MR004BT0 1 NH R 3 R 4 R 1 R 2 R 6 R 5 2,2,3,3,4,4-azetidine4BX4MR004BX0 2 O O R 3 R 4 R 1 R 2 3,3,4,4-betalactone4BO4MR004BO0 3 NH O R 3 R 4 R 1 R 2 3,3,4,4-betalactam4BA4MR004BA0 4 N OH R 3 R 4 R 1 R 2 3,3,4,4-betalactim4BI4MR004BI0 1 O R 3 R 4 R 1 R 2 R 6 R 5 2,2,3,3,4,4-oxetaneOTE4MR00OTE0 FourMemberedRingFragments. 218

PAGE 219

C.8FiveMemberedRingFragments TableC-8.FiveMemberedRingFragments. FiveMemberedRingFragments StructureName3L8LChg 1 R 1 R 2 1,2-cyclopentadienePT15MR00PT10 2 R 1 R 3 1,3-cyclopentadienePT25MR00PT20 3 R 1 R 4 1,4-cyclopentadienePT35MR00PT30 4 R 2 R 3 2,3-cyclopentadienePT45MR00PT40 5 O R 2 2-cyclopentan-1-onePT55MR00PT50 1 R CyclopentylCPL5MR00CPL0 1 HN R 3 R 2 2,3-pyrroleYR15MR00YR10 2 N R 3 R 4 R 1 1,3,4-pyrroleYR25MR00YR20 3 N R 3 R 2 R 4 R 5 R 1 1,2,3,4,5-pyrroleYR35MR00YR30 4 HN R 3 R 2 R 4 R 5 2,3,4,5-pyrroleYR45MR00YR40 1 N N R 3 R 1 R 5 1,3,5-pyrazolePRZ5MR00PRZ0 1 N N R 1 1-pyrazolineRA15MR00RA10 Continuedonnextpage. 219

PAGE 220

TableC-8.(continued) StructureName3L8LChg 2 N HN R 3 3-pyrazolineRA25MR00RA20 3 N N R 3 R 1 1,3-pyrazolineRA35MR00RA30 1 NH N R 1-pyrazolidineRI15MR00RI10 2 NH N R 1 O O R 4 1,4-pyrazolidine-3,5-dioneRI25MR00RI20 1 N N R 1 R 2 R 4 1,2,4-imidazoleIDZ5MR00IDZ0 1 N N R 3-imidazolineIZ15MR00IZ10 1 N NH R 1-imidazolidineIM15MR00IM10 2 N N R 1 R 3 1,3-imidazolidineIM25MR00IM20 3 NH N O R 1-imidazolidinoneIM35MR00IM30 4 N N O R 1 R 3 1,3-imidazolidinoneIM45MR00IM40 5 N N O O R 1 R 3 1,3-imidazoline-2,4-dioneIM55MR00IM50 1 N N N R 5 R 1 R 4 1,4,5-1,2,3-triazoleAZ15MR00AZ10 2 N N N R 5 R 1 R 3 1,3,5-pyrrodiazolePRR5MR00PRR0 1 N N N HN R 1H-tetrazoleTZ15MR00TZ10 Continuedonnextpage. 220

PAGE 221

TableC-8.(continued) StructureName3L8LChg 2 N N N N R TetrazoleTZ25MR00TZ20 1 N N N N N R PentazolePZ15MR00PZ10 1 N R 1-pyrrolidineRD15MR00RD10 2 N R O 1-pyrrolidoneRD25MR00RD20 3 N R O O 1-pyrrolidine-2,5-dioneRD35MR00RD30 1 O R 2 2-furanFN15MR00FN10 2 O R 3 3-furanFN25MR00FN20 3 O R 3 R 2 2,3-furanFN35MR00FN30 4 O R 3 R 4 3,4-furanFN45MR00FN40 5 O O R 5-R-gammalactoneFN55MR00FN50 1 O O O R 4a R 4b R 5b R 5a 4,4,5,5-1,3-dioxolan-2-oneXL15MR00XL10 1 N O R 4 R 2 R 5 2,4,5-oxazoleOZO5MR00OZO0 1 N O R 5 R 3 3,5-isoxazoleIOZ5MR00IOZ0 Continuedonnextpage. 221

PAGE 222

TableC-8.(continued) StructureName3L8LChg 1 O N R 2 2-oxazolineZO15MR00ZO10 2 O N R 2 O 2-1,3-oxazol-4-oneZO25MR00ZO20 1 O N O R 3 R 5 3,5-oxazolidinoneAO15MR00AO10 2 O N O R 3 R 4a R 4b R 5b R 5a 3,4,4,5,5-oxazolidinoneAO25MR00AO20 1 N N O R 5 R 2 2,5-1,3,4-oxadiazoleDZ15MR00DZ10 2 N O N R 3 R 4 3,4-1,2,5-oxadiazoleDZ25MR00DZ20 1 S R 3 3-thiophene3TP5MR003TP0 2 S R 3 R 2 2,3-thiophene23T5MR0023T0 3 S R 4 R 3 3,4-thiophene34T5MR0034T0 4 S R 5 R 2 2,5-thiophene25T5MR0025T0 1 N S R 4 R 5 4,5-thiazoleTZL5MR00TZL0 2 N S R 5 R 2 R 4 2,4,5-thiazoleTIZ5MR00TIZ0 Continuedonnextpage. 222

PAGE 223

TableC-8.(continued) StructureName3L8LChg 1 N S R 2 2-thiazolineZL15MR00ZL10 1 NH S R 2 R 4 2,4-1,3-thiazolidineIL15MR00IL10 2 NH S R 2 2-1,3-thiazolidineIL25MR00IL20 3 N S R 2 R 4 R 3 2,3,4-1,3-thiazolidineIL35MR00IL30 4 NH S O O R 5 5-thiazolidinedioneTLD5MR00TLD0 1 N N S R 5 R 4 4,5-1,2,3-thiadiazoleDI15MR00DI10 2 N N S R 5 5-1,2,3-thiadiazoleDI25MR00DI20 3 N S N R 3 R 4 3,4-1,2,5-thiadiazoleDI35MR00DI30 4 N N S R 5 R 2 2,5-1,3,4-thiadiazoleDI45MR00DI40 FiveMemberedRingFragments. 223

PAGE 224

C.9SixMemberedRingFragments TableC-9.SixMemberedRingFragments. SixMemberedRingFragments StructureName3L8LChg 1 R PhenylBNZ6MR00BNZ0 2 R 2 H H H R 1 H 1,2-phenyl(ortho)OSB6MR00OSB0 3 H R 2 H H R 1 H 1,3-phenyl(meta)MSB6MR00MSB0 4 H H R 2 H R 1 H 1,4-phenyl(para)PSB6MR00PSB0 5 R 1 R 2 R 3 R 4 R 5 1,2,3,4,5-phenyl4BZ6MR004BZ0 1 R Cyclohexyl6CH6MR006CH0 2 R 2 R 1 1,1-cyclohexane11C6MR0011C0 3 R 1 R 2 1,2-cyclohexene12C6MR0012C0 1 N R 2-pyridine2PY6MR002PY0 2 N R 3-pyridine3PY6MR003PY0 3 N R 4-pyridine4PY6MR004PY0 Continuedonnextpage. 224

PAGE 225

TableC-9.(continued) StructureName3L8LChg 4 N R 2 R 5 2,5-pyridinePY26MR00PY20 5 N R 4 R 3 3,4-pyridinePY36MR00PY30 1 N R 1-piperidine1PP6MR001PP0 1 N N R 2 2-pyrazineZI16MR00ZI10 2 N N R 2 R 3 R 5 R 6 2,3,5,6-pyrazineZI26MR00ZI20 1 NH N R 1-piperazinePPZ6MR00PPZ0 2 N N R 1 R 4 1,4-piperazinePZ26MR00PZ20 1 N N R 1-pyrimidineMI16MR00MI10 2 NH N R 1 O O R 5 1,5-pyrimidine-2,4-dioneMI26MR00MI20 1 N N R 3 R 4 3,4-pyridazineZA16MR00ZA10 2 N N R 3 R 6 3,6-pyridazineZA26MR00ZA20 3 N N R 3 R 6 R 4 R 5 3,4,5,6-pyridazineZA36MR00ZA30 Continuedonnextpage. 225

PAGE 226

TableC-9.(continued) StructureName3L8LChg 4 N N R 6 R 2 O 2,6-pyridazin-3-oneZA46MR00ZA40 1 NN N R 2 R 4 2,4-1,3,5-triazineA116MR00A110 2 NN N R 2 R 4 R 6 2,4,6-1,3,5-triazineA126MR00A120 1 HNNH O O O R 5b R 5a 5,5-barbituricacidB116MR00B110 2 NNH O O O R 5b R 5a R 1 1,5,5-barbituricacidB126MR00B120 1 OH 2-phenol2PO6MR002PO0 2 OH OH 3,4-diphenol3PO6MR003PO0 3 O R 2-diphenyletherDPE6MR00DPE0 4 R O O 1,4-benzoate14B6MR0014B0 5 R 1 O O R 2 1,4-benzoateesterBE16MR00BE10 6 R 4 O O R 2 R 3 R 5 R 6 2,3,4,5,6-benzoate1B46MR001B40 Continuedonnextpage. 226

PAGE 227

TableC-9.(continued) StructureName3L8LChg 7 R 4 O O R 2 R 3 R 5 R 6 R 7 2,3,4,5,6-benzoateesterBE26MR00BE20 1 O R 2 2-4H-pyranC116MR00C110 2 O R 2 R 3 R 5 R 6 R 4b R 4a 2,3,4,4,5,6-4H-pyranC126MR00C120 1 O R 4-oxaneD116MR00D110 2 O R 2 R 1 R 4 R 3 R 6 R 5 R 7 R 8 R 10 R 9 2,2,3,3,4,4,5,5,6,6-oxaneD126MR00D120 3 O R OH OH OH HO alpha-D-glucoseADG6MR00ADG0 4 O R R OH OH HO alpha-*D-glucoseASD6MR00ASD0 5 O R R R OH HO 2-deoxy-alpha-*D-glucose2DA6MR002DA0 1 O N R MorpholineMOR6MR00MOR0 2 S N R ThiomorpholineMR16MR00MR10 1 O O R 6-R-delta-lactoneE116MR00E110 Continuedonnextpage. 227

PAGE 228

TableC-9.(continued) StructureName3L8LChg 2 N O R 6-R-delta-lactamE126MR00E120 3 N O R 2-pyridone2YO6MR002YO0 4 N S R 2-thiopyridone2TP6MR002TP0 5 N N R 1 R 2 2-iminopyridine2IP6MR002IP0 SixMemberedRingFragments. 228

PAGE 229

C.10GreaterthanSixMemberedRingFragments TableC-10.GreaterthanSixMemberedRingFragments. GreaterthanSixMemberedRingFragments StructureName3L8LChg 1 R CycloheptylG61GMR00G610 2 R CyclooctylG62GMR00G620 GreaterthanSixMemberedRingFragments. 229

PAGE 230

C.11FusedRingFragments TableC-11.FusedRingFragments. FusedRingFragments StructureName3L8LChg 1 R 1-naphthaleneNAPFR000NAP0 1 R 1-indanID1FR000ID10 2 R 1-inden-1-ylID2FR000ID20 1 N R 2 R 4 R 5 R 7 2,4,5,7-quinolineQE1FR000QE10 2 N R 2 R 4 R 6 2,4,6-quinolineQE2FR000QE20 3 N R 2 R 4 R 5 R 7 R 3 R 6 R 8 2,3,4,5,6,7,8-quinolineQE3FR000QE30 1 N R 3 R 1 1,3-isoquinolineIQ1FR000IQ10 2 N R 4 R 5 R 7 R 3 R 6 R 8 R 1 1,3,4,5,6,7,8-isoquinolineIQ2FR000IQ20 1 N N R 3 3-phthalazineHT1FR000HT10 2 N N R 3 R 4 R 6 R 5 R 7 R 8 3,4,5,6,7,8-phthalazineHT2FR000HT20 Continuedonnextpage. 230

PAGE 231

TableC-11.(continued) StructureName3L8LChg 3 N N O R 2 R 4 2,4-phthalazinoneHT3FR000HT40 1 N N R 4 R 3 3,4-cinnolineCI1FR000CI10 2 N N R 4 R 5 R 7 R 3 R 6 R 8 3,4,5,6,7,8-cinnolineCI2FR000CI20 1 N N R 7 R 6 R 4 4,6,7-quinazolineIN1FR000IN10 2 N N R 2 R 5 R 7 R 6 R 8 R 4 2,4,5,6,7,8-quinazolineIN2FR000IN20 1 N N R 2 R 5 R 7 R 3 R 6 R 8 2,3,5,6,7,8-quinoxalineNO1FR000NO10 1 N N N N R 4 R 2 R 5 R 6 2,4,6,7-pteridineER1FR000ER10 2 HN N N N H 2 N O R 6 R 7 6,7-pterinER2FR000ER20 3 N N N N R 4 R 2 2,4-pteridineER3FR000ER10 4 N N N N R 4 R 2 R 6 2,4,6-pteridineER4FR000ER10 Continuedonnextpage. 231

PAGE 232

TableC-11.(continued) StructureName3L8LChg 5 HN N N N H 2 N O R 6 6-pterinER5FR000ER10 1 NH R 3 3-indoleLE1FR000LE10 2 NH R 4 4-indoleLE2FR000LE20 3 NH R 3 R 4 3,4-indoleLE3FR000LE30 4 NH R 3 R 5 3,5-indoleLE4FR000LE40 5 N R 3 R 5 R 1 R 2 1,2,3,5-indoleLE5FR000LE50 1 N R 3 R 4 R 5 R 7 R 6 R 1 H 1,3,4,5,6,7-isoindoleLE6FR000LE60 1 N N NH N R 6 R 2 R 8 2,6,8-purinePU1FR000PU10 1 NH N R 4 R 6 R 7 R 5 R 3 3,4,5,6,7-indazoleZ11FR000Z110 1 N N N R 1-benzotriazoleY11FR000Y110 1 N N R 1-1,3-benzodiazepineN11FR000N110 Continuedonnextpage. 232

PAGE 233

TableC-11.(continued) StructureName3L8LChg 2 N N R 1-1,4-benzodiazepineN12FR000N120 3 NH N R 4 O 4-1,5-benzodiazepin-2-oneN13FR000N130 1 O R 7 R 4 O R 3 3,4,7-coumarinCU1FR000CU10 2 O R 7 R 6 R 4 O R 3 R 4 R 8 3,4,5,6,7,8-coumarinCU2FR000CU20 1 O R 2b R 2a R 9 R 8 R 7 R 6 2,2,6,7,8,9-chromanCHRFR000CHR0 1 O R 3 R 2 O 2,3-chromoneCE1FR000CE10 2 O O R 6 6-chromoneCE2FR000CE20 3 O O R 5 R 7 R 3 R 10 R 11 3,5,7,10,11-ravoneCE3FR000CE30 1 O O R 5 5-1,4-benzodioxaneBD1FR000BD10 1 O R 2 2-benzofuranBF1FR000BF10 2 O R 3 3-benzofuranBF2FR000BF20 Continuedonnextpage. 233

PAGE 234

TableC-11.(continued) StructureName3L8LChg 3 O R 5 5-benzofuranBF3FR000BF30 1 O R 4 O 4-phthalidePH1FR000PH10 1 O O R 4 4-1,3-benzodioxoleBZ1FR000BZ10 1 S R 5 R 2 R 3 R 4 R 6 R 7 2,3,4,5,6,7-benzothiopheneBN1FR000BN10 2 S R 3 R 6 3,6-benzothiopheneBN2FR000BN20 3 S R 3 R 6 R 2 2,3,6-benzothiophene3BTFR0003BT0 4 S R 5 R 3 R 4 R 6 R 7 R 1 1,3,4,5,6,7-isobenzothiopheneBN4FR000BN40 1 N O R BenzoxazoleBXZFR000BXZ0 1 N O R 3 R 4 R 5 R 6 R 7 3,4,5,6,7-benzisoxazoleBI1FR000BI10 2 N O R 3 R 5 3,5-benzisoxazoleBI2FR000BI20 1 N S R 2 R 4 R 5 R 6 R 7 2,4,5,6,7-benzothiazoleBH1FR000BH10 Continuedonnextpage. 234

PAGE 235

TableC-11.(continued) StructureName3L8LChg 2 N S R 2 R 6 2,6-benzothiazoleBH2FR000BH20 3 N S R 2 2-benzothiazoleBH3FR000BH30 1 N O R 3 3-1,4-benzoxazineBX1FR000BX10 2 NH O R 6 O 6-1,4-benzoxazin-3(4H)-oneBX2FR000BX20 1 O R 1-ruorenone1FOFR0001FO0 2 O R 1-dibenzofuranX11FR000X110 3 N R carbazol-9-ylX12FR000X120 1 R 1 R 10 1,10-anthraceneAN1FR000AN10 2 O O R 1-dioxoanthraceneIDAFR000IDA0 3 O R 3 R 6 R 9 3,6,9-xanthen-9-ylXA1FR000XA10 4 O O R 1-oxanthreneXO1FR000XO10 1 N R 1-acridineCR1FR000CR10 Continuedonnextpage. 235

PAGE 236

TableC-11.(continued) StructureName3L8LChg 2 N R 9 R 1 R 3 R 2 R 4 R 5 R 6 R 7 R 8 1,2,3,4,5,6,7,8,9-acridineCR2FR000CR20 1 N N R 2 2-phenazineEZ1FR000EZ10 2 N N R 2 R 1 R 3 R 4 R 5 R 6 R 7 R 8 1,2,3,4,5,6,7,8-phenazineEZ2FR000EZ20 FusedRingFragments. 236

PAGE 237

REFERENCES [1]T.M.SpeightandN.H.G.Holford,editors. Avery'sDrugTreatment .AdisPress, Auckland,NewZealand,4thedition,1997. [2]N.A.Roberts,J.A.Martin,D.Kinchington,A.V.Broadhu rst,J.C.Craig, I.B.Duncan,S.A.Galpin,B.K.Handa,J.Kay,A.Krohn,R.W.L ambert, J.H.Merrett,J.S.Mills,K.E.B.Parkes,S.Redshaw,A.J.Ri tchie,D.L. Taylor,G.J.Thomas,andP.J.Machin.RationalDesignofPep tide-BasedHIV Proteinase-Inhibitors. Science ,248(4953):358{361,1990. [3]D.W.Cushman,M.A.Ondetti,E.M.Gordon,S.Natarajan,D .S.Karanewsky, J.Krapcho,andJr.Petrillo,E.W.Rationaldesignandbioch emicalutilityof specicinhibitorsofangiotensin-convertingenzyme. J.Cardiovasc.Pharmacol. 10:S17{30,1987. [4]A.R.Leach,B.K.Shoichet,andC.E.Peisho.Prediction ofprotein-ligand interactions.dockingandscoring:Successesandgaps. J.Med.Chem. 49(20):5851{5855,2006. [5]P.Ferrara,H.Gohlke,D.J.Price,G.Klebe,andC.L.Broo ks.Assessingscoring functionsforprotein-ligandinteractions. J.Med.Chem. ,47(12):3032{3047,2004. [6]M.B.Peters,K.Raha,andK.M.MerzJr.Quantummechanics instructure-based drugdesign. Curr.Opin.DrugDiscoveryDev. ,9(3):370{379,2006. [7]K.Raha,A.J.vanderVaart,K.E.Riley,M.B.Peters,L.M. Westerhot,H.Kim, andK.M.MerzJr.Pairwisedecompositionofresidueinterac tionenergiesusing semiempiricalquantummechanicalmethodsinstudiesofpro tein-ligandinteraction. J.Am.Chem.Soc. ,127(18):6583{6594,2005. [8]A.R.Ortiz,M.T.Pisabarro,F.Gago,andR.C.Wade.Predi ctionof Drug-BindingAnitiesbyComparativeBinding-EnergyAnal ysis. J.Med.Chem. 38(14):2681{2691,1995. [9]M.B.PetersandK.M.MerzJr.Semiempiricalcomparative bindingenergy analysis(SE-COMBINE)ofaseriesoftrypsininhibitors. J.Chem.TheoryComput. 2(2):383{399,2006. [10]R.D.CramerIII,D.E.Patterson,andJ.D.Bunce.Compar ativeMolecularField Analysis(CoMFA).1.EectofShapeonBindingofSteroidsto CarrierProteins. J. Am.Chem.Soc. ,110:5959{5967,1988. [11]G.Klebe,U.Abraham,andT.Mietzner.MolecularSimila rityIndexesina Comparative-Analysis(CoMSIA)ofDrugMoleculestoCorrel ateandPredict theirbiological-activity. J.Med.Chem. ,37(24):4130{4146,1994. [12]G.Klebe.Comparativemolecularsimilarityindicesan alysis:CoMSIA. Persp.Drug Disc.Design ,12:87{104,1998. 237

PAGE 238

[13]F.Estienne,Y.VanderHeyden,andD.L.Massart.Chemom etricsandmodeling. Chimia ,55(1-2):70{80,2001. [14]S.Wold,M.Sjostrom,andL.Eriksson.PLS-regression: abasictoolof chemometrics. Chemom.Intell.Lab.Syst. ,58(2):109{130,2001. [15]F.C.Bernstein,T.F.Koetzle,G.J.B.Williams,E.F.Me yer,M.D.Brice,J.R. Rodgers,O.Kennard,T.Shimanouchi,andM.Tasumi.Protein DataBankComputer-BasedArchivalFileforMacromolecularStructur es. J.Mol.Biol. 112(3):535{542,1977. [16]D.A.Case,T.A.Darden,T.E.Cheatham,III,C.L.Simmer ling,J.Wang,R.E. Duke,R.Luo,K.M.MerzJr.,D.A.Pearlman,M.M.Crowley,R.C .R.C.Walker, W.W.Zhang,B.Wang,S.Hayik,A.Roitberg,G.Seabra,K.F.Wo ng,F.Paesani, X.Wu,S.Brozell,V.Tsui,H.Gohlke,L.Yang,C.Tan,J.Monga n,V.Hornak, G.Cui,P.Beroza,D.H.Mathews,C.Schafmeister,W.S.Ross, andP.A.Kollman. AMBER9,2006. [17]T.Fink,H.Bruggesser,andJ.L.Reymond.Virtualexplo rationofthe small-moleculechemicaluniversebelow160daltons. Angew.Chem.,Int. 44(10):1504{1508,2005. [18]M.A.Koch,A.Schuenhauer,M.Scheck,S.Wetzel,M.Cas aulta,A.Odermatt, P.Ertl,andH.Waldmann.Chartingbiologicallyrelevantch emicalspace:A structuralclassicationofnaturalproducts(SCONP). Proc.Natl.Acad.Sci.U.S. A. ,102(48):17272{17277,2005. [19]D.G.Lloyd,G.Gols,A.J.S.Knox,D.Fayne,M.J.Meegan ,andT.I.Oprea. Oncologyexploration:chartingcancermedicinalchemistr yspace. DrugDiscov. Today ,11(3-4):149{159,2006. [20]T.FinkandJ.L.Reymond.Virtualexplorationoftheche micaluniverseup to11atomsofC,N,O,F:Assemblyof26.4millionstructures( 110.9million stereoisomers)andanalysisfornewringsystems,stereoch emistry,physicochemical properties,compoundclasses,anddrugdiscovery. J.Chem.Inf.Model. 47(2):342{353,2007. [21]A.Schuenhauer,P.Ertl,S.Roggo,S.Wetzel,M.A.Koch ,andH.Waldmann. Thescaoldtree-visualizationofthescaolduniversebyh ierarchicalscaold classication. J.Chem.Inf.Model. ,47(1):47{58,2007. [22]R.E.BabineandS.L.Bender.Molecularrecognitionofp rotein-ligandcomplexes: Applicationstodrugdesign. Chem.Rev. ,97(5):1359{1472,1997. [23]J.G.Robertson.Mechanisticbasisofenzyme-targeted drugs. Biochemistry 44(15):5561{5571,2005. 238

PAGE 239

[24]R.B.Silverman. Theorganicchemistryofenzyme-catalyzedreactions .Academic Press,SanDiego,2002. [25]P.A.Boriack-Sjodin,S.Zeitlin,H.H.Chen,L.Crensha w,S.Gross, A.Dantanarayana,P.Delgado,J.A.May,T.Dean,andD.W.Chr istianson. Structuralanalysisofinhibitorbindingtohumancarbonic anhydraseII. Protein Sci. ,7(12):2483{9,1998. [26]AjayandM.A.Murcko.Computationalmethodstopredict bindingfreeenergyin ligand-receptorcomplexes. J.Med.Chem. ,38(26):4953{4967,1995. [27]J.J.IrwinandB.K.Shoichet.ZINC{afreedatabaseofco mmerciallyavailable compoundsforvirtualscreening. J.Chem.Inf.Model. ,45(1):177{82,2005. [28]J.J.Irwin,F.M.Raushel,andB.K.Shoichet.Virtualsc reeningagainst metalloenzymesforinhibitorsandsubstrates. Biochemistry ,44(37):12316{28, 2005. [29]R.E.Dolle,B.LeBourdonnec,G.A.Morales,K.J.Moriar ty,andJ.M.Salvino. Comprehensivesurveyofcombinatoriallibrarysynthesis: 2005. J.Comb.Chem. 8(5):597{635,2006. [30]A.J.S.Knox,M.J.Meegan,G.Carta,andD.G.Lloyd.Cons iderationsin compounddatabasepreparation-"hidden"impactonvirtual screeningresults. J. Chem.Inf.Model. ,45(6):1908{1919,2005. [31]A.J.S.Knox,M.J.Meegan,andD.G.Lloyd.Estrogenrece ptors:Molecular interactions,virtualscreeningandfutureprospects. Curr.Top.Med.Chem. 6(3):217{243,2006. [32]J.C.Baber,A.S.William,Y.H.Gao,andM.Feher.Theuse ofconsensusscoring inligand-basedvirtualscreening. J.Chem.Inf.Model. ,46(1):277{288,2006. [33]W.P.WaltersandM.A.Murcko.Predictionof'drug-like ness'. Adv.DrugDelivery Rev. ,54(3):255{271,2002. [34]M.FeherandJ.M.Schmidt.Propertydistributions:Di erencesbetweendrugs, naturalproducts,andmoleculesfromcombinatorialchemis try. J.Chem.Inf. Comput.Sci. ,43(1):218{227,2003. [35]M.C.Hutter.Separatingdrugsfromnondrugs:Astatist icalapproachusingatom pairdistributions. J.Chem.Inf.Model. ,47(1):186{194,2007. [36]M.Snarey,N.K.Terrett,P.Willett,andD.J.Wilton.Co mparisonofalgorithms fordissimilarity-basedcompoundselection. J.Mol.GraphicsModell. ,15(6):372{385, 1997. 239

PAGE 240

[37]S.L.DixonandK.M.MerzJr.One-dimensionalmolecular representations andsimilaritycalculations:Methodologyandvalidation. J.Med.Chem. 44(23):3795{3809,2001. [38]T.Ewing,J.C.Baber,andM.Feher.Novel2Dngerprints forligand-basedvirtual screening. J.Chem.Inf.Model. ,46(6):2423{2431,2006. [39]M.StahlandH.Mauser.Databaseclusteringwithacombi nationofngerprintand maximumcommonsubstructuremethods. J.Chem.Inf.Model. ,45(3):542{548, 2005. [40]P.Willett.Searchingtechniquesfordatabasesoftwoandthree-dimensional chemicalstructures. J.Med.Chem. ,48(13):4183{4199,2005. [41]I.Muegge,S.L.Heald,andD.Brittelli.Simpleselecti oncriteriafordrug-like chemicalmatter. J.Med.Chem. ,44(12):1841{1846,2001. [42]C.A.Lipinski,F.Lombardo,B.W.Dominy,andP.J.Feene y.Experimentaland computationalapproachestoestimatesolubilityandperme abilityindrugdiscovery anddevelopmentsettings. Adv.DrugDeliveryRev. ,23(1-3):3{25,1997. [43]M.HornigandA.Klamt.COSMOfrag:AnoveltoolforhighthroughputADME propertypredictionandsimilarityscreeningbasedonquan tumchemistry. J.Chem. Inf.Model. ,45(5):1169{1177,2005. [44]A.L.ChengandK.M.MerzJr.Predictionofaqueoussolub ilityofadiverseset ofcompoundsusingquantitativestructure-propertyrelat ionships. J.Med.Chem. 46(17):3572{3580,2003. [45]A.ChengandS.L.Dixon.Insilicomodelsforthepredict ionofdose-dependent humanhepatotoxicity. J.Comput.AidedMol.Des. ,17(12):811{23,2003. [46]R.G.SusnowandS.L.Dixon.Useofrobustclassication techniquesforthe predictionofhumancytochromeP4502D6inhibition. J.Chem.Inf.Comput.Sci. 43(4):1308{1315,2003. [47]W.J.Egan,K.M.MerzJr.,andJ.J.Baldwin.Predictiono fdrugabsorptionusing multivariatestatistics. J.Med.Chem. ,43(21):3867{3877,2000. [48]J.C.BaberandM.Feher.Predictingsyntheticaccessib ility:Applicationindrug discoveryanddevelopment. Mini-Rev.Med.Chem. ,4(6):681{692,2004. [49]M.Stahl,N.P.Todorov,T.James,H.Mauser,H.J.Boehm, andP.M.Dean.A validationstudyonthepracticaluseofautomateddenovode sign. J.Comput.-Aided Mol.Des. ,16(7):459{478,2002. [50]P.M.Dean,D.G.Lloyd,andN.P.Todorov.Denovodrugdes ign:Integration ofstructure-basedandligand-basedmethods. Curr.Opin.DrugDiscoveryDev. 7(3):347{353,2004. 240

PAGE 241

[51]H.MauserandM.Stahl.Chemicalfragmentspacesforden ovodesign. J.Chem. Inf.Model. ,47(2):318{324,2007. [52]D.G.Lloyd,C.L.Buenemann,N.P.Todorov,D.T.Manalla ck,andP.M.Dean. Scaoldhoppingindenovodesign.ligandgenerationinthea bsenceofreceptor information. J.Med.Chem. ,47(3):493{496,2004. [53]H.Gohlke,M.Hendlich,andG.Klebe.Predictingbindin gmodes,bindinganities and'hotspots'forprotein-ligandcomplexesusingaknowle dge-basedscoring function. Perspect.DrugDiscoveryDes. ,20(1):115{144,2000. [54]B.A.Grzybowski,A.V.Ishchenko,J.Shimada,andE.I.S hakhnovich.From knowledge-basedpotentialstocombinatorialleaddesigni nsilico. Acc.Chem.Res. 35(5):261{269,2002. [55]B.A.Grzybowski,A.V.Ishchenko,C.Y.Kim,G.Topalov, R.Chapman, D.W.Christianson,G.M.Whitesides,andE.I.Shakhnovich. Combinatorial computationalmethodgivesnewpicomolarligandsforaknow nenzyme. Proc.Natl. Acad.Sci.U.S.A. ,99(3):1270{1273,2002. [56]M.Feher,E.Deretey,andS.Roy.BHB:Asimpleknowledge -basedscoringfunction toimprovetheeciencyofdatabasescreening. J.Chem.Inf.Comput.Sci. 43(4):1316{1327,2003. [57]H.F.G.Velec,H.Gohlke,andG.Klebe.DrugScore(CSD)knowledge-based scoringfunctionderivedfromsmallmoleculecrystaldataw ithsuperiorrecognition rateofnear-nativeligandposesandbetteranitypredicti on. J.Med.Chem. 48(20):6296{6303,2005. [58]W.D.Cornell,P.Cieplak,C.I.Bayly,I.R.Gould,K.M.M erzJr.,D.M.Ferguson, D.C.Spellmeyer,T.Fox,J.W.Caldwell,andP.A.Kollman.A2 ndGeneration Force-FieldfortheSimulationofProteins,Nucleic-Acids ,andOrganic-Molecules. J. Am.Chem.Soc. ,117(19):5179{5197,1995. [59]W.D.Cornell,P.Cieplak,C.I.Bayly,I.R.Gould,K.M.M erzJr.,D.M.Ferguson, D.C.Spellmeyer,T.Fox,J.W.Caldwell,andP.A.Kollman.As econdgeneration forceeldforthesimulationofproteins,nucleicacids,an dorganicmolecules. J.Am. Chem.Soc. ,118(9):2309{2309,1996. [60]D.A.Case,T.E.Cheatham,T.Darden,H.Gohlke,R.Luo,K .M.MerzJr., A.Onufriev,C.Simmerling,B.Wang,andR.J.Woods.TheAMBE Rbiomolecular simulationprograms. J.Comput.Chem. ,26(16):1668{1688,2005. [61]B.R.Brooks,R.E.Bruccoleri,B.D.Olafson,D.J.State s,S.Swaminathan,and M.Karplus.CHARMM-aProgramforMacromolecularEnergy,Mi nimization,and DynamicsCalculations. J.Comput.Chem. ,4(2):187{217,1983. 241

PAGE 242

[62]T.A.Halgren.Merckmolecularforceeld.5.Extension ofMMFF94using experimentaldata,additionalcomputationaldata,andemp iricalrules. J.Comput.Chem. ,17(5-6):616{641,1996. [63]T.A.Halgren.Merckmolecularforceeld.3.Molecular geometriesandvibrational frequenciesforMMFF94. J.Comput.Chem. ,17(5-6):553{586,1996. [64]T.A.Halgren.Merckmolecularforceeld.2.MMFF94van derwaalsand electrostaticparametersforintermolecularinteraction s. J.Comput.Chem. 17(5-6):520{552,1996. [65]T.A.Halgren.Merckmolecularforceeld.1.Basis,for m,scope,parameterization, andperformanceofMMFF94. J.Comput.Chem. ,17(5-6):490{519,1996. [66]T.A.Halgren.RepresentationofvanderWaals(vdW)Int eractionsinMolecular MechanicsForce-Fields-PotentialForm,CombinationRule s,andvdWparameters. J.Am.Chem.Soc. ,114(20):7827{7843,1992. [67]T.A.HalgrenandR.B.Nachbar.Merckmolecularforcee ld.4.Conformational energiesandgeometriesforMMFF94. J.Comput.Chem. ,17(5-6):587{615,1996. [68]T.A.Halgren.MMFFVII.CharacterizationofMMFF94,MM FF94s, andotherwidelyavailableforceeldsforconformationale nergiesandfor intermolecular-interactionenergiesandgeometries. J.Comput.Chem. 20(7):730{748,1999. [69]T.A.Halgren.MMFFVI.MMFF94SOptionforEnergyMinimi zationStudies. J. Comput.Chem. ,20(7):720{729,1999. [70]W.L.JorgensenandJ.Tiradorives.TheOPLSPotentialF unctionsforProteinsEnergyMinimizationsforCrystalsofCyclic-PeptidesandC rambin. J.Am.Chem. Soc. ,110(6):1657{1666,1988. [71]N.L.Allinger,Y.H.Yuh,andJ.H.Lii.MolecularMechan ics-theMM3force-eld forHydrocarbons.1. J.Am.Chem.Soc. ,111(23):8551{8566,1989. [72]M.J.S.DewarandW.Thiel.GroundStatesofMolecules.3 8.theMNDOmethod. approximationsandParameters. J.Am.Chem.Soc. ,99(15):4899{4907,1977. [73]M.J.S.Dewar,E.G.Zoebisch,E.F.Healy,andJ.J.P.Ste wart.AM1:ANew GeneralPurposeQuantumMechanicalMolecularModel. J.Am.Chem.Soc. 107:3902{3909,1985. [74]JamesJ.P.Stewart.OptimizationofParametersforSem iempiricalMethodsI. Method. J.Comp.Chem. ,10:209{220,1989. [75]JamesJ.P.Stewart.OptimizationofParametersforSem iempiricalMethodsII. Applications. J.Comp.Chem. ,10:221{264,1989. 242

PAGE 243

[76]W.ThielandA.A.Voityuk.ExtensionofMNDOtodOrbital s:Parametersand ResultsfortheSecond-RowElementsandfortheZincGroup. J.Phys.Chem. 100:616{626,1996. [77]M.P.Repasky,J.Chandrasekhar,andW.L.Jorgensen.PD DG/PM3and PDDG/MNDO:Improvedsemiempiricalmethods. J.Comput.Chem. 23(16):1601{1622,2002. [78]M.Elstner,D.Porezag,G.Jungnickel,J.Elsner,M.Hau gk,T.Frauenheim, S.Suhai,andG.Seifert.Self-consistent-chargedensityfunctionaltight-binding methodforsimulationsofcomplexmaterialsproperties. Phys.Rev.B 58(11):7260{7268,1998. [79]M.Elstner.TheSCC-DFTBmethodanditsapplicationtob iologicalsystems. Theor.Chem.Acc. ,116(1-3):316{325,2006. [80]K.W.Sattelmeyer,J.Tirado-Rives,andW.L.Jorgensen .Comparisonof SCC-DFTBandNDDO-basedsemiempiricalmolecularorbitalm ethodsfororganic molecules. J.Phys.Chem.A ,110(50):13551{13559,2006. [81]A.R.Leach. Molecularmodelling:principlesandapplications .PrenticeHall, Harlow,England;NewYork,2ndedition,2001. [82]B.H.MevikandH.R.Cederkvist.Meansquarederrorofpr ediction(MSEP) estimatesforprincipalcomponentregression(PCR)andpar tialleastsquares regression(PLSR). J.Chemom. ,18(9):422{429,2004. [83]R.GuhaandP.C.Jurs.Developmentoflinear,ensemble, andnonlinearmodels forthepredictionandinterpretationofthebiologicalact ivityofasetofPDGFR inhibitors. J.Chem.Inf.Comp.Sci. ,44(6):2179{2189,2004. [84]M.Karelson,V.S.Lobanov,andA.R.Katritzky.Quantum -ChemicalDescriptorsin QSAR/QSPRStudies. Chem.Rev. ,96(3):1027{1044,1996. [85]M.Brustle,B.Beck,T.Schindler,W.King,T.Mitchell ,andT.Clark.Descriptors, PhysicalProperties,andDrug-Likeness. J.Med.Chem. ,45:3345{3355,2002. [86]J.Wan,L.Zhang,G.Yang,andC.Zhan.QuantitativeStru cture-Activity RelationshipforCyclicImideDerivativesofProtoporphyr inogenOxidaseInhibitors: AStudyofQuantumChemicalDescriptorsfromDensityFuncti onalTheory. J. Chem.Inf.Comput.Sci. ,44:2099{2105,2004. [87]J.J.Sutherland,L.A.O'Brien,andD.F.Weaver.ACompa risonofMethods forModelingQuantitativeStructure-ActivityRelationsh ips. J.Med.Chem. 47:5541{5554,2004. [88]S.Dixon,K.M.MerzJr.,G.Lauri,andJ.C.Ianni.QMQSAR :Utilizationofa SemiempiricalProbePotentialinaField-BasedQSARMethod J.Comput.Chem. 26:23{34,2005. 243

PAGE 244

[89]A.H.Asikainen,J.Ruuskanen,andK.Tuppurainen.Spec troscopicQSARmethods andself-organizingmoleculareldanalysisforrelatingm olecularstructureand estrogenicactivity. J.Chem.Inf.Comput.Sci. ,43(6):1974{1981,2003. [90]A.H.Asikainen,J.Ruuskanen,andK.A.Tuppurainen.Al ternativeQSARmodels forselectedestradiolandcytochromeP450ligands:compar isonbetweenclassical, spectroscopic,CoMFAandGRID/GOLPEmethods. SARQSAREnviron.Res. 16(6):555{565,2005. [91]D.B.Turner,P.Willett,A.M.Ferguson,andT.Heritage .Evaluationofanovel infraredrangevibration-baseddescriptor(EVA)forQSARs tudies.1.General application. J.Comput.AidedMol.Des. ,11(4):409{22,1997. [92]A.M.Ferguson,T.Heritage,P.Jonathon,S.E.Pack,L.P hillips,J.Rogan,and P.J.Snaith.EVA:Anewtheoreticallybasedmoleculardescr iptorforusein QSAR/QSPRanalysis. J.Comput.-AidedMol.Des. ,11(2):143{152,1997. [93]C.M.R.Ginn,D.B.Turner,P.Willett,A.M.Ferguson,an dT.W.Heritage. Similaritysearchinginlesofthree-dimensionalchemica lstructures:Evaluationof theEVAdescriptorandcombinationofrankingsusingdatafu sion. J.Chem.Inf. Comput.Sci. ,37(1):23{37,1997. [94]T.W.Heritage,A.M.Ferguson,D.B.Turner,andP.Wille tt.EVA:Anovel theoreticaldescriptorforQSARstudies. Perspect.DrugDiscoveryDes. 9-11:381{398,1998. [95]D.B.Turner,P.Willett,A.M.Ferguson,andT.W.Herita ge.Evaluationofanovel molecularvibration-baseddescriptor(EVA)forQSARstudi es:2.Modelvalidation usingabenchmarksteroiddataset. J.Comput.-AidedMol.Des. ,13(3):271{296, 1999. [96]D.B.TurnerandP.Willett.TheEVAspectraldescriptor Eur.J.Med.Chem. 35(4):367{375,2000. [97]D.B.TurnerandP.Willett.EvaluationoftheEVAdescri ptorforQSARstudies: 3.theuseofageneticalgorithmtosearchformodelswithenh ancedpredictive properties(EVA GA). J.Comput.-AidedMol.Des. ,14(1):1{21,2000. [98]M.Ford,L.Phillips,andA.Stevens.OptimisingtheEVA descriptorforprediction ofbiologicalactivity. Org.Biomol.Chem. ,2(22):3301{3311,2004. [99]K.Tuppurainen.Frontierorbitalenergies,hydrophob icityandstericfactorsas physicalQSARdescriptorsofmolecularmutagenicity.arev iewwithacasestudy: MXcompounds. Chemosphere ,38(13):3015{3030,1999. [100]K.Tuppurainen.EEVA(electroniceigenvalue):AnewQ SAR/QSPRdescriptor forelectronicsubstituenteectsbasedonmolecularorbit alenergies. SARQSAR Environ.Res. ,10(1):39{46,1999. 244

PAGE 245

[101]K.TuppurainenandJ.Ruuskanen.Electroniceigenval ue(EEVA):anew QSAR/QSPRdescriptorforelectronicsubstituenteectsba sedonmolecularorbital energies.aQSARapproachtotheAhreceptorbindinganityo fpolychlorinated biphenyls(PCBs),dibenzo-p-dioxins(PCDDs)anddibenzof urans(PCDFs). Chemosphere ,41(6):843{848,2000. [102]K.Tuppurainen,M.Viisas,R.Laatikainen,andM.Pera kyla.Evaluationofa novelelectroniceigenvalue(EEVA)moleculardescriptorf orQSAR/QSPRstudies: Validationusingabenchmarksteroiddataset. J.Chem.Inf.Comput.Sci. 42(3):607{613,2002. [103]R.Bursi,T.Dao,T.vanWijk,M.deGooyer,E.Kellenbac h,andP.Verwer. Comparativespectraanalysis(CoSA):spectraasthree-dim ensionalmolecular descriptorsforthepredictionofbiologicalactivities. J.Chem.Inf.Comput.Sci. 39(5):861{867,1999. [104]E.Besalu,X.Gironos,L.Amat,andR.Carbo-Dorca.M olecularQuantumsimilarity andthefundamentalsofQSAR. Acc.Chem.Res. ,35:289{295,2002. [105]R.Carbo-DorcaandX.Girones.FoundationofQuantu mSimilarityMeasuresand TheirRelationshiptoQSPR:DensityFunctionStructure,Ap proximations,and ApplicationExamples. Int.J.QuantumChem. ,101:8{20,2005. [106]P.Bultinck,T.Kuppens,X.Girone,andR.Carbo-Dorc a.Quantumsimilarity superpositionalgorithm(QSSA):Aconsistentschemeformo lecularalignmentand molecularsimilaritybasedonquantumchemistry. J.Chem.Inf.Comput.Sci. 43(4):1143{1150,2003. [107]S.E.O'BrienandP.L.A.Popelier.QuantumMolecularS imilarity.3.QTMS Descriptors. J.Chem.Inf.Comput.Sci. ,41:764{775,2001. [108]U.A.ChaudryandP.L.A.Popelier.EstimationofpKaUs ingQuantum TopologicalMolecularSimilarityDescriptors:Applicati ontoCarboxylicAcids, AnilinesandPhenols. J.Org.Chem. ,69:233{241,2004. [109]P.Y.RenandJ.W.Ponder.Consistenttreatmentofinte r-andintramolecular polarizationinmolecularmechanicscalculations. J.Comput.Chem. 23(16):1497{1506,2002. [110]V.Gogonea,D.Suarez,A.vanderVaart,andK.M.MerzJr .Newdevelopmentsin applyingquantummechanicstoproteins. Curr.Opin.Struct.Biol. ,11(2):217{223, 2001. [111]S.L.DixonandK.M.MerzJr.SemiempiricalMolecularO rbitalCalculationswith LinearSystemSizeScaling. J.Chem.Phys. ,104:6643{6649,1996. [112]S.L.DixonandK.M.MerzJr.Fast,AccurateSemiempiri calMolecularOrbital CalculationsforMacromolecules. J.Chem.Phys. ,107:879{893.,1997. 245

PAGE 246

[113]A.vanderVaart,V.Gogonea,S.L.Dixon,andK.M.MerzJ r.LinearScaling MolecularOrbitalCalculationsofBiologicalSystemsUsin gtheSemiempiricalDivide andConquerMethod. J.Comput.Chem. ,21:1494{1504,2000. [114]W.Kohn.Density-FunctionalTheoryforSystemsofVer yManyAtoms. Int.J. QuantumChem. ,56(4):229{232,1995. [115]X.P.Li,R.W.Nunes,andD.Vanderbilt.Density-matri xelectronic-structure methodwithlinearsystem-sizescaling. PhysicalReviewB ,47(16):10891{10894, 1993. [116]J.J.P.Stewart.Applicationoflocalizedmolecularo rbitalstothesolutionof semiempiricalself-consistenteldequations. Int.J.QuantumChem. ,58(2):133{146, 1996. [117]K.RahaandK.M.MerzJr.AQuantumMechanicsBasedScor ingFunction:Study ofZinc-ionMediatedligandbinding. J.Am.Chem.Soc. ,126:1020{1021,2004. [118]K.RahaandK.M.MerzJr.Large-ScaleValidationofaQu antumMechanics BasedScoringFunction:PredictingtheBindingAnityandt heBindingModeofa DiverseSetofProtein-LigandComplexes. J.Med.Chem. ,48:4558{4575,2005. [119]H.FischerandH.Kollmar.EnergyPartitioningwithth eCNDOMethod. Theoret. Chim.Acta. ,16:163,1970. [120]M.J.S.DewarandD.H.Lo.ApplicationofEnergyPartit ioningtotheMINDO/2 MethodandaStudyofCopeRearragement. J.Am.Chem.Soc. ,93:7201{7205, 1971. [121]S.OlivellaandJ.Vilarrasa.ApplicationoftheParti tioningofEnergyintheMNDO MethodtotheStudyoftheBasicityofImidazole,Pyrazole,O xazole,andIsoxazole. J.Heterocycl.Chem. ,18:1189,1981. [122]F.M.H.Zipse.Chargedistributioninthewatermolecu le.acomparisonofmethods. J.Comp.Chem. ,26(1):97{105,2005. [123]J.B.Li,T.H.Zhu,C.J.Cramer,andD.G.Truhlar.Newcl assIVchargemodel forextractingaccuratepartialchargesfromwavefunction s. J.Phys.Chem.A 102(10):1820{1831,1998. [124]J.B.Li,B.Williams,C.J.Cramer,andD.G.Truhlar.Ac lassIVchargemodelfor molecularexcitedstates. J.Chem.Phys. ,110(2):724{733,1999. [125]J.B.Li,B.Williams,C.J.Cramer,andD.G.Truhlar.Ac lassIVchargemodelfor molecularexcitedstates. J.Chem.Phys. ,111(12):5624{5624,1999. [126]U.C.SinghandP.A.Kollman.Anapproachtocomputinge lectrostaticchargesfor molecules. J.Comput.Chem. ,5(2):129{145,1984. 246

PAGE 247

[127]C.I.Bayly,P.Cieplak,W.D.Cornell,andP.A.Kollman .AWell-Behaved ElectrostaticPotentialBasedMethodUsingChargeRestrai ntsforDerivingAtomic Charges-theRESPModel. J.Phys.Chem. ,97(40):10269{10280,1993. [128]W.D.Cornell,P.Cieplak,C.I.Bayly,andP.A.Kollman .ApplicationofRESP ChargestoCalculateConformationalEnergies,Hydrogen-B ondEnergies,and Free-EnergiesofSolvation. J.Am.Chem.Soc. ,115(21):9620{9631,1993. [129]J.M.Wang,P.Cieplak,andP.A.Kollman.Howwelldoesa restrainedelectrostatic potential(RESP)modelperformincalculatingconformatio nalenergiesoforganic andbiologicalmolecules? J.Comput.Chem. ,21(12):1049{1074,2000. [130]R.C.Wade,A.R.Ortiz,andF.Gago.Comparativebindin genergyanalysis. Persp. DrugDisc.Design ,9-11:19{34,1998. [131]A.R.Ortiz,M.Pastor,A.Palomer,G.Cruciani,F.Gago ,andR.C.Wade. Reliabilityofcomparativemoleculareldanalysismodels :Eectsofdatascaling andvariableselectionusingasetofhumansynovialruidpho spholipaseA(2) inhibitors. J.Med.Chem. ,40(7):1136{1148,1997. [132]C.Perez,M.Pastor,A.R.Ortiz,andF.Gago.Comparati vebindingenergyanalysis ofHIV-1proteaseinhibitors:Incorporationofsolventee ctsandvalidationasa powerfultoolinreceptor-baseddrugdesign. J.Med.Chem. ,41(6):836{852,1998. [133]T.WangandR.C.Wade.Comparativebindingenergy(COM BINE)analysisof inruenzaneuraminidase-inhibitorcomplexes. J.Med.Chem. ,44(6):961{971,2001. [134]J.Kmunicek,S.Luengo,F.Gago,A.R.Ortiz,R.C.Wade, andJ.Damborsky. Comparativebindingenergyanalysisofthesubstratespeci cityofhaloalkane dehalogenasefromxanthobacterautotrophicusGJ10. Biochemistry 40(30):8905{8917,2001. [135]T.Wang,S.Tomic,R.R.Gabdoulline,andR.C.Wade.How optimalarethe bindingenergeticsofbarnaseandbarstar? Biophys.J. ,87(3):1618{1630,2004. [136]S.Tomic,L.Nilsson,andR.C.Wade.NuclearreceptorDNAbindingspecicity:A COMBINEandfree-wilsonQSARanalysis. J.Med.Chem. ,43(9):1780{1792,2000. [137]T.WangandR.C.Wade.Comparativebindingenergy(COM BINE)analysisof OppA-peptidecomplexestorelatestructuretobindingther modynamics. J.Med. Chem. ,45(22):4828{4837,2002. [138]M.MurciaandA.R.Ortiz.Virtualscreeningwithrexib ledockingand COMBINE-basedmodels.applicationtoaseriesoffactorXai nhibitors. J.Med. Chem. ,47(4):805{820,2004. [139]K.Hasegawa,T.Kimura,andK.Funatsu.GAstrategyfor variableselectionin QSARstudies:Enhancementofcomparativemolecularbindin genergyanalysisby GA-basedPLSmethod. Quant.Struct.-Act.Relat. ,18(3):262{272,1999. 247

PAGE 248

[140]M.B.Peters.Asemiempiricalcomparativebindingene rgyanalysisstudyofaseries oftrypsininhibitors.Master'sthesis,ThePennsylvaniaS tateUniversity,2005. [141]R.Diestel. Graphtheory .Springer,Berlin,2005. [142]P.Labute.Ontheperceptionofmoleculesfrom3Datomi ccoordinates. J.Chem. Inf.Model. ,45(2):215{221,2005. [143]T.R.Cundari,C.Sarbu,andH.F.Pop.Robustfuzzyprin cipalcomponentanalysis (FPCA).acomparativestudyconcerninginteractionofcarb on-hydrogenbondswith molybdenum-oxobonds. J.Chem.Inf.Comp.Sci. ,42(6):1363{1369,2002. [144]S.Wold,J.Trygg,A.Berglund,andH.Antti.Somerecen tdevelopmentsinpls modeling. Chemom.Intell.Lab.Syst. ,58(2):131{150,2001. [145]G.M.Ullmann,E.W.Knapp,andN.M.Kostic.Computatio nalsimulation andanalysisofdynamicassociationbetweenplastocyanina ndcytochromef. consequencesfortheelectron-transferreaction. J.Am.Chem.Soc. ,119(1):42{52, 1997. [146]J.O.A.DeKerpelandU.Ryde.Proteinstraininbluecop perproteinsstudiedby freeenergyperturbations. Proteins:Struct.Funct.Genet. ,36(2):157{174,1999. [147]M.H.M.OlssonandU.Ryde.Theinruenceofaxialligand sonthereduction potentialofbluecopperproteins. J.Biol.Inorg.Chem. ,4(5):654{663,1999. [148]R.RemenyiandP.Comba.Anewgeneralmolecularmechan icsforceeldforthe oxidizedformfobluecoppperproteins. J.Inorg.Biochem. ,86(1):397{397,2001. [149]P.Comba,A.Lledos,F.Maseras,andR.Remenyi.Hybrid quantum mechanics/molecularmechanicsstudiesoftheactivesiteo fthebluecopperproteins amicyaninandrusticyanin. Inorg.Chim.Acta ,324(1-2):21{26,2001. [150]P.CombaandR.Remenyi.Anewmolecularmechanicsforc eeldfortheoxidized formofbluecopperproteins. J.Comput.Chem. ,23(7):697{705,2002. [151]D.Suarez,N.Diaz,andK.M.MerzJr.Ureases:Quantumc hemicalcalculationson clustermodels. J.Am.Chem.Soc. ,125(50):15324{15337,2003. [152]G.EstiuandK.M.MerzJr.Enzymaticcatalysisofuread ecomposition: Eliminationorhydrolysis? J.Am.Chem.Soc. ,126(38):11832{11842,2004. [153]G.EstiuandK.M.MerzJr.Catalyzeddecompositionofu rea.Moleculardynamics simulationsofthebindingofureatourease. Biochemistry ,45(14):4429{4443,2006. [154]G.Estiu,D.Suarez,andK.M.MerzJr.Quantummechanic alandmolecular dynamicssimulationsofureasesandZnbeta-lactamases. J.Comput.Chem. 27(12):1240{1262,2006. 248

PAGE 249

[155]TheApacheProject. Xerces-C++Parser .http://xml.apache.org/xerces-c/(accessed Oct1,2005). [156]A.M.Wollacott. Computationalstudiesoftheapplicabilityofsemiempiric alquantummechanicalmethodstostudyproteinstructure .PhDthesis,ThePennsylvania StateUniversity,2005. [157]A.M.WollacottandK.M.MerzJr.Hapticapplicationsf ormolecularstructure manipulation. J.Mol.GraphicsModell. ,25(6):801{805,2007. [158]R.J.F.Branco,P.A.Fernandes,andM.J.Ramos.Molecu lardynamicssimulations oftheenzymecu,znsuperoxidedismutase. J.Phys.Chem.B ,110(33):16754{16762, 2006. [159]T.WangandJ.J.Zhou.3DFS:Anew3Drexiblesearchings ystemforuseindrug design. J.Chem.Inf.Comput.Sci. ,38(1):71{77,1998. [160]J.M.Wang,R.M.Wolf,J.W.Caldwell,P.A.Kollman,and D.A.Case. Developmentandtestingofageneralamberforceeld. J.Comput.Chem. 25(9):1157{1174,2004. [161]E.C.MengandR.A.Lewis.DeterminationofMolecularT opologyandAtomic HybridizationStatesfromHeavy-AtomCoordinates. J.Comput.Chem. 12(7):891{898,1991. [162]F.H.Allen,O.Kennard,D.G.Watson,L.Brammer,A.G.O rpen,andR.Taylor. TablesofBondLengthsDeterminedbyX-RayandNeutron-Dir action.1.Bond LengthsinOrganic-Compounds. J.Chem.Soc.,PerkinTrans.2 ,(12):S1{S19,1987. [163]J.C.BaberandE.E.Hodgkin.AutomaticAssignmentofC hemicalConnectivityto Organic-MoleculesintheCambridgeStructuralDatabase. J.Chem.Inf.Comput. Sci. ,32(5):401{406,1992. [164]A.G.Orpen,L.Brammer,F.H.Allen,O.Kennard,D.G.Wa tson,andR.Taylor. TablesofBondLengthsDeterminedbyX-RayandNeutron-Dir action.2. OrganometallicCompoundsandCo-OrdinationComplexesoft heD-Blockand F-BlockMetals. J.Chem.Soc.,DaltonTrans. ,(12):S1{S83,1989. [165]M.Hendlich,F.Rippmann,andG.Barnickel.BALI:Auto maticassignmentofbond andatomtypesforproteinligandsintheBrookhavenProtein Databank. J.Chem. Inf.Comput.Sci. ,37(4):774{778,1997. [166]J.M.Wang,W.Wang,P.A.Kollman,andD.A.Case.Automa ticatomtype andbondtypeperceptioninmolecularmechanicalcalculati ons. J.Mol.Graphics Modell. ,25(2):247{260,2006. [167]B.T.Fan,A.Panaye,J.P.Doucet,andA.Barbu.RingPer ception-ANew AlgorithmforDirectlyFindingtheSmallestSetofSmallest RingsfromaConnection Table. J.Chem.Inf.Comput.Sci. ,33(5):657{662,1993. 249

PAGE 250

[168]B.L.Roos-kozelandW.L.Jorgensen.Computer-Assist edMechanisticEvaluation ofOrganic-Reactions.2.PerceptionofRings,Aromaticity ,andTautomers. J.Chem. Inf.Comput.Sci. ,21(2):101{111,1981. [169]M.LiptonandW.C.Still.Themultipleminimumproblem inmolecularmodeling -treesearchinginternalcoordinateconformationalspace J.Comput.Chem. 9(4):343{355,1988. [170]J.R.Ullmann.Analgorithmforsubgraphisomorphism. J.ACM ,23(1):31{42,1976. [171]WilsonT.Willett,P.andS.F.Reddaway.Atom-by-atom searchingusingmassive parallelism.implementationoftheullmannsubgraphisomo rphismalgorithmonthe distributedarrayprocessor. J.Chem.Inf.Model. ,31(2):225{233,1991. [172]E.J.Barker,D.Buttar,D.A.Cosgrove,E.J.Gardiner, P.Kitts,P.Willett,and V.J.Gillet.Scaoldhoppingusingcliquedetectionapplie dtoreducedgraphs. J. Chem.Inf.Model. ,46(2):503{511,2006. [173]S.K.Kearsley.OntheOrthogonalTransformationUsed forStructuralComparisons. ActaCrystallogr.,Sect.A:Found.Crystallogr. ,45:208{210,1989. [174]W.Kabsch.SolutionforBestRotationtoRelate2Setso fVectors. ActaCrystallogr.,Sect.A:Found.Crystallogr. ,32:922{923,1976. [175]W.Kabsch.DiscussionofSolutionforBestRotationto Relate2SetsofVectors. ActaCrystallogr.,Sect.A:Found.Crystallogr. ,34:827{828,1978. [176]G.Carta,V.Onnis,A.J.S.Knox,D.Fayne,andD.G.Lloy d.Permutinginput formoreeectivesamplingof3Dconformerspace. J.Comput.-AidedMol.Des. 20(3):179{190,2006. [177]C.Lemmen,M.Zimmermann,andT.Lengauer.Multiplemo lecularsuperpositioning asaneectivetoolforvirtualdatabasescreening. Perspect.DrugDiscoveryDes. 20(1):43{62,2000. [178]F.Daeyaert,M.deJonge,J.Heeres,L.Koymans,P.Lewi ,M.H.Vinkers,and P.A.J.Janssen.Apharmacophoredockingalgorithmanditsa pplicationtothe cross-dockingof18HIV-NNRTI'sintheirbindingpockets. Proteins:Struct.Funct. Genet. ,54(3):526{533,2004. [179]C.LemmenandT.Lengauer.Computationalmethodsfort hestructuralalignment ofmolecules. J.Comput.-AidedMol.Des. ,14(3):215{232,2000. [180]Q.Chen,R.E.Higgs,andM.Vieth.Geometricaccuracyo fthree-dimensional molecularoverlays. J.Chem.Inf.Model. ,46(5):1996{2002,2006. [181]S.K.Drayton,K.Edwards,N.Jewell,D.B.Turner,D.J. Wild,P.Willett,P.M. Wright,andK.Simmons.Similaritysearchinginlesofthre e-dimensionalchemical 250

PAGE 251

structures:Identicationofbioactivemolecules. InternetJ.Chem. ,1(37):CP3{U34, 1998. [182]F.Melani,P.Gratteri,M.Adamo,andC.Bonaccini.Fie ldinteractionand geometricaloverlap:Anewsimplexandexperimentaldesign basedcomputational procedureforsuperposingsmallligandmolecules. J.Med.Chem. ,46(8):1359{1371, 2003. [183]R.P.Sheridan,R.Nilakantan,J.S.Dixon,andR.Venka taraghavan.Theensemble approachtodistancegeometry-applicationtothenicotini cpharmacophore. J.Med. Chem. ,29(6):899{906,1986. [184]S.K.KearsleyandG.M.Smith.Analternativemethodfo rthealignmentof molecularstructures:Maximizingelectrostaticandsteri coverlap. Tetrahedron Comput.Methodol. ,3:615{633,1990. [185]A.C.Good,E.E.Hodgkin,andW.G.Richards.Utilizati onofgaussianfunctions fortherapidevaluationofmolecularsimilarity. J.Chem.Inf.Comput.Sci. 32(3):188{191,1992. [186]Y.C.Martin,M.G.Bures,E.A.Danaher,J.Delazzer,I. Lico,andP.A.Pavlik.A fastnewapproachtopharmacophoremappinganditsapplicat iontodopaminergic andbenzodiazepineagonists. J.Comput.-AidedMol.Des. ,7(1):83{102,1993. [187]B.B.Masek,A.Merchant,andJ.B.Matthew.MolecularS hapeComparisonof Angiotensin-IIReceptorAntagonists. J.Med.Chem. ,36(9):1230{1238,1993. [188]G.Klebe,T.Mietzner,andF.Weber.Dierentapproach estowardanautomatic structuralalignmentofdrugmolecules-applicationstost erolmimics,thrombinand thermolysininhibitors. J.Comput.-AidedMol.Des. ,8(6):751{778,1994. [189]A.N.Jain,T.G.Dietterich,R.H.Lathrop,D.Chapman, R.E.Critchlow,B.E. Bauer,T.A.Webster,andT.Lozanoperez.Compass-ashape-b asedmachine learningtoolfordrugdesign. J.Comput.-AidedMol.Des. ,8(6):635{652,1994. [190]G.Jones,P.Willett,andR.C.Glen.Ageneticalgorith mforrexiblemolecular overlayandpharmacophoreelucidation. J.Comput.-AidedMol.Des. ,9(6):532{549, 1995. [191]R.A.Dammkoehler,S.F.Karasek,E.F.B.Shands,andG. R.Marshall.Sampling conformationalhyperspace:Techniquesforimprovingcomp leteness. J.Comput.AidedMol.Des. ,9(6):491{499,1995. [192]C.McmartinandR.S.Bohacek.FlexibleMatchingofTes tLigandstoa3D PharmacophoreUsingaMolecularSuperpositionForce-Fiel d-Comparisonof PredictedandExperimentalConformationsofInhibitorsof 3Enzymes. J.Comput.AidedMol.Des. ,9(3):237{250,1995. 251

PAGE 252

[193]T.D.J.Perkins,J.E.J.Mills,andP.M.Dean.Molecula rsurface-volumeand propertymatchingtosuperposerexibledissimilarmolecul es. J.Comput.-AidedMol. Des. ,9(6):479{490,1995. [194]M.Petitjean.Geometricmolecularsimilarityfromvo lume-baseddistance minimization-applicationtosaxitoxinandtetrodotoxin. J.Comput.Chem. 16(1):80{90,1995. [195]J.A.Grant,M.A.Gallardo,andB.T.Pickup.Afastmeth odofmolecularshape comparison:Asimpleapplicationofagaussiandescription ofmolecularshape. J. Comput.Chem. ,17(14):1653{1666,1996. [196]C.LemmenandT.Lengauer.Time-ecientrexiblesuper positionofmedium-sized molecules. J.Comput.-AidedMol.Des. ,11(4):357{368,1997. [197]A.J.McmahonandP.M.King.OptimizationofCarbomole cularsimilarityindex usinggradientmethods. J.Comput.Chem. ,18(2):151{158,1997. [198]A.Cosse-BarbiandM.Raji.Discretepatternrecogni tionbyttingontoa continuousfunction. J.Comput.Chem. ,18(15):1875{1892,1997. [199]J.Mestres,D.C.Rohrer,andG.M.Maggiora.Mimic:Amo lecular-eldmatching program.exploitingapplicabilityofmolecularsimilarit yapproaches. J.Comput. Chem. ,18(7):934{954,1997. [200]J.W.M.Nissink,M.L.Verdonk,J.Kroon,T.Mietzner,a ndG.Klebe. Superpositionofmolecules:Electrondensityttingbyapp licationoffourier transforms. J.Comput.Chem. ,18(5):638{645,1997. [201]M.F.Parretti,R.T.Kroemer,J.H.Rothman,andW.G.Ri chards.Alignment ofmoleculesbytheMonteCarlooptimizationofmolecularsi milarityindices. J. Comput.Chem. ,18(11):1344{1353,1997. [202]M.CocchiandP.G.DeBenedetti.Useofthesupermolecu leapproachtoderive molecularsimilaritydescriptorsforQSARanalysis. J.Mol.Model. ,4(3):113{131, 1998. [203]M.C.DeRosaandA.Berglund.Anewmethodforpredictin gthealignmentof rexiblemoleculesandorientingtheminareceptorcleftofk nownstructure. J.Med. Chem. ,41(5):691{698,1998. [204]S.Handschuh,M.Wagener,andJ.Gasteiger.Superposi tionofthree-dimensional chemicalstructuresallowingforconformationalrexibili tybyahybridmethod. J. Chem.Inf.Comput.Sci. ,38(2):220{232,1998. [205]T.WangandJ.J.Zhou.3DFS:3Drexiblesearchingsyste mforleaddiscovery-new version1.2. JournalofMolecularModeling ,5(11):231{251,1999. 252

PAGE 253

[206]C.Lemmen,C.Hiller,andT.Lengauer.RigFit:Anewapp roachtosuperimposing ligandmolecules. J.Comput.-AidedMol.Des. ,12(5):491{502,1998. [207]M.D.Miller,R.P.Sheridan,andS.K.Kearsley.SQ:apr ogramforrapidly producingpharmacophoricallyreleventmolecularsuperpo sitions. J.Med.Chem. 42(9):1505{14,1999. [208]G.Klebe,T.Mietzner,andF.Weber.Methodologicalde velopmentsandstrategies forafastrexiblesuperpositionofdrug-sizemolecules. J.Comput.-AidedMol.Des. 13(1):35{49,1999. [209]M.deCaceres,J.Villa,J.J.Lozano,andF.Sanz.MIPSI M:similarityanalysisof molecularinteractionpotentials. Bioinformatics ,16(6):568{569,2000. [210]D.A.Cosgrove,D.M.Bayada,andA.P.Johnson.Anovelm ethodofaligning moleculesbylocalsurfaceshapesimilarity. J.Comput.-AidedMol.Des. 14(6):573{591,2000. [211]M.FeherandJ.M.Schmidt.Multiplerexiblealignment withseal:Astudy ofmoleculesactingonthecolchicinebindingsite. J.Chem.Inf.Comput.Sci. 40(2):495{502,2000. [212]X.Girones,D.Robert,andR.Carbo-Dorca.TGSA:Amol ecularsuperposition programbasedontopo-geometricalconsiderations. J.Comp.Chem. ,22(2):255{263, 2001. [213]X.GironesandR.Carbo-Dorca.TGSA-rex:Extendingt hecapabilitiesofthe topo-geometricalsuperpositionalgorithmtohandlerexib lemolecules. J.Comp. Chem. ,25(2):153{159,2004. [214]P.Labute,C.Williams,M.Feher,E.Sourial,andJ.M.S chmidt.Flexiblealignment ofsmallmolecules. J.Med.Chem. ,44(10):1483{1490,2001. [215]J.E.J.Mills,I.J.P.deEsch,T.D.J.Perkins,andP.M. Dean.Slate:Amethod forthesuperpositionofrexibleligands. J.Comput.-AidedMol.Des. ,15(1):81{96, 2001. [216]M.C.Pitman,W.K.Huber,H.Horn,A.Kramer,J.E.Rice, andW.C.Swope. FLASHFLOOD:A3Deld-basedsimilaritysearchandalignmen tmethodfor rexiblemolecules. J.Comput.-AidedMol.Des. ,15(7):587{612,2001. [217]A.Kramer,H.W.Horn,andJ.E.Rice.Fast3Dmolecular superpositionand similaritysearchindatabasesofrexiblemolecules. J.Comput.-AidedMol.Des. 17(1):13{38,2003. [218]S.P.Korhonen,K.Tuppurainen,R.Laatikainen,andM. Perakyla.Comparing theperformanceofFLUFF-BALLtoSEAL-CoMFAwithalargediv erseestrogen dataset:Fromrelevantsuperpositionstosolidprediction s. J.Chem.Inf.Model. 45(6):1874{1883,2005. 253

PAGE 254

[219]A.J.Tervo,T.Ronkko,T.H.Nyronen,andA.Poso.BRUTU S:Optimizationofa grid-basedsimilarityfunctionforrigid-bodymoleculars uperposition.I.alignment andvirtualscreeningapplications. J.Med.Chem. ,48(12):4076{4086,2005. [220]T.Ronkko,A.J.Tervo,J.Parkkinen,andA.Poso.BRUTU S:Optimizationofa grid-basedsimilarityfunctionforrigid-bodymoleculars uperposition.II.description andcharacterization. J.Comput.-AidedMol.Des. ,20(4):227{236,2006. [221]S.J.ChoandY.X.Sun.FLAME:Aprogramtorexiblyalign molecules. J.Chem. Inf.Model. ,46(1):298{306,2006. [222]J.Marialke,R.Korner,S.Tietze,andJ.Apostolakis. Graph-basedmolecular alignment(GMA). J.Chem.Inf.Model. ,47(2):591{601,2007. [223]G.KlebeandT.Mietzner.Afastandecientmethodtoge neratebiologically relevantconformations. J.Comput.-AidedMol.Des. ,8(5):583{606,1994. [224]J.SadowskiandJ.Bostrom.Mimumbarevisited:Torsi onanglerulesforconformer generationderivedfromx-raystructures. J.Chem.Inf.Model. ,46(6):2305{2309, 2006. [225]F.Daeyaert,M.deJonge,J.Heeres,L.Koymans,P.Lewi ,W.vandenBroeck,and M.Vinkers.Paretooptimalrexiblealignmentofmoleculesu singanon-dominated sortinggeneticalgorithm. Chemom.Intell.Lab.Syst. ,77(1-2):232{237,2005. [226]A.Strizhev,E.J.Abrahamian,S.Choi,J.M.Leonard,P .R.N.Wolohan,andR.D. Clark.TheEectsofBiasingTorsionalMutationsinaConfor mationalGA. J. Chem.Inf.Model. ,46(4):1862{1870,2006. [227]D.K.Agraotis,A.C.Gibbs,F.Zhu,S.Izrailev,andE. Martin.Conformational samplingofbioactivemolecules:Acomparativestudy. J.Chem.Inf.Model. 47(3):1067{1086,2007. [228]J.Bostrom,P.O.Norrby,andT.Liljefors.Conformat ionalenergypenaltiesof protein-boundligands. J.Comput.-AidedMol.Des. ,12(4):383{396,1998. [229]J.Bostrom.Reproducingtheconformationsofprotei n-boundligands:Acritical evaluationofseveralpopularconformationalsearchingto ols. J.Comput.-AidedMol. Des. ,15(12):1137{1152,2001. [230]D.J.DillerandK.M.MerzJr.Canweseparateactivefro minactiveconformations? J.Comput.-AidedMol.Des. ,16(2):105{112,2002. [231]J.Bostrom,J.R.Greenwood,andJ.Gottfries.Assess ingtheperformanceof omegawithrespecttoretrievingbioactiveconformations. J.Mol.GraphicsModell. 21(5):449{462,2003. [232]S.Putta,G.A.Landrum,andJ.E.Penzotti.Conformati onmining:Analgorithm forndingbiologicallyrelevant. J.Med.Chem. ,48(9):3313{3318,2005. 254

PAGE 255

[233]S.L.DixonandK.M.MerzJr.QMALIGN.[234]C.Lemmen,T.Lengauer,andG.Klebe.FLEXS:Amethodfo rfastrexibleligand superposition. J.Med.Chem. ,41(23):4502{4520,1998. [235]RDevelopmentCoreTeam. R:Alanguageandenvironmentforstatisticalcomputing .RFoundationforStatisticalComputing,Vienna,Austria, 2005. [236]T.M.Willson,P.J.Brown,D.D.Sternbach,andB.R.Hen ke.ThePPARs:From orphanreceptorstodrugdiscovery. J.Med.Chem. ,43(4):527{550,2000. [237]J.C.Parker.Troglitazone:thediscoveryanddevelop mentofanoveltherapyforthe treatmentoftype2diabetesmellitus. Adv.DrugDeliv.Rev. ,54(9):1173{97,2002. [238]P.J.Rybczynski,R.E.Zeck,J.Dudash,D.W.Combs,T.P .Burris,M.Yang, M.C.Osborne,X.L.Chen,andK.T.Demarest.Benzoxazinones asPPARgamma agonists.2.SARoftheamidesubstituentandinvivoresults inatype2diabetes model. J.Med.Chem. ,47(1):196{209,2004. [239]C.Z.Liao,A.H.Xie,L.M.Shi,J.J.Zhou,andX.P.Lu.Ei genvalueanalysisof peroxisomeproliferator-activatedreceptorgammaagonis ts. J.Chem.Inf.Comput. Sci. ,44(1):230{238,2004. [240]C.Z.Liao,A.H.Xie,J.J.Zhou,L.M.Shi,Z.B.Li,andX. P.Lu.3DQSAR studiesonperoxisomeproliferator-activatedreceptorga mmaagonistsusingCoMFA andCoMSIA. J.Mol.Model. ,10(3):165{177,2004. [241]T.Tuccinardi,E.Nuti,G.Ortore,C.T.Supuran,A.Ros sello,andA.Martinelli. AnalysisofhumancarbonicanhydraseII:Dockingreliabili tyandreceptor-based 3D-QSARstudy. J.Chem.Inf.Model. ,47(2):515{525,2007. [242]C.-Y.Kim,D.A.Whittington,J.S.Chang,J.Liao,J.A. May,andD.W. Christianson.StructuralAspectsofIsozymeSelectivityi ntheBindingofInhibitors toCarbonicAnhydrasesIIandIV. J.Med.Chem. ,45(4):888{893,2002. [243]B.A.Grzybowski,A.V.Ishchenko,C.Y.Kim,G.Topalov ,R.Chapman, D.W.Christianson,G.M.Whitesides,andE.I.Shakhnovich. Combinatorial computationalmethodgivesnewpicomolarligandsforaknow nenzyme. Proc.Natl. Acad.Sci.U.S.A. ,99(3):1270{3,2002. [244]S.Gruneberg,M.T.Stubbs,andG.Klebe.SuccessfulV irtualScreeningforNovel InhibitorsofHumanCarbonicAnhydrase:StrategyandExper imentalConrmation. J.Med.Chem. ,45(17):3588{3602,2002. [245]G.M.Smith,R.S.Alexander,D.W.Christianson,B.M.M cKeever,G.S. Ponticello,J.P.Springer,W.C.Randall,J.J.Baldwin,and C.N.Habecker. PositionsofHis-64andaboundwaterinhumancarbonicanhyd raseIIuponbinding threestructurallyrelatedinhibitors. ProteinSci. ,3(1):118{25,1994. 255

PAGE 256

[246]A.Weber,A.Casini,A.Heine,D.Kuhn,C.T.Supuran,A. Scozzafava,and G.Klebe.UnexpectedNanomolarInhibitionofCarbonicAnhy draseby COX-2-SelectiveCelecoxib:NewPharmacologicalOpportun itiesDuetoRelated BindingSiteRecognition. J.Med.Chem. ,47(3):550{557,2004. [247]R.Recacha,M.J.Costanzo,B.E.Maryano,andD.Chatt opadhyay.Crystal structureofhumancarbonicanhydraseIIcomplexedwithana nti-convulsantsugar sulphamate. Biochem.J. ,361(3):437{41,2002. [248]M.D.Lloyd,N.Thiyagarajan,Y.T.Ho,L.W.L.Woo,O.B. Sutclie,A.Purohit, M.J.Reed,K.R.Acharya,andB.V.L.Potter.FirstCrystalSt ructuresof HumanCarbonicAnhydraseIIinComplexwithDualAromataseSteroidSulfatase Inhibitors. Biochemistry ,44(18):6858{6866,2005. [249]C.-Y.Kim,P.P.Chandra,A.Jain,andD.W.Christianso n. Fluoroaromatic-FluoroaromaticInteractionsbetweenInh ibitorsBoundintheCrystal LatticeofHumanCarbonicAnhydraseII. J.Am.Chem.Soc. ,123(39):9620{9627, 2001. [250]V.Menchise,G.DeSimone,V.Alterio,A.DiFiore,C.Pe done,A.Scozzafava,and C.T.Supuran.CarbonicAnhydraseInhibitors:Stackingwit hPhe131Determines ActiveSiteBindingRegionofInhibitorsAsExempliedbyth eX-rayCrystal StructureofaMembrane-ImpermeantAntitumorSulfonamide Complexedwith IsozymeII. J.Med.Chem. ,48(18):5721{5727,2005. [251]R.D.Hancock.MolecularMechanicsCalculationsasaT oolinCoordination Chemistry. Prog.Inorg.Chem. ,37:187{291,1989. [252]S.C.Hoops,K.W.Anderson,andK.M.MerzJr.Force-Fie ldDesignfor Metalloproteins. J.Am.Chem.Soc. ,113(22):8262{8270,1991. [253]CieplakP.CornellW.Bayly,C.I.andP.A.Kollman.Awe ll-behavedelectrostatic potentialbasedmethodusingchargerestraintsforderivin gatomiccharges:theresp model. J.Phys.Chem. ,97(40):10269{10280,1993. [254]R.H.StoteandM.Karplus.Zincbindinginproteinsand solution:asimplebut accuratenonbondedrepresentation. Proteins ,23(1):12{31,1995. [255]D.V.SakharovandC.Lim.Znproteinsimulationsinclu dingchargetransferand localpolarizationeects. J.Am.Chem.Soc. ,127(13):4921{4929,2005. [256]J.AqvistandA.Warshel.Computersimulationofthein itialprotontransferstepin humancarbonicanhydrasei. J.Mol.Biol. ,224(1):7{14,1992. [257]Y.P.Pang,K.Xu,J.E.Yazal,andF.G.Prendergas.Succ essfulmolecular dynamicssimulationofthezinc-boundfarnesyltransferas eusingthecationicdummy atomapproach. ProteinSci. ,9(10):1857{65,2000. 256

PAGE 257

[258]Y.P.Pang.Successfulmoleculardynamicssimulation oftwozinccomplexesbridged byahydroxideinphosphotriesteraseusingthecationicdum myatommethod. Proteins ,45(3):183{9,2001. [259]A.VedaniandD.W.Huhta.ANewForce-FieldforModelin gMetalloproteins. J. Am.Chem.Soc. ,112(12):4759{4767,1990. [260]N.Gresh,J.P.Piquemal,andM.Krauss.Representatio nofZn(II)complexes inpolarizablemolecularmechanics.Furtherrenementsof theelectrostaticand short-rangecontributions.Comparisonswithparallelabi nitiocomputations. J. Comput.Chem. ,26(11):1113{30,2005. [261]N.Gresh.Development,validation,andapplications ofanisotropicpolarizable molecularmechanicstostudyligandanddrug-receptorinte ractions. Curr.Pharm. Des. ,12(17):2121{58,2006. [262]A.K.Rappe,C.J.Casewit,K.S.Colwell,W.A.Goddard, andW.M.Ski.UFF, aFullPeriodic-TableForce-FieldforMolecularMechanics andMolecular-Dynamics Simulations. J.Am.Chem.Soc. ,114(25):10024{10035,1992. [263]A.K.Rappe,K.S.Colwell,andC.J.Casewit.Applicati onofaUniversal Force-FieldtoMetal-Complexes. Inorg.Chem. ,32(16):3438{3450,1993. [264]J.M.Sirovatka,A.K.Rappe,andR.G.Finke.Molecular mechanicsstudies ofcoenzymeB-12complexeswithconstrainedCo-N(axial-ba se)bondlengths: introductionoftheuniversalforceeld(UFF)tocoenzymeB -12chemistryandits usetoprobetheplausibilityofanaxial-base-induced,gro und-statecorrinbutterry conformationalstericeect. Inorg.Chim.Acta ,300:545{555,2000. [265]P.Brandt,T.Norrby,E.Akermark,andP.O.Norrby.Mol ecularmechanics (MM3*)parametersforruthenium(ii)-polypyridylcomplex es. Inorg.Chem. 37(16):4120{4127,1998. [266]H.M.MarquesandK.L.Brown.AMolecularMechanicsFor ce-FieldfortheCobalt Corrinoids. J.Mol.Struct.(Theochem) ,340:97{124,1995. [267]K.L.Brown,X.Zou,andH.M.Marques.NMR-restrainedm olecularmodelingof cobaltcorrinoids:cyanocobalamin(vitaminB-12)andmeth ylcobaltcorrinoids. J. Mol.Struct.(Theochem) ,453:209{224,1998. [268]H.M.MarquesandK.L.Brown.Thestructureofcobaltco rrinoidsbasedon molecularmechanicsandNOE-restrainedmolecularmechani csanddynamics simulations. Coord.Chem.Rev. ,192:127{153,1999. [269]H.M.Marques,B.Ngoma,T.J.Egan,andK.L.Brown.Para metersforthe AMBERforceeldforthemolecularmechanicsmodelingofthe cobaltcorrinoids. J. Mol.Struct. ,561(1-3):71{91,2001. 257

PAGE 258

[270]J.AqvistandA.Warshel.Free-EnergyRelationshipsi nMetalloenzyme-Catalyzed Reactions-CalculationsoftheEectsofMetal-IonSubstit utionsinStaphylococcal Nuclease. J.Am.Chem.Soc. ,112(8):2860{2868,1990. [271]U.Ryde.Molecular-DynamicsSimulationsofAlcoholDehydrogenasewitha 4-Coordinateor5-CoordinateCatalyticZincIon. Proteins:Struct.Funct.Genet. 21(1):40{56,1995. [272]U.Ryde.OntheRoleofGlu-68inAlcohol-Dehydrogenas e. ProteinSci. 4(6):1124{1132,1995. [273]U.Ryde.Carboxylatebindingmodesinzincproteins:A theoreticalstudy. Biophys. J. ,77(5):2777{2787,1999. [274]R.D.Hancock,J.S.Weaving,andH.M.Marques.AMolecu larMechanicsModel oftheMetalloporphyrins-theRoleofStericHindranceinDi scriminationinFavor ofDioxygenRelativetoCarbon-MonoxideinSomeHemeModels J.Chem.Soc., Chem.Commun. ,(16):1176{1178,1989. [275]H.M.MarquesandI.Cukrowski.Molecularmechanicsmo dellingofporphyrins. usingarticialneuralnetworkstodevelopmetalparameter sforfour-coordinate metalloporphyrins. Phys.Chem.Chem.Phys. ,4(23):5878{5887,2002. [276]H.M.MarquesandK.L.Brown.Molecularmechanicsandm oleculardynamics simulationsofporphyrins,metalloporphyrins,hemeprote insandcobaltcorrinoids. Coord.Chem.Rev. ,225(1-2):123{158,2002. [277]C.E.Skopec,J.M.Robinson,I.Cukrowski,andH.M.Mar ques.Usingarticial neuralnetworkstodevelopmolecularmechanicsparameters forthemodellingof metalloporphyrins.III.vecoordinateZn(II)porphyrins andthemetalloprophyrins oftheearly3dmetals. J.Mol.Struct. ,738(1-3):67{78,2005. [278]C.E.Skopec,I.Cukrowski,andH.M.Marques.Usingart icialneuralnetworks todevelopmolecularmechanicsparametersforthemodellin gofmetalloporphyrins: PartIV.Five-,six-coordinatemetalloporphyrinsofMn,Co ,NiandCu. J.Mol. Struct. ,783(1-3):21{33,2006. [279]P.O.NorrbyandT.Liljefors.Automatedmolecularmec hanicsparameterization withsimultaneousutilizationofexperimentalandquantum mechanicaldata. J. Comput.Chem. ,19(10):1146{1166,1998. [280]P.O.NorrbyandP.Brandt.Derivingforceeldparamet ersforcoordination complexes. Coord.Chem.Rev. ,212:79{109,2001. [281]K.M.MerzJr.CO2BindingtoHumanCarbonicAnhydraseII. J.Am.Chem.Soc. 113(2):406{411,1991. [282]K.M.MerzJr.,M.A.Murcko,andP.A.Kollman.Inhibiti onof Carbonic-Anhydrase. J.Am.Chem.Soc. ,113(12):4484{4490,1991. 258

PAGE 259

[283]N.Diaz,D.Suarez,andK.M.MerzJr.Hydrationofzinci ons:theoreticalstudy of[Zn(H2O)(4)](H2O)(8)(2+)and[Zn(H2O)(6)](H2O)(6)(2 +). Chem.Phys.Lett. 326(3-4):288{292,2000. [284]N.Diaz,D.Suarez,andK.M.M.MerzJr.Zincmetallo-be ta-lactamasefrom Bacteroidesfragilis:Aquantumchemicalstudyonmodelsys temsoftheactivesite. J.Am.Chem.Soc. ,122(17):4197{4208,2000. [285]N.Diaz,D.Suarez,andK.M.MerzJr.Moleculardynamic ssimulations ofthemononuclearzinc-beta-lactamasefrombacilluscere uscomplexedwith benzylpenicillinandaquantumchemicalstudyofthereacti onmechanism. J.Am. Chem.Soc. ,123(40):9867{9879,2001. [286]N.Diaz,D.Suarez,T.L.Sordo,andK.M.MerzJr.Atheor eticalstudyofthe aminolysisreactionoflysine199ofhumanserumalbuminwit hbenzylpenicillin: Consequencesforimmunochemistryofpenicillins. J.Am.Chem.Soc. 123(31):7574{7583,2001. [287]N.Diaz,D.Suarez,T.L.Sordo,andK.M.MerzJr.Acylat ionofclassa beta-lactamasesbypenicillins:Atheoreticalexaminatio noftheroleofserine 130andthebeta-lactamcarboxylategroup. J.Phys.Chem.B ,105(45):11302{11313, 2001. [288]D.SuarezandK.M.MerzJr.Moleculardynamicssimulat ionsofthemononuclear zinc-beta-lactamasefromBacilluscereus. J.Am.Chem.Soc. ,123(16):3759{3770, 2001. [289]N.Diaz,T.L.Sordo,K.M.MerzJr.,andD.Suarez.Insig htsintotheacylation mechanismofclassAbeta-lactamasesfrommoleculardynami cssimulations oftheTEM-1enzymecomplexedwithbenzylpenicillin. J.Am.Chem.Soc. 125(3):672{684,2003. [290]N.Diaz,D.Suarez,K.M.MerzJr.,andT.L.Sordo.Molec ulardynamics simulationsoftheTEM-1,beta-lactamasecomplexedwithce phalothin. J.Med. Chem. ,48(3):780{791,2005. [291]D.Suarez,E.N.Brothers,andK.M.MerzJr.Insightsin tothestructureand dynamicsofthedinuclearzincbeta-lactamasesitefromBac teroidesfragilis. Biochemistry ,41(21):6615{6630,2002. [292]D.Suarez,N.Diaz,andK.M.MerzJr.Moleculardynamic ssimulationsofthe dinuclearzinc-beta-lactamasefrombacteroidesfragilis complexedwithimipenem. J. Comput.Chem. ,23(16):1587{1600,2002. [293]G.Cui,B.Wang,andK.M.MerzJr.Computationalstudie softhe farnesyltransferaseternarycomplex-PartI:Substratebi nding. Biochemistry 44(50):16513{16523,2005. 259

PAGE 260

[294]J.R.Collins,D.L.Camper,andG.H.Loew.ValproicAci dMetabolismby Cytochrome-P450-aTheoretical-StudyofStereoelectroni cModulatorsofProduct Distribution. J.Am.Chem.Soc. ,113(7):2736{2743,1991. [295]J.R.Collins,P.Du,andG.H.Loew.Molecular-Dynamic sSimulationsofthe RestingandHydrogenPeroxide-BoundStatesofCytochromeCPeroxidase. Biochemistry ,31(45):11166{11174,1992. [296]S.J.Yao,J.P.Plastaras,andL.G.Marzilli.AMolecul arMechanicsAmber-Type Force-FieldforModelingPlatinumComplexesofGuanineDer ivatives. Inorg.Chem. 33(26):6061{6077,1994. [297]M.M.Harding.Thegeometryofmetal-ligandinteracti onsrelevanttoproteins. Acta Crystallogr.,Sect.D:Biol.Crystallogr. ,55:1432{43,1999. [298]M.M.Harding.Thegeometryofmetal-ligandinteracti onsrelevanttoproteins. II.anglesatthemetalatom,additionalweakmetal-donorin teractions. Acta Crystallogr.,Sect.D:Biol.Crystallogr. ,56:857{67,2000. [299]M.M.Harding.Geometryofmetal-ligandinteractions inproteins. ActaCrystallogr., Sect.D:Biol.Crystallogr. ,57:401{11,2001. [300]M.M.Harding.Metal-ligandgeometryrelevanttoprot einsandinproteins:sodium andpotassium. ActaCrystallogr.,Sect.D:Biol.Crystallogr. ,58:872{4,2002. [301]M.M.Harding.Thearchitectureofmetalcoordination groupsinproteins. Acta Crystallogr.,Sect.D:Biol.Crystallogr. ,60:849{59,2004. [302]M.M.Harding.Smallrevisionstopredicteddistances aroundmetalsitesinproteins. ActaCrystallogr.,Sect.D:Biol.Crystallogr. ,62:678{82,2006. [303]J.Aqvist.IonWaterInteractionPotentialsDerivedf romFree-EnergyPerturbation Simulations. J.Phys.Chem. ,94(21):8021{8024,1990. [304]A.Bondi.vanDerWaalsVolumes+Radii. J.Phys.Chem. ,68(3):441{451,1964. [305]S.S.Batsanov.vanderWaalsradiiofelements. Inorg.Mater. ,37(9):871{885,2001. [306]S.S.Batsanov.ThedeterminationofvanderWaalsradi ifromthestructural characteristicsofmetals. Russ.J.Phys.Chem. ,74(7):1144{1147,2000. [307]D.Asthagiri,L.R.Pratt,M.E.Paulaitis,andS.B.Rem pe.Hydrationstructure andfreeenergyofbiomolecularlyspecicaqueousdication s,includingZn2+and rsttransitionrowmetals. J.Am.Chem.Soc. ,126(4):1285{1289,2004. [308]C.S.BabuandC.Lim.Empiricalforceeldsforbiologi callyactivedivalentmetal cationsinwater. J.Phys.Chem.A ,110(2):691{699,2006. [309]C.S.BabuandC.Lim.Anewinterpretationoftheeecti vebornradiusfrom simulationandexperiment. Chem.Phys.Lett. ,310(1-2):225{228,1999. 260

PAGE 261

[310]C.S.BabuandC.Lim.Theoryofionichydration:Insigh tsfrommolecular dynamicssimulationsandexperiment. J.Phys.Chem.B ,103(37):7958{7968,1999. [311]A.C.Vaiana,A.Schulz,J.Wolfrum,M.Sauer,andJ.C.S mith.Molecular mechanicsforceeldparameterizationoftheruorescentpr oberhodamine6Gusing automatedfrequencymatching. J.Comput.Chem. ,24(5):632{639,2003. [312]M.J.Frisch,G.W.Trucks,H.B.Schlegel,G.E.Scuseri a,M.A.Robb,J.R. Cheeseman,JrJ.A.Montgomery,T.Vreven,K.N.Kudin,J.C.B urant,J.M. Millam,S.S.Iyengar,J.Tomasi,V.Barone,B.Mennucci,M.C ossi,G.Scalmani, N.Rega,G.A.Petersson,H.Nakatsuji,M.Hada,M.Ehara,K.T oyota,R.Fukuda, J.Hasegawa,M.Ishida,T.Nakajima,Y.Honda,O.Kitao,H.Na kai,M.Klene, X.Li,J.E.Knox,H.P.Hratchian,J.B.Cross,V.Bakken,C.Ad amo,J.Jaramillo, R.Gomperts,R.E.Stratmann,O.Yazyev,A.J.Austin,R.Camm i,C.Pomelli, J.W.Ochterski,P.Y.Ayala,K.Morokuma,G.A.Voth,P.Salva dor,J.J. Dannenberg,V.G.Zakrzewski,S.Dapprich,A.D.Daniels,M. C.Strain,O.Farkas, D.K.Malick,A.D.Rabuck,K.Raghavachari,J.B.Foresman,J .V.Ortiz,Q.Cui, A.G.Baboul,S.Cliord,J.Cioslowski,B.B.Stefanov,G.Li u,A.Liashenko, P.Piskorz,I.Komaromi,R.L.Martin,D.J.Fox,T.Keith,M.A .AlLaham,C.Y. Peng,A.Nanayakkara,M.Challacombe,P.M.W.Gill,B.Johns on,W.Chen, M.W.Wong,C.Gonzalez,andJ.A.Pople.Gaussian03,revisio nc.02.Gaussian, Inc.,Wallingford,CT,2004. [313]B.H.Besler,K.M.MerzJr.,andP.A.Kollman.AtomicCh argesDerivedfrom SemiempiricalMethods. J.Comput.Chem. ,11(4):431{439,1990. [314]P.Cieplak,W.D.Cornell,C.Bayly,andP.A.Kollman.A pplicationofthe MultimoleculeandMulticonformationalRESPMethodologyt oBiopolymers -ChargeDerivationforDNA,RNA,andProteins. J.Comput.Chem. 16(11):1357{1377,1995. [315]A.D.Becke.Density-FunctionalExchange-EnergyApp roximationwithCorrect Asymptotic-Behavior. Phys.Rev.A ,38(6):3098{3100,1988. [316]C.T.Lee,W.T.Yang,andR.G.Parr.DevelopmentoftheC olle-Salvetti Correlation-EnergyFormulaintoaFunctionaloftheElectr on-Density. Phys. Rev.B ,37(2):785{789,1988. [317]A.D.Becke.Density-FunctionalThermochemistry.3. theRoleofExactExchange. J. Chem.Phys. ,98(7):5648{5652,1993. [318]P.E.M.SiegbahnandT.Borowski.Modelingenzymaticr eactionsinvolving transitionmetals. Acc.Chem.Res. ,39(10):729{738,2006. [319]A.BlondelandM.Karplus.Newformulationforderivat ivesoftorsionanglesand impropertorsionanglesinmolecularmechanics:Eliminati onofsingularities. J. Comput.Chem. ,17(9):1132{1141,1996. 261

PAGE 262

[320]W.C.SwopeandD.M.Ferguson.Alternativeexpression sforenergiesandforces duetoanglebendingandtorsionalenergy. J.Comput.Chem. ,13(5):585{594,1992. [321]R.E.Tuzun,D.W.Noid,andB.G.Sumpter.Computationo finternalcoordinates, derivatives,andgradientexpressions:Torsionandimprop ertorsion. J.Comput. Chem. ,21(7):553{561,2000. 262

PAGE 263

BIOGRAPHICALSKETCH MartinBarryPeterswasbornonApril3 rd ,1980inTipperary,RepublicofIreland toMartinandMaryPeters.Heattendedprimaryandsecondary schoolinNewInnand Cashelrespectively.InJune2002hereceivedhisB.A.Mod.d egreeinComputational ChemistryfromTrinityCollege,UniversityofDublin(TCD) .WhileatTrinityheworked underthesupervisionofDr.IsabelRozaswherehewasintrod ucedtocomputational chemistryandtohisfuturesignicantother,JaneMontague .MartinenrolledinthePhD programatPennStateUniversity(PSU)andworkedwithProf. KennethM.MerzJr. ontheapplicationofsemi-empiricalquantummechanicstos tructure-baseddrugdesign. InAugust2005,Martinreceivedhisseconddegree,M.Sc.inc hemistryfromPSU.In September2005hemovedtotheUniversityofFlorida(UF)and joinedtheDepartmentof ChemistryandtheQuantumTheoryProjecttocontinuehiswor kwithProf.K.M.Merz Jrinthepursuitofadoctoraldegree.Inhisnalyearasagra duatestudentheapplied foragovernmentofIrelandpostdoctoralfellowshipinscie nce,engineeringandtechnology (IRCSET)whichwassuccessful.AftergraduatingfromUF,he joinedDr.DavidLloydat TCDasanIRCSETpostdoctoralfellowinhisgroup. 263