Citation
Conformational Analysis of Drug-Like Molecules and Its Applications to Drug Discovery

Material Information

Title:
Conformational Analysis of Drug-Like Molecules and Its Applications to Drug Discovery
Creator:
Fu, Zheng
Publisher:
University of Florida
Publication Date:
Language:
English

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Chemistry
Committee Chair:
Merz, Kenneth Malcolm
Committee Members:
Horenstein, Nicole A
Roitberg, Adrian E
Bruner, Steven Douglas
Sanders, Beverly A
Winner, Lawrence H

Subjects

Subjects / Keywords:
conformation
donepezil
drug
oseltamivir
parallel

Notes

General Note:
Conformational analysis is concerned with exploring the energetically favorable conformations of molecules and describing the energy differences among conformational isomers (conformers) due to the adoption of conformational profiles (bond length, bond angles and dihedral angles). This is typically done by mapping the conformational space of molecules with molecular mechanics (MM), molecular dynamics (MD), quantum mechanics (QM) calculations and analysis of experimentally determined structural data (NMR and X-ray). A primary contribution of this dissertation is developing a systematic search method to identify local and global minimum on molecular conformational energy surface using MM conformational sampling / ab initio QM geometry optimization followed by high accuracy MP2/CBS single point calculation. Meanwhile a QM/MM X-ray refinement approach was implemented to fix the coordinate errors existed in PDB ligand-enzyme crystal structures so as to determine the “bioactive” conformation of drug-like molecules. These two conformational analysis methodologies were applied in several drug compounds as case studies including retinoic acid (all-trans and 9-cis), donepezil and oseltamivir. In addition geometry optimization of oseltamivir-neuraminidase was also performed by using a novel QM/MM energy minimization approach with our upgraded PUPIL 2.0 package. Finally all energetically preferable conformations of both free and bound states of drug-like molecules were stored and categorized into a database for future usage in ligand and structure based drug design.

Record Information

Source Institution:
UFRGP
Rights Management:
All applicable rights reserved by the source institution and holding location.
Embargo Date:
12/31/2013

Downloads

This item has the following downloads:


Full Text

PAGE 1

1 CON FORMATIONAL ANALYSIS OF DRUG LIKE MOLECULES AND ITS APPLICATION S TO DRUG DISCOVERY By ZHENG FU A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2012

PAGE 2

2 2012 Zheng Fu

PAGE 3

3 To my dear est grandfather, Prof. Fu, LiangKui, and my entire family

PAGE 4

4 ACKNOWLEDGMENTS I would first like to acknowledge Pr ofessor Kenneth M. Merz Jr., my teacher and mentor who was really supportive and helped a lot during my entire Ph.D career I also want to thank my colleagues and alumni in Merz Group, including Dr. Xiao He, Dr. Xue Li Dr. Ben Robert and Mr. Yipu Miao, who served as professional inspirations to me. Specifically I want to thank Dr. Mike Weaver for his time and help on editing and proofreading this dissertation.

PAGE 5

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF TABLES ................................ ................................ ................................ ............ 7 LIST OF FIGURES ................................ ................................ ................................ .......... 8 LIST OF ABBREVIATIONS ................................ ................................ ........................... 10 ABSTRACT 1 INTRODUCTION ................................ ................................ ................................ .... 13 2 THEORY AND METHODS ................................ ................................ ..................... 20 2.1 Hartree Fock Approximation ................................ ................................ ............. 20 2.1.1 Hartree Fock Equatio n ................................ ................................ ............ 20 2.1.2 Roothann Hall Matrix Equations ................................ .............................. 24 2.2 Second Order Mller Plesset Perturbation Theory ................................ ........... 28 2.3 Density Functional Theory ................................ ................................ ................ 31 2.4 X ray Refinement ................................ ................................ .............................. 37 2.4.1 Maximum Likelihood ................................ ................................ ................ 37 2.4.2 Cross Validation ................................ ................................ ...................... 40 2.4.3 Energy minimization ................................ ................................ ................ 41 3 RETINOIC ACID ................................ ................................ ................................ ..... 44 3.1 Background ................................ ................................ ................................ ....... 44 3.2 Computational Approaches ................................ ................................ ............... 46 3.2.1 Generating Retin oic Acid Conformation Ensembles ................................ 46 3.2.2 Single Point Energy Calculation ................................ .............................. 47 3.2.3 Alignment and Clustering of Retinoic Acid Confor mers ........................... 47 3.2.4 QM/MM Refinement of Protein RA X ray Crystal Structures ................... 48 3.2.5 Non Parametric Statistical Inference ................................ ....................... 49 3.3 Results and Discussion ................................ ................................ ..................... 50 3.3.1 Conformational Preferences of All trans and 9 cis RAs in Gas Phase .... 50 3.3.2 Estimating RA Conformational Energies with M06 and M06 2X Functionals ................................ ................................ ................................ .... 61 3.3.3 Conformational Preferences of Protein Bound All trans and 9 cis RAs ... 64 3.4 Conclusions ................................ ................................ ................................ ...... 72 4 DONEPEZIL ................................ ................................ ................................ ........... 76 4.1 Background ................................ ................................ ................................ ....... 76 4.2 Computational Approaches ................................ ................................ ............... 78

PAGE 6

6 4.2.1 Exploring Donepezil Conformational Energy Surface .............................. 78 4.2.2 Single Point Energy Calculation ................................ .............................. 79 4.2.2 Parallel QM/MM X ray Crystallography Refinement ................................ 79 4.3 Results and Discussion ................................ ................................ ..................... 81 4.3.1 Exploring Donepezil Conformational Energy Surface .............................. 81 4.3.2 Parallel QM/MM Refinement of Donepezil Tc AChE Complex Crystal Struc ture ................................ ................................ ................................ ........ 86 4.3.3 Strain Energy ................................ ................................ ........................... 92 4.4 Conclusion ................................ ................................ ................................ ........ 95 5 OSELTAMIVIR ................................ ................................ ................................ ........ 96 5.1 Background ................................ ................................ ................................ ....... 96 5.2 Methods ................................ ................................ ................................ ............ 97 5.2.1 Pure QM/MM energy minimizatio n ................................ .......................... 97 5.2.2 QM/MM X ray structure refinement ................................ ......................... 98 5.3 Results ................................ ................................ ................................ .............. 98 5.3.1 Exploring the conformational energy surface of free state oseltamivir .... 98 5.3.2 Conformational profiles of bound state oseltamivir ................................ 102 5.3.3 Pure QM/MM energy minimization of oseltamivir neuraminidase complex ................................ ................................ ................................ ....... 109 5.4 Conclusion ................................ ................................ ................................ ...... 114 6 FUTURE WORK ................................ ................................ ................................ ... 115 APPENDIX A R CODE AND OUTPUT OF NON PARAMETRIC STATISTICAL INFERENCE ON RETINOIC ACID CONFORMATIONAL ENERGY ................................ .......... 120 B R CODE AND OUTPUT OF NON PARAMETRIC STATISTICAL INFERENCE ON DONEPEZIL CONFORMATIONAL ENERGY ................................ ................ 122 C PARAMETER SETTINGS OF DONEPEZIL TC ACHE MODEL RE REFINEMENT USING CLASSICAL FORCE FIELD METHODS .......................... 125 D R CODE FOR NEAREST SHRUNKEN CENTROIDS CLASSIFIER WITH EXTERNAL K FOLD CROSS VALIDATION AND MULTIPLE PARTITIONS ...... 128 LIST OF REFERENCES ................................ ................................ ............................. 129 BIOGRAPHICAL SKETCH ................................ ................................ .......................... 136

PAGE 7

7 LIST OF TABLES Table page 3 1 Back bo ne torsion angles with respect to the PDB deposited RA conformers and their QM/MM re refined geometries. ................................ ............................ 67 3 2 Calculated strain energies using the MP2/CBS method for RA conformers deposited in PDB and th eir correspond ing QM/MM re refined structures. .......... 69 3 3 Comparing real space R factors/real space cor relation coefficients of different refinement methods at given weighting factors. ................................ .... 72 4 1 Co nformational features of six investigated conformations and their optimized ge ometries ................................ ................................ ................................ ......... 88 4 2 Distances of cation Tc AChE recognition and association. ................................ ................................ ............... 92 4 3 Strain ener gies of AFITT fitting result, CSD crystal structure, P DB deposited conformer and its re refined geometries ................................ ............................ 93 5 1 Conf ormat ional features of oseltamivir bound conformations and their optimized geo metries ................................ ................................ ...................... 104 5 2 Strain ene rgies of oseltamivir bound conformations optimized by typical conjugated gradient method along with ab initio /DFT QM Hamiltonians. ......... 112 5 3 Strain energies of oseltamivir bound conformations optimized by PRCG algorithm along with ab initio /DFT QM Hamiltonians. ................................ ....... 112 5 4 Strain ene rgies of oseltamivir bound conformations optimized by TNCG algorithm along with ab initio /DFT QM Hamiltonians. ................................ ....... 113 6 1 S elected commercial CA II inhibitors from DrugBank. ................................ ....... 116 6 2 2 by 2 confusion table describing the accuracy rate of label prediction for each class. ................................ ................................ ................................ ........ 119

PAGE 8

8 LIST OF FIGURES Figure page 1 1 Thermodynamic cycle of the ligand binding process. ................................ ......... 14 2 1 W orkflow of SCF algorithm ................................ ................................ ................ 27 3 1 Relative energy of Omega ensembles optimized by HF/6 31G* ........................ 51 3 2 Relative energies of 66 ATRA low energy conformers as a function of differ ent dihedral angles along the polyene chain ................................ .............. 53 3 3 Schematic representation of the eight groups of 66 ATRA low energy conformer s. ................................ ................................ ................................ ......... 54 3 4 Clustering Tanimoto scores computed from the shape comparison among the 66 ATRA low energy conformers. ................................ ................................ 57 3 5 Relative energies of 48 9 c is RA low energy conformers as a function of differ ent dihedral angles along the polyene chain ................................ .............. 58 3 6 Eight groups of energy preferred conformers characterized for 9 cis RA arranged in the or der of increasing average relative conformational energy. ..... 59 3 7 Clustering Tanimoto scores computed from the shape comparison among the 48 9 cis RA low energy conformers. ................................ ............................ 60 3 8 Relative energies computed for ATRA and 9 cis RA using DFT and MP2/CBS methods in gas phase. ................................ ................................ ....... 62 3 9 Overlay of the PDB deposited RA conformers and corresponding QM/MM re refined s tructures. ................................ ................................ ............................... 65 3 10 Exploring the conformational spaces of ATRA and 9 cis RA with Omega plus HF/6 31G* in the PCM solvent model. ................................ ................................ 68 4 1 Structure diagram of donepezil with atoms numbering. ................................ ...... 76 4 2 Pseudo code of parallel algorithm for computing 2 electron repulsive integrals in QUICK pac kage ................................ ................................ .............. 81 4 3 Identif ied donepezil low energy conformers with Omega / ab initio QM approach and estimated their relative energies by DFT methods in va cuo ....... 82 4 4 Schematic representation of the 18 donepezil low energy conformers identified by Omega and HF/6 31G* in gas phase. ................................ ............ 83

PAGE 9

9 4 5 Crystal structure of donepezil in its binding pocket and m odified donepezil bound conformation. ................................ ................................ ........................... 85 4 6 Superimposition of donepezil X ray structures plus re modeled bound conformations w ith their nearest local m inima and global minimum. ................. 87 4 7 The active site gorge of Tc AChE and ove rlays of PDB deposited donepezil coordinates with its remodelled bound conformations ................................ ....... 91 5 1 Structure diagram of oseltamivir with atom numbering. ................................ ...... 96 5 2 Schematic representation of the 12 oseltamivir low energy conformations. ....... 99 5 3 Estimated relative energies of oseltamivir low enegy conformers by DFT and MP2 level of theories in vacu o. ................................ ................................ ......... 100 5 4 St ereo view superimposition of PDB deposited oseltamivir conformation with its remodelled bound conformation s ................................ ................................ 103 5 5 Superimposition of four oseltamivir bound conformations with thei r nearest local minima and global minimum ................................ ................................ .... 105 5 6 Stereo views of N8 subtype neuraminidase active site with oseltamivir bound conformations. ................................ ................................ ................................ .. 107 5 7 Overlays of pure QM/MM minimization identified oseltamivir conformers ....... 110 5 8 Stereo view of three lowest strain oseltamivir conformers obtained by diffe rent CG algorithms and their corresponding local and global minimum. .... 111 6 1 Pseudo code of identifying optimal threshold value for NSC classifier with external k fold cross validation and multiple partitions. ................................ .... 118 6 2 Finding optimal threshold value using 5 folds external cross validation with 1000 loops multiple partitions. ................................ ................................ .......... 119

PAGE 10

10 LIST OF ABBREVIATION S 9 cis RA 9 cis Retinoic Acid ATRA All trans Retinoic Acid CB S Complete Basis Set CG Conjugate Gradient CoA Co Activator CoR Co Repressor DFT Density Functional Theory HF Hartree Fock LCAO Linear Combination of Atomic Orbitals NMR Nuclear Magnetic Resonance NSC Nearest Shrunken Centroids MD Molecular Dynamics MM Molecular Mechanics MO Mo lecular Orbital MP2 Second Order Mller Plesset Perturbation PDB Protein Data Bank PRCG Polak Ribiere C onjugate G radient QM Quantum Mechanics QM/MM Quantum Mechanics/Molecular Mechanics RAR Retinoic Acid Receptor RXR Retinoid X Receptor TNCG T runca ted N ewton L inear C onjugate Gradient

PAGE 11

11 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy CONFORMATIONAL ANALYSIS OF DRUG LIK E MOLECULES AND ITS APPLICATIONS TO DRUG DISCOVERY By Zheng Fu December 2012 Chair: K enneth M. Merz, Jr. Major: Chemistry Conformational analysis is concerned with exploring the energetically favorable conformations of molecules and describing the energy differences among conformational isomers (conformers) due to the adoption of conforma tional profiles (bond length, bond angles and dihedral angles). This is typically done by mapping the conformational space of molecules with molecular mechanics (MM), molecular dynamics (MD), quantum mechanics (QM) calculations and analysis of experimental ly determined structural data (NMR and X ray). A primary contribution of this dissertation is developing a systematic search method to identify local and global minimum on molecular conformational energy surface using MM conformational sampling / ab initio QM geometry optimization followed by high accuracy MP2/CBS single point calculation. Meanwhile a parallel QM/MM X ray refinement approach was implemented to fix the coordinate errors existed in PDB ligand enzyme crystal structures so as to further determi like molecules. These two conformational analysis methodologies were applied in several drug compounds as case studies including retinoic acid (all trans and 9 cis ), donepezil and oseltamivir. In addition pure QM/MM geometry optimization of oseltamivir neuraminidase X ray

PAGE 12

12 structure was also performed by using our upgraded PUPIL 2.0 package. All t hese results demonstrate that our conformational analysis approach can provide reliable drug target structural information f or future use in ligand and structure based drug discovery applications.

PAGE 13

13 CHAPTER 1 INTRODUCTION The primary objective of conformational analysis of drug like molecules is to answer a key question in drug discovery: what is the energy cost that a ligand must pay in order to transform from the free state to bound conformation. This question is important because such an energy penalty, named internal strain of the ligand (a.k.a. internal ligand strain), significantly contributes to the total free energy of binding. Figure 1 1 illustrates the thermodynamic cycle of the entire ligand binding process. From this figure the total free energy of binding in aqueous environment can be expressed as (1 1) where is the binding fre e energy in the gas phase, is the total change in solvation energy ( ), and are the internal strain (conformational energy o f the free state minus that of the bound state) of the protein and ligand, respectively. Note that Figure 1 1 only depicts the computational steps for estimating ligand binding free energy and cannot represent the real physical events occurring in protein ligand recognition and association. The first and second terms in equation (1 1) ( and ) exclude the energy changes caused by the conformational rearrangement of protein and ligand (unlike the ordinary definition of these terms). From Figure 1 1 and

PAGE 14

14 equation (1 1) we can see that the internal l igand strain ( is indeed an influential factor in estimating binding free energy. Figure 1 1. Thermodynamic cycle of the ligand binding process. In a recent work aimed at determining the numerical error in binding energy calculati on, Feherand and Williams reported that computing the binding free energy (using MM GBSA) with and without internal ligand strain term could give rise to as much as 17 Kcal/mol binding energy variance in some protein ligand complex cases 1 Moreover, many docking and scoring approaches have an imprecise internal ligand

PAGE 15

15 strain component in their scoring functions, undermining the accurate calculation of binding free energies. An earlier analysis of the conformational energy penalty for drug like molecules shows that a typical uncertainty in the calculation of the internal ligand strain is about 5~10 Kcal/mol, which increases the uncertainty in scores generated by standard docking programs 2 On this basis, successful ranking of drug like molecule libraries in a virtual screen might not be anticipated. Due to the importance of the internal strain of ligand, it is worthy to develop some novel computational chemistry methodologies in order to estimate this ligand energy penalty accurately and efficiently, which in turn may make contributions that improve the reliability of present structure based drug discovery approaches. As discussed above, one major component of ca lculating internal ligand strain is to determine the free state conformations of the ligand in solution (initial state) and estimating their conformational energies. Currently we have an insufficient number of experimentally determined structural data avai lable for unbound drug like molecules in aqueous environment. Yet in principle the free ligand in solution should have energy stable conformations that are pre organized for binding, which would reduce both internal strain and entropic penalties. We can id entify the low energy conformers of a given ligand to represent its conformational space using high quality energy functions (MP2/CBS method ) with a reliable solvent mod el ( polarizable continuum model, PCM ). Bostrom et al. analyzed 33 protein ligand complexes in vacuo as well as in solution using GB/SA with MM3 and an AMBER force field. In their work 70% of their protein ligand complexes had conformational strain less tha n 3 Kcal/mol when the experimental conformation was

PAGE 16

16 compared to the global minima conformation in aqueous solution. They conclude that 3 Kcal/mol can be set as a good threshold of the conformational strain and that the high strain of the remaining 30% of s tructures was attributed to the low resolution of some of the crystal structures and limitations of computational methods 3 To overcome the issues c reated by traditional conformational mapping approaches, we herein develop an Omega/QM geometry optimization protocol to locate the local and global minima on the gas phase and aqueous phase potential energy surfaces of drug like molecules. Their relative conformational energies were estimated utilizing the M06 suite of density functionals and MP2 methods combined with the 6 311+G(d,p), aug cc pVDZ and aug cc pVTZ basis sets, as well as complete basis set MP2 extrapolations using the latter two basis sets. Each of these computational methodologies and their theoretical frameworks are summarized in Chapter 2 and we further apply this protocol to retinoic acid (all trans and 9 cis ), donepezil and oseltamivir. The details of their free state conformational ener gy surfaces are elucidated in Chapters 3, 4 and 5. Another key point in estimating the internal strain of the ligand is the quality of its bound conformation determined by macromolecular X ray crystallography. The uncertainty of the atomic coordinates (fro m the experimental resolution or the force field used in the refinement) of the ligand bound conformation deposited in the PDB largely precludes a rigorous determination of drug bioactive conformation 4 Therefore finding a proper way to relax the drug bound conformation with constraints is required. The first study addressed this was published in 2004 by Perola and Charifson 5 A flat bottom potential was selected as the constraint to optimize 150 drug like molecules in their bound conformations. They revealed that about 60% of the ligands have conformational

PAGE 17

17 strain lower than 5 Kcal/mol, yet regardless of which force field was used, at least 10% of the cases had the strain more than 9 Kcal/mol with some up to 20 Kcal/mol. Butler et al. re analyzed 99 of the 150 protein l igand complexes in Perola and Charifson's set using DFT and MP2 level of theories to compute the energies of bound and free ligand conformations with the PCM solvent model. In addition, a harmonic potential was introduced instead of the flat bottom potenti al in the re refinement of the X ray crystallographic structure. As a result two thirds of the drug like ligands had strained cases (more than 2 Kcal/mol) was that these l igands were not fit well into the electron density data 6 So far all previous conformational analysis s tudies have used energy minimization with constraints rather than re refined the structures against the structure factors. More than that, they have only used force fields for the re refinement process, nevertheless these may lack accurate molecular parame ters for the drug like molecules found in protein ligand complexes. Hence in this dissertation we used the energetically restrained refinement formalism (EREF) combined with QM/MM calculations for the re refinement of protein ligand X ray structures of int erest. In brief the purpose of introducing constraints (like using either a flat bottom potential or a harmonic potential) in structure refinement is to reduce the adjustable parameters, whereas the EREF formalism essentially increases the number of observ ations by offering structural information in the form of a molecular potential 7 whic h in our protocol is the QM/MM method. Furthermore, routine X ray structure refinement methods commonly exclude the electrostatics term from the target energy function, which we will not do in the

PAGE 18

18 QM/MM X ray refinement. Since ignoring electrostatics is pr one to introduce errors into refinement process considering that many drug like molecules are charged under physiological conditions in human body. In Chapter 3 six retinoic acid (RA) protein complexes are re refined by means of our QM/MM X ray refinement approach and consequently several coordinate errors were fixed that previously existed in the PDB deposited RA bound conformations. Since highly strained bound conformation will make a positive contribution ( to the total binding free energy and thus oppose binding, h ere we use strain energy penalty (a.k.a. strain energy) as a quality indicator of ligand bound conformation extracted from protein ligand X ray structure, which is calcula ted by comparing the energy of the bound conformation of a ligand with the corresponding local and global minima on its gas or solution phase potential energy surface (local and global strain energy) For the retinoic acid cases, a 91% average reduction of the local strain energy in the gas phase, as well as 92% in PCM solvent, is observed using the QM/MM refined structures versus the PDB deposited RA conformations. Furthermore, we recently upgraded our in house ab initio code QUICK and make it possible to support Massage Passing Interface (MPI) with the distributed SCF algorithm in an effort to accelerate the QM/MM calculation via parallelization. The applications of this parallel scheme on donepezil acetylcholinesterase and oseltamivir neuraminidase comple xes are discussed in Chapters 4 and 5. In addition we performed a series of pure QM/MM optimization on oseltamivir neuraminidase crystal structure with various MM energy minimization protocols and different QM Hamiltonians. This work is presented in Chapte r 5 and its purpose is to identify the optimal molecular potential restraints of the EREF formalism for future X ray refinement software

PAGE 19

19 development. Overall, the EREF formalism combined with an ab initio QM/MM Hamiltonian provides a more realistic model f or the refinement of drug like molecules against experimental crystallographic data. The final chapter of this dissertation briefly presents an application of our conformational analysis method aimed at predicting the bioactivities of carbonic anhydrase II (CAII) inhibitors. 242 low energy conformations of 12 commercial drugs against CAII were identified by our Omega/ ab initio QM geometry optimization approach and compared with 40 CAII drug candidates in terms of their 3D similarity. Consequently we obtaine d a 3D similarity metric matrix and send it to a well tuned nearest shrunken centroids classifier as the training set. An overall 78% classification accuracy rate was observed ultimately and it indicates that our conformational analysis approach can be ext ended to a broader usage in drug discovery with further development.

PAGE 20

20 CHAPTER 2 THEORY AND METHODS 2.1 Hartree Fock Approximation 2.1.1 Hartree Fock Equation One of the most important questions in conformational analysis is how to calculate the conformat ional energies of small compounds. Since all drug like molecules are many body systems containing nuclei and electrons, one way to address this energy is to solve the nonrelativistic time independent Schrdinger equation (2 1) where is the Hamiltonian operator of a molecular system with nuclei and electrons, is the system wave function and is the system energy. Given that nuclei are much heavier than electrons, we may use Born Oppenheimer app roximation to separate the heavy nuclei movements of a molecule from the much faster motions of the electrons. Then the molecular Hamiltonian can be written as the sum of the electronic Hamiltonian and constants that represent the repulsion between the nuc lei. Equation 2 1 is then reduced to a Schrdinger equation with only the electronic Hamiltonian operator (2 2) The electronic Hamiltonian includes operators corresponding to the kinetic and potential energies of electrons (2 3) We define the one electronic operator (2 4)

PAGE 21

21 and two electronic operator (2 5) hence the electronic Hamiltonian is generally given in the form (2 6) In addition to the electronic Hamiltonian, another key factor of solving equation (2 2) is the expression of an N electron system wave function H ere we introduce the Hartree Fock approximation, which assumes each electron has an independent spin orbital (product of spatial orbital and spin function or respectively) and is subjected to an average potential generated by othe r electrons. So the N electron molecular system wave function corresponding to the electronic Hamiltonian can be described as a single, anti symmetry Slater determinant (2 7) In this thesis we use the short hand notation of Slater determinant for convenience that only shows the diagonal elements of the determinant, (2 8) We substitute in equation (2 2) to and obtain (2 9) Now the initial question was reduced to choose a set of spin orbitals so as to minimize the function al for the energy expectation value according to the variational principle

PAGE 22

22 (2 10) To perform this variational optimization that is subjected to a constra int using Lagrange multipliers, we assume the spin orbitals are orthonormal (2 11) where is the Kronecker delta and eventually we obtain the Hartree Fock equation for a single electron spin orbital (2 12) where is the Fock operator and is a set of Lagrange multipliers corresponding to the spin orbital energies. The Fock operator is composed of single electron Hamiltonian (2 13) and a sum of the Coulomb and exchange operators. The Coulomb operator, acts on single molecule spin orbital as (2 14) The exchange operator, acts on single molecule spin orbital as (2 15) Therefore the Hartree Fock equation (2 12) can be rewritten as an eigenvalue equation (2 16)

PAGE 23

23 where the minus sign comes from the antisymmetric nature of the Slater determinant. Since most drug like compounds have a closed shell N electron configu ration (each molecular orbital is occupied by a pair of electrons with opposite spins), we insert a restricted set of spin orbitals (the same spatial function for both spin function and ) into equation (2 16) and obtain the closed shell Hartree Fock equation for spatial orbitals (2 17) The closed shell Fock operator is thereby defined as (2 18) where the closed shel l Coulomb and exchange operators have the forms (2 19) (2 20) For closed shell determi nant, we use the orthonormal feature of spin orbitals (2 21) (2 22) (2 23) and derive the closed shell Hartree Fock energy

PAGE 24

24 (2 24) where is the Coulomb integral defined by (2 25) and is the exchange integral defined by (2 26) 2.1.2 Roothann Hall Matrix Equations From th e previous discussion we know that the procedure for deriving the Hartree Fock equation is a variational minimization with Lagrange multipliers, yet it is almost impossible to solve the Hartree Fock equation of a many electron molecular system by using num reformulating the eigenvalue problem by introducing a set of known spatial basis functions, the complex differential equation can be converted to a set of linear equations and solved by using much simpl er linear algebra techniques 8 Here we assume the unknown molecular orbital is a linear combination of the basis functions (number of functions equals n) (2 27)

PAGE 25

25 and if they were composed of atomic orbitals this meth MO). Expanding the Hartree Fock equation (2 12) with LCAO MO gives (2 28) Multiplyin g (1) on the left and integrating (2 29) This integrated Hartree Fock equation can be rewritten as (2 30) where denotes the element in overlap matrix S (2 31) and denotes the element in Fock matrix F (2 32) We can insert the definition of into equation (2 32) and obtain the finial expression of Fock matrix (2 33) where is the element of the core Hamiltonian matrix defined as (2 34)

PAGE 26

26 and desribes the two electron Coulomb and exchange interactions that merely relays on the element of den sity matrix ( ) and two electron integrals (2 35) (2 36) Meanwhile equation (2 30) also ca n be expressed as a single matrix equation called the Roothaan Hall equation (2 37) where C is an n n square matrix of the LCAO MO coefficients (2 38) and is a diagonal matrix of the molecular orbital energies (2 39) Equation (2 37) is almost an eigenvalue equation yet it may not be solved by using the secular equation due to the overlap matrix S If S is not singular and the eigenvalues of S are all non negative, we are always able to find a transformation matrix X = S 1/2 to make XS X = 1 ( S is the conjugate transpose of S and note that S is a Hermiti an matrix). Furthermore, the matrix X transforms the non orthogonal basis functions set { to an orthonormal set of functions { via C = X 1 C so that (2 40) Multiplying the conjugate transpose of X on the left gives

PAGE 27

27 (2 41) We then define a new matrix F = X FX and finally obtain an eigenvalue equation (2 42) Thus we can solve the equation (2 42) using an iterative procedure called the self consistent field method (SCF) and the entire workflow is shown in Figure 2 1. Fi gure 2 1. Workflow of SCF algorithm

PAGE 28

28 2.2 Second Order Mller Plesset Perturbation Theory The Hartree Fock method is based on the assumption that the wave function of an N electron molecular system can be approximately represented by a single Slater determi nant. This approach, however, underestimates the electron correlation energy and allows electrons to draw nearer one another than their truly correlated movement. One way to overcome this serious shortcoming of the Hartree Fock approximation is by introduc ing a few Slater determinants instead of only using the ground state one to describe the entire molecular wave function. In 1934, Christian Mller and Milton S. Plesset introduced additional determinants on solving the eigenvalue problem 9 (2 43) via a Rayleigh Schrdinger perturbation scheme. This so called Mller Plesset (MP) perturbation theory proposes that the electronic Hamiltonian can be written as (2 44) where is the unperturbed reference Hamiltoni an and is the electronic correlation perturbation with an ordering parameter In MP theory the unperturbed operator is just the Hartree Fork operator (2 45) where is called the Hartree Fock one electron potential operator and the perturbation operator was derived directly by taking the difference between and the full electronic Hamiltonian

PAGE 29

29 (2 46) Now we expand the wave function as well as the energy in a Taylor series in (2 47) (2 48) Inserting equation (2 47) and (2 48) into (2 43) gives (2 49) (2 50) (2 51) It is easy to estimate the values of and since is simply the single slater determinant for the ground state of molecular system (HF wave function). However we need to know the form of so as to compute (second order Mller Plesset perturbation energy or MP2 electron correlation energy) 10 theorem and the constraint that the higher order contributions to are orthogonal to the only possible is the doubly excited determinants making non zero contributions to the MP2 electron correlation energy, which represents a wave function reflecting the molecular electronic transition from occupied spin orbital a and b

PAGE 30

30 to uno ccupied (virtual) orbital r and s in This wave function thus is placed in equation (2 45) to yield (2 52) where (2 53) (2 54) (2 55) and are occupied orbital energies and and are virtual orbital energies. Although second order Mller Plesset perturbation theory is a post Hartree Fock approach, its methodology of solving the many body Schrdinger equation st ill depends on expressing the many electron wave function in terms of the product of molecular orbitals, which are further represented as a linear combination of atomic orbitals (LCAO MO). These atomic orbitals, or basis functions, theoretically should be a complete set with infinite numbers, yet in practice we can only employ an incomplete set of basis functions with a finite number of terms in the LCAO expansion. This truncation of complete basis sets (CBS) is the essential source of error in quantum chem istry calculations. In principle, performing a linear extrapolation towards the CBS limit can reduce such type of error and might be used as the best estimation of the result obtained using an infinitely large (complete) basis set. Previous work in our lab employed a two point linear extrapolation strategy to estimate the electron correlation energy and obtained pretty good results 11 T herefore i n this thesis we adopt the similar

PAGE 31

31 scheme with Dunning basis sets aug cc pVDZ and aug cc pVTZ, which are designed to converge systematically to the complete basis set (CBS) limit. Two parameter exponential functions of the cardinal number X (X=2 for aug cc pVDZ and X=3 for aug cc pVTZ) were applied to determine the CBS limit for the Hartree Fock energies (2 56) X (energy difference between MP2 energy and HF en ergy) was extrapolated using a two parameter polynomial formula (2 57) Therefore the MP2/CBS energy can be expressed as the sum of CBS limit values of the HF and electron correlation energies (2 58) 2.3 Density Functional Theory So far the methods introduced were wave function based approaches either depending on the variational principle or perturbation theory. In this section we pursue a totally different route, density functio nal theory (DFT), to calculate the many electron system energy. Unlike Hartree Fock and post Hartree Fock methods, whose primary goals were seeking improved wave functions, DFT concentrates on the electron density instead. For an N electron system, its ele ctron density is defined as following (2 59) The physical meaning of the electron density is the probability amplitude of finding any electron near the position r in space. T herefore we have (2 60)

PAGE 32

32 (2 61) Moreover the electron density can be determined using only 3 spatial coordinates instead of the 3N spatial coordinates of wave functions ( 4N variables if we consider spin orientation), yet it in cludes sufficient information to exactly compute the ground state energy of many electron system. This fact will be developed in the following paragraphs along with three important theorems that offer the theoretical basis of DFT and a practical way to imp lement DFT in real calculation. Theorem 1. ( First Hohenberg Kohn theorem The external potential is (to within a constant) a unique functional of ; since, in turn fixes we see that the full many particle ground state is a unique functional of 12 Thus the election density determines the Hamiltonian operator and all the properties of the many electron system. Based on this theorem, the total energy of the system can be written as (2 62) (2 63) where is the nuclei electron interaction energy, is the electron kinetic energy and is the electron electron interaction energy. The last one can be further expressed as (2 64)

PAGE 33

33 where is the classical Coulomb interaction between electrons and is the non classical contribution to the electron electron interactions. Therefore it is possible to estimate the many electron system energy by minimizing as a functional of electron density with the variational principle, wh ich is demonstrated in the second Hohenberg Kohn theorem 12 Theorem 2. ( Second Hohenberg Kohn theorem ) For any trial density that is subjected to and in a given external potential the total energy of the system is the upper bound of the true ground state energy In other words, the ground state energy can be calculated variationally and defined by the ground state electron density (2 65) At this stage the major issue is that we do not know the explicit form of and otherwise we can easily obtain the ground state energy of a many electron system using equations (2 62), (2 63) and (2 64) in a gi ven external potential To solve this problem, in 1965 Kohn and Sham invented a simple yet innovative approach to describe and by introducing a non interacting system that is able to produce the same electron density as the interacting system. The detail of their idea is stated in the Kohn Sham theorem. Theorem 3. ( Kohn Sham theorem ) Given any many electron system, there exists a mapping from the external potential of the interacting system towa rd another potential for the non interacting system 13

PAGE 34

34 Therefore, using to represent the orbital of the non interacting system, we can define its kinetic energy and electron density (normalized to N ) based on the Kohn Sham theorem (2 66) (2 67) Obviously is not equal to the real kinetic energy of the interacting system and this problem was circumvented by introducing a separated form of (2 68) (2 69) is the so called exchange correlation energy and is often treated as a e it includes everything unknown in DFT. After substituting equation (2 66), (2 68) and (2 69) into (2 62), we have the expression of total energy as following (2 70)

PAGE 35

35 Taking derivatives o f equation (2 70) of orbital and including the orthonormal constraint of each orbital with undetermined Lagrange multiplier, the results are the Kohn Sham equations (2 71) (2 72) These equations are also called self consistent field Kohn Sham equations since it can be solved iteratively using LCAO as an ansatz. Note that the Kohn Sham orbital is not the molecular orbital of the system and it does not have physical signific ance except the highest occupied Kohn Sham orbital is the negative of the ionization energy of the system. From equation (2 72) we can see that the explicit form of the exchange correlation potential, is still unknown, which is defined as th e functional derivative of with respect to (2 73) Therefore the art of DFT is to find out the best approximation of exchange correlation potential Currently about 200 approximations of this potential were published and they are categorized according to their chemistry accuracies and computational density functional approximations 14 The lowest rung of this lad der, Local Density

PAGE 36

36 Approximation (LDA), is the earliest and simplest approximation of exchange correlation potential It proposes that the many electron system is a homogeneous electron gas thus the exchange correlation energy can be written in the following form (2 74) where is the exchange correlation energy density functional with respect to the electron density of an uniform electron gas. LDA can offer accurate enough energy calculation in solid state physics since nearly all metal systems can be reasonably estim ated by uniform electron gas. Yet in the chemistry world, especially for drug like molecules, the homogeneity premise of electron density might not be well established. This prompted the introduction of the gradient of the charge density into the exchange correlation potential functional approximation and gave rise to the generalized gradient approximation (GGA) (2 75) Here the exchange correlation energy is a functional of both the electron density and its gradient. The next logical step beyond GGA is the use of not only the and its first gradient, but t o implement the second gradient of electron density along with the spin kinetic energy density and yield meta GGA approach (2 76) In this thesis we employ M06 suite of functionals to perform the conformational energy calculation of drug like molecules. M06 suite of functionals is a set of four meta hybrid GGA density functionals, M06, M06L M06 2X and M06 HF, which are constructed with

PAGE 37

37 local functionals and non local functionals. The hybrid exchange correlation energy of M06 suite of functionals is expressed as following (2 77) where is the non local Hartree Fock exchange energy, is the local DFT exchange energy, is the local DFT correlation energy and i s the percentage of HF exchange in the hybrid functional. For M06L ; M06, ; M06 2X and M06 HF, respectively 15 2.4 X ray Refinement Macromolecular X ray crystallography is a potent tool for providing valuable 3D structural information of protein ligand recognition and interaction. However in PDB database, there are many medium to low resolution protein ligand complex crystal structures with uncertainties in the ligand coordinates and ambi guities in protein ligand associations. In order to improve the quality of those crystal structures and use them in structure based drug discovery, in the past decade our research group developed several protocols of X ray refinement in both reciprocal spa ce and real space. Limited by the length of thesis, in this section we only introduce the theoretical framework underlying our reciprocal space X ray refinement methodology, which is applied for all three cases of drug like molecules in later chapters. 2.4 .1 Maximum Likelihood The ultimate goal of X ray crystallography refinement is to find out the minimum value of an energy function 16 18 (2 78)

PAGE 38

38 where is a function based on the parameters of an atomic model, is the geometric energy function of all atomic positions and is the X ray target f unction related to the difference between the observed and calculated X ray lattice data. is the weighting fa ctor that balance the impact of and on the total energy. The optimal weighting factor can be computed by (2 79) after an unrestrained ( = 0 in equation (2 78)) molecular dynamics simulation 19 Note that is a pseudo energy term without un it hence the unit of weighting factor should be Kcal/mol. The conventional form of is defined as the sum of squared residuals between the observed ( ) and calculated ( ) structure factor amplitudes (2 80 ) where are the indices of the reciprocal lattice points of the crystal and is a relative scale factor. Yet this least squares refinement methodology was based on an assumption that the deviation betwe en and obeys standard normal distribution and is independent of the parameters of the atomic model, which is not true especially considering the poor observation to parameter ratio in macromolecular crystallography experiment. Hence i n 1996 Pannu and Read developed a maximum likelihood analysis for protein structure determination 19 They proposed that the joint distribution (likelihood function ) of the entire X ray diffraction observations is the product of the conditional probability of each independent observation ( ) on the calculated amplitude ( )

PAGE 39

39 (2 81 ) For computing convenience, the li kelihood function was logarithm transformed and multiplied by 1 (2 82 ) Then they introduced an assumption that the distribution of the measurement error (difference between true structure factor amplitude and observed structure factor amplitude ) is a normal distribution with standard deviation Therefore the conditional probability of can be expressed as the convolution of the conditional probability of on the calculated amplitu de ( ) and the probability of the measurement error (2 83 ) Pannu and Read selected a normal distribution centered on to describe and for centric structure fa ctor it was defined as (2 84 ) (2 85 ) where is the expected intensity factor, and are distribution parameters for and derived by Wilson in 1949 with central limit theorem, respectively. Therefore substituting equation (2 83) into (2 82) and computing the first central moment of distribution we obtain the expected value of

PAGE 40

40 (2 86 ) where is the Kummer (confluent hypergeometric) function (2 87 ) Note that can not be directly calculated from experimental observation and must be estimated through and Kummer function. The second central moment of distribution is (2 88 ) Using these two c entral moments to construct a Gaussian approximation of and putting it into negat ive log likelihood function (2 81 ), we have (2 89 ) Assuming is almost a constant within each refinement cycle, the above equation can be simplified to (2 9 0 ) Even though this equation is derived on the basis of the centric case, it is proper also for the acentric structure factor and will be used as the form of in our QM/MM x ray refinement protocol. 2.4.2 Cross Validation For many years the quality of the crystallographic model was evaluated by t he R factor, which is defined in the following equation

PAGE 41

41 (2 91 ) In principle the R factor is expected to decrease during the refinement process, yet it can be made arbitrarily small by overfitting the X ray experimental diffraction data. To overcome this problem, Br nger in 1992 proposed a cross validation scheme to access the accuracy of refined atomic model 20 He suggested splitting the diffraction data into two groups: a working set (including 90% of the experimental observations) and a complementary test set (includ ing the rest 10%). The macromolecular structure determination program thus only uses the working set to carry out the normal crystallographic refinement procedure and computes the R factor with the test set plus the estimated parameters from the atomic mod el built on the working set. In other words, the form of maximum likelihood target function ought to be rewritten as (2 92 ) And the cross validated R factor (often it is called R free factor or R free value) is defined (2 93 ) In this thesis both R factor and R free factor are employed as an indicator o f drug target complex structure quality. 2.4.3 Energy minimization As we discussed at the beginning of this section, the goal of X ray refinement was not merely to search for the best agreement between observed and calculated structure factor amplitude, b ut to minimize the term with the chemical restrain The classical used in macromolecular X ray refinement programs like the

PAGE 42

42 Crystallography and NMR System (CNS) program is a sum of the terms describing various types of interactions (2 94 ) where the terms in order of appearance repr esent the contributions to the total energy from bond lengths, bond angles, dihedral torsion angles, chiral centers, planarity of aromatic rings and van der Waals forces. The empirical parameters of stereo chemical information were derived from an analysis on fragments of proteins and polynucleotides in Cambridge Structural Database (CSD) by Engh and Huber 21 One issue of this form of is that we have insufficient information to afford a suitable parameter set for various drug like molecules. As a result the refined structure of small compounds found in protein ligand complex es are often less reliable than their surrounding a mino acids. Hence in our study the is simply replaced with a quantum mechanics (QM) / molecular mechanics (MM) hybrid energy. We can use Hartree Fock, MP2 or DFT level of theories as the QM Hamiltonian and the details of QM energy calculation w ere already elucidated in section 2.1 and 2.2. For MM energy, we used a similar formulism as equation (2 93) yet implemented the electrostatic interactions and removed the chiral centers and planarity of aromatic rings terms

PAGE 43

43 (2 95 ) Finally the QM/MM interactions are evaluated from the QM electrostatic potentia l and the MM atomic charge.

PAGE 44

44 CHAPTER 3 RETINOIC ACID 1 3.1 Background All trans retinoic acid (ATRA) and 9 cis retinoic acid ( 9 cis RA) are derivatives of Vitamin A. They are involved in many crucial biological processes by modulating the communication be tween their target retinoic acid receptors (RARs) and retinoid X receptors (RXRs), in the intracellular environment. In the ligand free (apo) state, RAR and RXR are prone to form a RAR RXR heterodimer complex that allows dual ligand input. When ATRA binds to RAR as an agonist, it alters the RAR conformation via an allosteric effect and removes the co repressor (CoR) bound on the RAR surface. Then in the presence of 9 cis RA a co activator (CoA) interaction area is exposed at the RXR subunit that facilitates the recruitment of CoA to the RAR RXR heterodimer cooperatively. As a result one or more transcription/epigenetic machineries are activated and bind to the target gene promoter regions and the RAR RXR heterodimer itself is also able to bind to cognate cis acting DNA regulatory elements (termed RA response eleme nts) via its DNA binding domain 22 As members of the nuclear receptor (NR) family, RAR and RXR mediate many signaling pathways that control cellular proliferation differentiation and the early stages of carcinogenesis. Hence, both ATRA and 9 cis RA have served as drug molecules in cancer therapy and prevention, predating the determination of the RAR and RXR crystal structures 23 Clinical trials have confirmed that ATRA can induce com plete remission and overall survival in most acute promye locytic leukemia (APL) patients 24 26 In APL 1 Reprinted with permission from Fu, Z.; Li, X.; Merz, K. M., Jr. J Chem Theory Comput 2012, 8, 1436 Copyright 2012 American Chemical Society.

PAGE 45

45 ces the PML fusion protein 27 This hybrid protein not only tightly binds and recruits the nucleosome remodeling and deacetylase (NuRD) enzyme that facilitates polycomb repressive complex 2 (PRC2) binding t o block retinoic acid signalin g 28 but changes both the structures and intracellular localizations of RXR and other nuclear antigens as well 27 The chaotic and abnormal nuclear organization in APL cells ultimately leads to a few specific chromosomal translocations and triggers acute promyelocytic leukemia by arresting myelopoiesis at the promyelocyte stage. Nevertheless, in vitro RA treatment PML with RXR and other essential nuclear proteins that in turn eliminates the chromosomal translocations 27 Meanwhile a therapeutic dose of ATRA restores the normal expression level of tumor suppressor genes for APL patients via transcriptional co repressor dissociation from the promoter 2 28 In addition, numerous animal experiments demonstrate that ATRA and 9 cis R A are efficacious in abrogating tumor pro gression in models of breast 29 31 hepatic 32 and pancreatic carcinogenesis 33 Although bo th ATRA and 9 cis RA are potent anticancer agents, their side effects limit the widespread use of these drugs in human cancer treatment, wh ich includes teratogenicity 33 35 sever e headaches and hepatotoxicity 36 Therefore a comprehensive conformational analysis of retinoic acid will enable the rational design of more selective retinoids (ATRA and its analogs) and rexinoids ( 9 cis RA and its analogs) in order to attenuate adv erse side effects. To the best of our knowledge, this analysis has not been performed to date at a high level of sophistication. Beppu and Katitani used the CNDO/2 method to explore the adiabatic p otential surface (APS) of ATRA 37 According to their

PAGE 46

46 C6 C7 C8) is around 170 with CNDO/2 geometry opti mization (for ATRA atom numbering please see Figure 3 2A). Furthermore, they calculated the APS of several ATRA cyclohexenyl ring, with particular attention given to puckered forms. Six years late r Aalten and co workers generated an exhaustive ensemble of ATRA conformers to calculate their relative conformational energies via MD simulation. They noted that ATRA bound to cRBP (cellular retinol binding protein, type I) is highly strained re lative to free ATRA conformers 38 Although these studies are helpful in elucidating the conformational space of retinoic acid, they neither considered 9 cis RA nor did they analy ze the conformational profiles of retinoic acids bound to the RAR RXR heterodimer. Hence, we carried out a thorough conformational analysis of both ATRA as well as 9 cis RA in their free and bound states with highly accurate MP2/CBS calculations. QM/MM X r ay refinement was also performed on RA within the RAR/RXR active site. For more detail regarding the QM/MM refined RA protein complexes ple ase refer to Li and co workers 39 3 .2 Computational Approaches 3 .2.1 Generating Retinoic Acid Conformati on E nsembles Omega is a software package developed by Openeye Scientific Software 40 for yielding multi conformation data base. It systematically generates conformers using a divide and conquer scheme with an internal fragments library as well as a knowledge based list of torsion angles. Former study revealed that Omega can effectively reproduce drug bioactive conformations 41 and in this thesis two sets of retinoic acid geometries (one for all trans and ano ther for 9 cis ) were generated with Omega version

PAGE 47

47 2.4.3. rather than the default MM FF94s_NoEstat (without Coulomb terms) force field since in our study the deprotonated form of r etinoic acid was chosen for all calculations. The maximum allowed internal energy gap from the lowest energy conformer was set to 200 Kcal/mol Omega does not take duplicate confor mers into account and output all generated conformers. Consequently 948 ATRA conformers and 752 9 cis conformers were generated by OMEGA. Further geometry optimization was done at the HF/6 31G* level using the Omega generated geometries as input. The HF/6 31G* level of theory was chosen for optimization to match the model chemistry and basis set used in our QM/MM X ray refinement efforts. All ab initio geometry optimizations were done using the Gaussian 09 suite of programs 42 3 .2.2 Single Point Energy C alculation Single point calculations were carried out at the HF/6 31G*, M06/6 311+G**, M06/aug cc pVDZ, M06 2X /6 311+G**, M06 2X /aug cc pVDZ and M06 2X /aug cc pVTZ levels with Gaussian09 42 The MP2/CBS method was applied to comp are ab initio and DFT single point energy calculations. The CBS limit was determined by extrapolating MP2/aug cc pVXZ for X=D and T, for each retinoic acid conformer using a two parameter exponential function. For more details regarding the extrapolation s cheme, please see Chapter 2. 3 .2.3 Alignment and Clustering of Retinoic Acid C onformers ROCS (Rapid overlay of chemical structures) is a software package designed to search large 3D structure databases using a shape based superimposition method. It offers several metrics to measure the similarity between objects and herein we used the

PAGE 48

48 ShapeTanimoto score to estimate the 3D structure similarity between two retinoic ac id conformers. The ShapeTanimoto score is defined as (3 1) Where and are self volume overlaps of each conformer, respectively, and is the overlap volume between the two conformations a and b. The value of the ShapeTanimoto score ranges from zero to one, w here one indicates the two molecules are identical and zero means their volume (shape) do not overlap. The alignment of different retinoic acid conformers was accomplished using ROCS version 3.1.0 40 and all parameters were set to their default values. Moreover, t he ShapeTanimoto scores of the HF/6 31G* optimized Omega generated conformers were clustered with the centroid li nkage method. Uncentered correlation was chosen as the similarity metric. Cluster 3.0 (using C clustering library version 1.5) was employed for the clustering calculation 43 and the clustering results were plotted by using Java Treeview version 1.1.5r2 44 3 .2.4 QM/MM Refinement of Protein RA X ray Crystal S tructures The deposited PDB coord inates of 3C W K 45 1CBS 46 2VE3 47 2LBD 48 ( 1XDK 49 or the QM/MM refinement procedure. In the QM/MM X ray crystal structure refinement our goal is to find out the minimum of the energy function 50 61 : (3 2 ) Where E is a pseudo energy function where E QM/MM is the energy restraint obtained from the QM/MM calculation and E X ray is a pseudo energy penalty function that represents the difference between the observed and calculated X ray structure factors. w a is the weighting factor that balance the relative importance of E QM/MM and E X ray on

PAGE 49

49 the total energy. Since w a is an arbitrary parameter, the only objective standard for choosing its value is to use the one that can minimize the R free score. In these calculations the QM region covers the retinoic acid and residues and/or water molecules cytochrome P450 CYP120A1 (PDB entry 2VE3), the heme molecules were excluded from the QM region. The HF/6 31G* level of theory was selected as the QM Hamiltonian for the QM region. The SANDER program in Amber 10was the interface that merges the X ray gradients with the gradients from the QM/MM calculation using Amber 10 62 plus our in house ab initio code QUICK. The E X ray value was obtained from the CNS (Crystallography and NMR System) package 63 3 .2.5 Non Parametric Statistical I nference In order to quantitatively compare the accuracy of DFT with MP2/CBS method in terms of estimated RA conformational energies, we regarded th e RA low energy conformers as blocks and these quantum chemistry methods as different treatments and carried out distribution free two side all treatments multiple comparisons based on Friedman rank sums (Wilcoxon Nemenyi McDonald Thompson Test ) We assume d the effects of different RA conformations (blo cks) and computational methods (treatments) are additive, i.e., (3 3 ) ij i represents the effect of i th block (RA j represents the effec ts of the j th treatment (level of theory with given basis

PAGE 50

50 set), e ijt denote some random error terms that are independent and have same continuous distribution with median 0.c ij = 1 for all i and j (no replication within block). Therefore the h ypotheses of interest are H 0 u v u v for all pairs of We first calculated the sum of within block ranks for each treatment, R 1 R 2 k 0 and u v if | R u R v | u v calculated the corresponding r using R package 2.14.1 64 3.3 Results and Discussion 3.3.1 Conformational Preferences of All trans and 9 cis RAs in Gas P hase Geometry optimizations at the HF/6 31G* level of theory were initiated from 948 OMEGA generated ATRA structures (described in method sec tion), which resulted in the identification of 66 unique low energy conformers (Figure 3 1A). Similarly, we generated the 9 cis RA OMEGA geometries and optimized them at the HF/6 31G* level and identified 48 unique low energy structures (Figure 3 1B). To e xplore RA conformations in the context of their target binding sites, we selected six different retinoic acid conformers (five ATRA and one 9 cis RA) from five PDB entries (see Computational Approaches section) and performed single point energy calculation s, as well as geometry optimizations, at the HF/6 31G* level of theory in the gas phase. Note that along the X axis the RA protein complexes are listed in descending order of their resolution (from high resolution to low resolution) MP2/CBS calculations w ere used as point energies of the PDB deposited retinoic acid conformers and their respective optimized structures. Figure 3 1C shows that the maximum conformational energy interval (using the 1CBS A200 ATRA conformer as reference) is more than 120 Kcal/mol, yet after ab initi o QM optimization, only three low

PAGE 51

51 energy structures were identified (two ATRA and one 9 cis RA, Figure 3 1D). This result reveals that there are significant coordinate errors in bound retinoic acid conformers giving rise to large ligand strain energies. Al l trans retinoic acid can adopt different C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 O1) on its polyene tail (Figure 3 2A). Figure 3 1. Relative energy of Omega en sembles optimized by HF/6 31G* for A) all trans B) 9 cis conformers, C) PDB deposited RA conformers and D) their optimized geometries using HF/6 31G*

PAGE 52

52 C h Ci Cj) to stand for the torsion angle between the g h i and h i j plane. The positive sign means clockwise rotation of the h i bond looking from atom h to atom i, whereas the minus sign indicates a counterclockwise rotation. Figure 3 2B shows that the 66 C6 C7 63 or +63. This torsion angle reflects the orientation of the ionone ring relative to the polyene chain and its X ray crystallography experimental value 65 is +59. Un like the C6 C7 bond, the rotation of the C8 C9, C10 C11 and C12 C13 bonds impart C8 C9 C10 C11 C12 C13 180. Fig ure 3 2B through 3 discretely distributed yet in Figure 3 2F we observe substantial scattering from 180 to +180 with irregular relative energies. This indicates that the C14 C15 bond has more rotational freedom than its counterparts in the polyene tail, which was noted by Klucik and co workers from a different perspective 66 They developed a novel methodology (TACS) to search the ligand bound conformation and employed retinoic acid as a probe to evaluate its predictive power. Their result shows that the C13 C14 C15 O1 torsion makes the maximum contribution to the RMSD (around 16) when compared with the putative bioactive conformations in RA protein complexes. Another interesting point is C14 C15 O1) torsion angle at the global minimum is not equal to 0 or 180. on the potential energy surface (relative energy less than 0.7 Kcal/mol see group one in F igure 3 3 One explanation of this observation is the steric hindrance of the C20 methyl group.

PAGE 53

53 Figure 3 2. A) Structure of ATRA B) F) Relative energies of 66 ATRA low energy conformers as a function of different dihedral angles (in degrees) along the polyene chain done at MP2/aug cc pVDZ level in the gas phase.

PAGE 54

54 Moreover, carbon is more electronegative than hydrogen; hence C20 might have negative charge (Mulliken atomic c harge = 0.8 at the ATRA global minimum). Note that we use the anionic form of retinoic acid in all calculations, so there might be an electrostatic repulsion between the C20 atom and the carboxyl group, which causes the preferred C14 C15 orientation where the oxygen atom staggers the C 20 methyl group. C14 C15 O1) torsion angle that is not equal to zero or 180 degrees. Figure 3 3. Schematic representation of the eight groups of 66 ATRA low energy conformers. Groups are arranged in the order of increasing average relative energy obtained via MP2/aug cc pVDZ calculations in the gas phase. extreme variation), the 66 ATRA low energy con

PAGE 55

55 energetically distinct conformational groups. Figure 3 3 shows the typical backbone structure of each group with their corresponding av erage relative energy (reference is the Omega/ ab initio QM method identified global minimum). The standard deviation within each group is presumably ascribed to the rotation of C14 C15 bond (via the 3 we can also infer that the relative energy ordering of different ATRA conformer groups depends on the adoption of the torsion angles associated with the polyene tail. There are several cases to consider, first when the four double bonds (C7=C8, C9=C10, C11=C12 and C13=C14) are coplanar, which energies and are near global minimum on the conformational energy surface (see group one in Figure 3 3 ). C5=C6 double bond is not counted in this ener gy structure 3 ), the relative en ergies increase by 3 ~ 4 Kcal/mol Another set has two pairs of conjugated double bonds and within each pair the double bonds are coplanar yet the entire set of four double bonds do not lie in the r 180 degrees: see group four in Figure 3 3 ). In this scenario the energy gap between the ATRA low energy conformers and the most stable conformation of the set is around 4 ~ 5 Kcal/mol Next, 180, see group five, six and seven in Figure 3 3 ), we obtain A TRA structures that are about 7~ 8 Kcal/mol higher in energy than the lowest energy conformer we identified. The final situation has none of the four conjugated double b onds in the same plane

PAGE 56

56 energies (see group eight in Figure 3 3 ). In addition to subdividing the 66 ATRA low energy conformers in terms of their polyene chain torsion angles, 3D similarity comparisons by ROCS between each pair were performed to offer another criterion for categorizing these structures. The 4,356 Tanimoto scores were clustered into four smaller sets by virtue of the centroid linkage algorithm with estimating the uncentered correlation as the similarity metric. This clustering result is visualized in Figure 3 4 The deeper the red color is, the larger the Tanimoto score is (higher 3D similarity between two conformers). The trees on the margins of the heatmap repre sent the multilevel hierarchy, where clusters at one level are grouped as clusters at the next level. Even though our clustering method is not one size fits all and there are biases on the clusters it constructs, we can still observe that most of the ATRA low energy conformers in groups one, two and three fall into Cluster II, and those in group four into Cluster I. Clusters III and IV mainly cover the structures in groups five, six, seven and eight. This clu stering result demonstrates our conformational an the conformational preference of free ATRA in the gas phase.

PAGE 57

57 Figure 3 4. Clustering Tanimoto scores computed from the shape comparison among the 66 ATRA low energy conformers.

PAGE 58

58 Figure 3 5. A) Structure 9 cis RA B) F) Relative energies of 48 9 cis RA low energy conformers as a function of different dihedral angles (in degrees) along the polyene chain done at MP2/aug cc pVDZ level in the gas phase.

PAGE 59

59 We also carried out a similar conforma tional analysis procedure on 9 cis RA and the results are presented in Figure 3 5 and Figure 3 C6 C7 C12 C13 C14) from 9 cis RA low energy conformers are 38 or C8 C9 C10 C11 C12) are C8 C9 C10) it prefers C10 C11 C12) adopts v alues of 50 or 180 (Figure 3 5 C,D and E). As expected, the rotational freedom of the C14 C15 sigma bond is much is close to, but not equal to zero degrees (Figure 3 5 F). Figur e 3 6 Eight groups of energy preferred conformers characterized for 9 cis RA arranged in the order of increasing average relative conformational energy.

PAGE 60

60 Figure 3 7. Clustering Tanimoto scores computed from the shape comparison among the 48 9 cis RA low e nergy conformers.

PAGE 61

61 Similarly we can classify the 48 9 cis RA minimum energy conformations into 9 cis RA conformations with all conjugated double bonds (excluding C5=C6) in the same plane have the lowest relative energies and include the global minimu m (group one in Figure 3 6). The less favorable orientation of the 9 cis RA polyene tail has three coplanar double bonds and has an energy penalty of 3.5 to 3.8 Kcal/mol (groups two and three in Figure 3 6). 9 cis k from the global minimum by nearly four Kcal/mol (group four in Figure 3 6 ). Conformations in groups four to seven have relatively similar energy structure relationships to those found for ATRA and are destabilized b y 6.9 to 9.4 Kcal/mol Lastly geometries in group eight exhibit an average energy value more than 10 Kcal/mol above the global minimum, which contributes to their highly distorted backbone structure. Figure 3 7 displays the clustering of the Tanimoto scor es among the 48 9 cis RA minimum energy conformations. Cluster I includes the entire four members of group one and Cluster II includes the majority of group two conformers. Geometries belonging to group three or four are gathered into Cluster III and the r emaining 9 cis RA low energy conformations are found in Cluster IV and V. 3.3.2 Estimating RA Conformational E nergies with M06 and M06 2X F unctionals Our earlier work showed that the M06 class of density functionals gave relative energies that reliably mat c hed MP2/CBS calculations for ibuprofen done at HF/6 31G* optimized geometries 50 To further examine the performance of the M06 functional class, we performed a series of M06 and M06 2X single poi nt energy calculations on the global and local minima of ATRA and 9 cis RA. Since we are not particularly interested

PAGE 62

62 in highly distorted retinoic acid structures, only conformers placed into groups one, two, three and four were selected as the starting geo metries for both ATRA and 9 cis RA, respectively (see Figure 3 4 and 3 6 for ba ckbone conformations). Figure 3 8 Relative energies computed using A) M06 and MP2 for ATRA; B) M06 2X and MP2 for ATRA; C) M06 and MP2 for 9 cis RA; D) M06 2X and MP2 for 9 cis RA with different basis sets in the gas phase.

PAGE 63

63 Figure 3 8A reveals the relative energy values calculated for ATRA low energy conformers at functional, the relative energy provided different levels of theory: HF/6 31+G(d), M06/6 311+G(d,p), M06/aug cc pVDZ, MP2/6 311+G(d,p), MP2/aug cc pVDZ, MP2/aug cc pVTZ and MP2/CBS. The name of RA conform ers (REA_01 to REA_16) in X ax is are labeled in the order of their increasing MP2/CBS energies. HF/6 31+G(d) underestimated the relative energies of the conformer s in ATRA groups two, three and four (see REA05 to REA16 In Figure 3 8A) by about two Kcal/mol, notwithstanding its performance around the global minimum was acceptable (REA01 to REA04, Figure 3 8A). For single point calculations with the M06 by the 6 311+ G(d,p) basis set overestimated the MP2/CBS results by about 0.6 Kcal/mol M06/aug cc pVDZ performs better than M06/6 311+G(d,p) yet its relative energies are at least 0.3 Kcal/mol higher than the MP2/CBS values (Figure 3 8 A). Figure 3 8 C shows a similar ou tcome for the M06 functional for the 9 cis RA minimum energy conformers. Note that both Figure 6A and 6C use REA01 to REA16 along the X axis labels yet they represent entirely different sets of ATRA and 9 cis RA conformations. Interestingly, comparison of the M06 2X and MP2 methods combined with the 6 311+G(d,p), aug cc pVDZ and aug cc pVTZ basis sets (see Figure 3 8 B) shows excellent agreement. All three M06 2X relative energy curves essentially coincide with the MP2/CBS results (Figure 3 8 B). The same tr end is repeated for the sixteen 9 cis RA low energy conformations (see Figure 3 8 D). The M06/aug cc pVTZ calculation data was not shown due to SCF convergence issues (even we set the SCF convergence criterion to 10 4 au the energies still cannot converge)

PAGE 64

64 Based on the excellent match between the M06 2X and MP2/CBS curves, we were also interested in quantitatively determining whether any of the DFT methods employed ( M06 2X /6 311+G(d,p), M06 2X /aug cc pVDZ and M06 2X /aug cc pVTZ) differed from MP2/CBS in terms of the calculated RA conformational energies. We addressed this question by using Wilcoxon Nemenyi McDonald Thompson test, since the distribution of RA conformatio nal energies was unknown (see section 3.2.5 for more details). From the Appendix A we can conclude that M06 2X/aug cc pVDZ, M06 2X /aug cc pVTZ and MP2/CBS are equivalent with respect to the calculated conformational energies for both ATRA and 9 cis RA (failure to reject the null hypothesis at p value > 0.05 for each pair comparison ). Whereas we cannot find strong statistical evidence to claim that M06 2X /6 311+G(d,p) is not different from MP2/CBS for either ATRA or 9 cis RA (reject the null hypothesis in favor of alternative hypothes i s at p value < 0.05). Overall the single point energ y results indicate that the M06 2X functional is suitable for predicting the relative conformational energies of retinoic acid in the gas phase when larger basis sets are utilized. 3.3.3 Conformational Preferences of Protein Bound A ll trans and 9 cis RAs Having probed the conformational preferences of ATRA and 9 cis RA in the free state, it is illustrative to compare and contrast with the conformations of RA in target protein binding sites. Previous studies in our lab and others have already demonstrated t hat a QM/MM X ray refinement approach is capable of enhancing the quality of X ray structures of metalloproteins and receptors with bound drug like molecules 52,54,55,57,59 61,67 69 Therefore we selected four ATRA and one 9 cis RA protein complex crystal structures from the PDB database and carried out hybrid QM/MM X ray refinement on each of them. This study is an extension of our recent work on ATRA QM/MM

PAGE 65

65 refinement, but herein we largely focus on the comparison between the refined bioactive conformers and the global minimum of free retinoic acid. Figure 3 9 Overlay of the PDB deposited RA conformers (purple) and corresponding QM/MM refined structures (orange). A) 3CWK A300; B) 1CBS A200; C) 2LBD A500; D) 2VE3 A1445; E) 2VE3 B1445; F) 1XDK A500.

PAGE 66

66 Figure 3 9 shows the overlays of the initial PDB structures of ATRA and 9 cis RA with the corresponding refined structures. It is clear that after QM/MM X ray refinement every bound retinoic acid conformer experienced changes in their spatial orientation and position. Table 3 1 lists the backbone 1 5 torsion angles (absolute values) of the six selected bound retinoic acid conformations as well as their QM/MM refined structures. In addition to this analysis, we al so measured the dihedral angles between two planes whose line of intersection contains a methyl substituted double bond ( 1 (C16 C5 C6 C7), 2 (C19 C9 C10 C11) and 3 (C20 C13 C14 C1 5), see Table 3 1). From Figure 3 9 A and Table 3 1 we can see that C19 mo ves in the refined 3CWK ATRA conformer, accompanied by a torsion angle 2 (C19 C9 C10 C11) change from 62 to 2 (the free value found above should be either 0 or 180 degree). When it com e s to 1CBS no significant changes took place in the polyene backbone and the most remarkable displacement caused by the QM/MM refinement involved in the rotation of the C6 C7 single bond, 1 (C5 C6 C7 C8), which adjusts from 33 to 42 which is much closer to the favorable value for this to 9 B and Table 3 1). Similarly the 2LBD and its refined structure are quite similar with the 1 (C5 C6 C7 C8) angle experiencing the largest change from 45 to 48 (Figure 3 9 C and Table 3 1).

PAGE 67

67 Table 3 1 Backbone torsion angles (in degrees) with respect to the PDB deposited RA conformers and their QM/MM re refined geometries. 3CWK A300 1CBS A200 2LBD A500 2VE3 A1445 2VE3 B1445 1XDK A500 PDB QM/MM PDB QM/MM PDB QM/MM PDB QM/MM PDB QM/MM PDB QM/MM | 1 |(C5 C6 C7 C8) 48.7 49. 9 32.5 41.7 45.1 48.1 68.1 68.5 77.6 64.3 58.3 45.4 | 2 |(C7 C8 C9 C10) 174.1 172.0 178.5 169.8 179.8 179.7 163.3 178.7 159.6 178.3 179.3 160.2 | 3 |(C9 C10 C11 C12) 179.4 178.6 178.7 176.1 179.4 178.8 155.4 172.4 161.2 179.3 175.5 174.2 | 4 |(C11 C12 C13 C14) 171.4 173.7 179.8 178.9 176.8 177.2 166.9 160.8 174.3 150.8 176.5 169.2 | 5 |(C13 C14 C15 O1) 27.2 24.8 129.1 49.0 3.5 8.7 134.1 151.5 3.8 126.1 141.1 142.4 | 1 |(C16 C5 C6 C7) 1.8 0.1 6.2 3.2 2.2 0.0 1.4 3.7 0.3 2.6 0.2 2.0 | 2 |(C19 C9 C10 C11) 61.5 2.4 3.0 3.3 1.4 7.0 20.1 2.1 13.6 6.1 179.9 173.2 | 3 |(C20 C13 C14 C15) 4.9 0.2 0.2 1.7 1.3 13.6 27.7 5.7 28.3 11.3 0.5 8.5

PAGE 68

68 Former studies in our lab have already demonstrated that a QM/MM X ray refinement approach was a potent tool in fixing the internal coordinate errors of drug like molecules bound to their receptors, which leads to an immense red uction of ligand conformational strain energy 50 In other words, strain energy itself could be employed as a criterion to quantitatively evaluate the effect of our QM/MM X ray refinement method Therefore, we extracted the coordinates of the retinoic acid conformers from each selected PDB RA protein complex, as well as the refined structure, and computed the global and local strain energy in both the gas phase and PCM solvent model. Here we defi ne the global strain energy as the energy difference between the crystallographic conformation and the lowest energy conformation of the free ligand, whereas local strain energy of a bound ligand represents the energy difference between the experimentally bound conformation and the nearest local minimum on conformational energy surface obtained by unconstrained geometry optimization. Figure 3 10 Exploring the conformational spaces of A) ATRA and B) 9 cis RA with Omega plus HF/6 31G* in the PCM solvent mo del.

PAGE 69

69 Table 3 2. Calculated strain energies using the MP2/CBS method for retinoic acid conformers deposited in PDB and their corresponding QM/MM re refined structures. 3CWK A300 1CBS A200 2LBD A500 2VE3 A1445 2VE3 B1445 1XDK A500 PDB QM/ MM PDB QM/MM PDB QM/MM PDB QM/MM PDB QM/MM PDB QM/MM GAS Local Strain 65.1 1.9 22.4 3.0 30.4 7.0 115.0 6.7 101.3 7.0 40.0 7.1 Global Strain 65.6 2.4 22.9 3.5 30.4 7.0 115.0 6.7 101.3 7.0 40.5 7.6 PCM Local Strain 64.2 2.1 19.5 2.1 27.7 5.6 112.2 6.2 99.6 7.3 37.1 5.1 Global Strain 64.5 2.4 19.8 2.3 27.8 5.6 112.3 6.3 99.7 7.4 37.5 5.4

PAGE 70

70 A similar strategy was used to look for the ATAR and 9 ci s RA global minimum in aqueous environment, which means we carried out the HF/6 31G(d) geometry optimization on all Omega identified low energy conformers with the PCM solvent model and the results are shown in Figure 3 10 Table 3 2 gives the collected da ta for the strain energy calculations in both the gas phase and PCM solvent model using the MP2/CBS method. In the case of 3CWK, the optimization of the entire geometry via QM/MM re refinement causing a 97 % reduction in the local strain energy as compared to the PDB deposited RA conformer (Table 3 2). For 1CBS and 2LBD even though only a few torsion angles were moderately optimized via QM/MM refinement, the local strain dropped from 22.4 to 3.0 and 30.4 to 7.0, respectively (Table 3 2). Both the 2VE3 A1445 and B1445 ATRA PDB deposited conformations had distorted polyene chains and after QM/MM X ray refinement an apparent change of the backbone was seen in each structure (Figure 3 9D and 3 9 re 163 and 155 for A1445 and 159 and 161 for B1445, while after QM/MM refinement they are 178 and 172 for A1445 and 178 and 179 for B1445 (see Table 3 1) which are in better accord with the low energy conformers observed in the gas and PCM solution C9 C10 C13 C14 C9 C10 C11) = 13, C13 C14 C15 ) = 28 for B1445, respectively. QM/MM refinement relaxed the C12 C13 C14) torsion angles for both the A1445 and

PAGE 71

71 B1445 PDB deposited conf ormers remained distorted even after QM/MM refinement and this is the main reason why the re refined A1445 and B1445 ATRA structures still have relatively high local strain energies (6.7 Kcal/mol for A1445 and 7.0 Kcal/mol for B1445). However in comparison with the 115.0 and 101.3 Kcal/mol conformational strain energies found in the initial PDB structures, our QM/MM refinement results for 2VE3 are good enough Finally although the 1XDK crystallographic and QM/MM refined conformation are nearly identical (Ta ble 3 1 and Figure 3 9 F), the local strain trends downward from 40.0 to 7.1 Kcal/mol (Table 3 2). Table 3 2 also reveals the role solvent plays in the calculation of strain energy. In the gas phase the local strain energy of the PDB deposited coordinates ranged from 22.4 to 115.0 Kcal/mol with a mean of 62.4 Kcal/mol while the PCM solvent model reduced the range of local strain energies from 19.5 to 112.2 Kcal/mol with an average value of 60.1 Kcal/mol The strain energies calculated for the crystallograp hically determined ligands relative to the global minimum ranged from 22.9 to 115.0 Kcal/mol in vacuo and 19.8 to 112.3 Kcal/mol in aqueous solution. For the QM/MM refined structures the local strain energy ranged from 1.9 to 7.1 Kcal/mol with an average v alue 5.4 Kcal/mol in the gas phase. Local strain calculations using a continuum solvent model ranged from 2.1 to 7.3 for refined structures with a mean value of 4.7 Kcal/mol Similarly, the PCM solvent model gave a decreased global strain energy ranging fr om 2.3 to 7.4 Kcal/mol versus 2.4 to 7.6 Kcal/mol for the refined retinoic acid conformations in the gas phase. When solvent effect is considered in strain energy calculation, t he 1XDK 9 cis RA refined structure has the most significant reduction of local strain energy in aqueous solution (7.1 5.1 = 2.0 Kcal/mol ) followed by 2LBD (7.0 5.6 = 1.4

PAGE 72

72 Kcal/mol ) then the 1CBS and 2VE3 A1445 refined conformers (3.0 2.1 = 0.9 Kcal/mol and 6.7 6.2 = 0.5 Kcal/mol respectively). However compared to the results in the gas phase, the 3CWK and 2VE3 B1445 refined conformations underwent a slight increase in local strain energy with the PCM solvent model, mainly due to the difference between the local minimum identified in the gas phase and in the PCM solvent model Note that Table 3 3 reports the real space R factor at a given weighting factor, from which we deduce that the reduction in strain energies was not caused by over fitting to quantum mechanics (the weighting factors are small (1 or less) and the real space R factors are ~0.1 or less). Table 3 3 also shows that our QM/MM refinement protocol yields to lower real space R factors than the conventional CNS method (force field only), which demonstrates the QM/MM re refined RA conformers better fit the experimental electron density data. The RMS D values in Figure 3 9 also reveals t hat strain relief arises due to small changes in the geometries between the original PDB and the re refined RA conformers. Table 3 3. Comparing real space R factors/r eal space correlation coefficients ( RSR/ RSCC) of different refinement methods at given weighting factors. 3CWK A300 1CBS A200 2LBD A500 2VE3 A1445 2VE3 B1445 1XDK A500 Weighting Factors w a = 0.3 w a = 0.4 w a = 1.0 w a = 1.0 w a = 1.0 w a = 1.0 CNS/MM 0.060/0.940 0.086/0.914 0.14 1/0.859 0.143/0.857 0.129/0.871 0.043/0.957 QM(HF/6 31G*)/MM 0.049/0.951 0.084/0.916 0.102/0.898 0.072/0.928 0.076/0.924 0.030/0.970 3.4 Conclusions The present study describes a thorough conformational analysis of retinoic acid using Omega and quantum mechanical calculations. A total of 66 and 48 energy minima have been identified and characterized by means of HF/6 31G(d) geometry optimization

PAGE 73

73 in the gas phase for ATRA and 9 cis RA, respectively. It was found that the torsio n angles associated with the pol yene chain have significant impact on the conformational energy, which in turn affects the retinoic acid backbone orientation. Based on the absolute values of these torsion angles, we divided the 66 ATRA energy low energy conformers into eight groups. T he low energy ATRA conformers, not unexpectedly, prefer the double bonds to be coplanar, which is reflected by the reduction of the relative potential energy. A similar structure energy relationship was also observed in the forty eight 9 cis RA low energy conformations. Additionally, the single point energies using the M06 and M06 2X functionals with different basis sets were calculated. The result of these calculations indicates that the M06 2X level of theory with the aug cc pVDZ or aug cc pVTZ basis set is satisfactory in estimating the conformational energy of both ATRA and 9 cis RA at the HF/6 31G(d) optimized geometries in the gas phase. Five selected ATRA conformers from four ATRA protein crystal structures found in the PDB were previously investigate d using the ab initio QM/MM X ray refinement approach and are further analyzed herein with respect to the uncomplexed gas and solution phase conformers. In addition we performed the same analysis on a freshly re refined 9 cis RA complex (PDB ID 1XDK) in this paper as well. Every re refined retinoic acid conformation preferred to arrange all conjugated double bonds of the polyene tail in the same plane with 1 (C5 C6 C7 63), which is in good agreement with the conformational analysis results of free RA using Omega plus HF/6 31G(d) geometry optimization. As a consequence the internal ligand strain energies of QM/MM refined RA structures are lower relative to those from

PAGE 74

74 the corresponding PDB dep osited conformations. In the gas phase the local strain energy computed at the MP2/CBS level of theory has an average value of 5.4 Kcal/mol among refined RA conformers versus the 62.4 Kcal/mol found for the original PDB structures. Local strain gauged rel ative to the PCM solvent model has an average value of 4.7 Kcal/mol while for the PDB conformers this mean value is 60.1 Kcal/mol (relative to MP2/CBS results). The same decrease in ligand strain after QM/MM refinement was also detected in terms of the glo bal strain energy. This again illustrates that QM/MM X ray refinement is a powerful tool in improving the intrinsic coordinate errors of bound drug like molecules, which are mainly induced by inadequate MM models used during standard X ray refinement. Nat ive retinoic acid itself was not a viable oral drug molecule due to its unsatisfactory side effects, yet its derivatives, retinoids and rexinoids such as TTAB 70,71 (CD367), TTNPB 70,72 (Ro 13 7419), TTNN 70,73,74 and LGD1069 (Bexarotene, Targretin; Ligand Pharmaceuticals), play a critical role in anti cancer treatment. The scheme for the rational design of retinoids/rexinoids focused o n narrowing the conformational flexibility by locking one or more of the conjugated double bonds of the RA backbone, which in turn will produce an energetic benefit on binding to the target protein. In order to simplify the synthesis of RA derivatives, med icinal chemists have used benzene rings to lock 1 (C5 C6 C7 C8) thereby forcing it to be 180. However according to our conformational analysis of both free and bound RA conformers, the low energy value of 1 (C5 C6 C7 C8) is around 63 implying non rigid rings might be better choices than the phenyl group. Hence our present study is not merely some benchmark calculation

PAGE 75

75 for different computational chemistry methods, but it also sheds insights on the discovery of more effective anti cancer drugs.

PAGE 76

76 CHAPTER 4 DONEPEZIL 4.1 Background D onepezil (trade name Aricept), (R S ) 1 benzyl 4 [(5,6 d imethoxy 1 indanon) 2 yl]methyl piperidine (see Figure 4 1), is a palliative anti Alzheimer drug designed based on the assumption that pharmacologically maintaining acety lcholine levels in the 75 77 .It inhibits the acetylcholinesterase (AChE) enzyme from rapidly hydrolyzing the neurotransmitter acetylcholine at cholinergic syn apses of neuronal contacts in the central nervous system, thereby increasing the concentration of acetylcholine in numerous brain regions of AD patients with a long duration of action 78 Meanwhile it has been demon strated that donepezil neither adversely affects the synthesis of acetylcholine nor inhibits the activity of choline acetyltransferase 79,80 Hence, clinical trials have shown a significant correlation between recei ving oral donepezil and the improvement in cognitive and stages 81 90 Figure 4 1. Structure diagram of donepezil with atoms number ing.

PAGE 77

77 Early ligand based research on indanone benzylpiperidines applied quantitative structure activity relationship (QSAR) methods to a series of indanone benzylpiperidines synthesized by Eisai 59,91,92 because at t hat time little was known about the structure of acetylcholinesterase (AChE). Due to Sussman and co the crystal structure of Torpedo californica AChE ( Tc AChE ) 93 and subsequen t ly the donepezil Tc AC hE complex were reported 94,95 .This allowed for the elucidation of the interaction between donepezil and AChE and facilitated post hoc rationalizations of the donepezil binding mode as well. Structurally the Tc AChE active site gorge is composed of two ligand binding sites: the acylation site located at the bottom of the gorge and the compartments were connected via a narrow bottlen eck region predominantly formed by the Phe330 and Tyr121 residues. Compared to acetylcholine, donepezil has an enhanced binding affinity with acetylcholinesterase because its longer scaffold affords the ability to span the two anionic binding sites. In add ition previous studies articulated there were three major forces stabilizing the donepezil Tc anionic binding site; (iii) a cation quaternary ammonium ion of donepezil and Phe330 at the bottleneck region 94,95 which results in strong stru ctural complementarity between donepezil and Tc AChE. On the basis of the binding pattern of donepezil towards its target enzyme, many groups have undertaken the rational design of donepezil like compounds via docking and molecular

PAGE 78

78 dynamics simulations 96 101 Nevertheless, as far as we are aware, none of them performed a thorough conformational analysis of donepezil especially in its bound state, which might play a critical role in discovering novel anti Alzheimer dr ugs with higher selectivity and reduced adverse effects. Hence, we describe below the detailed analyses of the conformational preferences of donepezil in both the free and bound state with highly accurate MP2/complete basis set (CBS) methods. Moreover, we employ a parallel QM/MM X ray refinement methodolog y 39,54,55,59 61 to the donepezil Tc AChE complex, thereby, providing deeper insights into the AChE inhibitor binding mechanism. 4.2 Computational Approaches 4.2.1 Ex ploring Donepezil Conformational Energy Surface Omega version 2.4.3 from Openeye Scientific Software 40 was employed to g enerate a conformational ensemble for donepezil. Jonas Bostro m et al. evaluated the performance of Omega with different settings of parameters in 2003 41 and according to their analysis we selected a 5 Kcal/mol cut all unique generated donepezil conformations. Here we were only interested in generating donepezil conformers around the global energy minimum, thus there was no Since our QM/MM X ray refinement protocol used the HF/6 31G* level of theory as th e Hamiltonian to optimize the donepezil bound conformations, further gas phase geometry optimizations employed the same model chemistry. The Gaussian 09 suite of programs 42 were chosen to perform all ab initio V to fully relax all local and global minimum conformations

PAGE 79

79 4.2.2 Single Point Energy Calculation To describe the details of the donepezil conformational energy surface, we performed single point calculations, in the gas phase, for all Omega/ ab i nitio QM identified low energy conformers at several different level of theories computed with Gaussian 09 (including M06L /6 3 11+G**, M06L/aug cc pVDZ, M06 2X/6 311+G**, M06 2X /aug cc pVDZ, M06 2X /aug cc pVTZ, M06/6 311+G**, M06/aug cc pVDZ and M06/aug cc pVTZ). We employed MP2 complete basis set (CBS) calculations as benchmarks and compared the MP2/CBS outcomes with several DFT methods. For a more mathematical description about the strategy of CBS extrapolation, please refer to our previous work on ibuprof en and retinoic acid 50,102 4.2.2 Parallel QM/MM X ray Crystallography R efinement In ab initio molecular orbital calculations, the most time consuming step is computing the so called ele ctron repulsion integrals (E RI) (4 1) Calculating these two electron integrals typically uses the direct self consistent field (SCF) approach to generate the Fock matrix. The F ock matrix is defined as (4 2) where is the one electron integral matrix for fixed nuclei, is the density matrix obtained from basis sets coeffici ent matrix multiplication, is the two electron coulomb integrals and is the two electron exchange integrals, respectively. Here we implement distributed data parallel algorithms 103 into our in house package QUICK with MPI (Message Passing Interface) library so as to increase the

PAGE 80

80 computational efficiency of two electron Coulomb and exchange integrals in Fock matrix. Figure 4 2 presents the distributed and parallel algori thm of executing the quadruple loop of basis shells in multi processors cluster. The outermost loop task of index i was chunked and distributed for each processor and data were transferred outside the inner nested kl loops. Index i, k and l varies from 1 t o n where n is the number of basis functions, whereas j varies from 1 to i since direct SCF method by default formed Fock matrix symmetrized and only the upper triangular Fock matrix element ought to be computed. Here we performed parallel QM/MM X ray refi nement on donepezil Tc AChE crystal structure (PDB 1EVE) with QUICK and AMBER 10 62 and used SANDER program in Amber 10 as an interface to combine gradients from Crystallography & NMR System (CNS) and QM/MM calculation. The QM region was defin itself. Note that we assigned a hydrogen ion to the N14 atom and make it become quaternary ammonium because this charged nitrogen is essential for donepezil to bind in the acylation site of Tc AChE. HF/6 31G* level of theory was selected as the QM region Hamiltonian and the weighting factor was set to 1.0. This calculation was executed on AMD Opteron 2356 2.3GHz quad core server processor with 1000 energy minimization cycles in AMBER and successfully converged within one week. For comparing our parallel QM/MM X ray refinement protocol to classical MM X ray structure determination methods, we also rebuilt the donepezil Tc AChE model with CNS 1.3 and Refmac5.5 (CCP4 program suit) by iterative cycles of simulated annealing at weighting factor = 1.0. For more detail about the CNS and Refmac parameters setting, please see Appendix C

PAGE 81

81 Figure 4 2. Pseudo code of parallel algorithm for computing 2 electron repulsive integrals in QUICK package 4.3 Results and Discussion 4.3.1 Exploring Donepezil Conformational Energy S urface We initially analyzed the donepezil conformational energy surface using Omega, which yielded 66 donepezil conformations all within 5 Kcal/mol. All of these Omega ge nerated geometries were fully relaxed at the HF/6 31G* level of theory, which is consistent with the Hamiltonian used in our choice of basis set in the QM/MM X ray

PAGE 82

82 refinement approach. This analysis resulted in 18 unique low energy donepezil conformations from the original 66 (Fi gure 4 3 A and Figure 4 4 ). Future work of vibrational analysis might be performed at these 18 optimized geometries so as to exclude the transition state. Figure 4 3. A) Identifying donepezil low energy conformers with Omega and H F/6 31G* in gas phase and their relative energies were estimated by B) M06; C) M062x; D) M06L functionals with different basis sets in vacuo

PAGE 83

83 Figure 4 4. Schematic representation of the 18 donepezil low energy conformers identified by Omega and HF/6 31G* in gas phase.

PAGE 84

84 In order to gain further insight on donepezil conformational energy surface, we carried out a series of M06, M06L and M06 2X single point energy calculations on the predicted local and global energy minimum of donepezil, since previous stud ies afforded us strong confidence in the performance of M06 class of density functionals on estimating conformational energies of drug like molecules 39,50,102,104 Note that for all DFT calculations in gas phase we treated the N14 atom as a quaternary ammonium merely for theoretical calculation and it would not exist in vacuum in the real world. Figure 4 3B shows the relative energy values calculated for the 18 donepezil low energy conformations using the M06 funct ional with different basis sets. For E20_01, E20_02, E20_04 and E20_06 the M06 functional computed energies resemble the MP2/CBS results, yet no matter which basis set was selected, they overestimated the MP2/CBS results by about 1 ~ 1.5 Kcal/mol on the re maining 14 low energy conformations. The same trend was observed in the M06L curves and their results were in marked contrast to the MP2/CBS values for the donepezil conformations with higher energies ( see E20_09 to E20_18 in Figure 4 3 C, the M06L /aug cc p VTZ calculation data was not shown because of SCF convergence issues). For single point calculations with the M06 2X functional (Fi gure 4 3 D), the DFT computed donepezil conformational energies agree quite well with the MP2/CBS result using either Pople ba sis set (6 311+G**) or Dunning basis sets (aug cc pVDZ and aug cc pVTZ). Based on the discussion above, one question naturally presents quantitative way to tell the difference between DFT and MP2/CBS methods in terms of donepezil confo certainly yes. If we treat the 18 donepezil low

PAGE 85

85 distribution free two side all treatments multiple comparisons to test our conclusion (Wilcoxon Nemenyi McDonald Thompson Test based on Friedman rank sums). All these non parametric statistical inference results were summarized and listed in Appendix B. For the 6 311+G** basis set, we have very strong evidence (te st statistic = 2 which is much less than the critical value = 18.66 at M06 2X and MP2/CBS outcomes are well matched = 0.05 level) to declare that there is no significant difference between M06 and MP2/CBS results, while we reject the null hypothesis (p value < 0.05) that the M06L /6 311+G** and MP2/CBS methods are equivalent with respect to the donepezil conformational energies For the aug cc pVDZ and aug cc pVTZ, we have reached the conclusion that only M06 2X functional is not different from the MP2/CBS method (fails to reject the null hypothesis with p value > 0.05), whereas neither M06 nor M06L with these basis sets, yield results comparable to the MP2/CBS outcome. Figure 4 5. A) Crystal structure of d onepezil in its binding pocket. Electron density is shown as a wi donepezil bound conformation.

PAGE 86

86 So in summary, in comparison with the M06L and M06 functionals, only the M06 2X level of theory provides reliable estimates of how much the conformational energy changes among different donepezil fully relaxed geometries. Hence, our suggestion would be to use this level of theory to gain insight into the conformational preferences of molecular systems similar with donepezil at a reduced computational cost relative to MP2/C BS methods. 4.3.2 Parallel QM /MM Refinement of D onepezil Tc AChE Complex Crystal S tructure Even though donepezil is a very important drug molecule in treating Alzheimer disease, only one donepezil acetylcholinesterase complex crystal structure has been depo sited in the PDB database with a resolution of 2.50 (PDB entry 1EVE, Torpedo californica acetylcholinesterase, Tc AChE). From the paper reporting this structure we can see the electron density around C28 is missing in the initial difference Fourier map 95 Therefore we reinterpreted the 2F 0 F c map wrapping t he ligand at 1.5 sigma level, but the electron density part surrounding C28 atom was still absent (Figure 4 5 A), which indicated the significant uncertainty in the predicted coordinates for C28 atom. By minimizing the PDB conformation we found that the C28 O27 C2 rotated to 75.1 from 5.3 and this slightly altered donepezil bound conformation is shown in Figure 4 5B. This conformation was use d as the initial ligand geometry for QM/MM X ray refinement. The same geometry was also re refined by conventional force field methods including CNS 1.3 63 and Refmac 5.5 105 (see Computational Approaches part for detail), as well as fitted into the 2F 0 F c electron density map using AFITT 40,106 Moreover, the donep ezil geometry determined by X ray powder diffraction was extracted from the Cambridge Structural Database (CSD) in order to compare with modeling results based on the donepezil Tc AChE X ray crystallography data.

PAGE 87

87 Figure 4 6. Superimposition of donepezil X ray structures plus re modeled bound conformations with their nearest local minima (green) and global minimum (purple) in A) gas phase and B) PCM solvent model. Figure 4 6 shows the donepezil PDB deposited conformation, three re refined X ray structures (QM/MM, CNS and Refmac), the AFITT fitting solution and the CSD

PAGE 88

88 crystal structure overlapping the optimized geometries. From this figure we can see that the position of N14 is the most obvious coordinate error existing in the original PDB conformation whe n compared with its fully relaxed geometries (in the gas phase and with the PCM model). This charged nitrogen in the piperidine ring should adopt a sp 3 hybridized (not an ideal sp 3 hybridization given that the N14 atom is connected to different groups), w hich might give rise to similar values of bond angles C13 N14 C15, C15 N14 C17 and C17 N14 C13. F rom Table 4 1 we can observe that the fully relaxed donepezil geometry (in PCM) has 1 (C13 N14 2 (C15 N14 C17) = 3 (C17 N14 C13) = 113.1, respectively. Table 4 1. Conformational features (in degrees) of six investigated conformations and their optimized geometries (numbers in bracket ) PDB AFITT CNS QM/MM REFMAC CSD Gas phase 1 (C4 C9 C7 C8) 0.1 (10.5) 0.4 (8.9) 0.4 (7.2) 0.7 (10.5) 0.7 (10.5) 2.4 (11.2) 2 (C5 C7 C9 C8) 0.7 (10.0) 0.7 (8.4) 0.6 (6.5) 1.5 (10.0) 0.6 (10.0) 2.7 (10.6) 1 (C13 N14 C15) 113.4 (111.1) 111.9 (111.1) 114.4 (108.5) 111.3 (111.1) 125.7 (111.1) 110.6 (110.9) 2 (C15 N14 C17) 100.9 (111.8) 110.5 (111.8) 100.7 (112.1) 111.4 (111.8) 112.8 (111.8) 112.7 (113.0) 3 (C17 N14 C13) 122.5 (113.0) 112.7 (113.0) 119.5 (113.9) 112.5 (113.0) 121.5 (113. 0) 109.5 (111.9) PCM 1 (C4 C9 C7 C8) 0.1 (10.6) 0.4 (10.1) 0.4 (10.8) 0.7 (10.0) 0.7 (10.6) 2.4 (10.5) 2 (C5 C7 C9 C8) 0.7 (9.9) 0.7 (9.4) 0.6 (10.1) 1.5 (9.8) 0.6 (9.9) 2.7 (10.6) 1 (C13 N14 C15) 113.4 (111.2) 111.9 (111.2) 114.4 (111.2) 111.3 (111.1) 125.7 (111.2) 110.6 (110.9) 2 (C15 N14 C17) 100.9 (111.0) 110.5 (111.0) 100.7 (111.0) 111.4 (111.8) 112.8 (111.0) 112.7 (113.0) 3 (C17 N14 C13) 122.5 (113.1) 112.7 (113.1) 119.5 (113.0) 112.5 (113.0) 121.5 (113.1 ) 109.5 (111.9)

PAGE 89

89 However, in the PDB geometry the 1 (C13 N14 2 (C15 N14 3 (C17 N14 C13) = 122.5, respectively (see Table 4 1). This 1 (C13 N14 C1 5) = 2 (C15 N14 3 (C17 N14 C13) = 112.5 (Figure 4 6 and Table 4 1). Note that AFITT also gave a good result in terms of the N14 atom position using the MMFF94 force field in its fitting process. Another important conformational determinant of donepezil is the placement of carbon atom C8 (the only chiral center) since it controls the orientation of the piperidine and benzyl moieties within the binding pocket. The position of this asymmetric carbon atom lies above or below the C7 C5 C4 C9 plane that defines the cyclopentanone 1 (C4 C9 C7 2 (C5 C7 C9 C8) that quantifies the degree to which the envelope like conformation of the cyclopentanone ring is adopted. Their val ues are summarized in Table 4 1. From this table we observe that the C8 atom in donepezil PDB conformer is essentially planar (about 0.1 from planarity, see Table 4 1 and Figure 4 6 ) with the r emaining four atoms in the cyclopentanone five membered ring. A similar trend was observed for the CNS, Refmac and AFITT methods with Refmac showing the most puckering. The QM/MM refinement approach puckered the cyclopentanone ring a little bit more than the other methods and the details of its impact on donepezil T c AChE binding patterns are discussed below Figure 4 7 shows views of the modeled donepezil conformations plus the CSD geometry binding in the Tc AChE active site gorge with the amino acid residues involved in ligand recognition and stabilization highlighted (Trp84, Trp279 and Phe330). Since

PAGE 90

90 the quaternary ammonium, benzyl and indanone are the three major functional groups mediating donepezil Tc conformations of donepezil were further evaluated by the following metrics: (i) distance between the center of mass of the donepezil phenyl group and that of the six membered benzene ring in the Trp84 indole functional group (d1); (ii) distance between the center of mass of the six membered benzene ring in the donepezil 1 Indanone group and that of the aromatic ring of Trp279 (d2) and (iii) the distance between N14 and the center of mass of the P he330 benzyl ring (d3). Table 4 2 lists these three distances for the various donepezil Tc AChE modeled structures. For the distance between the donepezil phenyl group and the aromatic ring of Trp84, the QM/MM refinement yields shortest d1 distance that is in agreement with the workers 107 They reported that the separation in the benzene dimer was 3.7 in the sandwich configuration using MP2/aug cc pVTZ optimized geomet ries. This arises because the QM/MM refinement reduced the deviation from an ideal envelope conformation for the C4 C5 C7 C8 C9 torsion of the cyclopentanone ring, which in turn altered the spatial orientation of the donepezil benzyl group. From Figure 4 6 we can find that comparing with the other modeling approaches, the phenyl group of the QM/MM refined donepezil conformer overlaps the counterpart of its fully relaxed geometry ( in vacuo and in solution) to the maximum extent. Nonetheless, for this interac tion all methods do a reasonable job of modeling the d1 distance.

PAGE 91

91 Figure 4 7. A) The active site gorge of Tc AChE ; overlays of PDB deposited conformer (white) and B) AFITT (orange); C) CNS (cyan); D) QM/MM (blue); E) Refmac (pink); F) CSD (purple ).

PAGE 92

92 Tab le 4 2. Distances (in ) of cation Tc AChE recognition and association. PDB AFITT CNS QM/MM REFMAC d1 (Trp84 --Benzyl group) 3.88 3.92 3.88 3.86 3.87 d2 (Trp279 --1 Indanone group) 4.07 4.10 4.07 4.11 4.17 d3 (Phe330 --N + ) 4.06 4.10 4.08 4.12 3.93 Sinnokrot et al. also found that the parallel benzene dimer plane was separated by 3.8 in the parallel displaced configuration yet all modeling results have the d2 value at least 0.27 higher than this value. One possible explanation is the fact that the benzene dimer is interaction between the six membered ring within indole due to electronic differences between indole and benzene. For d3 value we use Felder and co workers study 108 on cation between tetramethylammonium and benzene as a reference to evaluate the different modeling methods. They reported that the distance between the charged nitrogen atom of tetramethylammonium and the center of benzene was 4 .18 estimated by the MP2/6 31G* level of theory (configuration 1 ref. 94 ). Our QM/MM and AFITT results agree well with their computational outcome since only these two methods corrected the N14 atom coordinate position. The d3 value for Refmac was 0.25 lower than the MP2/6 31G* result mainly due to the sp 2 hybridized configuration of the N14 atom in its re refined structure (see Figure 4 6 and Table 4 1). 4.3. 3 Strain Energy One useful tool for evaluating the quality of mod eled donepezil conformations is their strain energies distorted structure with high strain energy. In this study the strain energy was defined as

PAGE 93

93 the single point energy difference between a conformer and its fully optimized geometry point energy calculations were employed to calculate the strain energy for the five donepezil bound conformations plus the CSD geometry with respect to the nearest local minimum (local strain) and glo bal minimum (global strain) on the donepezil conformatio nal energy surface (see Figure 4 6 for local/global minimum conformations). As shown in Table 4 3, the QM/MM refinement method generated the lowest local strain energy when compared with other modeli ng methods (4.4 Kcal/mol in gas phase and 3.8 Kcal/mol in PCM solvent model). The AFITT fitting protocol successfully corrected the configuration of the quaternary ammonium yet it retained the planar conformation of the five membered cyclopentanone ring yi elding 17.0 Kcal/mol local strain energy in the gas phase and 16.0 Kcal/mol in the continuum solvent model. Three conventional force field modeling approaches (PDB, CNS and Refmac) neither corrected the position of the charged nitrogen atom nor appropriate ly placed the chiral carbon atom in the 1 Indanone group. Therefore their local strain energies ranged from 55.3 Kcal/mol to 75.6Kcal/mol in vacuo and from 55.2 Kcal/mol to 72.3 Kcal/mol in solvent, with average values of 68.3 and 66.2 Kcal/mol, respective ly. Table 4 3. Strain energies (in Kcal/mol ) of AFITT fitting result, CSD crystal structure, PDB deposited conformer and its re refined geometries A) in vacuo ; B) in PCM solvent model. PDB AFITT CNS QM/MM REFMAC CSD GAS Local 55.3 17.0 74.1 4 .4 75.6 252.8 Global 63.3 22.5 79.0 12.4 83.6 260.8 PCM Local 55.2 16.0 71.0 3.8 72.3 252.7 Global 60.3 17.7 76.1 9.7 77.4 254.7

PAGE 94

94 The extremely high local strain energy of the CSD conformation originates from the rotation of the N14 C17 single bond that in turn gives rise to an energetically unfavorable conformation. From Figure 4 7 we can clearly observe that the phenyl group of CSD donepezil conformation nearly touches the Phe330 residue located in the Tc AChE active site. It is worthwhile to point out that the global strain of the QM/MM re refined conformer was about 8 Kcal/mol higher than its local strain in the gas phase and 6 Kcal/mol higher in aqueous solution, respectively. We ascribe these energy differences to the rotation of the C8 C10 and C10 C11 sigma bonds in the global minimum conformations (gas phase and solvent) that in turn altered the orientation of the piperdine and benzyl rings relative to the re refined geometry and induced more conformational strain (Figure 4 6 ). However given the narrow shape of Tc AChE active site gorge, these low energy conformations do not fit well into the binding pockets, so 6 and 4 7). Another possible factor contributing to this phenomenon was the location of the C28 atom (Figure 4 6). Recall that we used O27 C2 C1) = 75.1 to place the C28 atom in the initial structure of donepezil for modeling (Figure 4 5) and after QM/MM refinement the dihedral angle O27 C2 C1) was 96.4. Whereas the most energy favorable conformations identified by Omega plu s HF/6 31G* ( gas phase and solvent) both have O27 C2 C1) = 180, which were in agreement with the CSD geometry and the AFITT modeling result. Note that CNS and Refmac both returned this dihedral angle to about 5 just as observed in the original PDB conformation.

PAGE 95

95 4.4 Conclusion In the present study we first carried out single point calculations on 18 free state donepezil low energy conformations using MP2/CBS and M06 functionals, which enabled a detailed and accurate understanding of the donepezil con formational energy surface. Further molecular modeling and conformational analysis for bound donepezil conformations were performed with using QM/MM X ray refinement, CNS, Refmac and AFITT fitting. Ab initio geometry optimizations at the HF/6 31G* level of theory were executed on each of these modeled conformers along with the CSD geometry. Compared with the fully relaxed geometry in the gas phase as well as in a continuum solvent model, the original PDB structure incorrectly positioned the N14 atom that pl ays a key role in donepezil Tc AChE recognition and association Moreover a nother notable difference between the PDB deposited conformation and its optimized geometry was the extreme flattening of the envelope conformation of the cyclopentanone ring as a re sult of misplacing the chiral center located on the indanone functional group along with force field errors in modeling five membered rings. Our QM/MM refinement approach improved these two coo rdinate errors and consequently released the conformational str ain existing in the PDB deposited structure which in turn reduced the local strain energy to 3.8 Kcal/mol in aqueous solution and 4.4 Kcal/mol in vacuo Other modeling methodologies did improve some aspects of the structure, but largely retained significa nt strain energies of greater than 17 Kcal/mol.

PAGE 96

96 CHAPTER 5 OSELTAMIVIR 5 .1 Background Oseltamivir, (3R 4R, 5S ) 4 acetamido 5 amino 3 pentan 3 yloxycyclohexene 1 carboxylic acid (see Figure 5 1), is a potent anti flu drug designed to interrupt influenz a virus replication cycle by preventing the neuraminidase (NA, a type of sialidase) catalyzing process of removing the virion progeny terminal Neu5Ac residues (sialic acids) linked to glycoproteins and glycolipids. Based on the previous research of Influenza virus sialidase catalytic mechanism, the inhibition of oseltamivir against neuraminidase is primarily attributed to the cyclohexene core o f this drug that can closely mimic the putative transition state of the sialic acid (a sialosyl cation intermediate ) 109 Therefore the six membered ring of oseltamivir ought to adopt a half chair conformation (the most energy stable form) with C2 symmetry. Figure 5 1. Structure diagram of oseltamivir with atom numbering. This half chair conformation is partially induced by three basic amino acid residues: Arg292, Arg118and Arg371, which together with Tyr347 engages the

PAGE 97

97 carboxylate group of oseltamivir as well. Meanwhile the amino group and acetamido group of oseltamivir are stabilized by Glu119, Asp151 and Arg 152 via electrostatic interactions, respectively 110,111 All these amino acid residues are highly conserved hence in this study we performed a n exhaustively conformational analysis of oseltamivir so as to detect the conformational preference of its bound conformation as well as investigate the binding mode between the inhibitor and neuraminidase residues. 5 .2 Methods 5 .2.1 Pure QM/MM energy mini mization In order to determine the role of molecular potential played in energetically restrained refinement formalism (EREF) exploited in the oseltamivir neuraminidase (NA) crystal structure refinement, we carried out a serial of pure QM/MM geometry optim izations (weighting factor is equal to zero) towards the oseltamivir NA complex in which the QM region only includes the inhibitor itself. Three protocols were employed as attempts to find out the best way of releasing the conformational strains existed in the original PDB oseltamivir NA conformation: Protocol 1: In this protocol we directly used the PDB deposited coordinates as the initial geometry and performed the QM/MM energy minimization with 1000 loops. Protocol 2: In this protocol we first executed a typical 1000 loops MM energy minimization on the inhibitor 2 ) positional restraints on the ligand so as to remove structural artifacts of amino acids to the maximum degree. After that the new positions of all atoms will be used as the reference for the QM/MM optimization. Protocol 3: The first step of this protocol was as same as that of protocol 2. However before the QM/MM cycle we carried out an additional 1000 loops MM energy

PAGE 98

98 2 ) positiona l restraints on oseltamivir in order to optimize its conformation toward a local minimum point on the conformational energy surface. Each protocol was combined with three different MM energy minimization algorithms: conventional conjugated gradient method (CG), Polak Ribiere nonlinear conjugate gradient algorithm (PRCG) and truncated newton linear conjugate (TNCG). For the QM energy minimizations we select four level of theories as the Hamiltonians: HF/6 31G*, M06/6 31G*, M06 2X /6 31G* and M06 HF/6 31G*. Th e MM force field parameters for the protein were taken from AMBER 10(ff99SB) and all QM calculations were done by using Gaussian 09 and embedded into MM region via our upgraded in house package PUPIL 2.0. 5 .2.2 QM/MM X ray structure refinement The deposite d PDB coordinate for 2HT7 (N8 subtype neuraminidase) was chosen as the initial geometry for our QM/MM refinement. The QM region was defined as the oseltamivir The Hartree Fock model chemistry with Pople 6 31G* basis set were employed as the QM Hamiltonian for the QM region. All QM/MM refinement was performed using AMBER 10 plus our in house ab initio code QUICK and CNS (Crystallography and NMR System). 5 .3 Results 5 .3.1 Exploring the conformational energy surface of free state oseltamivir The PDB deposited geometry of oseltamivir (PDB ID 2HT7) has been taken into account in our study as the sta rting point in the conformational search for all possible energy favorable oseltamivir conformations. As a result 76 oseltamivir conformers were generated by Omega at 5 Kcal/mol cut field. We

PAGE 99

99 set up a small relative en ergy threshold (5 Kcal/mol) mainly because our interest here is only in finding oseltamivir global minimum Further ab initio geometry optimizations of all Omega produced free state oseltamivir conformations were done at HF/6 31G* level of theory and final ly it yielded 12 energy minima. Figure 5 2. Schematic representation of the 12 oseltamivir low energy conformations.

PAGE 100

100 Figure 5 3. The relative energies of Omega/QM methods identified oseltamivir energy favorabl e conformers were estimated by A) M06; B) M06 2X ; C) M06 L and D) MP2 level of theories in vacuo and comparing with MP2/CBS results. From Figure 5 2 we can see that the biggest difference among these 12 oseltamivir low energy conformers is the orientation of their hydrophobic pentyloxy functional g roups, which are primarily due to the rotations of sigma bond C1 O9, O9 C13, C13 C16 and C13 C14. In order to precisely portrait the conformational energy surface of oseltamivir and locate the global minimum point, we also performed high level wave functio n based complete basis set (CBS) calculation s on the 12 identified

PAGE 101

101 oseltamivir low energy conformations. MP2/aug cc pVDZ and MP2/aug cc pVTZ single point calculation were executed to extrapolate to the MP2/CBS limit at the HF/6 31G* optimized geometries (F igure 5 3A, the 12 oseltamivir conformers labeled on x axis were named in order of increasing MP2/CBS energy). Overall, the MP2/CBS energy differences among oseltamivir low energy conformers were not large (less than 2.5 Kcal/mol between the highest and lo west energy conformer). Previous work in our group demonstrated that the M06 suite of functional s at HF/6 31G* optimized geometries gave relative energies that closely resemble MP2/CBS calculation for drug like molecules such as ibuprofen, retinoic acid a nd donepezil. Hence here we performed a series of M06 single point calculations including M06, M06 2X and especially M06 HF, which is intentionally selected to test if using fully non local exchange energy in hybrid meta GGA DFT method can improve the accu racy of estimating conformational energy. Figure 5 3 B shows the relative energy values calculated using M06 HF functional with different basis sets for the 12 oseltamivir low energy conformers. All three basis set (6 311+G**, aug cc pVDZ and aug cc pVTZ) w orks well with M06 HF functional on determining the relative energies of conformers close to global minimum point (G39_01, G39_02, G39_03 and G39_04) in comparison with MP2/CBS results. Basically the conformational energy surface in this region is very fla t and the rotation of sigma bond C13 C14 leads to only quite small changes in energy (Figure 5 2 and Figure 5 3B). For G39_05, G39_06 and G39_07, no matter which basis sets were employed, the M06 HF functional results were 0.20 Kcal/mol less than the MP2/C BS values. For the rest oseltamivir low energy conformers, they have different dihedral angle (C2 C1 O9 C13) as compared to the conformers G39_01 to G39_07,

PAGE 102

102 which is primarily ascribed to the rotation of C1 O9 sigma bond. Meanwhile their M06 HF estimated conformational energies were about 0.25 Kcal/mol deviated from MP2/CBS outcomes. Conformation al analyses of retinoic acid and donepezil in our earlier work have already illustrated that M06 2X functional can offer the most reliable estimation of how much the conformational energy changes among the low energy conformers of drug like molecules. Yet for oseltamivir when compared to the MP2/CBS method, M06 2X level of theory with both Pople and Dunning basis sets preferred the G39_03 as the lowest energy conformation in lieu of G39_01 (Figure 5 3C). The same tendency was also observed in using M06 func tional to compute the oseltamivir conformational energy (Figure 5 3D). This result indicates that the M06 functionals using hybrid exchange energy terms, while working well for the molecular systems with long hydrophobic tail (retinoic acid) or few aromat ic rings (donepezil), may not be widely successful in predicting conformational energy of molecular system with multiple hydrogen bond donors and acceptors. 5 .3.2 Conformational profiles of bound state oseltamivir Currently the RCSB PDB database contains 1 5 oseltamivir neuraminidase (NA) X ray crystallography structures and most neuraminidases in these complexes belong to the N1 or N2 subtypes. Since there have already been plenty of publications in elucidating the molecular basis of oseltamivir neuraminida se binding mode and developing novel anti flu drugs based on the 3D structures of N1 or N2 neuraminidase, we herein selected the only N8 subtype neuraminidase in PDB (entry ID 2HT7) with conformation of 150 loops (residues 147 152 ) as the target to study the conformational preference of oseltamivir bound conformation. The deposited coordinates of this oseltamivir neuraminidase complex was then rebuilt by our parallel

PAGE 103

103 X ray refinement methodology and conventional force field approac h embedded in PHENIX 1.8.1 112 plus fitted the ligand only into the 2F 0 F c electron density map using AFITT. Figure 5 4. Stereo view superimposition of A) PDB depo sited oseltamivir conformation that was remodeled by B) PHENIX C) AFITT and D) QM/MM refinement. Electron dens

PAGE 104

104 Table 5 1. Conformational features (in degrees) of oseltamivir bound conformations and their optimized geometries (numbers in bracket) PDB PHENIX AFITT QM/MM Gas phase 1 ( C1 C2=C 3 C4) 0.6(0.4) 1.2(0.4) 1.3(1.0) 0.3(0.4) 2 ( C2=C3 C4 C5) 13.7( 17.5) 14.4( 17.5) 17.5( 15.6) 18.4( 17.5) 3 ( C3 C4 C5 C6) 42.3(50.3) 40.7(50.3) 44.9(48.3) 46.9(50.3) 4 ( C4 C5 C6 C1) 57.0( 66.4) 53.4( 66.4) 56.9( 66.7) 61.0( 66.4 ) 5 ( C5 C6 C1 C2) 47.5(46.6) 36.1(46.6) 39.8(49.2) 45.4(46.6) 6 ( C6 C1 C2=C3) 17.6( 15.7) 9.2( 15.7) 12.9( 18.6) 16.7( 15.7) PCM 1 ( C1 C2=C3 C4) 0.6( 0.9) 1.2( 0.9) 1.3( 0.8) 0.3( 0.9) 2 ( C2=C3 C4 C5) 13.7( 17.7) 14.4( 17 .7) 17.5( 17.7) 18.4( 17.7) 3 ( C3 C4 C5 C6) 42.3(50.2) 40.7(50.2) 44.9(49.6) 46.9(50.2) 4 ( C4 C5 C6 C1) 57.0( 64.8) 53.4( 64.8) 56.9( 63.7) 61.0( 64.8) 5 ( C5 C6 C1 C2) 47.5(43.7) 36.1(43.7) 39.8(42.8) 45.4(43.7) 6 ( C6 C1 C2=C3) 17 .6( 12.8) 9.2( 12.8) 12.9( 12.4) 16.7( 12.8) Figure 5 4 exhibits the overlays between the initial structure for oseltamivir in PDB and its remodeled conformations, in which moderate deviations from the original PDB structures were observed. Compari ng with PDB deposited geometry, the QM/MM re (C2=C3 C4 C5) from 13.7 to C4 C5 (C4 C5 C6 C1) from 57.0 to 61.0 (Table 5 1). Meanwhile the QM/MM approach 1 O9 C13) bond angle from 125.2 to 118.8. We also carried out geometry optimization for PDB, QM/MM and PHENIX remodeled oseltamivir conformers at HF/6 31G* level of theory with polarizable continuum model and after that all three bound conformations were converged to the same local minimum point on the adjusted

PAGE 105

105 O9 C13) bond angle from 125.2 to 118.8. Figure 5 5. Superimposition of four oseltamivir bound conformations with their nearest local minima (orange) and global minimum (magenta).

PAGE 106

106 We a lso carried out geometry optimization for PDB, QM/MM and PHENIX remodeled oseltamivir conformers at HF/6 31G* level of theory with polarizable continuum model and after that all three bound conformations were converged to the same local minimum point on th 2 (C2=C3 C4 3 (C3 C4 C5 4 (C4 C5 C6 O9 C13) values are 17.7, 50.2, 64.8 and 118.5, respectively (Table 5 1). Therefore the QM/MM re refined conformation is in better agreement with its fully relaxed geometry as compared to traditional force field methods re refined oseltamivir conformers in terms of releasing the torsional strain on cyclohexene ring as well as optimizing other bond lengths and bond angles. It thus leads to huge drops of strain energies: from 31.9 Kcal/mol to 3.5 Kcal/mol in both local and global strain using PCM solvent model (Figure 5 5 ). Here we define the global strain of a bound state ligand as the energy difference between its crystallographically determined conform ation and the lowest energy minimum identified by Omega/ ab initio conformational search both, whereas the local strain is estimated as the energy difference between the bound conformation of ligand and its nearest local energy minimum point on the conforma tional energy surface. Figure 5 5 also shows that AFITT fitted oseltamivir conformer has a quite similar cyclohexene ring configuration with its optimized geometry in solution, yet its unique orientation of pentyloxy group give rise to a 4.3 Kcal/mol highe r global strain as compared to that of QM/MM re refined conformer. Note that the local strain and global strain computed in a vacuum are much higher (about 10 Kcal/mol) than in aqueous. This phenomenon is mainly ascribed to the overemphasis on intra molecu lar electrostatic interactions between the positively charged quaternary ammonium (N8 atom) and O12 atom in acetamido group, which in

PAGE 107

107 turn induced a unrealistic rotation of C6 N7 single bond and might not occur in the oseltamivir potentially breaking out the inter molecular ionic interactions between the O12 atom and Arg152 residue in the active site). Figure 5 6. Stereo views of N8 subtype neuraminidase active site with A) PDB; B) PHENIX ; C) AFITT and D) QM/MM conformers. Putativ e hydrogen bonds are shown as yellow dotted lines.

PAGE 108

108 Figure 5 6 exhibits a view of the active site binding of oseltamivir to N8 subtype loops. In the original PDB deposited co crystallized structure, oseltami vir was integrated into the salt bridge networks with several polar amino acid residues: (i) the carboxylate group of oseltamivir was held strongly by three arginine and one tyrosine residues and the distance between O11 Arg118, O12 Arg 292, O11 Arg371, O1 2 Arg371 and O12 Tyr347 are 3.8, 3.5, 3.6, 3.3, 3.4 and 2.9, respectively; (ii) the amino group forms electrostatic interaction with Glu119 and Asp151 and the distance between N8 Glu119 and N8 Asp151 are 4.1, 3.5 and 4.3; (iii) the oxygen atom of acetamido group interacting with Arg152 and their distance is 2.9. According to the overlay view of PDB and QM/MM re refined conformation in Figure 5 5, the lat t er one was subjected to slight translations and rotations from its original position. This tin y shift plus the optimized conformational features of oseltamivir greatly strengthened the ionic interactions between inhibitor and amino acid residues located on the binding site. In the QM/MM refined conformation, the distance of the Arg371 nitrogen atom s to the O11 and O12 oxygen of oseltamivir decreased by about 0.6 and 0.5 when compared with those of the PDB deposited conformer. Another change in the conformation of the local region around oseltamivir carboxylate group is a 0.8 movement of the nitro gen atom of Arg292 towards the O12 atom of the drug molecule. Moreover, the geometric divergences are also observed at the site of interaction between oseltamivir and Glu119, where the QM/MM refinement approach reduced the distances between the carboxylate oxygen of Glu119 and quaternary ammonium of oseltamivir from 4.4 to 3.5 and 3.5 to 2.8, respectively. In particular the QM/MM method reasonably altered the orientation of Asp151 that

PAGE 109

109 consequently shortened the distance between the carboxylate oxygen o f Asp151 and the positively charged N8 atom of oseltamivir from 4.0 to 2.7 Note that PHENIX did the similar conformational modification on Asp151. Meanwhile the QM/MM re refined conformation retained the internal salt bridge between Glu276 and Arg224. T he distances between Glu276 carboxylate oxygen and primary/secondary guanidinyl nitrogen of Arg224 are 2.7 and 2.7 in QM/MM refined conformation, respectively (Figure 5 6D). This internal ionic interaction forces the carboxyl group of Glu276 to orientate outward from the binding site and forms a hydrophobic pocket consisting of chain of Arg224, which in turn allows the accommodation of the large pentyloxy group of oseltamivir. In a nutshell our QM/MM X ray refinement method not only fixed the coordinated error existed in t he fairly strained PDB deposited ligand conformation, but somewhat adjusted the placement of charged conserved neuraminidase residues that directly involved in stabilizing the oseltamivir neuraminidase association as well. 5 .3.3 Pure QM/MM energy minimizat ion of oseltamivir neuraminidase complex Figure 5 QM/MM geometry optimization with various combinations of protocol / minimization algorithm / QM Hamiltonian and in general the differen ces among their conformational features are not large. As shown in Table 5 2, by means of typical conjugated gradient approach (default MM minimization option in Amber), local strain calculations using a continuum solvent model gave strain energies ranging from 5.2 to 7.4 Kcal/mol for QM/MM energy minimization protocol 1, 2.9 to 4.9 Kcal/mol for protocol 2 and 3.1 to 4.2 Kcal/mol for protocol 3, with average values 6.8, 3.8 and 3.8 Kcal/mol, respectively.

PAGE 110

110 Figure 5 7. Overlays of QM/MM minimization ident ified oseltamivir conformers using HF (yellow), M06 (orange), M06 2X (green) and M06 HF (magenta) with 6 31G* basis set as QM Hamiltonian.

PAGE 111

111 Figure 5 8. Stereo view of three lowest strain oseltamivir conformers obtained by different minimization algorithms (cyan: CG; green: PRCG; blue: TNCG) and their corresponding local ( magenta) and global (gray) minimum. Table 5 3 summarized the estimated strain energies of oseltamivir bound conformation determined by Polak Ribiere nonlinear conjugate gradient algorithm (PRCG), in which the local strains ranged from 2.8 to 6.8 Kcal/mol for protocol 1, 3.5 to 5.4 Kcal/mol for protocol 2 and 3.9 to 5.0 Kcal/mol for protocol 3 in aqueous, with average values 4.7, 4.3 and 4.5 Kcal/mol, respectively. The local strain energies calculated for the QM/MM optimized oseltamivir conformers in terms of truncated newton linear conjugate algorithm (TNCG) ranged from 4.1 to 5.6 Kcal/mol for protocol 1, 3.4 to 4.7 Kcal/mol for protocol 2 and 3.3 to 4.7 Kcal/mol for protocol 3 in solution, with average values 4.9, 4.1 and 4.0 Kcal/mol, respectively (Table 5 4).

PAGE 112

112 Table 5 2. Strain energies (in Kcal/mol) of oseltamivir bound conformations optimized by typical conjugated gradient method along with ab initio /DFT QM Hamiltonians. GAS PCM Loca l Global Local Global Protocol 1 HF 21.0 22.8 7.4 7.4 M06 19.4 21.2 5.6 5.6 M06 2X 18.7 20.5 5.2 5.2 M06 HF 20.1 21.9 6.4 6.4 Protocol 2 HF 16.5 18.3 4.9 4.9 M06 14.1 15.9 3.6 3.6 M06 2X 13.7 15.5 2.9 2.9 M06 HF 15.1 16.9 3.9 3.9 Protocol 3 HF 16.0 17.1 4.2 4.2 M06 14.7 16.5 3.8 3.8 M06 2X 13.5 15.3 3.1 3.1 M06 HF 15.2 16.4 3.9 3.9 Table 5 3. Strain energies (in Kcal/mol) of oseltamivir bound conformations optimized by Polak Ribiere nonlinear conjugate gradient algorithm along with ab initio /DFT QM Hamiltonians. GAS PCM Local Global Local Global Protocol 1 HF 20.1 21.9 6.8 6.8 M06 13.3 15.1 3.3 3.3 M06 2X 15.2 17.0 2.8 2.9 M06 HF 18.1 19.9 5.7 5.7 Protocol 2 HF 17.8 18.9 5.4 5.4 M06 14.8 16.0 3.7 3.7 M06 2X 14.3 16.1 3. 5 3.5 M06 HF 15.1 16.9 4.4 4.4 Protocol 3 HF 15.8 17.6 5.0 5.0 M06 15.7 17.6 4.6 4.6 M06 2X 14.8 16.6 3.9 3.9 M06 HF 15.1 17.0 4.4 4.4 Note that the strain energies computed in gas phase lie about 10 Kcal/mol higher than in continuous solvent mo del due to the overemphasized intra molecular interaction as we mentioned before. On the basis of these strain energies data, it appears that the treatments of different conjugated gradient algorithms only have minor impact on

PAGE 113

113 optimizing the bound conforma tion of oseltamivir with respect to the same QM/MM energy minimization scheme. Table 5 4. Strain energies (in Kcal/mol) of oseltamivir bound conformations optimized by truncated newton linear conjugate algorithm along with ab initio /DFT QM Hamiltonians. GAS PCM Local Global Local Global Protocol 1 HF 18.9 20.7 5.6 5.6 M06 17.7 19.5 4.8 4.8 M06 2X 17.1 18.9 4.1 4.1 M06 HF 18.0 19.8 4.9 4.9 Protocol 2 HF 16.6 18.4 4.7 4.7 M06 15.1 16.9 3.9 3.9 M06 2X 14.7 16.6 3.4 3.4 M06 HF 15.9 17.7 4.4 4. 4 Protocol 3 HF 16.3 18.1 4.7 4.7 M06 14.7 16.5 3.8 3.8 M06 2X 14.4 16.2 3.3 3.3 M06 HF 15.4 17.2 4.1 4.1 However pre releasing the conformational strain existed in neuraminidase before performing pure QM/MM minimization (protocol 2 and 3) may fac ilitate the oseltamivir bound conformation closer to its nearest local energy minimum (1~2 Kcal/mol lower in local strain comparing with using PDB deposited geometry for QM/MM minimization directly). Moreover although the energy differences between the ind ividual QM/MM optimized geometries of oseltamivir are small it should nevertheless be pointed out that regardless of which MM minimization algorithm was utilized (CG, PRCG or TNCG), the lowest strain energy conformers were all corresponding to M06 2X /6 31G level of theory (see Table 5 2, 5 3, 5 4 and Figure 5 8). This result indicates that M06 2X functional with Pople basis set should be considered as the QM Hamiltonian and implemented into our future QM/MM X ray refinement package.

PAGE 114

114 5 .4 Conclusion In th is study we presented a thorough conformational analysis for anti flu drug oseltamivir by combining a MMFF94s conformational search with HF/6 31G* geometry optimization in gas phase. Twelve low energy conformers were identified and their accurate single po int energy were estimated by means of MP2/CBS and M06 suite of functionals with 6 311+G**, aug cc pVDZ and aug cc pVTZ basis sets, which brings in a detailed and precise description of oseltamivir conformational energy surface. Meanwhile we performed both pure QM/MM minimization and QM/MM X ray refinement on oseltamivir neuraminidase complex crystal structure so as to investigate the conformational preference of oseltamivir bound/bioactive conformation. Comparing with the PDB deposited inhibitor enzyme X ra y structure, our QM/MM re refined approach not only optimized the conformational features of oseltamivir itself, but adjusted the distance between ligand recognizing amino acid residues (Arg371, Arg292, Glu119Asp151) and oseltamivir as well. It thereby enh anced the drug target association and reduced the local strain energy in aqueous from 31.9 to 3.5 Kcal/mol. Moreover this QM/MM re refined oseltamivir neuraminidase structure might be used as a good template for designing novel anti flu drugs aimed at N8 s ubtype neuraminidase with executing MM energy minimization with freezing ligand before actually running a QM/MM calculation can help to trim the ligand strain moderately and M06 2X functional should be integrated into our QM/MM X ray refinement methodology in the future due to its excellent performance in optimizing ligand bound conformation.

PAGE 115

115 CHAPTER 6 FUTURE WORK The future work or the long term goal of our conformatio nal analysis project is to create a database including low energy conformations, HOMO LUMO data, single point conformational energies and bioactivities of numerous drug like molecules. One possible application of this database is to predict the bioactiviti es of drug candidates for a give target protein on the basis of three dimensional shape based similarities between 12 commercial (FDA appr oved) carbonic anhydrase II (CA II ) inhibitors from DrugBank database 113 and applied Omega on each of them at 10 Kcal/mol cut with MMFF94s force field. All these Omega generated conformations were further optimized at HF/6 31G* level of theory in gas phase. The n umber of conformers for each CA II inhibitor is summarized in Table 6 1. After that we randomly selected 40 compounds from ChEMBL database 114 wh ose binding affinity against CA II have been already measured by experiments. 20 of the m have K i less than 1nM and we labeled Omega identified lowest energy conformations were saved and compared with the 272 energy favor able conformers of 12 commercial CA II drugs in terms of 3D similarities (ScaledColor Score) using ROCS 3.1.2. Consequently we obtained a 27240 matrix in which each element represents a 3D similarity metric between two conformers (commercial vs. non commerc ial). This matrix was used as the training set for a nearest centroid classifier that can be considered as a linear discriminant analysis (LDA) method with regularization so as to figure out the high dimensionality issue 115

PAGE 116

116 Ta ble 6 1. Selected commercial CA II inhibitors from DrugBank. DrugBank ID Drug Name Number of Omega Generated Conformers After HF/6 31G* Geometry Optimization DB00232 Methyclothiazide 269 29 DB00273 Topiramate 201 27 DB00311 Ethoxzolamide 19 5 18 DB00436 Bendroflumethiazide 169 43 DB00562 Benzthiazide 191 15 DB00606 Cyclothiazide 64 19 DB00695 Furosemide 159 10 DB00703 Methazolamide 40 3 DB00869 Dorzolamide 302 54 DB01144 Dichlorphenamide 17 3 DB01194 Brinzolamide 225 32 DB01325 Quine thazone 25 19 Total 1857 272 The key step of this statistical machine learning algorithm is to reduce the distance between classwise mean and overall mean for each feature separately based on the following equation (6 1) where is the overall mean and is the within class mean (class i) for feature j, is the pooled within class standard deviation of feature j, is equal to th e reciprocal of sample numbers in class i minus the reciprocal of total sample numbers n and is a small positive constant (often it is the median of values) to avoid the extremely high due to the near zero valu e of The measurement will shrink toward zero using soften thresholds (6 2)

PAGE 117

117 where + means positive part of the value (a + = a if a > 0 otherwise 0) and is the threshold to be determined during the tuning process. Meanwhile the corresponding can be computed through equation (6 1) (6 3) Therefore the discriminant function should be (6 4) where is a raw vector that represents the 3D similarity values for a is the prior probability of a sam ple belonging to class i. In this binary classification case we just set = 0.5 for both active 4 ) is built on the hypothesis that features in each class have independent normal distribution with the s ame variance. Hence the nearest shrunken centroids might be sorted as a nave Bayes classifier. The final call is then (6 5 ) In principle only features that hav e a non zero value for at least one class can make contributions to the discriminant score function (6 4). In order to find out the optimized threshold for the centroid shrinkage (maximizing the classification accuracy rate), we performed a 5 f olds external cross validation (eliminating the classification bias to the maximum extent possible) 116 on 30 candidate values automatically generated by R / ted this process for 1000 times (1000 threshold value. The pseudo code of the above training and tuning algorithm of nearest

PAGE 118

118 shrunken centroids classifier is summarized in Figur e 6 1 and its corresponding R code is on Appendix D. Figure 6 1. Pseudo code of identifying optimal t hreshold value for nearest shrunken centroids classifier (NSC) with external k fold cross validation and multiple partitions. Figure 6 2 exhibits the 5 folds external cross validation result of 30 threshold values and we obtained the maximum overall classification accuracy rate 0.780 (average value based on 1000 loops multiple partitions) or 78% at delta = 2.586. Meanwhile the classification accuracy rate 2). All these results indicates that our 3D similarity based virtual screening methodology is feasible to predicate the bioacti vity of drug candidates and we will expand this approach to much larger training datasets with drug like molecules against other target proteins.

PAGE 119

119 Table 6 2. 2 by 2 confusion table describing the accuracy rate of label prediction for each class Actual Class Labels Potent Non Active Predicted Class Labels Potent 16 3 Non Active 4 17 Accuracy Rate 0.80 0.85 Figure 6 2. Finding optimal threshold value using 5 folds external cross validation with 1000 loops multiple partitions.

PAGE 120

120 APPENDIX A R CODE AND OUTPUT OF NON PARAMETRIC STATISTIC AL INFERENCE ON RETINOIC ACID CONFOR MATIONAL ENERGY (1) R code and output of examining the difference between MP2/CBS and M06 2X level of theory with respect to the relative energies of 16 ATRA low energy conformers (n = 1 6 and k = 4).

PAGE 121

121 (2) R code and output of examining the difference between MP2/CBS and M06 2X level of theory with respect to the relative energies of 16 9 cis RA low energy conformers (n = 16 and k = 4).

PAGE 122

122 APPENDIX B R CODE AND OUTPUT OF NON P ARAMETRIC STATISTICA L INFERENCE ON DONEPEZIL CONFORMATIONAL ENERG Y (1) R code and output of examining the difference between MP2/CBS and DFT methods with 6 311+G** basis set upon estimating relative energies of 15 donepezil low energy conformers (n = 15 an d k = 4, some calculations of M06L are not converged).

PAGE 123

123 (2) R code and output of examining the difference between MP2/CBS and DFT methods with aug cc pVDZ basis set upon estimating relative energies of 15 donepezil low energy conformers (n = 15 and k = 4, some calculations of M06L are not converged).

PAGE 124

124 (3) R code and output of examining the difference between MP2/CBS and DFT methods with aug cc pVTZ basis set upon estimating relative energies of 15 donepezil low energy conformer s (n = 15 and k = 4, some calculations of M06 are not converged).

PAGE 125

125 APPENDIX C PARAMETER SETTINGS OF DONEPEZIL TC ACHE MODEL RE REFINEMENT USING CLASSICAL FORCE FIELD METHODS (1) CNS 1.3:

PAGE 126

126 (2) Refmac 5.5:

PAGE 127

127

PAGE 128

128 APPENDIX D R CODE FOR NEAREST S HRUNKEN CE NTROIDS CLASSIFIER WITH EXTE RNAL K FOLD CROSS VALIDATION AND MULTIPLE PARTITI ONS.

PAGE 129

129 LIST OF REFERENCES (1 ) Feher, M.; Williams, C. I. J Chem Inf Model 2012 (2 ) Tirado Rives, J.; Jorgensen, W. L. J Med Chem 2006 49 5880. (3 ) Bostro m, J.; Norrby, P. O.; Liljefors, T. J Comput Aided Mol Des 1998 12 383. (4 ) Kleywegt, G. J.; Henrick, K.; Dodson, E. J.; van Aalten, D. M. Structure 2003 11 1051. (5 ) Perola, E.; Charifson, P. S. J Med Chem 2004 47 2499. (6 ) Butler, K. T.; Luque, F. J.; Barril, X. J Comput Chem 2009 30 601. (7 ) Kleywegt, G. J. Acta Crystallogr D Biol Crystallogr 2007 63 94. (8 ) Roothaan, C. C. J. Rev Mod Phys 1951 23 69. (9 ) Moller, C.; Plesset, M. S. Phys Rev 1934 46 0618. (10 ) Headgordon, M.; Pople, J. A.; F risch, M. J. Chem Phys Lett 1988 153 503. (11 ) He, X.; Fusti Molnar, L.; Merz, K. M., Jr. J Phys Chem A 2009 113 10096. (12 ) Hohenberg, P.; Kohn, W. Phys Rev B 1964 136 B864. (13 ) Kohn, W.; Sham, L. J. Phys Rev 1965 140 1133. (14 ) Tao, J. M.; Perde w, J. P.; Staroverov, V. N.; Scuseria, G. E. Physical Review Letters 2003 91 (15 ) Zhao, Y.; Truhlar, D. G. Theor Chem Acc 2008 120 215. (16 ) Jack, A.; Levitt, M. Acta Crystallogr A 1978 34 931. (17 ) Hendrickson, W. A. Method Enzymol 1985 115 252. ( 18 ) Tronrud, D. E.; Teneyck, L. F.; Matthews, B. W. Acta Crystallogr A 1987 43 489. (19 ) Adams, P. D.; Pannu, N. S.; Read, R. J.; Brunger, A. T. Proc Natl Acad Sci U S A 1997 94 5018. (20 ) Brunger, A. T. Nature 1992 355 472. (21 ) Engh, R. A.; Huber, R. Acta Crystallogr A 1991 47 392. (22 ) de Lera, A. R.; Bourguet, W.; Altucci, L.; Gronemeyer, H. Nat Rev Drug Discov 2007 6 811.

PAGE 130

130 (23 ) Tang, X. H.; Gudas, L. J. Annu Rev Pathol 2011 6 345. (24 ) Tallman, M. S.; Andersen, J. W.; Schiffer, C. A.; Appelb aum, F. R.; Feusner, J. H.; Ogden, A.; Shepherd, L.; Willman, C.; Bloomfield, C. D.; Rowe, J. M.; Wiernik, P. H. N Engl J Med 1997 337 1021. (25 ) Warrell, R. P., Jr.; Frankel, S. R.; Miller, W. H., Jr.; Scheinberg, D. A.; Itri, L. M.; Hittelman, W. N.; V yas, R.; Andreeff, M.; Tafuri, A.; Jakubowski, A.; et al. N Engl J Med 1991 324 1385. (26 ) Huang, M. E.; Ye, Y. C.; Chen, S. R.; Chai, J. R.; Lu, J. X.; Zhoa, L.; Gu, L. J.; Wang, Z. Y. Blood 1988 72 567. (27 ) Weis, K.; Rambaud, S.; Lavau, C.; Jansen, J.; Carvalho, T.; Carmofonseca, M.; Lamond, A.; Dejean, A. Cell 1994 76 345. (28 ) Lai, A. Y.; Wade, P. A. Nat Rev Cancer 2011 11 588. (29 ) Anzano, M. A.; Byers, S. W.; Smith, J. M.; Peer, C. W.; Mullen, L. T.; Brown, C. C.; Roberts, A. B.; Sporn, M. B. Cancer Research 1994 54 4614. (30 ) Elstner, E.; Muller, C.; Koshizuka, K.; Williamson, E. A.; Park, D.; Asou, H.; Shintaku, P.; Said, J. W.; Heber, D.; Koeffler, H. P. Proc Natl Acad Sci U S A 1998 95 8806. (31 ) White, K. P.; Hua, S. J.; Kittler, R. C ell 2009 137 1259. (32 ) Tamura, K.; Nakae, D.; Horiguchi, K.; Akai, H.; Kobayashi, Y.; Andoh, N.; Satoh, H.; Denda, A.; Tsujiuchi, T.; Yoshiji, H.; Konishi, Y. Carcinogenesis 1997 18 2133. (33 ) Colston, K. W.; Pettersson, F.; Dalgleish, A. G.; Bissonne tte, R. P. Brit J Cancer 2002 87 555. (34 ) Lammer, E. J.; Chen, D. T.; Hoar, R. M.; Agnish, N. D.; Benke, P. J.; Braun, J. T.; Curry, C. J.; Fernhoff, P. M.; Grix, A. W.; Lott, I. T.; Richard, J. M.; Sun, S. C. New Engl J Med 1985 313 837. (35 ) Kessel, M.; Gruss, P. Cell 1991 67 89. (36 ) Miller, W. H. Cancer 1998 83 1471. (37 ) Beppu, Y.; Kakitani, T. Chem Phys 1990 148 333. (38 ) vanAalten, D. M. F.; deGroot, B. L.; Berendsen, H. J. C.; Findlay, J. B. C. Biochemical Journal 1996 319 543. (39 ) Li, X.; Fu, Z.; Merz, K. M., Jr. J Comput Chem 2012 33 301.

PAGE 131

131 (40 ) OEChem 1.3.4 ed. Santa Fe, NM, USA, 2005. (41 ) Bostrom, J.; Greenwood, J. R.; Gottfries, J. J Mol Graph Model 2003 21 449. (42 ) M. J. Frisch, G. W. T., H. B. Schlegel, G. E. Scuseria, ; M. A Robb, J. R. C., G. Scalmani, V. Barone, B. Mennucci, ; G. A. Petersson, H. N., M. Caricato, X. Li, H. P. Hratchian, ; A. F. Izmaylov, J. B., G. Zheng, J. L. Sonnenberg, M. Hada, ; M. Ehara, K. T., R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, ; Y. Hond a, O. K., H. Nakai, T. Vreven, J. A. Montgomery, Jr., ; J. E. Peralta, F. O., M. Bearpark, J. J. Heyd, E. Brothers, ; K. N. Kudin, V. N. S., R. Kobayashi, J. Normand, ; K. Raghavachari, A. R., J. C. Burant, S. S. Iyengar, J. Tomasi, ; M. Cossi, N. R., J. M Millam, M. Klene, J. E. Knox, J. B. Cross, ; V. Bakken, C. A., J. Jaramillo, R. Gomperts, R. E. Stratmann, ; O. Yazyev, A. J. A., R. Cammi, C. Pomelli, J. W. Ochterski, ; R. L. Martin, K. M., V. G. Zakrzewski, G. A. Voth, ; P. Salvador, J. J. D., S. Dapp rich, A. D. Daniels, ; O. Farkas, J. B. F., J. V. Ortiz, J. Cioslowski, ; and D. J. Fox, G., Inc., Wallingford CT, 2009. (43 ) de Hoon, M. J.; Imoto, S.; Nolan, J.; Miyano, S. Bioinformatics 2004 20 1453. (44 ) Saldanha, A. J. Bioinformatics 2004 20 3246 (45 ) Geiger, J. H.; Vaezeslami, S.; Jia, X. F.; Vasileiou, C.; Borhan, B. Acta Crystallogr D 2008 64 1228. (46 ) Kleywegt, G. J.; Bergfors, T.; Senn, H.; Le Motte, P.; Gsell, B.; Shudo, K.; Jones, T. A. Structure (London, England : 1993 ) 1994 2 1241. (47 ) Kuhnel, K.; Ke, N.; Cryle, M. J.; Sligar, S. G.; Schuler, M. A.; Schlichting, I. Biochemistry 2008 47 6552. (48 ) Renaud, J. P.; Rochel, N.; Ruff, M.; Vivat, V.; Chambon, P.; Gronemeyer, H.; Moras, D. Nature 1995 378 681. (49 ) Bourguet, W.; Pogenbe rg, V.; Guichou, J. F.; Vivat Hannah, V.; Kammerer, S.; Perez, E.; Germain, P.; de Lera, A. R.; Gronemeyer, H.; Royer, C. A. J Biol Chem 2005 280 1625. (50 ) Fu, Z.; Li, X.; Merz, K. M., Jr. J Comput Chem 2011 32 2587. (51 ) Hendrickson, W. A. Methods En zymol 1985 115 252. (52 ) Hsiao, Y. W.; Sanchez Garcia, E.; Doerr, M.; Thiel, W. J Phys Chem B 2010 114 15413. (53 ) Jack, A. a. L., M. Acta Crystallogr A 1978 34 931. (54 ) Li, X.; Hayik, S. A.; Merz, K. M., Jr. J Inorg Biochem 2010 104 512.

PAGE 132

132 (55 ) Li, X.; He, X.; Wang, B.; Merz, K., Jr. J Am Chem Soc 2009 131 7742. (56 ) Nilsson, K.; Hersleth, H. P.; Rod, T. H.; Andersson, K. K.; Ryde, U. Biophys J 2004 87 3437. (57 ) Nilsson, K.; Ryde, U. J Inorg Biochem 2004 98 1539. (58 ) Ryde, U.; Olsen, L.; Nil sson, K. Journal of Computational Chemistry 2002 23 1058. (59 ) Yu, N.; Hayik, S. A.; Wang, B.; Liao, N.; Reynolds, C. H.; Merz, K. M. J Chem Theory Comput 2006 2 1057. (60 ) Yu, N.; Li, X.; Cui, G.; Hayik, S. A.; Merz, K. M., 2nd Protein Sci 2006 15 2 773. (61 ) Yu, N.; Yennawar, H. P.; Merz, K. M., Jr. Acta Crystallogr D Biol Crystallogr 2005 61 322. (62 ) D.A. Case, T. A. D., T.E. Cheatham, III, C.L. Simmerling, J. Wang, R.E. Duke, R. Luo, M. Crowley, Ross C. Walker,W. Zhang, K.M. Merz, B.Wang, S. Hay ik, A. Roitberg, G. Seabra, I. Kolossvry, K.F.Wong, F. Paesani, J. Vanicek, X.Wu, S.R. Brozell, T. Steinbrecher, H. Gohlke, L. Yang, C. Tan, J. Mongan, V. Hornak, G. Cui, D.H. Mathews, M.G. Seetin, C. Sagui, V. Babin, and P.A. Kollman University of Calif ornia, San Francisco., 2008. (63 ) Brunger, A. T.; Adams, P. D.; Clore, G. M.; DeLano, W. L.; Gros, P.; Grosse Kunstleve, R. W.; Jiang, J. S.; Kuszewski, J.; Nilges, M.; Pannu, N. S.; Read, R. J.; Rice, L. M.; Simonson, T.; Warren, G. L. Acta Crystallogr D Biol Crystallogr 1998 54 905. (64 ) R Foundation for Statistical Computing, 2012, URL http://www.R project.org. (65 ) Gilardi, R.; Karle, I. L.; Karle, J.; Sperling, W. Nature 1971 232 187. (66 ) Klucik, J.; Xiao, Y. D.; Hammond, P. S.; Harris, R.; Schmit t, J. D. J Med Chem 2004 47 6831. (67 ) Ryde, U.; Greco, C.; De Gioia, L. J Am Chem Soc 2010 132 4512. (68 ) Ryde, U.; Hsiao, Y. W.; Rulisek, L.; Solomon, E. I. J Am Chem Soc 2007 129 726. (69 ) Ryde, U.; Nilsson, K. J Am Chem Soc 2003 125 14232. (70 ) Bollag, W. Cancer Chemother Pharmacol 1981 7 27. (71 ) Charpentier, B.; Bernardon, J. M.; Eustache, J.; Millois, C.; Martin, B.; Michel, S.; Shroot, B. J Med Chem 1995 38 4993.

PAGE 133

133 (72 ) Westin, S.; Kurokawa, R.; Nolte, R. T.; Wisely, G. B.; McInerney, E. M.; Rose, D. W.; Milburn, M. V.; Rosenfeld, M. G.; Glass, C. K. Nature 1998 395 199. (73 ) Harant, H.; Korschineck, I.; Krupitza, G.; Fazeny, B.; Dittrich, C.; Grunt, T. W. Br J Cancer 1993 68 53 0. (74 ) Gottardis, M. M.; Bischoff, E. D.; Shirley, M. A.; Wagoner, M. A.; Lamph, W. W.; Heyman, R. A. Cancer Res 1996 56 5566. (75 ) Babic, T. J Neurol Neurosurg Psychiatry 1999 67 558. (76 ) Bartus, R. T.; Dean, R. L., 3rd; Beer, B.; Lippa, A. S. Scien ce 1982 217 408. (77 ) Cummings, J. L.; Back, C. Am J Geriatr Psychiatry 1998 6 S64. (78 ) Davidsson, P.; Blennow, K.; Andreasen, N.; Eriksson, B.; Minthon, L.; Hesse, C. Neurosci Lett 2001 300 157. (79 ) Liston, D. R.; Nielsen, J. A.; Villalobos, A.; C hapin, D.; Jones, S. B.; Hubbard, S. T.; Shalaby, I. A.; Ramirez, A.; Nason, D.; White, W. F. Eur J Pharmacol 2004 486 9. (80 ) Sugimoto, H.; Ogura, H.; Arai, Y.; Iimura, Y.; Yamanishi, Y. Jpn J Pharmacol 2002 89 7. (81 ) Gauthier, S.; Feldman, H.; Hecke r, J.; Vellas, B.; Ames, D.; Subbiah, P.; Whalen, E.; Emir, B.; Investigators, D. M. S. Int Psychogeriatr 2002 14 389. (82 ) Gauthier, S.; Feldman, H.; Hecker, J.; Vellas, B.; Subbiah, P.; Whalen, E. J Am Geriatr Soc 2000 48 S2. (83 ) Harry, R. D. J.; Za kzanis, K. K. Hum Psychopharm Clin 2005 20 183. (84 ) Holmes, C.; Wilkinson, D.; Dean, C.; Vethanayagam, S.; Olivieri, S.; Langley, A.; Pandita Gunawardena, N. D.; Hogg, F.; Clare, C.; Damms, J. Neurology 2004 63 214. (85 ) Kaufer, D. I.; Catt, K.; Pollo ck, B. G.; Lopez, O. M.; DeKosky, S. T. Neurology 1998 50 A89. (86 ) Mohs, R. C.; Doody, R. S.; Morris, J. C.; Ieni, J. R.; Rogers, S. L.; Perdomo, C. A.; Pratt, R. D.; Grp, S. Neurology 2001 57 481. (87 ) Rogers, S. L.; Doody, R. S.; Mohs, R. C.; Friedh off, L. T.; Grp, D. S. Arch Intern Med 1998 158 1021. (88 ) Trinh, N. H.; Hoblyn, J.; Mohanty, S. U.; Yaffe, K. Jama J Am Med Assoc 2003 289 210.

PAGE 134

134 (89 ) Wilcock, G.; Howe, I.; Coles, H.; Lilienfeld, S.; Truyen, L.; Zhu, Y.; Bullock, R.; Grp, G. G. S. Drug Aging 2003 20 777. (90 ) Winblad, B.; Kilander, L.; Eriksson, S.; Minthon, L.; Batsman, S.; Wetterholm, A. L.; Jansson Blixt, C.; Haglund, A. Lancet 2006 367 1057. (91 ) Sugimoto, H.; Iimura, Y.; Yamanishi, Y.; Yamatsu, K. J Med Chem 1995 38 4821. (92 ) Sugimoto, H.; Tsuchiya, Y.; Sugumi, H.; Higurashi, K.; Karibe, N.; Iimura, Y.; Sasaki, A.; Araki, S.; Yamanishi, Y.; Yamatsu, K. J Med Chem 1992 35 4542. (93 ) Sussman, J. L.; Harel, M.; Frolow, F.; Oefner, C.; Goldman, A.; Toker, L.; Silman, I. Science 1991 253 872. (94 ) Kryger, G.; Silman, I.; Sussman, J. L. J Physiol Paris 1998 92 191. (95 ) Kryger, G.; Silman, I.; Sussman, J. L. Structure 1999 7 297. (96 ) da Silva, C. H.; Campo, V. L.; Carvalho, I.; Taft, C. A. J Mol Graph Model 2006 25 169. ( 97 ) Deb, P. K.; Sharma, A.; Piplani, P.; Akkinepally, R. R. Mol Divers 2012 (98 ) Doucet Personeni, C.; Bentley, P. D.; Fletcher, R. J.; Kinkaid, A.; Kryger, G.; Pirard, B.; Taylor, A.; Taylor, R.; Taylor, J.; Viner, R.; Silman, I.; Sussman, J. L.; Greenbl att, H. M.; Lewis, T. J Med Chem 2001 44 3203. (99 ) Guo, J.; Hurley, M. M.; Wright, J. B.; Lushington, G. H. J Med Chem 2004 47 5492. (100 ) Niu, C.; Xu, Y.; Luo, X.; Duan, W.; Silman, I.; Sussman, J. L.; Zhu, W.; Chen, K.; Shen, J.; Jiang, H. J Phys Ch em B 2005 109 23730. (101 ) Remya, C.; Dileep, K. V.; Tintu, I.; Variyar, E. J.; Sadasivan, C. Med Chem Res 2012 21 2779. (102 ) Fu, Z.; Li, X.; Merz, K. M., Jr. J Chem Theory Comput 2012 8 1436. (103 ) Alexeev, Y.; Kendall, R. A.; Gordon, M. S. Comput Phys Commun 2002 143 69. (104 ) Merz, K. M.; Molnar, L. F.; He, X.; Wang, B. J Chem Phys 2009 131 (105 ) Vagin, A. A.; Steiner, R. A.; Lebedev, A. A.; Potterton, L.; McNicholas, S.; Long, F.; Murshudov, G. N. Acta Crystallogr D 2004 60 2184. (106 ) Wlod ek, S.; Skillman, A. G.; Nicholls, A. Acta Crystallogr D Biol Crystallogr 2006 62 741.

PAGE 135

135 (107 ) Sinnokrot, M. O.; Valeev, E. F.; Sherrill, C. D. J Am Chem Soc 2002 124 10887. (108 ) Felder, C.; Jiang, H. L.; Zhu, W. L.; Chen, K. X.; Silman, I.; Botti, S. A .; Sussman, J. L. J Phys Chem A 2001 105 1326. (109 ) De Clercq, E. Nat Rev Drug Discov 2006 5 1015. (110 ) Russell, R. J.; Haire, L. F.; Stevens, D. J.; Collins, P. J.; Lin, Y. P.; Blackburn, G. M.; Hay, A. J.; Gamblin, S. J.; Skehel, J. J. Nature 2006 443 45. (111 ) Collins, P. J.; Haire, L. F.; Lin, Y. P.; Liu, J.; Russell, R. J.; Walker, P. A.; Skehel, J. J.; Martin, S. R.; Hay, A. J.; Gamblin, S. J. Nature 2008 453 1258. (112 ) Adams, P. D.; Mustyakimov, M.; Afonine, P. V.; Langan, P. Acta Crystall ogr D Biol Crystallogr 2009 65 567. (113 ) Knox, C.; Law, V.; Jewison, T.; Liu, P.; Ly, S.; Frolkis, A.; Pon, A.; Banco, K.; Mak, C.; Neveu, V.; Djoumbou, Y.; Eisner, R.; Guo, A. C.; Wishart, D. S. Nucleic Acids Res 2011 39 D1035. (114 ) Gaulton, A.; Bel lis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al Lazikani, B.; Overington, J. P. Nucleic Acids Res 2012 40 D1100. (115 ) Tibshirani, R.; Hastie, T.; Narasimhan, B.; Chu, G. Proc Natl Acad Sci U S A 2002 99 6567. (116 ) Ambroise, C.; McLachlan, G. J. Proc Natl Acad Sci U S A 2002 99 6562.

PAGE 136

136 BIOGRAPHICAL SKETCH Zheng Fu was born in Beijing, China in 1982. He obtained a Bachelor of Science degree in chemistry from Peking University, Beijing, Ch ina. In August 2007, he joined the group of Professor Kenneth M. Merz, Jr. for his doctoral studies in computational chemistry/biology at the University of Florida. On 2012 he obtained William and Arlene Ruegamer Award for his excellent performance in bioc hemistry research. He received his Ph.D. from the University of Florida in the fall of 2012.