<%BANNER%>

Computational Studies of the Structure and Function of Metalloenzymes and the Performance of Density Functional Methods

HIDE
 Title Page
 Dedication
 Acknowledgement
 Table of Contents
 List of Tables
 List of Figures
 Abstract
 Introduction
 QM and MM methods of the study...
 Computational studies of the CU(I)...
 Electronic structure of the active...
 Survey of density functional theory...
 Conclusions
 Appendix: Gaussian keywords
 References
 Biographical sketch
 

PAGE 1

COMPUTATIONAL STUDIES OF THE STRUCTURE AND FUNCTION OF METALLOENZYMES AND THE PERFORMANCE OF DENSITY FUNCTIONAL METHODS By BRYAN T. OPT HOLT A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2006

PAGE 2

For Jenny Kay

PAGE 3

iii ACKNOWLEDGMENTS For the motivation to complete this dissert ation and to focus on my research even when I thought it would be too hard, Jenny s hould get all the credit. For the ideas and suggestions that kindled my interest in computational and biological chemistry when it started to wane, I recognize Kennie. The desire to complete my educational career with a Ph.D. and the knowledge of the value of a st rong education were instilled in me by my parents, and I owe them my best. As for life in the Merz group, I will always be thankful for the contributions of Ken Ayers, Ed Brot hers, Guanglei Cui, and Kevin Riley to my research.

PAGE 4

iv TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................................................................................iii LIST OF TABLES.............................................................................................................vi LIST OF FIGURES...........................................................................................................ix CHAPTER 1 INTRODUCTION........................................................................................................1 2 QM AND MM METHODS OF THE STUDY OF METALLOPROTEINS...............6 Metal Binding Site Studies in Gaussian 03..................................................................6 Molecular Mechanics of HAH1..................................................................................10 The AMBER Force Field....................................................................................10 LeAP and sander .................................................................................................12 Free Energy Simulations of HAH1.............................................................................17 Thermodynamic Integration................................................................................20 Free Energy Perturbation.....................................................................................23 Potential of Mean Force......................................................................................24 3 COMPUTATIONAL STUDIES OF THE CU(I) METALLOCHAPERONE HAH1..........................................................................................................................26 Cu(I) Biochemistry and the HAH1 System................................................................26 Cu Metallochaperones.........................................................................................26 Cu pathways within the Cell................................................................................29 Copper Homeostasis............................................................................................31 Quantum Chemical Characterization of the Cu(I) Binding Site from HAH1............33 Creation of a MM Force Field for the Cu(I) Binding Site of HAH1..........................36 The use of the HAH1 dimer as a model for the HAH1-MNK4 heterodimer......36 Creation of parameter files for Cu(I)-bound HAH1............................................37 Free Energy Calculations of the Cu(I)-Bound HAH1 Dimer.....................................47 Quantum Energy Calculations on the Small Cu(I) Thiolate Cluster Models......48 Free Energy Calculations on HAH1....................................................................50 Conclusions.................................................................................................................64

PAGE 5

v 4 ELECTRONIC STRUCTURE OF THE AC TIVE SITE OF AMINOPEPTIDASE FROM Aeromonas proteolytica .................................................................................66 AAP Introduction........................................................................................................66 Effects of 1st-Shell Mutations.....................................................................................72 Conclusions.................................................................................................................81 5 SURVEY OF DENSITY FUNC TIONAL THEORY METHODS............................83 Introduction.................................................................................................................83 Methods......................................................................................................................84 Computational Methods..............................................................................................99 Results by Property...................................................................................................117 Bond Lengths.....................................................................................................117 Bond Angles......................................................................................................120 Ground State Vibrational Frequencies..............................................................125 Ionization Potential............................................................................................129 Electron Affinity................................................................................................134 Heat of Formation..............................................................................................139 Hydrogen Bonding Interaction Energy.............................................................146 Conformational Energies...................................................................................150 Reaction Barrier Heights for Small Sy stems with Non-singlet Transition States..............................................................................................................158 Reaction Barrier Heights for Organic Molecules with Singlet Transition States..............................................................................................................159 Conclusions...............................................................................................................164 General Summary of the Survey of DFT Methods..................................................170 6 CONCLUSIONS......................................................................................................178 APPENDIX......................................................................................................................180 LIST OF REFERENCES.................................................................................................182 BIOGRAPHICAL SKETCH...........................................................................................193

PAGE 6

vi LIST OF TABLES Table page 2-1 The DZpdf basis function used for Cu(I) in the DFT calculations of the cluster models........................................................................................................................9 2-2 Windows and weighting factors fo r a 12-window TI calculation in sander ...............22 2-3 values for a 12-window TI calculation in sander .....................................................22 3-1 Geometry parameters of the DFT-optimized multi-coordinate models shown in Figure 3-2.................................................................................................................36 3-2 Atom type, atomic mass, van der Waals radii, and van der Waals well-depths for Cu(I) and Cu(I)-bound S in HAH1..........................................................................41 3-3 Bond lengths, bond angles, and associat ed force constants for the HAH1 Cu(I) binding site...............................................................................................................41 3-4 CYM and Cu(I) RESP charges used for the HAH1 Cu(I) binding site.......................41 3-5 Comparison of the HAH1 active site be tween the four-coordinate model, the solvated, equilibrated protein, and the X-ray crystal structure of the Cu(I)-bound protein (1FEE)..........................................................................................................43 3-6 Summary of rms deviation, rms flexibili ty, and radius of gyration for the five solvated HAH1 protein models and key active site residues...................................43 3-7 Reaction energies and energies of solvation for the model systems...........................50 3-8 Atoms, atom types, atomic masses a nd van der Waals parameters used in TI simulations of Cu(I)-bound HAH1..........................................................................56 3-9 Bond length parameters for the reactions used in TI calculations of Cu(I)-bound HAH1.......................................................................................................................57 3-10 Bond angle parameters for the reactions used in TI calculations of Cu(I)-bound HAH1.......................................................................................................................58 3-11 RESP charges used for TI calculations on Cu(I)-bound HAH1................................59 3-12 Free energy changes for TI calculations on the reactions shown in Figure 3-9........63

PAGE 7

vii 3-13 The free energy difference of changing the coordination environment of Cu(I) in HAH1.......................................................................................................................63 4-1 Electrons in the side chains of As p117 and Asp179 are equally delocalized over the carboxylic acid region, while Glu 151 and Glu152 side chains contain one CO bond with more electron density than the other...................................................76 4-2 Several distances are shown for Zn-Zn and Zn-O interactions for the structure shown below in Figures 4-6 and 4-7........................................................................77 5-1 The thirty-seven density functional me thod and two wave function methods tested in this survey with appropriate references................................................................90 5-2 Basis sets employed in the survey of DFT methods....................................................97 5-3 Valence shell polarization functions incorporated into the correlation-consistent basis sets of Dunning................................................................................................98 5-4 The bond lengths and bond angles test set................................................................102 5-5 The test set for ground st ate vibrational frequency...................................................104 5-6 The ionization po tential test set.................................................................................106 5-7 The electron affinity test set......................................................................................107 5-8 The heat of formation test set, singlets only..............................................................108 5-9 The heat of formation test set, radicals only..............................................................111 5-10 The hydrogen bonding test set.................................................................................113 5-11 The conformational energy test set..........................................................................114 5-12 The small system radical transiti on state reaction barrier test set...........................115 5-13 The organic molecule singlet transi tion state reaction barrier test set.....................116 5-14 Bond lengths, bond angles, and vibra tional frequencies for HF, MP2, and LSDA methods..................................................................................................................119 5-15 Average unsigned ionizat ion potential erro rs for HF, MP2, LSDA, and B3P86 methods..................................................................................................................134 5-16 Average unsigned electron affinity errors for HF, MP2, LSDA, and B3P86 methods..................................................................................................................139 5-17 Average unsigned HOF errors fo r the HF, MP2, and LSDA methods....................143

PAGE 8

viii 5-18 Average unsigned hydrogen bond inte raction errors for HF, MP2, and LSDA methods..................................................................................................................150 5-19 Average unsigned conformational energy errors for HF, MP2, and LSDA methods..................................................................................................................155 5-20 Average unsigned errors for the non-singl et transition state reaction test set for HF, MP2, and LSDA methods...............................................................................159 5-21 Average unsigned errors for the singlet transition state large reaction test set for HF, MP2, and LSDA methods...............................................................................164 5-22 Rankings of functional/basis set combina tions for properties considered in this work........................................................................................................................171 5-23 Average functional rankings and sta ndard deviations for the top fifteen functionals along with 6-31+G* and aug-cc-pVDZ basis sets...............................175 5-24 Performances of the fifteen highest-ra nking functionals paired with the 6-31G* basis........................................................................................................................176 5-25 Performances of the fifteen highest ranked density functi onal methods paired with the aug-cc-pVDZ basis set.............................................................................177

PAGE 9

ix LIST OF FIGURES Figure page 3-1 The proposed mechanism for Cu(I) transfer from HAH1 to the fourth domain of the Wilsons disease protein. Cym indicat es a negatively charged Cys residue......34 3-2 Gas-phase optimized structures of the multi-coordinate models of the HAH1 Cu(I) binding site. Cu(I) is in green, S is yello w, and C is gray. The top figure shows the two-coordinate model, followed by th e three-coordinate model and the fourcoordinate model at the bottom................................................................................35 3-3 The 1.80 crystal structure for Cu(I)-bound HAH1. PDB ID 1FEE.1.......................40 3-4 Root-mean-squared deviations between the five solvated HAH1 models and the Cu(I)-bound HAH1 crystal structure as a function of time......................................45 3-5 rmsd between the active site loop regi ons of the five solvated HAH1 models and the Cu(I)-bound HAH1 crystal stru cture as a function of time................................46 3-6 The isodesmic reaction and solvation of three-coordinate Cu(I) to four-coordinate in the model system..................................................................................................49 3-7 The relative energies of the species in the isodesmic reactions of the model systems in the gas phase (top) and in implicit solvent.............................................50 3-8 A scheme for Cu(I) transfer.........................................................................................52 3-9 Two separate TI simulations were performed to compare two different Cu(I) transfer pathways from HAH1 to M NK4. The two reactions shown have different starting points, but the same endpoi nt. The top reaction is referred to as Reaction 1 and the bottom is Reaction 2 in the discussion that follows...........53 3-10 FEP vdW correction to TI on HAH1. Eval uate trajectory of structure with vdW exclusions removed with the Hamiltoni an with vdW exclusions intact..................60 3-11 Bond length correction to FEP calcu lations by PMF: contract Cu(I)-S bond length from ~2.8 to ~2.0 by PMF an alysis of twenty-three windows..............61 3-12 PMF curve of solvated HAH1 showing minimum energy Cu(I)-S12B bond length near 2.1 for the bonding of Cys 12B to Cu(I)......................................................62

PAGE 10

x 3-13 PMF curve for the binding of Cys 15B to Cu(I) in solvated HAH1, showing a minimum energy bond-length of just over 2.1 for the Cu(I)-S15B bond...............62 3-14 The free energy difference by thermodyna mic integration between the different three-coordinate Cu(I)-bound HAH1 dimers...........................................................63 4-1 AAP active site inhibited by Tris (left) and BuBA Investigation of the X-ray structures of these complexes shed light on substrate conformation and a potential mechanism for peptide hydrolysis in AAP. PBD ID 1LOK, 1CP6..........67 4-2 In fluoride inhibition studies of AAP, it was shown that a Fion displaces a terminal hydroxide group, deactivating the enzyme................................................68 4-3 A proposed mechanism for AAP peptide hydrolysis showing proton transfer to Glu151, formation of a terminal hydroxylgroup, a gem-diolate intermediate, donation of a proton back to the leavi ng amino group, and reformation of the water/hydroxide bridge Adapted from Petsko.59.....................................................70 4-4 The general model for the QM work is the AAP active site from PDB structure 1AMP, the 1.8 resolution structure el ucidated by Chevrier and Schalk.67 Asp 117 is below the two Zn ions, with Zn2 on the left and Zn1 on the right. The residue at the top of th e active site is Glu 151. Zn2 is bound to His 97 and Asp 179, and Zn1 is complexed with His 256 and Glu 152.............................................74 4-5 B3LYP/6-31G* optimized geometries of two models of the AAP active site. Asp 117 is shown in the upper-right of both pictures, binding each Zn. Zn1 is the ion on the left and Zn2 on the right in each structure. The structure on the left is from an originally water-bridg ed structure and Glu151 has gained a H, while the structure on the right started with a OHbridge and Asp 179 gains a H from a crystallographic water..............................................................................................78 4-6 Starting structures (lef t) and B3LYP/6-31G* optimized geometries for different Zn-Zn bridging species within the active site of AAP. a) a water bridge b) a hydroxlbridge c) an oxylbridge...........................................................................79 4-7 Initial structures (left) and B3LYP/631G* optimized geometri es for models with Glu151 protonated. The starting structures vary only in the protonation state of the bridging group. Structur e a) contains an O2bridge and the Zn ions in c) are bridged by a hydroxide group..................................................................................80 5-1 Average unsigned bond length errors (in ) for GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals along w ith the Pople type basis sets...............121 5-2 Average unsigned bond length errors (in ) for GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals along w ith the Dunning type basis sets..........122 5-3 Average unsigned bond angle errors (in degrees) for GGA, hybrid-GGA, metaGGA, and hybrid-meta-GGA functionals along with the Pople type basis sets....126

PAGE 11

xi 5-4 Average unsigned bond angle errors (in degrees) for GGA, hybrid-GGA, metaGGA, and hybrid-meta-GGA functionals al ong with the Dunning type basis sets.127 5-5 Average unsigned vibrational frequency errors (in cm-1) for GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functiona ls along with the Pople type basis sets..........................................................................................................................130 5-6 Average unsigned vibrational frequency errors (in cm-1) for GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functi onals along with the Dunning type basis sets.................................................................................................................131 5-7 Average unsigned ionization potential errors for GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals w ith Pople type basis sets...............................135 5-8 Average unsigned ionization potential errors for GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals w ith Dunning type basis sets..........................136 5-9 Average unsigned electron affinity e rrors (kcal/mol) for GGA, hybrid-GGA, metaGGA, and hybrid-meta-GGA functionals..............................................................140 5-10 Average unsigned heat of formation e rrors (kcal/mol) for the five Pople-style basis sets employed in this study............................................................................144 5-11 Average unsigned heat of formation e rrors (kcal/mol) for the Dunning-type basis functions used in this work.....................................................................................145 5-12 Average unsigned hydrogen bond inter action energy errors (kcal/mol) for GGA, hybrid-GGA, meta-GGA, and hybrid-met a-GGA functionals along with Pople type basis sets.........................................................................................................151 5-13 Average unsigned hydrogen bond inter action energy errors (kcal/mol) for GGA, hybrid-GGA, meta-GGA, and hybrid-m eta-GGA functionals along with Dunning type basis sets..........................................................................................152 5-14 Average unsigned conformational en ergy errors (kcal/mol) for GGA, hybridGGA, meta-GGA, and hybrid-meta-GGA f unctionals along with Pople-type basis sets.................................................................................................................156 5-15 Average unsigned conformational en ergy errors (kcal/mol) for GGA, hybridGGA, meta-GGA, and hybrid-meta-GGA f unctionals along with Dunning-type basis sets.................................................................................................................157 5-16 Average unsigned barrier height ener gy errors (kcal/mol) for SRBH reactions along with the GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals along with Pople type basis sets..........................................................160

PAGE 12

xii 5-17 Average unsigned barrier height ener gy errors (kcal/mol) for SRBH reactions along with the GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals along with D unning-type basis sets.....................................................161 5-18 Average unsigned barrier height ener gy errors (kcal/mol) for large singlet transition state reactions along with the GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals along w ith Pople-type basis sets...........................165 5-19 Average unsigned barrier height ener gy errors (kcal/mol) for large singlet transition state reactions along with the GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals along w ith Dunning-type basis sets......................166

PAGE 13

xiii Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy COMPUTATIONAL STUDIES OF THE STRUCTURE AND FUNCTION OF METALLOENZYMES AND THE PERFORMANCE OF DENSITY FUNCTIONAL METHODS By Bryan T. Opt Holt December 2006 Chair: Kenneth M. Merz, Jr. Major Department: Chemistry Transition metals work within metalloprotei ns to catalyze an innumerable array of important reactions in the body. Experime ntal data on the intermolecular and intramolecular phenomena that govern these bi ochemical processes are often difficult to obtain with traditional methods of analyt ical and biological chemistry. However, computational methods have been developed to simulate macromolecular systems in order to understand the dynamics and ener getics of protein st ructure and function. The first half of this dissertation details the application of qua ntum mechanical and molecular mechanical methods to the meta lloproteins HAH1 and aminopeptidase. The active site of the Cu(I)-bi nding protein HAH1 is initially characterized using QM methods to determine the geometrical and elect rostatic parameters of the system. This information is subsequently used to create a molecular mechanics force field for the active site within the native holo-protein. A series of molecular dynamics simulations and free energy calculations are performed on the protein following this parameterization.

PAGE 14

xiv The energetics of Cu(I) transfer are discu ssed and a favorable mechanism for metal ion transport is proposed. The aminopeptidase system is investigated in a different manner as the effects of first-shell mutations are studied with respect to the electronic structure and coordination of the di -Zn(II) active site. Ab initio methods are employed to describe a series of native and mutant active sites. The second part of this work describes a large-scale survey of the performance of density functional methods and basis sets at predicting a series of molecular properties and intermolecular interactions. As the computational resources available to the scientific community increase in speed while becoming more affordable, the study of large scale systems with DFT and ab initio methods will be more plausibl e. The intention of this study is to individually examin e the accuracy of density f unctional methods, not only to identify which methods work the best for certa in properties, but also to note where there may be room for improvement as more methods are developed.

PAGE 15

1 CHAPTER 1 INTRODUCTION As the scientific community continues to discover more about the roles of metalloenzymes in biological processes, it becomes necessary to investigate the structure and function of these proteins on numerous le vels. Experimental methods have been used to study the structure and function of prot eins for decades, and now computational methods are available that can be used to further investigate protein systems. The combination of experimental and comput ational research on biochemical systems enhances the ability to identify potential drug targets, determine catal ytic mechanisms of protein function, and describe the energetics of chemis try occurring in the body on the macromolecular and atomic levels. For example, X-ray crystallography and NM R spectroscopy visualize the structures of proteins. Spectroscopic methods such as EXAFS, XAS, and EPR can be used to elucidate conformational and electronic details of the metal binding sites of metalloenzymes in certain cases, and kinetics and binding affinity studies are routinely used to suggest mechanisms of protein f unction. However, each of these methods has limitations as the scale of the research appr oaches the atomic and sub-atomic levels. Except for the most recent high-resolution st ructures, X-ray crystallography is not able to fully resolve the protonation states of residues, which is a key factor in protein function. The practical difficulty of the me thod is compounded by the delicate task of crystallizing a sample (only to have it de stroyed by the experiment). Dynamically, X-ray structures reveal the proteins structure at low temperature and in a fixed crystalline

PAGE 16

2 state.1 On the other hand, NMR methods can samp le many conformations of proteins in solution and at room temperature and provide high structural resolution.2 Wet chemistry techniques such as assays, titrations, and elec trophoresis supply valu able data on reaction rates, equilibrium constants, and binding a ffinity, but are time consuming. Moreover, the number of distinct systems that can be ev aluated consecutively is limited by lab space and lab equipment. A trait that is missing from all of these experimental methods is the ability to predict or measure the energetics of the individual bonds and interactions that play key roles in protei n structure and function. The past twenty years has seen extensiv e advancements in computational methods and computational ability. From the semi-empirical methods of Dewar et al.3 to the newest density functional methods of the Scuseria and Truhlar groups,4,5 systems are being better-defined both energetically a nd geometrically. In addition, more complex systems are being investigated today than ev er before thanks to massively parallel computer clusters that feature extensive av ailable memory and the ability to perform trillions of operations per second. Furthermore, development of software packages such as AMBER 9.06 and Gaussian 037 allow researchers to carry out a variety of specific calculations to meet their needs. While the ability to computationally pred ict or determine protein structure or sequence has not been developed, many of the systematic hurdles of the experimental methods mentioned above can be addresse d by some form of presently available computational method. Reliable, accurate data on proteins and parts of proteins can be gathered using robust and well-validated computationa l techniques. Bonding and nonbonding interaction energies can be quan tified, electrostatic potentials can be

PAGE 17

3 calculated, strained systems can be energeti cally minimized and equilibrated, and the motion of large systems can be followed ove r time in dynamics simulations. Moreover, computational methods can be amended or parameterized for accuracy and new methods can be developed that scale less dramatically or are not heavily parameter-dependent. This dissertation high lights the application of computational met hods to a range of chemical systems from small molecules to solv ated proteins. Together with experimental data, the application of com putational techniques provides a complete picture of the world of biochemistry. The following chapte rs detail the investigation of metal ion transfer between proteins, the electronic st ructure of a di-metal active site, and the assessment of the performance of comm only used density functional methods. Chapter 2 outlines the techniques us ed in the computational study of metalloproteins. Starting with the use of QM methods to de scribe the metal binding sites of the Cu(I) binding protein HAH1 and the di-Z n(II) site of the aminopeptidase from A. proteolytica (AAP), the process of defining the geometries and energetics of metal binding sites is explained. The QM data for the HAH1 system were used to construct a MM force field for the Cu(I) binding site. The force field was employed in a long timescale MD simulation to inves tigate the dynamics of Cu(I)-bound HAH1. Furthermore, the free energy of Cu(I) transfer from HAH1 to the fourth domain of the Menkes Disease Protein, MNK4 is inves tigated using other MM methods including thermodynamic integration, free energy perturbation, and potential of mean force calculations. Chapter 3 discusses Cu(I) homeostasis, Cu(I) transport, and the HAH1 system. QM calculations using Gaussian 037 were conducted on a model of the HAH1 active site

PAGE 18

4 based on the crystal structure of the HAH1 dimer, and MM calculations were performed on the entire HAH1 dimer in gas-phas e and explicit solvent with AMBER.6 The ultimate goal of the HAH1 study was to elucidate deta ils of Cu(I) transfer from HAH1 to the Menkes disease protein MNK. This study i nvolved the QM description of the HAH1 active site, the construction of an MM force fi eld to describe the ac tive site classically, and numerous MD-based calculations. Finally, the results of the study are discussed in terms of protein dynamics and the mech anistic implications of the research. Chapter 4 summarizes the appl ication of QM methods to the AAP system. Several questions regarding the active site were addressed by modeli ng variations of the active site of the protein. The first goal was to id entify the molecular bridge between the two Zn(II) ions in the active site. The coordination state of each ion was investigated, and the contribution of each ligating residue was studied. The fifth chapter details the large-scale survey of the performance of commonly used density functional methods and basis func tions. The accuracy of thirty-seven density functionals and two wave function methods is evaluated for nine molecular and intermolecular properties. These proper ties include bond lengths, bond angles, ground state vibrational frequencies, ionization potentials, electron a ffinities, heats of formation, hydrogen bonding interaction energies, confor mational energies, and reaction barrier heights. Finally, a list of the be st-performing methods is given. The final chapter gives a brief summary of the work presented in this dissertation. More complete discussions and conclusions are given within each chapter. The computational investigation of biological systems is nece ssary to completely understand the mechanisms by which macromolecules f unction within the body. The metalloprotein

PAGE 19

5 studies presented here incorporate the use of a variety of computati onal tools to address questions that would be very difficult to answer using othe r types of experiments. The DFT survey is a prime example of the usef ulness of powerful computer clusters, and provides a detailed comparison of commonl y-used density functional methods. This research is made possible by the continuous development of computational methods and computer power and serves as a testament to the flexibility and far-reaching applicability of QM and MM calculations.

PAGE 20

6 CHAPTER 2 QM AND MM METHODS OF THE STUDY OF METALLOPROTEINS The behavior of metalloproteins can only be fully described using a combination of several computational technique s. For instance, MM methods do not take electrons into account, while high-level QM calculations ar e too computationally expensive to be tractable for large systems over 120 atoms or so. The electronic structures, atomic charges, and accurate geometri es of metal binding sites s hould be investigated with quantum mechanical methods whenever possi ble. Data from these studies form a foundation for the molecular mechanics or semi -empirical methods applicable to largescale systems. This section discusses the va rious computational techniques utilized in the studies of HAH1 and AAP presented later in this th esis. A brief discourse on the background and theory of QM methods is given in chap ter 5 as an introduc tion to the DFT study. Metal Binding Site Studies in Gaussian 03 Creating models of metal binding sites in proteins for the purpose of QM calculations is a common practice in studies that aim to descri be the metal binding environment.8-11 For the study of the Cu(I) binding site of the Cu transport protein HAH1, a model active site was created from the X-ray crystal structure of th e Cu(I) binding site in the HAH1 dimer1 using thiol [HSCH3] or thiolate [SCH3]ligands instead of the native Cys residues to complex the central ion. Us ing DFT in Gaussian 03, the thermodynamics, bonding parameters, and electrostatics of the model systems were determined.

PAGE 21

7 The QM calculations of the model cluste rs should be reliable so that the MM parameters derived from the QM data are also reliable. One approach to ensuring a stable QM structure is to use several initial geometries of the metal cluster to so to avoid getting trapped in a local minimum energy well.9 Another way in which the accuracy and reliability of calculations of Cu(I) is increased is the development of Cu(I)-specific basis sets. Olsson and Ryde9,12 modified a double-zeta ba sis function of Shfer13 by the inclusion of diffuse p, d, and f orbitals (DZpdf ). Table 2-1 lists the Ryde basis set used for the production QM calculations of the model Cu(I) clusters. The modification from the original Shfer basis is the enhancement of the p, d, and f orbitals with exponents 0.174, 0.132, and 0.39, respectively. A modified 631G* basis function has been created by Pulay et al that can also be used for Cu and other 1st row transition metals.14,15 The modified 6-31G* basis set was used to generate an initial structure for the use of the Ryde basis set, optimizing the Cu(I) cluster in a stepwise fashion. The initial structures for the model clus ters were created in WebLabViewer Pro16 from the Cu(I)-bound HAH1 dimer crystal structure 1FEE1 in Cartesian coordinates and optimized in Gaussian using the default gr id at B3LYP/m6-31G*, where m6-31G* is the modified basis set of Pulay et al .14 Since the HAH1 model clus ters are all closed-shell singlet states, RB3LYP could have been specified in the Gaussian input file to request a restricted calculation. While this method is le ss computationally expe nsive, restricted and unrestricted calculations will produce the same results for singlet species. Further optimization was performed using the Ryde DZpdf basis set for Cu(I) and the split valence 6-311++G* basis for S, C, and H. Upon final optimization at th is level, each of the clusters was subjected to single point ca lculations for frequencies and thermodynamic

PAGE 22

8 properties. Bonding force constants between Cu (I) and S were also obtained. Gaussian outputs a user-readable table of bonds and associated constants based on internal coordinates. Electrostatic potential calculations were al so performed on the optimized structures. The Merz-Kollman-Singh charge method17,18 was used and the ionic radius for Cu(I) was set as 0.91 The MKS method spreads the partia l atomic charge (the calculated electron moment) of each atom onto the surface of a sphere around the atom defined by its ionic radius. Gaussian ESPs were conve rted to AMBER charges using the espgen and respgen programs in the AMBER suite. These three calculations account for all of the bonding and electronic properties of the model that are needed to create a MM force field for the native metal binding site for Cu(I)-bound HAH1. The AAP system was also subjected to geometry optimizations and single point frequency calculations following the same pr ocedures as above. Once again, the active site from a high-resolution crys tal structure of the apo-form of AAP was used to create and initial structure model for the AAP active site. Residues were abridged and capped at their termini except in cases of adjacent residues, for wh ich the peptide bonds were retained for the calculations. Since the crea tion of a new force field was not the goal for the AAP system, and the primary interests were the electronic structure of the active site and the parameters of Zn(II) ion coordinati on within the active site, the B3LYP/6-31G* method was sufficient for the calculations. Also, Zn had already been sufficiently described by the normal 6-31G* basis function, so no modified or larger basis sets were needed for the metal ions.15 Population analysis for atomic charges was not performed on

PAGE 23

9 the AAP active site. The model for the AAP active site needed to be more robust than that of the HAH1 active site since the goal of the study was to elucidate the finer details Table 2-1. The DZpdf basis function used for Cu(I) in the DFT calculations of the cluster models. Orbital Exponents OrbitalExponents S 6 1.00 P 3 1.00 441087.2507 -2.18E-04 73.67182138 0.238814523 66112.02119 -1.69E-03 30.44736969 0.449800159 15047.01143 -8.81E-03 13.12271488 0.393376824 4263.427308 -3.60E-02 P 1 1.00 1396.38158 -0.117429705 5.521483997 1 511.9605579 -0.288442674 P 1 1.00 S 2 1.00 2.145792213 1 203.4542695 -0.426788989 P 1 1.00 82.79233703 -0.330441285 0.767974887 1 S 1 1.00 P* 1 1.00 20.85428563 1 0.174 1 S 1 1.00 D 3 1.00 9.041067958 1 47.3137437 3.24E-02 S 1 1.00 13.15468845 0.168227065 2.751813517 1 4.366288575 0.384944296 S 1 1.00 D 1 1.00 1.043485652 1 1.412206594 1 S 1 1.00 D 1 1.00 0.111722924 1 0.38840713 1 S 1 1.00 D* 1 1.00 4.10E-02 1 0.132 1 P 3 1.00 F* 1 1.00 2530.096567 1.91E-03 0.39 1 600.0979295 1.58E-02 194.0820448 7.63E-02 Represents enhanced orbital exponents Courtesy of Olsson and Ryde.9 of the electronic structure of the comp lete di-Zn(II) binding site to the 1st coordination shell. Therefore, the model AAP active site comprised about seventy atoms for 1st shell coordination sphere compared to 2123 atoms for the HAH1 model cluster.

PAGE 24

10 Molecular Mechanics of HAH1 The completion of the set of QM calcula tions on the HAH1 model site led to the next phase of the experiment, the description of the MM force field for the Cu(I) binding site. AMBER6 is a suite of programs that perform a variety of calcula tions and analyses based on user input and a collection of libra ries. The research discussed here utilized AMBER versions 6 -9. Despite the many versions used, the force fields and libraries used in the course of this study remained th e same. For the purposes of the calculations discussed here, the 1994 force field of Cornell et al.19 was used in conjunction with the 1994 libraries and the parameters constructe d for Cu(I) and Cu(I)-bound ligands after the QM work described earlier. The 1994 force fi eld is a large database of parameters detailing bond lengths, bond angles, bond torsions, nonbonding parameters, and electrostatics for dozens of specific atom t ypes occurring in proteins, nucleic acids, and organic compounds. The parameters within the force field libraries are derived from QM calculations of organic molecules and amino aci ds. Each of these parameters contributes to the overall energy of the system being investigated. The AMBER Force Field The AMBER energy EAMBER is defined as the enthalpy difference between the folded and unfolded states of a protein and can be calculated as the sum of the energy contributions of the parameters listed a bove. This expression is known as the AMBER force field (equation 2-1). The first three terms of 2-1 ar e bonding interactions. Bond stretching and bond angle bending c ontributions to the energy are calculated using force constants applied to each term. Equilibrium va lues and force constants are user-defined and the nature of the energy functions sets the equilibrium value at the bottom of a potential well. The third term involves the bond torsion contribution to the total energy

PAGE 25

11 represented by a truncated F ourier series in which th e measured torsion angle and the phase are input and nV is the rotation barrier height for the torsion.20,21 The last two terms represent the nonbonding interactions between atoms i and j The fourth term is the 6-12 Lennard-Jones potential, accounting for the attraction and repulsion between two nonbonded atoms known as van der Waals interactions. The 121 R term accounts for repulsion, while th e attraction is calculated in the 61 R term. In this term, A and B represent the lowest-energy distance between the two interacting atoms, and the term can be modified by the inclusion of an artificial energy well in the potential curve. The final term includes the Coul ombic interaction between charged nonbonded particles. The magnitude of this interaction depends on the charges of the two interacting particles and the distance be tween them embedded in field with a dielectric constant, In AMBER, a nonbonding cutoff can be define d to set a maximum distance at which these interactions can be felt between two particles.20 EAMBER Krr req2 bonds Keq 2 angles Vn2 1 cos ndihedrals AijRij 12 BijRij 6 i j atoms qiqjRij i j atoms (2-1) The difference between MM and QM methods is the inclusion of nuclear and electronic interactions in QM methods. Except for the electro static interacti ons in the last term of equation 2-1, no quantum effects are considered in the MM force field. Even

PAGE 26

12 then, electrostatic potentials are user-defined or drawn from a library in AMBER, not explicitly calculated as with QM methods. In essence, the molecular dynamics package in AMBER, called sander relies solely on the AMBER potential to calculate al l interactions between the particles in a system in a stepwise fashion. Several files must be prepared in order for a successful simulation or minimization to be performed in sander The necessary files are created in the LeAP program within AMBER. LeAP and sander LeAP is initialized by loading a force field and parameter library suitable for the system being investigated. The Cornell 1994 force field is used throughout this work, which includes the parm94.dat force field and the all_amino94.in amino acid database. These two files form the basis for the atoms and residues that LeAP recognizes. Next, the initial structure file for the system under study is loaded into the interface A check of the structure will output any faults within the system such as non-integral charge or unknown atom types. In the case of HAH1 MD a nd free energy simulations, Cu, Cu-bound S, and any dummy atoms had to be described in sepa rate force field modification files (.frcmod) before LeAP could recognize them as atom types. At this point, any new parameter files can be loaded into the interface and applied to the structure. Parameter files must be correctly formatted to be AMBER-compatible and may contain user-defined atom name s, atom types, and many other atomic parameters. Furthermore, bond connections can be defined or broken and atomic partial charges can be specified for cer tain atoms if they are unknown or if they differ from the values in the AMBER library so that the to tal charge of the system is an integer. Incidentally, AMBER charges differ from a ny charges than may have been generated

PAGE 27

13 outside of AMBER using a QM method like Me rz-Kollman-Singh. Separate programs in AMBER, called espgen and respgen are used to read the output from a QM charge calculation and convert the QM charges into AMBER charges. A successful check of the system within LeAP should reveal no furthe r unknown atom types or electrostatic errors. sander uses two files to describe the system. These are the coordinate file (.crd) and the topology file (.top). Once all of the parame ters have been specified for the system in LeAP and there are no further errors, the .top and .crd files can be created. Now, LeAP will return any parametrical errors such as missing bond, angle, or torsion force constants or undefined atoms types within the system that need to be added or amended before the two new files are created. sander reads one more file. This is the i nput file for the program. The first simulation performed on HAH1 was a MM minimization. Minimization routines vary, but the default in AMBER is the steepest descent (SD) method. This method calculates the gradient of the energy difference betw een two conformations of the system and initializes the next minimization step along a li near path so that th e energy of the system is decreased most rapidly. After some number of steps, sander may convert to the conjugate gradient (CG) method of energy minimization to aid in convergence. CG differs from SD in that a more direct rout e to a minimum energy state is taken. CG is particularly well-suited for finding the mini mum of a shallow potential surface or an elongated energy well. Many conditions of th e minimization may be defined including minimization method and number of steps to be taken. Although it is unlikely that the global minimum of a large system of many thousands of atoms will be found by the methods used by sander a sufficiently

PAGE 28

14 minimized structure will be generated after a large number minimization steps have been performed. Traits of a sufficiently minimi zed structure are low vdW, bond, angle, torsional, and electrostatic energies compar ed to the initial structure. The user can monitor the progress of a minimization run by tracking the decrease of energy components and by visually in specting the system at any point during the process. Once the gas-phase structure has been minimized, it can be equilibrated in the gas phase with MD or it can be solvated. Th e creation of an orde red water box is performed in LeAP. In these simulations, TI P3P water boxes were added to the gas-phase minimized and equilibrated proteins. The TI P3P box is a snapshot of an equilibrated water box,20 and the use of actual water molecu les around the protein instead of an external dielectric field is cal led explicit solvation. The pr otein fits within the box of user-defined dimensions so that there is plenty of water su rrounding the protein. In this research an 8.0 or 10.0 water box was us ed to solvate the protein. The solvated HAH1 protein system comprises more than 19,000 atoms compared to about 2,050 atoms in the gas-phase system. To protect waters on the edge of the simulated box from being exposed to the vacuum that exists outside of the box, periodic boundary conditions (PBC) are applied to the system. PBC overcomes the problem of th e outer solvent molecules being directly exposed to vacuum by replicating the real box containing the solvent and solute and placing imaginary copies on all sides of the re al box. That way, if a water molecule or part of the solute leaves the edge of the re al box, a water molecule entering the other side of the box from an imaginary neighbor repl aces it. Another feature implemented in MD simulations of solvated systems is Particle Mesh Ewald (PME), in which short-range and

PAGE 29

15 long-range interactions are sepa rated and calculated in different ways in order to decrease computational expense. Without PME, the generation of the list of nonbonding interactions like vdW and elec trostatics for large systems would take a very long time. The solute must equilibrate with the newly applied solven t. Constraints are imposed onto the protein within the box so that it does not change conformation as the water equilibrates around it. The temperature of the system is slowly raised to 300 K over thousands of equilibration step s, and the restraints on the pr otein within th e solvent box are also gradually relaxed. Once the system reaches 300 K, the constraints are removed and the whole system is allowed to equilibra te. Once the solvated system is sufficiently equilibrated, it is subjected to long term MD simulation. The prediction of molecular motion thr ough time is derived from subjecting a system to Newtonian laws of motion in conjunc tion with a potential function, in this case the AMBER force field. The force felt by partic les in the system is the derivative of the potential with respect to position. Fi V ri (2-2) Moving particles through time is done in iter ations of calculating the incident force on each particle and then moving the particles in react ion to that force. With the initial positions of the particles known, the force is actuated and the particles are allowed to move. The positions of the particles after a small time step t can be calculated using the Verlet equation. 2m 2 ) ( t t t t t t t F r r r (2-3)

PAGE 30

16 Subsequent positions are formulated in the same manner until the user-defined number of steps has been performed. Each step results in a ne w conformation of the system that can be written to an output file. Typical MD step sizes range from 0.5 to 1.5 femtoseconds, and the total length of productionquality MD simulations is generally four nanoseconds (at least four million steps with a 1.0 fs time step). A collection of these individual snapshots can be analyzed to see how the system changes over time. In the HAH1 system, the geometry of the active site was monitored to ensure that Cu(I) was always bound in the correct coordinati on and that the active site behaved well. This validated the new parameters that were imposed on the Cu(I) binding site. The rootmean squared deviation (rmsd) from the crys tal structure was monitored to quantify how the simulated protein differed from the refere nce state during the simulation. Also root mean squared fluctuation (rmsf) of key atom s and residues were calculated to see which parts of the protein were highly mobile and which maintained more fixed positions. Radius of gyration (radgyr) is another way to quantify the motion of the system is a general manner. This can be interpreted as the rmsd of scattering elements from the particles center of mass. rmsd di 2 i 1 NatomsNatoms (2-4) extl i sim i id, ,r r (2-5) Equation 2-4 calculates the rmsd between a reference state and an individual structure generated by the MD simulation. rm sd is probably the most basic function for quantifying the difference between two struct ures. Good rmsd values between a crystal structure reference state and a MD snapshot of the simulate d protein are generally lower

PAGE 31

17 than 3.5 The movement of a subset of the system with respect to the average structure over the whole simulation is the rmsf, and the -factor is related to rmsf by equation 2-7. A high -factor identifies a hi ghly mobile subset. rmsfi ri 2 ri 2 (2-6) 2 i3 8 rmsf B AMBER (2-7) Finally, radgyr is a measure of how much a protei n spreads out from its center. In essence, it quantifies the level of unfolding of a protein. radgyr is size dependent, as larger proteins have larger radgyr values. It is also shape de pendent and different kinds of structures (coil, sheet, rando m loop) all have different radgyr. radgyr rj riN i 1 N j 1 N2N 1 2 (2-8) Each of these values can be obtained us ing another program in the AMBER suite called ptraj This program reads in a reference stat e and the coordinate file of the MD simulation (which may contain thousands of c onformations for a system that is thousands of atoms big) and monitors user-selected parameters for the entire MD simulation. ptraj is also used to generate stru cture files for individual poses. Free Energy Simulations of HAH1 Once the integrity of the HAH1 parameters had been validated by analyzing the MD simulations, free energy calculations on the protein began. One of the goals of the

PAGE 32

18 HAH1 study was to identify an energetically favorable transition st ate of Cu(I) transfer between HAH1 and its target MNK. Energeti cally, the enthalpy, entropy, and free energy of different Cu(I) coordination states could be determined. Of these three properties, free energy provides the most accurate account of the relative energies of different systems. As with the MD simulations, an initia l structure was constructed for the TI calculations. However, the r eactions involved in the TI calculations add a level of complexity to this process. TI simulations pe rturb the system from one state to another. In this case, the initia l structure was a three-coordinate Cu(I)-bound activ e site and the product of the perturbation was a four-coordi nate Cu(I) active site in HAH1. These two conformations differ by one atom, which is the hydrogen on the unbound Cys of the initial structure. To compensa te for the disappearance of that hydrogen, a dummy atom was created in the final state to ensure that the total number of atoms in each state was the same. Once the number of atoms in the two states was equal, then the initial structure was loaded into LeAP. The initial structure used the same parameters as the MD simulation on the three-coordinate Cu(I) structure used. However, the four-coordinate product had different charges and bonds than the starting structure. These changes were outlined in LeAP and represented the final state of th e TI simulation. In AMBER, the perturbed partial atomic charges and atom types were input for every atom that was changed during the simulation. Now, the parameters of both the initial and final states of the TI simulation have been described in LeAP. A check of the system should reveal no unknown atom types and that th e total charges of the two states are integer values.

PAGE 33

19 The final phase in LeAP for preparing free energy calculations is the generation of the coordinate files and t opology files for the two states The unperturbed topology file will be used to minimize and equilibrate the in itial structure prior to the TI simulation. The perturbed topology file will be used in th e actual TI calculation and contains all of the information for both states of the simulation. If there are any incomplete parameters pertaining to either state, Le AP will alert the user at this point. The perturbed topology file will only be generated when the perturbed system is fully described. At this point, the initial structure is minimized and equilibrated in the same manner described above. Gas-phase and solvated prot ein structures are bot h equilibrated and prepared for TI simulations. Twelve sander input files are created; one for each window of the TI simulation. The icfe=1 keyword turns on TI in sander The details of the TI calculations are provided in chapter 3. Several requirements must be met to ensure the accuracy of the calculations. A good potential function must be used to descri be the system. In this study, the AMBER force field was used along with the parameters derived for the Cu(I) binding site. Secondly, there must be a way to rapidly update the system as it changes over the course of the free energy simulation. This includes evaluating forces and energies and updating positions through time. Finally, G must be calculated. Equation 2-9 shows how the force is calculated as the second derivative of position with respect to time, and equations 2-10 and 2-11 reveal how G can be determined in the canonical ensemble (constant N, V, and T). In equation 2-11, the Hamiltoni an H is approximated by the AMBER force field. Although equation 2-9 al lows rapid calculation of the force, it requires high accuracy. This explains why the time length of steps in MD and TI calculation should be

PAGE 34

20 about 1 fs. In order to sample a sufficientl y large conformational space, the number of steps in each window must be very large.22 2 2F t m m r V r a r (2-9) G N VT RT lnQ N VT (2-10) p xp xd d e CRT H / NVTQ (2-11) Unfortunately, these calculations take a very long time to converge even for simple systems such as the water dimer. Solving explicitly for G is, in essence, an expansion of H and higher orders of the expansion take even longer to converge. Therefore, other methods of calculating G mu st be derived using some approximations. Thermodynamic Integration Another functionality of the sander program is the ability to perform thermodynamic integration (TI) calculations. Li ke other free energy methods, TI perturbs an initial structure to a fi nal structure over a series of windows. TI uses a scaling parameter which varies between 0 and 1 as the systems character progresses toward the final structure. At =0, the system exists completely in the initial state, and at =1 the system exists completely in the final state. TI is based on the integration of the ln form of the G expression (equation 2-10) and where H is V(,x). d d RT d dNVT NVTQ Q G (2-12) Now, substituting for QNVT:

PAGE 35

21 d d d e d e d d d dRT RTx x x xx x, V V G, V V (2-13) And integration yields G: 1 0, V G d d d x (2-14) Yet another obstacle presents itself here in that equation 2-14 is not analytically solvable and must be solved numerically with another approximation. G V, x i V, x i 1 0 Ni 1( i ) 2 (2-15) Or simply, G wi V ii 1 n (2-16) sander outputs V values. The user can define how often these values are calculated and averaged. wi is the weighting factor for each window. The TI calculations used in this work contai ned twelve windows between =0 and =1. The V values from the sander output were averaged for each window, multiplied by the weighting factor for that window and then all of the weighted values were summed to generate the G value for the TI perturbation. Table 2-2 lis ts the weighting factors used the G calculations. values for the twelve windows are listed in Table 2-3. For the TI simulations of the solvated protein, 1,000 V values were collected for each window (each of those V

PAGE 36

22 values represented 500-step averages with in each window). The average of the 1,000 V values for each window were then averaged and weighted. Placement of the windows along and weighting factors are based on Gaussian quadrature, which quantifies the integral of the space under the V curve with an accuracy similar to that of simpler methods like the midpoint method or the trapezoid method, but the Gaussian method requires on ly half the sample size of the simpler methods which saves on simulation time. Table 2-2. Windows and we ighting factors for a 12-wi ndow TI calculation in sander. Window Weight 1 & 12 0.02359 2 & 11 0.05347 3 & 10 0.08004 4 & 9 0.10158 5 & 8 0.11675 6 & 7 0.12457 Table 2-3. values for a 12-window TI calculation in sander. Window 1 0.0092 2 0.04794 3 0.11505 4 0.20634 5 0.31608 6 0.43738 7 0.56262 8 0.638392 9 0.79366 10 0.88495 11 0.95206 12 0.99078

PAGE 37

23 Free Energy Perturbation Free energy perturbation (FEP) differs from TI in that FEP is a more continuous representation of the change from =0 to =1. Instead of performing twelve separate calculations for windows of the TI simulation, FEP is a single calcu lation containing as many windows as the user wants. Each window within an FEP calculation contains a number of equilibration steps followed by a num ber of data-gathering steps. Another was to perform a FEP is to evaluate the trajectory of some initial state with the Hamiltonian of the perturbed state. This closely resembles the calculation performed on HAH1 where the structure generated by the initial Hamilt onian was evaluated using an alternate Hamiltonian that instituted some vdW exclusions. This is discussed in chapter 3. Instead of perturbing atom types and ch arges as in the TI simulations, the perturbation in the FEP calcu lations for the HAH1 study is the exclusion of some vdW forces in the perturbed state. G between two states A and B is defined in equation 2-17. G GB GA RT ln QBQA (2-17) Q has the same form as in equation 2-11. In the general equation for FEP, ART i ie RT1 Hln G (2-18) H is the change in the Ham iltonian (approximated by the AMBER force field) from the initial state to the perturbed state. In the case of the HAH1 study, the difference between the two Hamiltonians should only be in the vdW term. Essentially, FEP spans the space of two physical endpoints 0 and 1 with an array of non-physical states in between, characterized by the value of

PAGE 38

24 11 0i iG G (2-19) The gibbs program in the AMBER package was cr eated to be able to perform FEP on large systems, with one advantage over TI in sander of being able to define dummy atoms in both states. Despite numerous attempts, I was never able to adequately simulate the HAH1 system with gibbs. After AMBER 7, gibbs was no longer developed or supported in AMBER. Potential of Mean Force Potential of mean force (PMF) calculati ons were performed on the HAH1 system to determine the energetics of the Cu(I)-S bonds within the active site. PMF between two bodies is a function of the distan ce between their centers of mass.23 In AMBER, a harmonic potential is applied to a certain bond (or any other parameter), centered at an equilibrium value. Then, over a series of window s, the parameter is varied from an initial position to a final position. Th e interaction betwee n the two particles involved in the PMF is monitored. W can be an expression of the free energy change with respect to coordinates and is calcul ated in equation 2-20. W q kBT ln q (2-20) Here, q is the coordinate and is the probability of q obtaining a certain value. In an MD simulation, the values of q can be collected into bins and then analyzed as a histogram. The histograms for all of the windows are aligned with WHAM,24,25 a weighted histogram analysis program so that a PMF curve is constructed. 26 In order for a range of possible values of q to be adequately represented, a very large number of samples must be taken. In the case of th e HAH1 system, the Cu(I)-S was varied from

PAGE 39

25 about 2.8 to 2.0 over the course of twenty-three windows. Each window sampled 100,000 steps for a total of over two million samp les taken for just 0.8 of coordinate space. A biasing potential is applied in th e AMBER PMF to ensure that the simulation sufficiently samples the bond lengths of intere st to the HAH1 study. The biasing potential is shown in equation 2-21. U q 1 2 kq q02 (2-21) q0 is the equilibrium value that is of interest in the calculation. By assigning a high value to k, a large energy penalty has to be paid to any coordinate that is too far from q0. For the HAH1 PMF calculations a biasing pote ntial of 2,000 kcal/m ol was placed around an equilibrium distance of 2.19 Th e PMF calculation was specified in the sander input files by specifying nmropt=1 for the program to read in the biasing poten tial and calling a PMF parameter file which contained that information for each window individually. This represents a TI-method of pe rforming the PMF calculation with sander. Unlike the TI simulation of changing the Cu(I) coordinati on, there is no chemical change between the initial and final states of the PMF simu lation, only a conformational change in the bond length that is being adjusted in each window.

PAGE 40

26 CHAPTER 3 COMPUTATIONAL STUDIES OF TH E CU(I) METALLOCHAPERONE HAH1 Cu(I) Biochemistry and the HAH1 System Cu Metallochaperones Several transition metals have been impli cated in important intracellular biological processes. These metals, including Fe, Co, Zn, Mo, and Cu, are involved in such central biological roles in part due to their abili ty to exist in multiple oxidation states in vivo.27 For example, copper exists in both the cuprous and cupric states within the cell and functions as a catalyst in bot h. Cycling of copper ions be tween two oxidation states can catalyze the production of highl y toxic hydroxyl radicals with in the cellular environment that can result in damage to many intracellular macromolecules. This creates a potential problem within the cell: metal ions such as Cu(I) and Cu(II) are essential for normal cell behavior, yet the free existence of thes e ions in the cell is clearly toxic. A group of metal-binding proteins labele d metallochaperones have been shown to bind transition metal ions in both prokaryot ic and eukaryotic cells. Of particular interest are the chaperones involved in c opper and zinc transport in human cells. In human cells, a number of these chaperones de liver Cu ions to other copper-binding proteins or organelles. Notewo rthy is the fact that these chaperones are not anti-toxins. Instead, they act to sequester the ion and transport it thr ough the intracellular environment. It has been found that the av erage concentration of intracellular free copper is on the order of 10-18 M, translating to less th an one unbound copper ion per cell in the human body.28 This low concentration has been at tributed to the over-abundance of

PAGE 41

27 moderate and strong copper-che lating sites including metalloth ioneins, vesicular sites, and specific copper-binding proteins.29 Yet, with all the other potential copper-binding sites, specific copper chaperone s are able to acquire the ions as they enter the cell and distribute them throughout the cell as need ed. These include the human antioxidant protein (HAH1), the Menkes and Wilsons ATPases, the human copper chaperone for superoxide dismutase (hCCS), and the hum an copper, zinc superoxide dismutase (SOD1). Several different copper-tran sport routes within the cel l are responsible for copper homeostasis to ensure that the total copper concentration (normally in the micromolar range29) in the cell does not ge t too high or too low.27,29-36 This involves regulation of the amount of copper entering and exiting the cell. Copper must then be delivered from the trans-membrane proteins to the metal-bindi ng sites of the correct proteins in the cytoplasm. Finally, the copper ions must be transported through the cell to the proteins and organelles where they are needed. Proteins that perform each of these functions have been discovered and studied. Defects in the metabolism of intracellular metal ions result in a vast array of health problems. For ex ample, problems within the copper-transport structure of the cell results in Menkes Syndr ome, Wilsons Disease, familial amyotrophic lateral sclerosis (fALS) disord ers, and Alzheimers disease.30 Individual pathways and binding interacti ons will be discussed later, but a brief overview of metal binding and inter-protein transport is given here. According to Rosenzweig, copper binding proteins deliver th e copper ions to their targets via direct protein-protein interaction.37 Moreover, copper transport between a chaperone and its target is thought to progress through the formation of a series of multi-coordinate

PAGE 42

28 complexes until the ion has been completely released from the donor and bound by the receptor.1,37 With this in mind, another concept suggests that the chaperone donates its metal ion in an enzymatic fashion, loweri ng the energy barrier for inter-protein ion transport.29,31 Recently, crystal structures have been solved for many Cu(I) transport proteins38-41 and some transport mechanisms have been suggested.1,31,33,35,37 However, the exact mechanisms by which copper delivery is accomplished is still under intense study. Several key issues should be addressed when considering this problem. For example, the specificity of the donor-tar get interaction must be understood. Currently, highly conserved secondary structures between the dono r and target at the metal binding site are believed to explain the problem of recogn ition. Possible protein rearrangement during metal transfer must also be investigated Disulfide bonds near the binding sites may contribute to rearrangement of the fold duri ng metal transfer. Protei n docking should also be addressed. It has been suggested that docking involves el ectrostatic interactions27, and that heterodimers or even higher order oligomers42 may be formed during copper transfer between proteins. The recent structure determinations of many of the proteins involved in copper transport in both apoand holoforms have opened this field to computational study. Ideally, using computational tools, an inve stigator could model the binding sites of several proteins, ultimately using molecular dynamics (MD) simulations to determine the binding and transfer mechanisms for the processes described above. Some ab initio modeling of Cu binding to sulfur and Cys groups has been performed.8-10,38 Currently,

PAGE 43

29 ZINDO43, PM3(tm), ab initio14, and SIBFA1238 are some of a limited number of methods for which Cu parameters have been established. Cu(I) is generally unstable in aqueous solutions. However, it may be stabilized by sulfur-containing ligands or immobilized by a protein.44 Generally, the copper-binding active site within a protein is characterized by the use of tw o or more Cys or His ligands for direct binding less than 2.5 from the ion, with Met residues or charged amino acids as part of the supporting st ructure at 3.5 8.0 away from the metal. The human copper, zinc, superoxide dismutase (SOD1) em ploys four His at its Cu(I) active site. Three His appear to be tightly bound (2.0-2 .12 ) while the other is bound to a lesser extent (3.12 ). Each His is bound to Cu(I) by the 1 or 2 N of the imidazole ring. Cys residues are present at the bi nding sites of the human copper chaperone for superoxide dismutase (hCCS), HAH1, yeast antioxidant protein (Atx1), the yeast homolog to Menkes disease protein (Ccc2), and the Menkes disease protein (MNK4). These chaperones bind the ion with multiple Cys residues as part of a common MT/CXXC binding motif, which forms part of a turn between an -helix and -sheet. Some Cu(II) binding proteins such as huma n nitrite reductase and the pl ant electron-transfer protein plastocyanin employ both Cys and His at their active sites. This study focuses on optimizing structures for the Cu(I) bindi ng sites in HAH1 as well as other multicoordinate Cu(I) structures, and to observe th e differences between two-, three-, and fourcoordinate Cu(I) complexes. Cu pathways within the Cell Once through the membrane, copper must be delivered to other sites in the cell where it is needed. Metallochap erones perform this task. Due to the specificity of the

PAGE 44

30 binding and transfer mechanisms between ch aperone and target, there is a different chaperone for delivering copper to each targ et. Chaperones are grouped according to the primary structure of their binding site(s). Atx1-type proteins exhibit the MT/HCXXC37 or MXCXXC29 binding motif at the N-terminus. In y east, Atx1 is respons ible for delivering Cu(I) to Ccc2, the yeast CPx-type copper ATPase, for eventual incorporation into Fet3,27 an important protein in iron metabolism. HAH1 delivers Cu(I) to domain 4 of the Wilsons disease P-type ATPase, MNK4, in human cells. Another pathway involves the chaperone for superoxide dismutase in yeast, yCCS or Lys7, and in humans, hCCS. The function of these proteins is to deliver Cu(I) to the copper, zinc superoxide dismutase, SOD1. This pathway is more complicated than the Atx1 pathway since the proteins involved contain multiple domains and the Cu(I) binding sites are more complex and are not located near the surface of the protein. Atx1-like copper chaperones bind the Cu(I) ion in a loop between an -helix and a -sheet near the exterior of the protein. The binding residues are all Cys, although there may be some contributing electr ostatic interactions to the binding site from nearby Met, His, and Thr residues. HAH1 exists as homodi mer in the crystal structure, binding one Cu(I) per dimer. Each monomer donates up to two Cys residues for ion binding. Atx1 exists as a monomer, coordinating the Cu(I) in a twoor three-coordinate system in the binding loop described above. The target s for Atx1 and HAH1, Ccc2 and MNK4, respectively, share secondary structure hom ology with their chaperones. Thereby, the chaperone and target are able to dock and i on transfer can occur. A representation of the Cu(I)-bound HAH1 dimer is shown in Figure 3-3.

PAGE 45

31 The CCS proteins are more complex, and th eir structures have not yet been fully resolved crystallographically. yCCS is a 27 kD a two-domain monomer that exists as a 54 kDa homodimer protein42. Each domain possesses its own unique copper binding site. hCCS is a three domain 274 residue monomer that exists in vivo as a 548 residue dimer. Once again, each domain shares structure hom ology with another chaperone in the cell. Both yCCS and hCCS domain 1 have si milar Cu(I) binding sites as Atx1 and HAH1, namely the MT/HCXXC or MX/CXXC bindi ng motif. Domain 2 in both yCCS and hCCS has similar folding to SOD1, although each lacks several key stru ctural features of SOD1, differentiating them from each other. It should be noted here that SOD1 exists as a 32 kDa dimer in vivo and utilizes four His residues as its Cu+ binding site8. Finally, domain 3 in hCCS is a small 39-residue f eature containing a CXC binding motif. It is believed that this domain is involved in the physical transfer of Cu (I) from hCCS to the Cu binding site in SOD1. Of these pathways, copper transfer between CCS and SOD1 is perhaps the process that has been investigated the most. Of inte rest is the intra-protein transfer of Cu(I) between domains in CCS and the transfer of Cu(I) from the CXC binding site of hCCS domain 3 to the quad-His binding site in SOD1 It has been suggested that the transfer involves the formation of a heterodimer or even higher orde r oligomers between monomers of CCS and SOD1.42 Copper Homeostasis One of the fundamental problems in biologi cal coordination chem istry today is the insufficient understanding of how metal ion c oncentrations are medi ated by intracellular processes.32 On one hand, enough metal ions must be present with the ce ll to facilitate essential biochemical functions. However, tran sition metals, and copper in particular, are

PAGE 46

32 prone to cause problems due to their catalyt ic nature and the presence of so many favorable metal binding sites. As mentioned, Cu can readily cycl e between two oxidation states, which can catalyze the production of toxic radicals within the cell. Moreover, several amino acids can easily bind Cu ions, creating an abundance of copper chelation sites within the cell. Copper chaperones are proteins that bi nd Cu ions in both the Cu(I) and Cu(II) oxidation states and perform a twofold purpose in maintaining Cu homeostasis. First, the metal-binding site on the metallochaperones must be able to bind Cu ions more readily than the other favorable yet non-essential Cu binding sites throughout the cell. Secondly, the chaperones must act to sequester Cu ions from the intracellular environment, or at least ensure that the Cu ions are alwa ys bound within another essential Cu-binding protein. Although there are numerous Cu pathways within the cell,27,36 the copper chaperones involved in these pathways can be divided into two groups: trafficking proteins and metallore gulatory proteins.32 Trafficking proteins are confined to cell membranes and the cytoplasm and include tran s-membrane metal transport proteins and water-soluble Cu transport proteins that exist in the cytoplasm, delivering metal ions to specific intracellular target pr oteins. Metalloregulatory proteins bind ions in a more permanent fashion, using the ions to re gulate gene expression and cell function.32,45,46 Initially, Cu pathways were identified but the details of metal binding were unknown. A large number of crystallogra phic and spectroscopic studies in the last decade have clarified the details of Cu binding in both Cu-transport and Cu-re gulatory proteins. One facet of Cu homeostasis that remains largely unresolved is Cu transfer between proteins.

PAGE 47

33 Isolating membrane-bound proteins for cr ystallization is a daunting task for crystallographers. Further complication ar ises in producing crystals of metal-bound proteins. While no structures currently exis t for the Cu-binding trans-membrane protein hCtr1, the structures of soluble holo-pro teins HAH1 and Atx1 have been determined by NMR2 and X-ray crystallography.1,47 The structure of the f ourth domain of the Menkes disease protein MNK4 (the target for Cu(I) transport from HAH1) has also been determined by XRC41, and the interaction between HAH1 and MNK4 has been investigated.48 The copper-transport complex of y CCS (and its human homologue hCCS) and SOD1 has also been well-char acterized by X-ray crystallography.39,42,49-51 Quantum Chemical Characterization of the Cu(I) Binding Site from HAH1 The work reviewed in this chapter fo cuses on Cu(I) transfer from HAH1 to the Menkes disease protein. This pathway was select ed due in part to the availability of highresolution structures of the Cu(I) donor HAH1 and the target domain MNK4. The Cubinding domains of both the donor and target employ two Cys residues in an MT/CXXC motif to hold Cu in a multi-coordinate state. HAH1 exists as a dimer in solution, with each monomer containing one binding domain. Th e coordination state of the Cu ion in the dimer is yet unknown. EXAFS studies of hol o-Atx1 suggest a threecoordinate state, with two Cys residues binding tightly to Cu at 2.25 and a third less strongly bound Cys at 2.40 which may be from an adjacent Atx1 molecule.52 The 1.80 resolution X-ray crystal structure of the Cu(I)-bound HAH1 dimer reveals four Cys in close proximity to Cu(I). This structure suggests a roughly tetrahedral geometry for Cu(I) with three strongly bound Cys at 2.30 and the four th Cys at 2.40 The coordination environment for MNK4 is believed to be si milar to that of the HAH1 monomer or Atx1.

PAGE 48

34 Cu(I) transport between HAH1 and MNK4 is thought to progress via a multicoordinate mechanism.1,47,52 Figure 3-1 characterizes the proposed mechanism for Cu(I) transfer as a series of Cu(I)-S bonds formi ng with the target domain as Cu(I)-S bonds break within the donor. The energe tics of this process have yet to be determined, and it is not known if a potential four-coordinate interm ediate exists as pa rt of the transfer mechanism. The order of Cu(I)-binding and release is also unknown. In the HAH1 monomer, Cu(I) is bound by Cys12 and Cys15. As the target domain comes into close proximity, one of these two residues releases Cu (I) first. In a similar manner, the target domain must also sequentially form bonds, but it is not known whether Cys14 or Cys17 of MNK4 is the first to complex the incoming Cu(I) ion. HAH1Cym Cym Cu+ MNK4Cys Cys HAH1Cym Cym Cu+ MNK4Cym Cys HAH1Cym Cys Cu+ MNK4Cym Cym HAH1Cys Cys +CuMNK4Cym Cym Figure 3-1. The proposed mechanism for Cu(I) tr ansfer from HAH1 to the fourth domain of the Wilsons disease protein. Cy m indicates a negatively charged Cys residue. In order to address the details of Cu(I) transport, several models of the HAH1 Cu(I)-binding site were constructed in WebLabViewer Pro16 substituting Cys residues with methylthiolate [SCH3]ligands. Two-, three-, and four-coordinate models were constructed and geometrically optimized using Gaussian 98.53 Figure 3-2 depicts the optimized structures of the models and Table 3-1 lists some geometrical parameters of the structures.

PAGE 49

35 Figure 3-2. Gas-phase optimized structures of the multi-coordinate models of the HAH1 Cu(I) binding site. Cu(I) is in green, S is yellow, and C is gray. The top figure shows the two-coordinate model, follo wed by the three-coordinate model and the four-coordinate model at the bottom.

PAGE 50

36 The structures were optimized using the B3LYP density functional combined with the Ryde double-zeta ba sis (DZpdf) set for Cu(I)9 and the 6-311++G** split-valence basis set for all other atoms using th e six Cartesian-type d-orbitals. Table 3-1. Geometry parameters of the DFToptimized multi-coordinate models shown in Figure 3-2. Model Cu-S () Cu-SCu-SCu-SS-Cu-S (deg) C-S-Cu 2-coord 2.23 2.23 5.25 6.00 180.0 101.8 3-coord 2.31 2.35 2.41 4.98 114.6 105.2 4-coord 2.19 2.19 2.19 2.24 109.5 108.9 Upon optimization, the structures were used in single point calculations to determine atomic charges, thermodynamic prope rties, and bond force constants. Quantum charges were determined us ing the Merz-Kollman-Singh17,18 method. For these calculations, the Cu(I) ionic radius was set at 0.91. Thermodynamic properties of the model clusters were obtained at the same leve l of theory as the geometry optimizations were performed. Finally, bond force constants were determined using the z-matrix form of the optimized geometry as the input. The geometrical parameters, bond force cons tants, and MKS charge data were all used in conjunction to create a molecular mechanics (MM) force field for the Cu(I) binding site of HAH1. Creation of a MM Force Field for the Cu(I) Binding Site of HAH1 The use of the HAH1 dimer as a model for the HAH1-MNK4 heterodimer The X-ray structure of HAH1 shows that the protein crystallizes as a homodimer,1 but NMR structures show that HAH1 exists as a monomer in solution.2 The target protein, domain 4 of MNK, interacts with the HAH1 monomer in the cell, forming a heterodimer during Cu(I) transfer. Unfortunately no structures exist (NMR or X-ray) for

PAGE 51

37 the HAH1-MNK4 dimer or between HAH1 and a ny other MNK domain. Such a structure would be useful in MM and QM studies on Cu(I) transfer. In stead, an acceptable homolog to the HAH1-MNK4 heterodimer must be employed in such studies. Arnesano and coworkers performed a docking study of HAH1 to MNK4 to investigate the interactions between the meta l binding domains and the protein-protein interface of the donor and target proteins.54 Docking of the yeast antioxidant protein ATX1 to its target Ccc2 was performed in order to map the protei n-protein interactions that facilitate Cu(I) transfer between the two proteins. Superim position of the docked yeast heterodimer onto the crystal structure of the HAH1 dimer revealed that the two structures could be considered remarkably similar.54 Larin and coworkers performed a manual docking of HAH1 to MNK4.48 This study was performed previous to the elucidation of the NMR or X-ray structures of HAH1 and the HAH1 di mer, although the X-ray structure of MNK4 has already been resolved.41 In the Larin study, the hom ology between MNK4 and HAH1 was known, so one MNK4 domain was computationally adapted to model HAH1 in a computationally docked HAH1-MNK4 heterodi mer. The information provided by these two studies suggests that the use of the HAH1 homodimer as a computational mimic for the HAH1-MNK4 heterodimer is a valid approximation. Creation of parameter f iles for Cu(I)-bound HAH1 Cu(I) and S bound to Cu(I) are not defined atom types in the current version of the AMBER suite.6 AMBER 6 and AMBER 7 were used for the bulk of this work. In order to perform molecular dynamics simula tions on Cu(I)-bound HAH1 and MNK4, these atom types must be defined in a format that AMBER can understand. This involves creating a force field parameter file that incl udes all the pertinent information used in the AMBER force field equation. For the purpose of this study, force field parameters

PAGE 52

38 included molecular mass, two bond lengths types: Cu-S and S-C and their force constants, numerous bond angle types: S-Cu -S, C-S-Cu, C-C-S, and H-S-Cu and their force constants, a host of torsion angles and torsion constants, and van der Waals radii for Cu(I) and S. Initially, Cu-S bond lengths and bond force constants were implemented directly from the QM calculations on the model systems. C-S bond parameters in the metal-binding site were taken from the 1994 force field as were bond angles, angle force constants, and torsion parameters. Cu-S bondi ng parameters were varied from the QMderived values after initial MD simulations t hose parameters revealed that the bonds were not strong enough to hold the de sired binding site geometry. Once the geometry of the binding site was sufficiently described, the atomic charges were added. Cu(I) changes the normal charges of the adjacent atoms from their normal values in AMBER. Using the MKS charges output by the Gaussian calculation mentioned earlier, the antechamber package in AMBER was used to generate the electrostatic potential (RESP) charges for use in the AMBER input package LeAP. Ch arges on bound Cys residues were modified from typical AMBER charges for the HAH1 sy stem to compensate for Cu(I) binding in order to ensure integral charge of the system. Cu(I)-binding residues were specified as the CYM residue type in AMBER. This form denotes a ten-atom negatively ch arged cysteine liga nd. Conversely, unbound cysteines or other cysteines elsewhere in the protein are defined using the CYS residue type. The CYM side chain is defined -CH2Swhile the CYS side chain is -CH2SH. Atom types for Cu(I) and copper-bound S atom s also had to be defined. Copper was defined as atom PP. The four S atoms we re identified as SA, SB, SC, and SD.

PAGE 53

39 Tables 3-2 through 3-4 list the parameters defined in the AMBER force field file for the HAH1 Cu(I) binding site. Table 32 gives atomic mass and van der Waals parameters for S and Cu(I). Table 3-3 shows bond lengths, bond angles, and force constants for each, and Table 3-4 lists the RESP charges used for CYM ligands and Cu(I). All Cu(I)-bound S atoms have been kept equivalent, as have bound and unbound residues in terms of atom types and charges. Cu-S bond force constant values in Table 33 were increased by a factor of five over the force constants generated by the Gaussian 03 calculations performed on the model systems. This adjustment was implemented after the initial force constant parameters were found to be not be strong enough to keep the Cu(I) ion bound within the active site. CT-S-Cu angl e force constant parameters were based on the CT-S-H parameters for normal Cys from the parm94 force field. The parameters described here closely match those determined in a similar study for Cu(II) bound to Met and His by Comba and Remenyi.55 Figure 3-3 shows the active site region of the high resolution crystal structur e of Cu(I)-bound HAH1. This site was reproduced in AMBER for MM simulations of HAH1. Cu(I) and th e ligating Cys residues from HAH1 are shown. Cu(I) is in green, and the Cys residues are shown in stick form pointing toward Cu(I) creating a nearly tetrahedral binding e nvironment. The two top Cys residues are Cys 12A and Cys 12B that are more solvent-exposed. Cys 15A and Cys 15B reside close to the monomer interface and have less contac t with solvent. The total charges of the different entire-protein mode ls are: unbound=0; two-coordinate =-1; three-coordinate=-2; and four-coordinate=-3. The holo-HAH1 dimer comprises between 2059 and 2061 atoms depending on the number of coordinating CYM ligands, and includes Cand Nterminal caps on each of the monomers present in the structure.

PAGE 54

40 Figure 3-3. The 1.80 crystal stru cture for Cu(I)-bound HAH1. PDB ID 1FEE.1 Once the force fields are fully described and the RESP charges are in place, the protein structure is ready to be minimized The minimization and equilibration process takes several steps. First, the protein is desc ribed in LeAP and checked to ensure that all the bonds, angles, torsions and nonbonding parame ters are fully described. Counterions can be added (Na+ ions) to adjust the overall charge of the system to zero. An initial gasphase minimization is performed on the protei n. Then, the temperature of the system is gradually raised to 300 K over a series of MD runs. Once the temperature of the system reaches 300 K, it is subjected to a long MD r un allowing the system to reach equilibrium. Cys 12A Cys 15A Cys 12B Cys 15B

PAGE 55

41 Table 3-2. Atom type, atomic mass, van der Waals radii, and van der Waals well-depths for Cu(I) and Cu(I)-bound S in HAH1. 2-coordinate structure Atom Atom type Mass (au) van der Waals radius () Well-depth ( kcal) Cu PP 63.55 2.50 0.20 S (Cys 12A) SA 32.06 2.00 0.25 S (Cys 15A) SB 32.06 2.00 0.25 3-coordinate structure Cu PP 63.55 2.50 0.20 S (Cys 12A) SA 32.06 2.00 0.25 S (Cys 15A) SB 32.06 2.00 0.25 S (Cys 12B) SC 32.06 2.00 0.25 4-coordinate structure Cu PP 63.55 2.50 0.20 S (Cys 12A) SA 32.06 2.00 0.25 S (Cys 15A) SB 32.06 2.00 0.25 S (Cys 12B) SC 32.06 2.00 0.25 S (Cys 15B) SD 32.06 2.00 0.25 Table 3-3. Bond lengths, bond angles, and asso ciated force constants for the HAH1 Cu(I) binding site. Bond kbond (kcal/mol2) r0 () Cu-S 60.00 2.19 Angle kt (kcal/ molrad2) 0 (deg) S-Cu-S 50.00 109.5 C-S-Cu 93.98 95.91 Table 3-4. CYM and Cu(I) RESP charges used for the HAH1 Cu(I) binding site. RESP charge in CYM Atom CYS charge Atom type 2-c oordinate3-coordinate 4-coordinate N -0.4157 N -0.4408 -0.4157 -0.3630 H 0.2719 H 0.2468 0.2719 0.2520 C 0.0213 CT -0.1000 -0.0351 0.0350 H 0.1124 H1 0.0257 0.0508 0.0480 C -0.1231 CT -0.0646 0.0168 -0.5720 H 0.1112 H1 0.0445 0.0053 0.2440 S -0.3119 SA, SB, SC, or SD -0.8290 -0.8682 -1.0920 C 0.5973 C 0.6016 0.5973 0.6160 O -0.5679 O -0.5636 -0.5679 -0.5040 H 0.1933 HS n/a n/a n/a Cu n/a PP 0.5922 0.6484 1.3670

PAGE 56

42 The equilibrated system can be used dire ctly for solvent-phase MD simulations. In order to perform MD simulation in solvent, a few more steps must be taken. The gasphase minimized and equilibrated structure is again loaded into the LeAP program, where an explicit solvent box is added. In this case, an 8.0 TIP3P water box was imposed around the protein, increasing the total number of atoms in the system to over 19,000. The solvated system was subjected to a series of relaxation runs similar to the gasphase system. Incrementally smaller constraint s were placed on the protein part of the solvated system as the systems temperature was increased to 300 K. Eventually, an equilibrated, solvated protein system was obt ained. At this point, long timescale MD was performed. Each of the five multi-coordina te solvated protein model systems were subjected to the minimization and equilibrati on scheme just described. The solvated systems were ultimately simulated over a timescale of at least 3.6 ns. Table 3-5 compares the AMBER-equilib rated HAH1 Cu(I) binding sites of the four-coordinate protein model to its mode l QM cluster counterpart minimized in Gaussian and to the X-ray crystal struct ure of Cu(I)-bound HAH1, and Table 3-6 lists other key data taken from the long times cale MD simulations. Although the geometries are not exactly reproduced by the MM force fi eld parameters created for the HAH1 active site, the shape of the active site and the local protein environment are good. Figure 3-4 shows the root-mean-square deviation from th e crystal structure for the entire protein sequence for each of the three solvated model systems, and Figure 3-5 displays the rmsd values for the active site loop regions of each protein model.

PAGE 57

43 The figures below show that the active si te regions of all five models reached equilibrium rapidly after about 400 ps. A nd while the entire proteins are in good agreement with the crystal structure throughout, the complete protein structures did take longer to reach equilibrium. For the whole-pr otein models, rmsd valu es of between 2.0 and 2.5 were achieved by 2,500 ps and we re maintained beyond that point in the simulations. The radgyr values are within the expected range, and reveal that the most highly mobile residues are ones near the termini of the monomers. Table 3-5. Comparison of the HAH1 active site between the four-coordinate model, the solvated, equilibrated protein, and th e X-ray crystal structure of the Cu(I)bound protein (1FEE). Parameter QM Model Protein X-ray Cu-S (Cys 12A) 2.19 2.29 2.30 Cu-S (Cys 15A) 2.19 2.14 2.39 Cu-S (Cys 12B) 2.19 2.33 2.30 Cu-S (Cys 15B) 2.24 2.39 2.32 Cys12A-Cu-Cys15A 109.0 deg 117.5 deg 115.7 deg Cys12A-Cu-Cys12B 109.5 deg 112.5 deg 109.4 deg Table 3-6. Summary of rms deviation, rms flex ibility, and radius of gyration for the five solvated HAH1 protein models and key active site residues. 12B bound 15B bound 2-coord 2-coord (cis) 3-c oord (B) 3-coord (A) 4-coord RMSD () Total 2.69 2.67 2.27 2.10 2.04 Backbone 1.89 1.88 1.30 1.34 1.17 Binding loop 0.89 1.29 0.87 0.94 1.31 Bind. lp. 0.34 0.71 0.37 0.38 0.54 Radgyr () Protein avg. 29.38 27.26 29.35 29.30 29.43 RMSF () Total 4.98 4.85 3.60 4.54 5.46 Backbone 4.80 4.70 3.46 4.37 5.23 Cu 5.11 2.66 2.19 4.19 3.94 Cys 12A 5.28 3.83 2.40 4.95 5.08 Cys 15A 4.10 2.66 1.78 4.38 3.45 Cys 12B 6.14 3.62 3.44 3.64 4.68 Cys 15B 4.32 2.15 1.88 2.60 3.21

PAGE 58

44 The rmsf values listed in Tabl e 3-6 reveal details about the flexibility of certain residues as well as the complete protein and the prot ein backbone for the three models. The small difference between rmsf values for the co mplete protein compared to the backbone suggests that the flexibility of the protein is not limited to the side chains and that the backbone also move freely. From the rmsf data for the Cu(I)-binding residues Cys 12A, Cys 15A, Cys 12B, and Cys 15B, it appears that Cys 12A and Cys 12B have comparable magnitudes in each model. The values for Cys 15A and Cys 15B are also similar for each model. The similarity is derived from the location of these residues on the binding loop. Cys 12A and Cys 12B are more solvent-expos ed and move more freely due to solvent interactions and being further away from th e monomer interface. On the other hand, Cys 15A and Cys 15B show less flexibility as th ey are close to the interface region and not generally solvent-exposed. The flexibility of the Cys 12 residues may play a role in Cu(I) transfer between binding domains. rmsf data for Cu(I) show that Cu(I) is least mobile when bound by only three residues. In the four-c oordinate model, Cu(I) is more flexible. This may be a result of the Cu(I) moving around within the binding site as different binding ligands move in and out of proximity to the ion. Perhaps, Cu(I) maintains a threecoordinate state even in the four-coordinate model, but complexes different residues over time. The results from the MD simulations of the three Cu(I)-bound HAH1 dimer models show that the QM-derived parameters used to construct the MM fo rce field adequately described the system. The rmsd data show th at the computationally generates structures maintain the same fold and Cu(I) binding affinity as the protein in vivo. After the MD simulations were completed, the question of deciphering the order of Cu(I) binding and

PAGE 59

45 0 0.5 1 1.5 2 2.5 3 3.5 0500100015002000250030003500 whole backbone 0 0.5 1 1.5 2 2.5 3 3.5 0500100015002000250030003500 whole backbone 0 0.5 1 1.5 2 2.5 3 3.5 0500100015002000250030003500 whole backbone 0 0.5 1 1.5 2 2.5 3 3.5 24028021202160220022402280232023602 whole backbone 0 0.5 1 1.5 2 2.5 3 3.5 04008001200160020002400280032003600Simulation Time (ps) whole backbonermsd ()4Coord 3Coord B 3Coord A 2Coord 2Coord bridge Figure 3-4. Root-mean-squared deviations be tween the five solvated HAH1 models and the Cu(I)-bound HAH1 crystal stru cture as a function of time.

PAGE 60

46 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0500100015002000250030003500 loop backbone4Coord 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0500100015002000250030003500 loop backbone2Coord 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0500100015002000250030003500 loop backbone3Coord B 0 0.5 1 1.5 2 2.5 24028021202160220022402280232023602 loop backbone3Coord A 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 036072010801440180021602520288032403600Simulation Time (ps) binding loop loop backbonermsd () Figure 3-5. rmsd between the active site loop regions of the five solvated HAH1 models and the Cu(I)-bound HAH1 crystal stru cture as a function of time.

PAGE 61

47 release during Cu(I) transfer remained to be an swered. As the next section describes, the thermodynamic details of Cu(I) transfer were elucidated through a series of free-energy calculations using several di fferent methods in AMBER. Free Energy Calculations of the Cu(I)-Bound HAH1 Dimer Since the HAH1 Cu(I) binding motif has been described using QM and MM methods, the data garnered from those studi es can be used to get an idea of the thermodynamics of Cu(I) transport so that more can be understood about the mechanism through which Cu(I) is transferred between the active sites of two metalloproteins. This study focused on proposing the order of Cu (I) binding and release based on an energetically favorable Cu(I) transfer pathwa y. As mentioned, the currently proposed mechanism suggests that Cu(I) is handed off from the donor to the target via a series of multi-coordinate Cu(I) intermediates during which the coordination number of the Cu ion is no lower than two.1,37 The experiments described here a ttempt to establish the order of Cu(I) binding to the target domain and the orde r of Cu(I) release from the active site of the donor. Also, this work suggests that a pot ential four-coordinat e Cu(I) intermediate during ion transfer is en ergetically unfavorable. In order to address these issues, the thr ee Cu(I) cluster models were subjected to several calculations in Gaussi an 03 so that free energy differences between the three models could be ascertained. In another e xperiment, the MM parameters defined for the system were used again in three kinds of free energy calculations using the AMBER suite. The three complete, solvated protei n systems were subjected to thermodynamic integration calculations using single topology mutations, free energy perturbation calculations, and potential of mean force calculations.

PAGE 62

48 Quantum Energy Calculations on the Sm all Cu(I) Thiolate Cluster Models One difficulty in performing free energy ca lculations on the model system was the difference in number of atoms between differe nt Cu(I) coordination states. For instance, a two-coordinate Cu(I) model cluster ( Cu(I)[HSCH3]2[SCH3 -]2 -1 ) comprises twenty-three atoms, while the three-coordinate model ( Cu(I)[HSCH3][SCH3-]3 -2 ) contains twentytwo atoms. Because of the difference in th e number of atoms, a direct free energy comparison between the two models cannot be made. However, a comparison between models was able to be made by simulating an isodesmic reaction for protonation of a water molecule. An isodesmic reaction for the conversion of the two-coordinate model to the four-coordinate model with the transfer of two prot ons from the unbound methylthiols from the two-coordinate structure to two water molecules is shown below. O 2H SCH I Cu O H O H SCH HSCH I Cu O 2H SCH HSCH I Cu3 3 4 3 3 2 2 3 3 3 2 1 2 3 2 3 Single point energies were calculated for th e gas-phase-optimized structures in both the gas-phase and implicitly solvated phase. In this manner, the Esolvation could be calculated for each species in the reaction as well. The solvation correction to the gasphase is important, especially for the charge d species. Figure 3-6 shows the reaction for Cu(I) becoming four-coordinate from the three-coordinate state and the energy differences between products and reactants as well as Esolvation for each molecule. Figure 3-7 shows the reaction profile of ch anging the coordination of Cu(I) in the model systems. As expected, the addition of implicit solvent around the model mitigates the instability of the charged molecules in the gas phase. This is shown as the highly charged four-coordinate system as a much more favorable Esolv than the two-coordinate

PAGE 63

49 system. The figure illustrates th e notion that the four-coordinate state is not energetically favorable, as it is 65.440 kcal/mol higher in energy than the three-coordinate model in the solvated state. Table 3-7 lists the energy diffe rences between the different models in both the gas and solvent phases. The energy differe nce between the two-c oordinate and threecoordinate models is lower by comparison at 48.531 kcal/mol. It should be noted that while two different three-coordi nate structures can exist in the protein system, they are indistinguishable in the QM model. So wh ile this experiment showed some relative energies between the threeand fourcoordi nate models, full prot ein simulations were needed to compare the energies of the two diffe rent three-coordinate states to each other. S Cu(I) S S SHCH3 H H3C H3C H3C + H2O SCH3Cu(I) H3CS SCH3 SCH3 + H3O+ S Cu(I) S S SHCH3 H H3C H3C H3C + H2O SCH3Cu(I) H3CS SCH3 SCH3 + H3O+ Erxn,gas Erxn,solv Es o l v Es o l v Es o l v Es o l v Figure 3-6. The isodesmic reaction and solvat ion of three-coordinate Cu(I) to fourcoordinate in the model system.

PAGE 64

50 Table 3-7. Reaction energies and energies of solvation for the model systems. Gas Solvent Erxn 3coord to 4coord 581 65.4 Erxn 2coord to 3coord 244 48.5 Esolv 4coord + 2H3O+ -770 Esolv 3coord + H2O + H3O+ -254 Esolv 2coord + 2H2O -59.1 Values in kcal/mol ReactiondE(rxn) E gas E solv 2coord + 2 H2O3coord + H2O + H3O+4coord + 2 H3O+ Figure 3-7. The relative energies of the species in the isod esmic reactions of the model systems in the gas phase (top) and in implicit solvent. Free Energy Calculations on HAH1 Having developed the force field for the Cu(I) binding active site of HAH1, the challenge in setting up the free energy cal culations lay in choosing which free energy method to apply and ensuring that the number of atoms in each simulation was the same for each different Cu(I) coordination state. The thermodynamic integration functionality of the sander program in AMBER 8 was chosen. The purpose of this study was to identify an energetically favorable route for Cu(I) transfer between two three-coordinate high affinity Cu(I) binding sites. From the twocoordinate state, there are two possibilities

PAGE 65

51 for Cu(I) transfer. Earlier, the difference of the two Cys residues in the Cu(I) binding site was discussed. Cys 12 is more solvent exposed in the dimer state, while Cys 15 is along the interface between the donor and target Cu(I) binding domains and is not solventexposed. One possibility for Cu(I) transfer is for Cys 12 of the target to bind to the incoming Cu(I) ion first. The other option is for Cys 15 of the target to bind Cu(I) as the two domains come into close proximity. The solvated, MD-equilibrated three-coordi nate HAH1 dimer was initially used to create a starting point for the free energy calcu lations. This active site featured three Cu(I)-S bonds of approximately 2.3 along with a fourth longer distance Cu(I)---SH interaction to an unbound Cys residue at nearly 5.0 from the Cu(I) ion. In the initial state, both active site Cys residues of th e donor domain bound Cu(I) while only one target Cys bound the Cu(I) ion. This structure was mutated in a stepwise fashion to the final structure which used both target Cys residues to bind Cu(I) while only one donor Cys continued to coordinate the metal ion. Fi gure 3-8 shows the proposed scheme for Cu(I) transfer. State 0 is the initial state and Stat e 1 is the final state. Using a simple model system, such as that in the quantum work de scribed earlier, there is no difference between State 0 and State 1 as shown below. Howeve r, when the entire solvated protein is simulated, conformational and solvation differe nces between the two states result in a free energy change as Cu(I) is transf erred between the two active sites. Although a simulation of the scheme in Figur e 3-8 would yield a reliable value for the free energy difference between the two states, such a method is not currently applicable. The dual-topology free energy perturbation method was implemented in the gibbs program within AMBER. This method allo ws for the existence of dummy atoms in

PAGE 66

52 Cu(I)S S S S Cys 15B Cys 15A Cys 12A Cys 12B H 2 .3 2 3 2 3 5 0 Cu(I)S S S S Cys 15B Cys 15A Cys 12A Cys 12B 2 3 2 3 5 0 2 3 State 0State 1H Figure 3-8. A scheme for Cu(I) transfer. both the initial and final states of the simu lation. A substantial attempt was made to perform a dual-topology calculation on the sc heme presented above, but the simulation failed. Since dual-topology could not be used to obtain the desired information, a singletopology approach was undertaken. Unlike a dual-topology approach, TI is not a continuous blending of initial and final reacti on states as the simulation proceeds. Instead, TI calculations blend the reac tant and product states in a series of discrete windows. These experiments used twelve windows to desc ribe the reaction from start to finish. The first window would simulate a model whose ch aracter was very sim ilar to the reactant, the sixth window would simulate a species whos e character was nearly an equal blend of the force fields of the product and reacta nt, and the final wind ow would simulate a system nearly identical to the product. Using TI in sander, it is possible to define an initial system with dummy atoms and mutate it to a structure without dummy atoms. In order to compare two different pathways for Cu(I) transfer, this experiment combined a set of two TI calculations. One calculation took the State 0 structure shown above and mutated it into a four-coordinate structure as defined in the previous section. This

PAGE 67

53 simulation represented initial binding by Cy s 15 of the target monomer. A similar simulation mutated State 1 from Figure 3-8 into the same four-coordinate structure. This represented initial binding by Cys 12 of the ta rget domain. Since the two TI simulations shared the same endpoint, the energies of th e two reactions could be compared. Figure 39 depicts the two reaction paths followed in the TI simulations. Cu+Cys 12A Cys 15A Cys 15B Cu+Cys 12A Cys 15A S Cys 12B S Cu+Cys 12A Cys 15A S S Du Model A. Model B. Cys 12B H Cys 15B H Cys 15B Free Energy DifferenceCys 12B Du Equivalent endpoint Figure 3-9. Two separate TI simulations were performed to compare two different Cu(I) transfer pathways from HAH1 to M NK4. The two reactions shown have different starting points, but the same e ndpoint. The top reaction is referred to as Reaction 1 and the bottom is Reacti on 2 in the discussion that follows. Each state contains the same number of atoms, but some atoms may have different atom types. Dummy atoms are used to as placeholders for the hydrogens that appear and disappear during the simulation. Dummy atoms have the same mass as hydrogen

PAGE 68

54 atoms, but no charge. In each reaction, Cu(I) mu tates from a three-coordinate state to a four-coordinate state, so its at om type must be changed. Ini tially, Cu(I) is defined as atom type PP, with bonds to S12A (atom type SA), S15A (SB), and S15B (SD). A weak interaction is defined between PP and S12B (SC). As the simulation progre sses, the ion is mutated to type PQ which is bonded to SA, SB, SD, and S12B (SF). Other atoms that change atom type are the sulfur atoms of Cys 12B and Cys 15B. In the top reaction of Figure 3-9, sulfur 12B is assigned atom type SC. It is bound to C from Cys 12B and a hydrogen of atom type HS (the default AMBER atom type for hydrogen bonded to S in Cys). SC also shares a weak interaction with Cu(I). The sa me atom is changed during the simulation to type SF which is bound to a dummy atom, C of Cys 12B, and Cu(I). The HS atom in the reactant becomes a dummy atom bound to sulf ur SF in the product. SA, SB, and SC do not change atom type in the top reaction and are all considered to be equivalent. Bond angles between SF-PQ-SX (X=A,B, or D) in the product are included in the SF and PQ force fields. The mutation scheme in the lower reaction is similar. In reaction two of Figure 3-9, the metal ion begins as atom type PP, and is bound to S12A (SA), S15A (SB), and S12B (SC) and shares a weak interaction with S15B (SD). Sulfur of Cys 15B is initially identified as atom type SD which is bound to a hydrogen (HS), C from Cys 15B, and has a weak interaction with the metal cen ter. SD is mutated into atom type SE, which bonds a dummy atom and Cu(I) which is labeled as atom type PQ in the product of reaction 2. PQ in the lower reaction binds SA, SB, SC, and SE. In this reaction, SA, SB, and SC are treated equivalently and are not mutated in any manner.

PAGE 69

55 Even though no bond exists between Cys 12B and Cu(I) at the start the reaction in Figure 3-9, a bond must still be defined. This is to satisfy the fact that bonds cannot be created nor destroyed in an MD simulation. In essence, the 5.0 Cu(I)-S12B bond with a very weak arbitrary force constant in the reactant is being mutated into a 2.3 bond in the product with a well defined force constant. The same thing occurs over the course of the simulation of the lower reaction for the Cu(I)-S15B bond. This treatment allows one bond to form with the target over the cour se of the simulation even though no actual bond exists in the initial state. Tables 3-8, 3-9, and 3-10 list the atom types and force field parameters specified for the reactions shown in Figure 3-9. The pa rameters were derived from the threeand four-coordinate solvated and equilibrated structures from the long timescale MD simulations outlined above. Table 3-11 shows th e atomic charges for the atoms in the two reactions of Figure 3-9. The ch arges on the ligating Cys residues can be compared to the default AMBER charges for Cys given in Table 3-4. For brevity, tabl es referring to atom type SX mean that the parameter is the same for all sulfur atoms within the active site. For example, SX-C refers to any bond between sulfur in the active site and C of the ligating Cys residue. Likewise, in Table 3-10, in angles referri ng to SX-PP SC, SX includes any S in that active si te that is not SC. Some bond pa rameters, such as those for HS-SX and C-SX were adapted from the parm94.dat library instead of from QM calculations on the model clusters as were some bond angle parameters such as C-S-HS. The TI calculations were performed on both gas-phase and explicitly solvated proteins. This was done in order to determin e the solvation energy of the protein and any differences in active site geometry that may be caused by solvent. There were some

PAGE 70

56 differences between the solvated and ga s-phase simulations. No periodic boundary conditions were applied to the gas-phase sy stem, which also was given a high cutoff (> 20 ) for nonbonding interactions. The solvat ed system employed periodic boundary conditions, and the nonbonding cutoff was kept at the default value of 8.0 The simulation of the solvated system was pe rformed at constant pressure, while the temperature scaling was set for constant ener gy dynamics. For the solvated system using constant pressure dynamics, anis otropic pressure scaling was used in conjunction with the TIP3P water box. Table 3-8. Atoms, atom types, atomic masses and van der Waals parameters used in TI simulations of Cu(I)-bound HAH1. Atom Atom type Mass (au) van der Waals radius () Well-depth (kcal) Reaction 1 Cu (reactant) PP 63.55 2.40 0.05 Cu (product) PQ 63.55 2.40 0.05 S12A SA 32.06 2.00 0.25 S15A SB 32.06 2.00 0.25 S12B (reactant) SC 32.06 2.00 0.25 S12B (product) SF 32.06 2.00 0.25 S15B SD 32.06 2.00 0.25 HS 12B (reactant) HS 1.008 0.60 0.015 HS 12B (product) DU 1.00 0.00 0.00 Reaction 2 Cu (reactant) PP 63.55 2.40 0.05 Cu (product) PQ 63.55 2.40 0.05 S12A SA 32.06 2.00 0.25 S15A SB 32.06 2.00 0.25 S12B SC 32.06 2.00 0.25 S15B (reactant) SD 32.06 2.00 0.25 S15B (product) SE 32.06 2.00 0.25 HS 15B (reactant) HS 1.008 0.60 0.015 HS 15B (product) DU 1.00 0.00 0.00

PAGE 71

57 Table 3-9. Bond length parameters for the r eactions used in TI calculations of Cu(I)bound HAH1. Bond kbond (kcal/mol 2) r0 () Reaction 1 PP-SA 60.000 2.190 PP-SB 60.000 2.190 PP-SC 0.001 5.000 PP-SD 60.000 2.190 HS-SC 274.000 1.336 C-SX 219.354 1.849 PQ-SX 60.000 2.190 DU-SF 274.000 1.336 Reaction 2 PP-SA 60.000 2.190 PP-SB 60.000 2.190 PP-SC 60.000 2.190 PP-SD 0.001 5.000 HS-SD 274.000 1.336 C-SX 219.354 1.849 PQ-SX 60.000 2.190 DU-SE 274.000 1.336 The TI calculations do not encompass all of the contributions to the free energy change of the reaction. Because bonds cannot be broken or formed in MD simulations, weak interactions were described in pl aces where bonds would be forming over the course of the simulations. This presents a problem in terms of how AMBER deals with bonding and nonbonding interactions. In AMBER, when two atoms share a bond, they are excluded from each others nonbonding intera ctions. In our three-coordinate model, Cu(I) is supposed to be bonded to only thr ee Cys ligands. However, since the fourth S had to be bonded to Cu with a weak inte raction, its nonbonding inte ractions were being neglected. In reality, no bond exists between Cu and a fourth Cys ligand. To compensate for this, the topology files gene rated by LeAP had to be modi fied for the TI calculations so that the weakly bound S would be rem oved from the nonbonding exclusions of Cu and

PAGE 72

58 its neighboring atoms. This allowed for the metal ion and the other Cys ligands in the active site to have nonbonding inte ractions with the unbound Cys. Table 3-10. Bond angle parameters for the r eactions used in TI calculations of Cu(I)bound HAH1. Angle kt (kcal/ mol rad2) 0 (deg) Reaction 1 SA-PP-SB 50.000 109.50 SA-PP-SD 50.000 109.50 SB-PP-SD 50.000 109.50 C-SX-PP 93.700 109.50 SX-PP-SC 0.001 109.50 HS-C-SC 0.001 109.50 H-C-SC 100.000 109.50 C-SC-PP 0.001 109.50 C-SC-HS 43.000 96.00 SX-PQ-SX 50.000 109.50 C-SX-PQ 93.700 109.50 H-C-SF 100.000 109.50 C-C-SX 50.000 109.50 C-SF-DU 43.000 109.50 PQ-SF-DU 50.000 109.50 Reaction 2 SA-PP-SB 50.000 109.50 SA-PP-SC 50.000 109.50 SB-PP-SC 50.000 109.50 C-SX-PP 93.700 109.50 SX-PP-SD 0.001 109.50 HS-C-SD 0.001 109.50 H-C-SD 100.000 109.50 C-SD-PP 0.001 109.50 C-SD-HS 43.000 96.00 SX-PQ-SX 50.000 109.50 C-SX-PQ 93.700 109.50 H-C-SE 100.000 109.50 C-C-SX 50.000 109.50 C-SE-DU 43.000 109.50 PQ-SE-DU 50.000 109.50 While this approach solved one problem, it created another. TI simulations can only read in one topology file for each simulation. Therefore, once the fourth Cu(I)-S

PAGE 73

59 bond formed by the end of the simulation, the atoms were still feeling the nonbonding interactions of the fourth Cys. As a correction to the TI calculation, the effects of forming that bond had to be determined. This was done using free energy perturbation. In separate simulations, the products of th e two reactions in Figure 3-9 were used as the starting points for a FEP calculation. The perturba tion would be the in troduction of the vdW exclusions that were rem oved in the TI calculation. Table 3-11. RESP charges used fo r TI calculations on Cu(I)-bound HAH1. RESP charge Reaction 1 Reaction 2 Atom type Reactant Product Reactant Product PP 0.6483 n/a 0.6483 PQ n/a 1.3670 1.3670 SA -0.8682 -1.0448 -0.8682 -1.0448 SB -0.8682 -1.0448 -0.8682 -1.0448 SC -0.8485 n/a -0.8682 -1.0448 SF n/a -1.0448 n/a n/a SD -0.8682 -1.0448 -0.8485 n/a SE n/a n/a n/a -1.0448 HS 0.5470 n/a 0.5470 n/a DU n/a 0.0000 n/a 0.0000 N -0.4157 -0.4157 -0.4157 -0.4157 H 0.2719 0.2719 0.2719 0.2719 C -0.0351 -0.0351 -0.0351 -0.0351 H 0.0508 0.0508 0.0508 0.0508 C 0.0168 0.1011 0.0168 0.1011 H 0.0053 -0.04934 0.0053 -0.0493 C 0.5973 0.5973 0.5973 0.5973 O -0.5679 -0.5679 -0.5679 -0.5679 Figure 3-10 displays the FEP scheme as a correction to the TI calculations. A fourcoordinate model (the product of the TI simulation) with vdW exclusions removed between the newly bound Cys and the rest of th e active site was simulated using the with the vdW exclusions intact. The FEP calcula tion was a trajectory analysis of the exclusions-removed structure using the excl usions-present Hamiltonian and only took

PAGE 74

60 one step. The free energy difference between the initial and final states of the FEP simulations equaled the contribution to th e total free energy of nonbonding interactions becoming bonding interactions as the new Cu(I)-S bond formed in the TI simulation. Cys 15A Cys 15BSDu Cu+Cys 12A Cys 15A Cys 15BSDu Cu+Cys 12A vdW Exclusions RemovedvdW exclusions in placeCys 12B Cys 12B Figure 3-10. FEP vdW correction to TI on HAH1. Evaluate trajectory of structure with vdW exclusions removed with the Ham iltonian with vdW exclusions intact. The FEP simulations revealed another cont ribution to the total free energy change between the threeand four-coordinate prot eins. While the Cu(I)-S bond lengths for the three initially bound Cys ligands remained unchanged throughout the TI and FEP simulations, the new Cu(I)-S bond did not reach the correct length. This was due to the fact that AMBER did not allow the new S to come any closer than about 2.8 to the Cu(I) ion while the bonding interactions were turned off. In a sense, the penalty for removing the vdW exclusions was not only th e omission of bonding interactions once the new bond had been formed, but also that the new bond was too long. The normal Cu(I)-S bond length was around 2.2 with a force cons tant of 60.00 kcal/mol, but the newly formed Cu(I)-S bond was 2.8 with the same force constant. Another correction to the TI simulation had to be made in the form of reeling in the newly-bound S to the metal center. The energy profile of shortening the bond length could be generated by a potenti al of mean force calculation during which the products of the reactions listed in Figure 3-9 (with vdW ex clusions in place) w ould again be used as starting structures. The four-c oordinate structures featured three Cu(I)-S bonds of the

PAGE 75

61 correct length, and one Cu(I)-S bond that was too long. The fourth S bond would be contracted from 2.8 to 2.0 over the c ourse of twenty-three windows in the PMF simulation. Figure 3-11 shows the reaction sc heme for the PMF experiment, and Figure 3-12 shows the energy profile of shorteni ng the final Cu(I)-S bond. A steep harmonic potential was induced upon the long Cu(I)-S bond with a minimum at 2.1 The energy difference between the initial bond length and the minimum energy bond length on the PMF curve served as the third and final cont ribution to the free energy change of the reactions in Figure 3-9. Th e data from the PMF experiments were connected using weighted histogram analysis with the WHAM software.24,25 Cys 15A Cys 15BSDu Cu+Cys 12A Cys 15A Cys 15BSDu Cu+Cys 12A vdW exclusions in placevdW exclusions in place2.77A 2.03A Cys 12B Cys 12B Figure 3-11. Bond length correction to FEP calculations by PMF: contract Cu(I)-S bond length from ~2.8 to ~2.0 by PMF analysis of twenty-three windows. The overall free energy change from thre eto four-coordinate Cu(I) in HAH1 is the sum of the TI mutation, the FEP trajectory analysis for vdW interaction correction, and the PMF for bond length correction. Table 3-12 lists the free energy changes for the TI reaction shown above for both gas-phase a nd aqueous systems. As shown, the addition of solvent lowers the energy barrier of bindi ng the fourth S to Cu(I). Recalling Figure 39, the endpoints of each reaction are equiva lent. So the free energy difference between the two different three-coordi nate reactants can be determ ined by taking the difference between the total free energy differen ces of their resp ective reactions.

PAGE 76

62 0 2 4 6 8 10 12 14 16 22.12.22.32.42.52.62.72.8 Cu-S bond length () Figure 3-12. PMF curve of solvated HAH1 showing minimum energy Cu(I)-S12B bond length near 2.1 for the bonding of Cys 12B to Cu(I). 0 2 4 6 8 10 12 14 16 22.12.22.32.42.52.62.72.8 Cu-S Bond Length () Figure 3-13. PMF curve for the binding of Cy s 15B to Cu(I) in solvated HAH1, showing a minimum energy bond-length of just over 2.1 for the Cu(I)-S15B bond. Table 3-12 shows the free energy change of the reactions displayed in Figure 3-9, and Figure 3-14 plots the fr ee energy difference between the two different three-

PAGE 77

63 coordinate states in the explicitly solvated protein. These values show that the Model A structure of Cys 15B of the target monomer bi nding Cu first is energetically favorable to Cys 12B binding Cu first by 24.7 kcal/mol. Table 3-12. Free energy changes for TI calculations on the reac tions shown in Figure 3-9. TI G Model A: Cys 15B binding 1st Gas 224.2 Solvent 177.7 Model B: Cys 12B binding 1st Gas 213.9 Solvent 153.0 Solvated Reactions G 3coord A 3coord B 24.7 kcal/mol 4coord 2coord Figure 3-14. The free energy difference by thermodynamic integration between the different three-coordinate Cu(I)-bound HAH1 dimers. Table 3-13. The free energy difference of changing the coordination environment of Cu(I) in HAH1. Cys 12B unbound Cys 15B unbound G Gas 255.5 252.8 -2.7 Solvent 219.5 193.8 -25.7 Solvent effect -23.0 Values are in kcal/mol.

PAGE 78

64 Conclusions The results from the QM and MM studies on the model systems and the HAH1 dimer can be interpreted to suggest an ener getically favorable or der of Cu(I) transport between the active site of a donor HAH1 mono mer and the active si te of the fourth domain of the Cu(I) receptor MNK. The QM cal culations done in the first part of this experiment created a foundation for the desc ription of the Cu(I) binding site in HAH1. Further QM work detailed the thermodynamics of Cu(I) thiolate clusters as models of the active sites of the MT/CXXC fa mily of Cu(I)-binding metalloproteins. The first part of the molecular dynamics study was to create a force field to describe the atoms involved in Cu(I) binding in HAH1 based on the QM calculations. Then, MD simulations were performed with the new force field. Analysis of these simulations showed the accuracy and reliability of the new force field paramete rs. The final stage of the experiment was an investigation of the free en ergy of Cu(I) transport between two metal binding domains. The HAH1 dimer was used as a model for Cu (I) transfer from the active site of a HAH1 monomer to the fourth domain active site of the Wilsons disease protein. In this model, Cys 12A and Cys 15A of the HAH1 di mer represent the donor active site, while residues Cys 12B and Cys 15B represent the metal binding site of MNK4 which are Cys 14 and Cys 17, respectively. Mechanistically, the free ener gy calculations suggest that when a Cu(I) is being transferred from the HAH 1 binding site to the MNK4 site, that Cys 17 of the MNK protein fourth domain is en ergetically more favorable to bind the incoming Cu(I) before the more solvent-exposed Cys 14. Physically, this makes sense due to the fact that the solvent-exposed Cys 14 is farther away from the protein-protein interface than Cys 17 on the target domain and that solvent interactions would stabilize Cys 14 on the surface of the protein. At that point, Cys 12 of the donor domain would

PAGE 79

65 start releasing Cu(I) as Cys 14 of MNK4 started to bind the ion. There is no evidence that a purely four-coordinate Cu(I) species exists during copper transport. This is supported by the QM results early in the study. Instead, it appears that the Cu(I) ion is nearly always three-coordinate as it is transferred between the two proteins. In th e proposed transfer mechanism, Cys 15 of HAH1 is the last donor residue to release the copper ion. When copper transfer is complete, HAH1 is no l onger bound to copper and the active site of MNK4 complexes the Cu(I) ion.

PAGE 80

66 CHAPTER 4 ELECTRONIC STRUCTURE OF THE AC TIVE SITE OF AMINOPEPTIDASE FROM Aeromonas proteolytica AAP Introduction Zinc-dependent peptidases such as bovi ne lens leucine aminopeptidase (bLAP), carboxypeptidase A, thermolysin, and the aminopeptidase from Aeromonas proteolytica (AAP) play important roles in tissue repair, carcinogenesis, protein maturation, cycle cell control, the regulation of hormone levels,56,57 and the degradation of DNA, RNA, phospholipids, and polypeptides.58 Improper functioning of aminopeptidases has been linked to health issues includi ng aging, cataracts, inflammation, cystic fibrosis, cancer, and leukemia.56-60 Despite the variety of cellular processes in which aminopeptidases are involved, not much was known about their exact functions or mode of action until recently. Peptidases such as carboxypeptidas e A and thermolysin which utilize a sole Zn2+ ion for catalysis have been extensivel y studied and their modes of action are relatively well understood.56,58,60 Aminopeptidase from Aeromonas proteolytica is a dinuclear metallohydrolase which employs two Zn2+ ions to catalytically cleave the Nterminus of a polypeptide chain. Its small size (~ 32 kDa), high thermal stability, and functionality as a monomer 61,62 led AAP to being one of the first peptidases to be isolated and characterized in detail.63 Substituting the spectroscopically silent Zn2+ ions with Co2+ or Cu2+ allowed for further kinetic and mech anistic studies on the protein and did not adversely affect catalytic activity.63,64 In fact, some hyper-active species of AAP were created by these susbstitutions.65

PAGE 81

67 Native AAP (Figure 4-1) contains two Zn2+ ions in the active site, but can perform its function at 80% efficiency with only one Zn2+ present.66 In fully functioning AAP, both cations are present and perform some catalytic func tion. The reason why some peptidases function in a mononuclear capacity while others require multiple ions for full efficiency is not yet understood.66 The binding pocket in AAP has been shown to bind all N-terminal amino acids and can accommodate all penultimate residues except Glu and Pro. Being largely hydrophobic in nature,67 the active site pr eferentially binds hydrophobic residues with Leu bei ng the most easily cleaved.62 Zn1Zn2Asp117O O N N O N N His256His97O Asp179O Glu152B O OH H O O Zn1Zn2Asp117O O N N O N N His256His97O Asp179O Glu152O H2N O HO Tyr225 O O Figure 4-1. AAP active site inhi bited by Tris (left) and BuBA Investigation of the X-ray structures of these complexes shed light on substrate conformation and a potential mechanism for peptide hydro lysis in AAP. PBD ID 1LOK, 1CP6 The metal-binding pocket of AAP is charac terized by several Asp, Glu, and His residues which coordinate the Zn cations. X -ray crystallographic studies on native AAP have predicted a tetrahedral (Td) geometry for both cations when no substrate is present,68 although in its closed-she ll electronic state Zn2+ shows no preference for either octahedral (Oh) or Td geometry.57 Beyond the divalent cations, other catalytically important features of the binding site include the bridging water/hydroxide molecule and Glu151 each of which have potentially important roles in th e proposed catalytic mechanism of AAP. In 1992, Chevrier et al. were the first to produce a high resolution (1.8 ) crystal structure

PAGE 82

68 of native AAP. This pioneering work not only showed that the active site was dinuclear, but it also identified the ke y first shell Zn-complexing residues Asp117, Asp179, Glu152, His256, and His97.67 Further high resolution crystallo graphic studies on inhibitor-bound AAP carried out by several groups have si nce clarified the roles of second shell complexing residues such as Glu151, Tyr225, Ser228, Cys227, and Asp99, the importance of the bridging water/hydroxide a nd other water molecules in the active site, active site coordination geometries upon substrate binding, and have lead to the proposal of several catalytic mechanisms.62,66,69-71 Zn1OH O O Glu151O N N His256O Glu152H Zn2Asp117O N N O Asp179O O His97 F-O O Glu151H Zn2Asp117O N N O Asp179O O His97 Zn1F O N N His256O Glu152 Figure 4-2. In fluoride inhibition st udies of AAP, it was shown that a Fion displaces a terminal hydroxide group, deactivating the enzyme. Several inhibition studies have been perfor med on this system to complement the crystallographic work. The twof old purpose of these studies ha s been to both characterize the nature of the inhibited protein and to investigate possible drug candidates for enzyme inhibition. Beyond the preference for hydrophobic residues in the bindi ng cleft, potential substrates should have a free -amino group in the L-configuration.57 At present, several well-known peptide inhibitors have been shown to inhibit AAP. Potent inhibitors include L-leucinethiol, hydroxamates, -hydroxyamides, and notably 1-butaneboronic acid (BuBA) and Tris.57,59,62,63,72-76 Inhibitor binding to both cations is not necessary for AAP inhibition, and X-ray structures of both th e Trisand BuBA-inhibited enzyme (Figure 4-

PAGE 83

69 2) have revealed that th e water/hydroxide bridge be tween the cations is broken.72 These data suggest that the -aqua bridge is broken to form a terminal hydroxylgroup at some point during peptide hydrolysis in order fo r the enzyme to function properly. This hypothesis is confirmed by fluoride inhibition studies of native AAP.57,77 A single fluoride ion binds to Zn1 in the active site (Figure 4-3), displacing the terminal water/hydroxylgroup after substrate bindi ng, and the reaction does not proceed. Inactivation only occurs after substrate bi nding, suggesting that a terminal hydroxylgroup is not present until the carbonyl oxygen of the activated scissile bond has bound to Zn1 and peptide hydrolysis is underway. Interest ingly, chloride ions do not inhibit AAP up to a 2 M concentration because they do not bind with sufficient strength to the cation in the active site.57,77 The highest resolution structur e of AAP was obtained in 2002 by the Petsko lab.59 The 1.2 structure of native AAP in Tris buffer reduced the amount of structural uncertainty due to side chain motion, determined the position of several hydrogen atoms in the protein, and clarified to some degree the geometry of the Trisbound active site.59 Of note is how the distances betw een the Zn ions and the bridging O atom change from the unbound native struct ure 1AMP and the Tris-bound structure 1LOK. In the unbound protein Zn1-O and Zn2-O distances are 2.29 and 2.25 respectively. The Tris-bo und active site reveals Zn1-O and Zn2-O distances of 1.95 and 2.21 The change in Zn-O distances may be further evidence of the conversion of the bridging water/hydroxide group into a terminal moiety. Both cations must be present in the active si te in order for AAP to be fully efficient, and they must have a task to perform during peptide hydrolysis. Zn2+ has been shown to be a hard acid,78 and the mono-zinc environment ha s been shown by Christianson and

PAGE 84

70 Cox to reduce the pKa of a single water molecule in bulk solvent from 15.7 to 9.0.59,79 The pKa of bound water in a di-zinc environment is expected to be much less than 9.0. It has been proposed that the two cations, each acting as a Lewis acid, perform separate but equally important functions in the reaction cycle. A common thread between several Zn1Zn2O O O Glu152NH2NHOOC O H O O Asp179O O Glu151H H Zn1Zn2O O O Glu152NH2NHOOC O H O O Asp179O O Glu151H H PheZn1Zn2HO O O Glu152NHO O O Asp179 Zn1Zn2OH O O Glu152NH2NHOOC O H O O Asp179 O O Glu151 LeuH2OOH O Glu151 Zn1Zn2Asp117(H)HO O O Glu151O N N O N N His256His97O Asp179O Glu152O O N NH2HOOC O H Zn1Zn2(H)HO O O Glu151O O Glu152N NH2HOOC O H O O Asp179 Zn1Zn2(H)HO O O Glu151O O Glu152O O Asp179 Phe-Leu 1. 2. 5. 7.6.3. 4. Figure 4-3. A proposed mechanism for AAP pe ptide hydrolysis show ing proton transfer to Glu151, formation of a terminal hydroxylgroup, a gem-diolate intermediate, donation of a proton back to the leaving amino group, and reformation of the water/hydroxide bridge. Adapted from Petsko.59 proposed mechanisms has been that th e N-terminal amino group binds to Zn2 and Zn1 binds the carbonyl oxygen of the activated sc issile bond. The mechanism proposed by Stamper et al. based on kinetic, crystallographic, and spectroscopic studies shows Zn1 binding to the carbonyl group of the scissile bond, followed by the N-terminal amino group binding to Zn2.57,66,71,77,80-82 Holz reasons that carbonyl binding occurs prior to amino binding as a result of inhibi tion studies of LeuSH on [CoCo(AAP)].57,73 The observed geometry of boron in the study of BuBA-inhibited AAP is further evidence of

PAGE 85

71 this binding sequence.57,70 As stated earlier, other key players in AAP peptide hydrolysis are Glu151 and the bridging wate r/hydroxide group. In proposed mechanisms, a bridging or terminal OHwould serve as a nucleophile and Glu151 would act as a general base.57,61,62,66,71,77 Beyond the observations made from fluorid e inhibition studies, there is further evidence of a terminal hydroxyl group. Bu BA inhibition studies show that when a substrate is bound, the distance between Zn2 and Asp117 and Asp179 is decreased to 3.0 from 3.4 in the native structure. The decr eased distance allows for the formation of a strong hydrogen bond between Glu151 and Hi s97 that does not exist in the unbound protein. Substrate binding also brings As p99 closer to His97, creating yet another hydrogen bond. The proximity of the tw o negatively charged residues to Zn2 along with the newly formed H-bonds effectively stabilizes the charge neutrality of Zn2 and regulates its Lewis acidity. A suffici ent decrease in the acidity of Zn2 would facilitate the formation of a terminal water/hydroxide group on Zn1.57,70,83 Chen et al.77 suggested that Glu151 assists in the deprotona tion of a terminal water molecule, resulting in a nucleophilic hydroxomoiety, followed by att ack by that group on the carbonyl oxygen of the scissile peptide bond, forming a gem-diolate intermediate characterized by two oxygens binding to Zn1. The gem-diolate is stabilized through its interaction with both Zn ions.59 At this point, the reaction proceeds toward completion with Glu151 donating a proton back to the penultimate amino group ( now the N-terminus of the leaving group), which departs the binding cleft upon C-N bond cl eavage, the rate-limiting step of peptide hydrolysis.77 The final step is the reformati on of the water br idge between Zn1 and Zn2 as

PAGE 86

72 the active site returns to its native unbound conformation. Other publications have since supported the mechanism proposed by Chen et al.57,59,84 Many mechanisms have been suggested for peptide hydrolysis by AAP, and there are some contentious points among them. Overall, assumptions are made about protonation states of the wate r bridge and Zn-binding residue s, and the conformation of the substrate in the active site. Desmarais et al. contend that uncertain ties in the reaction mechanism can not be clarified without more detailed knowledge of the electronic structure and protonatio n state of the metal ions, water molecules, and residues in the immediate active site.59 Despite their exhaustive QM /MM study of the AAP peptide hydrolysis mechanism, Schrer et al. suggest that molecular dynamics simulations of the protein are needed to take accurate account of conformational movements of the protein and substrate.71 While Schrer et al. suggest that high level ab initio or DFT studies on the complete AAP active site would be prohib itively expensive, those experiments have been performed in this study. A series of those calculations have produced some data about the electronic structur e and geometry of several act ive site protonation states. Effects of 1st-Shell Mutations Numerous inhibition, crystallographic, kinetic, and computational studies have been performed on the aminopeptidase from Aeromonas proteolytica (AAP) in order to gain a better understanding of the mechanism of the peptide hydrolys is reaction catalyzed by the enzyme.57,59,62,66,67,71,84 However, the research perfor med on AAP to this point has yet to answer key questions re garding the protonation state of the Zn-Zn bridging species in the native and active states of the enzyme the role of Glu151 in the reaction, and the electronic structure of the di nuclear center. Crystallization of Tris-inhibited AAP and its structural characterization by XRC to a re solution of 1.2 yielded new information

PAGE 87

73 about the side chain conformations of severa l residues in inhibited AAP as well as the positions of some hydrogens in the enzyme.59 However, that study was not able to determine the nature of the Zn-Zn bridge in the protein. Schrer, Lanig, and Clark completed a detailed QM/MM study of th e AAP peptide hydrolysis mechanism by determining relative energies of several possibl e intermediate and transition state species using AM1 and VAMP for the QM and MM regions, respectively.71 This computational work entails fu lly quantum geometric and energetic optimization of the AAP active site using B3LYP/6-31G* in Gaussian 03.85 Here, data is presented from these studies pertaining to the electronic structure a nd coordination of the di-zinc-containing AAP active site. The active site model employed in these calculations is similar in nature to the one investigated by Schrer et al., comprising the side chains of Asp117, Asp179, His256, His97, Glu152, Glu151, two Zn2+, the bridging species, and crystallographic water molecules within the active site. However, none of the structures of 1st-shell mutations include an inhibitor mol ecule or the second-shell residues Asp99, Cys227, Ser228, and Tyr225. The initial activ e site geometry was obtained from the 1.8 resolution crystal structure of AAP (Fi gure 1, PDB-ID 1AMP) obtained by Chevrier et al. in 1992.67 This structure was modified by the remova l of the backbone atoms of each residue except for the bridging atoms between Glu 151 and Glu152 and the addition of one or two protons to the bridging oxygen. Other st ructures were created by protonating Glu151 at the oxygen closest to the di -zinc bridge. The ini tial structures were used to generate B3LYP/3-21G* optimized geometries in Gaussian 03. Those models were in turn used as starting structures for the final B3 LYP/6-31G* optimization. Single point

PAGE 88

74 Figure 4-4. The general model for the QM work is the AAP active site from PDB structure 1AMP, the 1.8 resolution structure elucidated by Chevrier and Schalk.67 Asp 117 is below the two Zn ions, with Zn2 on the left and Zn1 on the right. The residue at the top of the active site is Glu 151. Zn2 is bound to His 97 and Asp 179, and Zn1 is complexed with His 256 and Glu 152. energy calculations at a higher level of theory such as MP2, have not yet been attempted because the computational expense of such an experiment would be too high, and the resulting energies from the geometry optim izations performed here are sufficiently accurate for future mechanistic studies. Ultimatel y, the goal of this work is threefold: to investigate the different protona tion states of the water/hydro xide bridge, to measure C-O bonds in the Zn-coordinating carboxylate re sidues, and to gauge the importance of Glu151 as a proton acceptor in the initi al stages of AAP peptide hydrolysis. By performing calculations on an array of pr otonation states, the relative energies between possible intermediates of the hydrolysis reaction were able to be determined, namely in a potential initial proton transfer from the water bridge to Glu151. The first

PAGE 89

75 system that was investigated was one where a water bridge exists between Zn1 and Zn2. This species was compared to an active si te with a hydroxide br idge and a protonated Glu151, with both models containing 78 atom s. The relative ener gy of the optimized structure of the hydroxide-bridged state shows that it is energetic ally favored over the water-bridged state by more than 4.10 k cal/mol. However, upon inspection of the optimized water-bridged struct ure, it was seen that one hydr ogen from the water bridge transfers to Glu151. Another feature of the optimized wa ter-bridged structure is the formation of interactions betw een E152 and H256 and between Zn1 and crystallographic water that was retained in the active site The optimized structure of the hydroxidebridged model with E151 initially protonate d reveals protonation of D179. D179 gains a proton from one of the water molecules reta ined in the active site, while the hydroxide ion formed by that deprotonation interacts with H256 on the other side of the active site. In the end, the energy difference between the two 78-atom models may be attributed to the different interactions and conformati ons that form during the optimization. The next study compared a model w ith an initial hydroxide bridge and unprotonated Glu151 to an oxygen-bridged activ e site with Glu151 being protonated, with each model containing 77 atoms. The hydroxide-bridged model is 4.62 kcal/mol more favorable. Upon comparison of multiple species, the structures with an OHbridge are lower in energy than either the O2-bridged model or the H2O-bridged model. This trend was also observed during a simple single point energy comparison between different protonation states of the native crystal structure w ithout geometry optimization. This may suggest that favorable intermedia tes in the reaction mechanism may all have bridged or terminal OHspecies as opposed to O2or H2O bridges.

PAGE 90

76 These data generally sup port the previous proposal that a high-energy waterbridged active site would initially donate one of its hydrogens to Glu151 in order to produce a nucleophilic hydroxide-bridging sp ecies in an exothermic process.77 Further interpretation of these results suggests that an initial hyd roxide bridge would not donate its hydrogen to Glu151. In that instance, Gl u151 would not have a pr oton that it could donate to the N-terminus of a polypeptide chai n within the active site as proposed. Both systems reveal the stability of a hydroxide bridge over an oxy lor water-bridge between Zn1 and Zn2. In all cases, the crystallographic wate r molecules in the active site work together with the bridging species and the surrounding carboxylate resi dues to establish a robust hydrogen-bonding network within the ac tive site. A more detailed discussion of the optimized geometries of seve ral models is included below. Table 4-1. Electrons in the side chains of Asp117 and Asp179 are equally delocalized over the carboxylic acid region, while Glu151 and Glu152 side chains contain one C-O bond with more electr on density than the other. Asp117 Asp179 Glu151 Glu152 C-O 1 C-O 2 C-O 1 C-O 2 C-O 1 C-O 2 C-O 1 C-O 2 H2O bridge 1.28 1.26 1.26 1.28 1.24 1.31 1.36 1.22 OHbridge 1.27 1.26 1.26 1.27 1.28 1.32 1.30 1.24 O2bridge 1.27 1.27 1.25 1.28 1.22 1.33 1.28 1.25 GluH + O2bridge 1.26 1.27 1.24 1.30 1.22 1.33 1.28 1.25 GluH + OHbridge 1.27 1.27 1.24 1.30 1.23 1.33 1.29 1.25 X-ray 1.22 1.24 1.22 1.22 1.24 1.23 1.23 1.22 Values in C-O bond lengths for Asp117, Asp179, Glu151, and Glu152 are listed in Table 4-1. It is clear that both C-O bonds equally share the electrons in the carboxylic regions of the aspartic acid residues as the bond lengths are nearly identical. The acidic regions of the Glu residues do not share this feature. Instea d, one C-O bond is clearly higher in electron

PAGE 91

77 density, while the other C-O bond is comparativ ely longer. These data suggest that the oxygens in the Asp117 and Asp179 are coordinating equally into the metallic center of the active site. On the other hand, only one oxygen of the side chains of Glu151 and Glu152 is coordinating with the zinc centers. In the case of Glu151, the side chain serves a role in the H-bonding network that exists throughout the active site. Along with the protonation state of th e bridging species, the metal-binding carboxylic amino acids Asp117, Asp179, and Gl u152 are interesting research subjects. An analysis of C-O bond le ngths and O-Zn distances he lp describe bond order and electron density and metal ion coordination, respectively. Measurement of the C-O distances in the carboxylic ac id side chains of the Zn-c oordinating residues yields information about the locali zation of electrons within the carboxyl regions of the coordinating residues. Table 4-2 lists Zn-Zn, Zn-O, and Zn-Asp117 distances for the structures shown in Figures 4-5 and 4-6. Generally, Zn-Z n distances are around 3.3 which is shorter than the inter-zinc distan ce in the 1AMP crystal structure. The one exception is for structure 4-6f in which a te rminal peroxogroup exists and the two zinc ions are separated by more than 4.0 Table 4-2. Several distances are shown for Zn -Zn and Zn-O interactions for the structure shown below in Figures 4-6 and 4-7. Zn1-Zn2 Zn1-O Zn2-O D117-Zn1D117-Zn2 E151-O H2O bridge 3.31 1.96 2.02 1.99 2.00 2.55 OHbridge 3.28 1.96 1.97 1.96 2.00 3.28 O2bridge 4.23 1.97 4.03 2.02 2.01 3.27 GluH + O2bridge 3.37 2.05 1.99 2.01 1.98 2.71 GluH + OHbridge 3.26 1.96 1.95 1.96 1.97 2.77 X-ray 3.47 2.25 2.29 2.05 2.01 3.30 Values in

PAGE 92

78 Figure 4-5. B3LYP/6-31G* optimized geometries of two models of the AAP active site. Asp 117 is shown in the upper-right of both pictures, binding each Zn. Zn1 is the ion on the left and Zn2 on the right in each structure. The structure on the left is from an original ly water-bridged structure and Glu151 has gained a H, while the structure on the right started with a OHbridge and Asp 179 gains a H from a crystallographic water. The geometry optimizations of several va riations of this active site model are discussed here. Generally, the starting struct ures are the same in each case, and the first three models we investigated differ only by the protonation state of the bridging group while Glu151 is unprotonated (Figure 4-6).A s shown in Figure 4-5a an initial water bridge with an unprotonated Glu151 is optimized to a OHbridge as the initial water loses a hydrogen to nearby Glu151. The Zn-Zn distance decreases from the 3.47 shown in the original crystal structure to 3.31 in the optimized structure. The optimization of the second model (Figure 4-5b) is mo re complicated as the initial OHbridge and unprotonated Glu151 becomes a bridging O2H peroxospecies and Glu151 and Glu152 are both protonated. It appears that a crys tallographic water and the initial bridging OHdonate one H each to Glu151 and Glu152.

PAGE 93

79 Figure 4-6. Starting structur es (left) and B3LYP/6-31G* optimized geometries for different Zn-Zn bridging species within the active site of AAP. a) a water bridge b) a hydroxlbridge c) an oxylbridge. This type of bridging gro up has not yet been discussed as a possibility in the proposed reaction schemes. However, the fo rmation of the peroxogroup may be one consequence of not modeling an inhibitor into the active site. Not this optimized structure nor any other optimized struct ure containing a peroxobrid ge was shown to have the lowest relative HF energy to structures with a similar number of atoms. The last model of the first group (Figure 46b) contains a bridging O2ion. This optimization results in the formation of two terminal species. On Zn1, a terminal peroxogroup forms, similar in nature, but not geometry to the peroxogroup formed during the optimization of structure a. b. c. d. e. f.

PAGE 94

80 4-5b. Then, one of the crystallographic water molecules becomes terminal to Zn2. Glu151 becomes protonated as the water molecule that helps to form the terminal peroxogroup on Zn1 donates one of its hydrogens to it. The fi nal two models that were studied both start with a neutral Glu1 51 while differing by a O2(Figure 4-6a) and a OH(Figure 4-6b) bridge. Figure 4-7. Initial structures (left) and B3 LYP/6-31G* optimized geometries for models with Glu151 protonated. The starting stru ctures vary only in the protonation state of the bridging group. Structure a) contains an O2bridge and the Zn ions in c) are bridged by a hydroxide group. When model 4-6a is optimized, the forma tion of another peroxogroup is observed as a crystallographic water donates a hydrogen to Asp179 and the remaining OHbinds to the original bridging O2-. Glu151 remains protonated th roughout the optimization. Structure 4-6b forms yet another interesting structure upon optimization. In this case, both Glu151 and the OHbridge retain their original pr otonation states. However, one of a. b. c. d.

PAGE 95

81 the crystallographic water molecules donate s a hydrogen to Asp179 while the remaining OHgroup complexes with His256. Conclusions Here, the initial efforts to detail the 1st-shell electronic structure, geometry, and protonation states of the active site of th e aminopeptidase from Aeromonas proteolytica have been described. However, much work rema ins to be done until a complete picture of the mechanism of peptide hydrolys is in AAP can be revealed. One purpose of this study was to investigat e the different protona tion states of the water/hydroxide bridge. To that end, ma ny model active sites were created, each containing one of the three possible bridging sp ecies. In each case, the bridge interacted to some extent with the surrounding crystallogr aphic waters in the active site. In some cases, a hydrogen-bonding network was establis hed which helped to stabilize the active site structure. Some minimu m energy structures also cont ained a peroxospecies that resulted from an oxo-bridge interacting with an active site water molecule. Finally, the DFT minimization calculations suggested th at a hydroxide bridge was the most energetically stable, supporting some mech anistic studies previously done by other groups. Another facet of the AAP study was to measure C-O bonds in the Zn-coordinating carboxylate residues. Each carboxylate side chai n that complexes a Zn(II) ion can do so in either a monodentate or bidentate manner. Equivalent C-O bond le ngths suggest that the electronic character of th e side chain is distributed evenly throughout the carboxylate region and that each partially negative oxygen is interacting with a Zn cation. Residues in which one C-O bond is noticeably longer than th e other indicate a re sidue that binds a metal cation in a monodentate fashion.

PAGE 96

82 Finally, to gauge the importance of Glu151 as a proton acceptor in the initial stages of AAP peptide hydrolysis active sites were created with and without this residue. It was shown through DFT minimization th at in species containing a OHor H2O bridge between the Zn ions, that proton transfer occu rred between the bridge and the previously unprotonated Glu151 residue. This further suppo rts the notion that the bridging species must either be water or a hydroxide ion. Moreover, it suggests the necessity for an unprotonated Glu151 before substrate binding can occur. This work has only scratched the surface of the computational work that can be performed on the AAP system. Other studies are currently underway to investigate the effects of 2nd-shell mutations around the active site. That study hopes to identify other key residues in active site geometry that may also participate in s ubstrate binding or that may be targets for drug interactions. Some st ructures have been resolved which contain some small molecule substrate. Investigation of these structures c ould be used to better determine the electronic structure of substrat e binding and locate any substrate interaction with 2nd-shell residues. Another stu dy that is currently being performed is the full QM minimization of the implicitly solvated protei n. In this work, the native unbound protein is being minimized along with structures containing 2nd-shell mutations. Comparing the minimized structures of the native protein and mutant proteins will illuminate the effects of the mutations on the overall structure of the active site.

PAGE 97

83 CHAPTER 5 SURVEY OF DENSITY FUNCTIONAL THEORY METHODS Introduction The availability of large-scale parallel high-performance computer clusters is facilitating the application of ab-initio methods to large chemical systems such as biomolecules. Density Functional Theory (DFT ) methods are a sensib le choice for use in such calculations due to their relatively low expense compared to Hartree-Fock (HF) and post-HF methods and for the array of specific functionals which can be employed. However, when presented with a list of all of the DFT methods available, a scientist may only see an alphabet soup. Choosing an appr opriate functional and basis set can be a daunting task, even for a seasoned computationa l chemist. One purpose of this survey is to evaluate a host of widely used DFT me thods so that members of the scientific community at large can find which met hod is best suited to their needs. Another way that the data pr esented in this study may be used is to quantify any progress made by recent DFT methods. The number of methods available to computational chemists has greatly increased ov er the last ten years and it seems that new functionals are being introduced every month in the literature. This work allows for the comparison of old tried-and-true methods to some of the newer functionals over a broad sampling of molecular properties. Ultimately, this survey, which comprised more than 150,000 individual computational jobs, is the largest of it s kind ever performe d. Although some DFT methods have been omitted, a fair sampling of five families of DFT functionals is

PAGE 98

84 presented and evaluated. The end result is a us eful reference guide for future research on large scale organic and biomolecular systems using ab initio methods. The next section in this chapter outl ines the theory and development of ab initio techniques in a general sense from wave f unction methods to the most recent density functionals. Specific methods for calculating individual molecular properties are also discussed in the next section. Other sections of this chapter provide in-depth analysis of each facet of this work, addressing each mo lecular property in turn. The final section recapitulates the entire study with general conclusions. Methods All of the calculations in this work were performed using Gaussian 03 Rev C.011 and version D.01 and the functionalities th erein. More detail ed property-specific calculations are described below in individual sections. While the main focus of this work is to evaluate DFT methods, Hartree-Fock and second-order perturbation (MP2) methods are included for further compar ison. A brief theoretical intr oduction to density functional methods is given in this secti on along with some discussion on the basis sets used in this work. Schrdingers equation can yield the exact energy of a system if the complete wave function and Hamiltonian are employed. Ho wever, a complete wave function and Hamiltonian are far too comput ationally expensive to be tr actable and are difficult to define for multi-electron systems. A series of approximations have been adopted to simplify the Hamiltonian, thereby limiting th e number calculations that must be performed on a system. A complete Hamiltoni an for a system of N electrons takes the form:

PAGE 99

85 1 2 i 1 Ni 2 1 2 MA A 1 M ZAriA A 1 Mi 1 N 1 rij j i Ni 1 N ZAZBRAB B A MA 1 M (5-1) The first term is the classical kinetic energy operator for the electrons. The third term is the Coulombic term fo r electrostatic interactions between the electrons and the nuclei, and the fourth term represents the charge repulsion between electrons. The second and fifth terms deal with nuclear kinetic ener gy and charge interactio ns, respectively, and are reduced to constants by the Born-Oppenhe imer approximation which treats nuclei as fixed point charges in a field of moving elec trons. The reduced form of the Hamiltonian after the B-O approximation is known as the electronic Hamiltonian of electrons moving in a field of fixed nuc lear point charges: 1 2 i 1 Ni 2 ZAriA A 1 Mi 1 N 1 rij j i Ni 1 N (5-2) Following this approximation, the total energy of the system is the sum of the electronic energy and the constant nuclear char ge interaction energy, which is dependent on the orientation of the nuclei to each other in space. A nuclear Hamiltonian can be used to account for motion of the nuclei as well. This simply consists of the second term from equation 5-1 and an added potential for nuclear motion. The B-O approximation and its associated Hamiltonian satisfact orily describe the spatial parameters of the electron field. But to fully characterize an electron, spin must be taken into account. The concept of spin is roughly derived from the Pauli Exclusion Principle to ensure that no two electrons on an atom exist with the same energy or quantum configuration. A common visualization is that of one spin-up and one spindown electron occupying a full orbital. In othe r words, one spatial or bital gives rise to

PAGE 100

86 two unique spin orbitals. Electronic spin sa tisfies the notion of antisymmetry, which prohibits the existence of two like electrons in the same orbital. A spin orbital, ,i represents a complete picture of an electr on both spatially and in terms of its spin.2 A new approximation is made to deal with the fully represented electrons, allowing for correct placement of the electrons into orbitals in a manner that satisfies the antisymmetry rule. When an antisymmetric wa ve function (equation 5-3), comprised of the spin orbitals of a ground state N-elec tron system, is operated upon by a Hamiltonian the lowest possible energy is returned (equation 5-4). 0 1 2... N (5-3) 00 0 (5-4) When E0 is minimized with respect to the spin orbitals of 0, the Hartree-Fock equation can be used to determine the optimal spin orbitals for the system: fi xi xi (5-5) This is the central tenet of the Hartree-Fock approximation and is the common starting point for more accurate quantum chemical methods. The Fock operator, fi is a one-electron operator and HFi in equation 5-6 is the e ffective potential incident upon electron i due to the other electrons in the syst em. In this representation, the many-body problem of electron-electron interaction has been reduced to a one-electron problem as electron-electron repulsion has been treated in an average manner.2 fi 1 2 2 ZAriA A 1 MHFi (5-6)

PAGE 101

87 The field experienced by electron i is related to the spin orbitals of all other electrons in the system. This is referred to as electron correlation. In order to account for this relationship, the HF equation is solved in an iterative ma nner known as the selfconsistent-field (SCF) method. In the SCF meth od, the spin orbitals are first described by an initial guess, from which an initial averag e electronic field can be calculated. The spin orbitals are subsequently modified until slig ht changes in the spin orbitals no longer affect the average field. At this point th e energy has converged to the HF minimum. Using this method, however, can produce an infinite number of solutions to the HF equation since the number of accessible orbita ls is theoretically very large. Basis functions are imposed on the system to limit the number of molecular orbitals that can be accessed by the electrons in the system. Differ ent types of basis functions are described below, though in principle larger basis functions allow for more potential HF solutions to be explored in the attempt to increase the accuracy of th e method. HF is a variational method and will always produce a higher energy than the ground state energy. An example of a complete wave functi on method developed after HF is second order Mller-Plesset perturbation theory (MP2). Perturbation theory defines the total Hamiltonian as a zeroth-order term base d on a Hamiltonian with known eigenfunctions and eigenvalues plus a perturbation term. Th e perturbation term may contain many orders of increasingly complex contributions to the exact Hamiltonian. 0 (5-7) Equation 5-7 shows the general manner in which perturbation theory derives the exact Hamiltonian using an ordering parameter, with a small perturbation MP2 is a second-order technique which augments the zer oth-order Hamiltonian with elements of

PAGE 102

88 two higher order perturbation terms derived from the Taylor expansion of the exact Hamiltonian in terms of the zeroth-order Hamiltonian.2 The first order correction to the energy is the average of the perturbati on Hamiltonian over the unperturbed wave function.3 This is called the variational method (o r MP1), and the first-order correction to the energy is equal to the HF energy since a first-order correcti on is the variational integral for the HF wave function. The second-order expansion makes MP2 nonvariational. The second-order correction to energy accounts for double excitations in the unperturbed wave function.3 Since the energy correction te rms are derived from matrix elements of high order expansions, comput ational expense greatly increases with increasing perturbation order es pecially in terms of the amount of physical memory needed to complete the calculation. C onsequently, MP2 typi cally scales as O(n5) compared to HF which scales as O(n4) where n represents the number of orbitals in the system. Furthermore, MP2 is more sensitive to the quality of the initial guess than HF, which can result in energy convergence problems. Density functional theory differs from the classical wave function methods described above because it uses the electron density, (r) as the primary variable instead of using the complete wave function (r1, s1, r2, s2,, rn, sn). Application of DFT limits the system to its ground state. In practice, DFT provides a description of the ground state that is both mathematically less complicated and less computationally intensive than wave function methods.4 One benefit of DFT is that it is an exact method. In other words, when the exact wave function of a system is known, the DFT Hamiltonian can be applied to generate the exact ground state energy. With non-density functional methods, the external potential is used to determine the properties of a system. Hohenberg and Kohn

PAGE 103

89 were the first to show that the ground-state electron density determines the Hamiltonian along with the ground-state wave function by replacing the external potential with electron density in the Shrdinger equation.5,6 With the ground-state Hamiltonian and wave function known, all electr onic properties of the system can be calculated from the ground-state electron density. The density func tional approach simplif ied the problem of solving the many-body Schrdinger equation to th e minimization of a density functional.7 While the density functional can be minimi zed using several tec hniques, the Kohn-Sham approach8 is the most widely accepted. The densit y functionals evaluated in this work are all characterized as Kohn-Sham DFT methods. DFT is not one method. Rather it encomp asses a variety of methods that can be divided into several groups. One of the more difficult aspects of developing a density functional method is to devise a good approximation of the unknown energy density functional of the system.7 Different approaches to the in ception of such approximations have led to different DFT families. Table 51 lists the thirty-seven density functional methods (divided into families) and the two wa ve function methods tested in this survey. Table 5-2 lists the basis sets that were paired with each of the thirty -seven functionals and two wave function methods (when computa tionally feasible) for each of the nine properties calculated in this work. At this point, a brief overview of some of the DFT methods used in this study will be presented. More in-depth reviews exist such as Scuseria an d Staroverovs 2005 essay7 and are suggested material for any interest ed reader. The Levine text provides an introductory glance at DFT and has been used to prepare this section.3

PAGE 104

90 Table 5-1. The thirty-seven density func tional method and two wave function methods tested in this survey with appropriate references. Method Reference Method Reference HF 9 Hybrid-GGA MP2 10 B1LYP 11-13 B3LYP 12-15 LSDA PBE1PBE 16 SVWNV 17,18 B3P86 12,19 SPL 17,20 B3PW91 12,20-22 cSVWNV(0.3) 17,18,23 B98 24 GGA Meta-GGA BLYP 12,13 VSXC 25 BPW91 12,20,21 BB95 12,26 PBELYP 13,16 MPWB95 26,27 PBEP86 16,19 TPSS 28,29 PBEPW91 16,20,21 MPWKCIS 26,30-32 PBEPBE 16 PBEKCIS 16,30-32 PW91LYP 13,20,21 TPSSKCIS 28-32 PW91P86 19-21 PW91PW91 20,21 Hybrid-meta-GGA MPWLYP 13,19,27 BB1K 12,26,33 MPWP86 19,27 B1B95 12,26 MPWPW91 20,21,27 TPSS1KCIS 28-32,34 MPWPBE 16,27 PBE1KCIS 16,30-32,35 G96LYP 13,36 MPW1KCIS 11,26,30-32,37 G96P86 19,36 HCTH 38 Most DFT energy functionals include a co rrelation functional and an exchange functional. For some families of DFT methods an exact exchange te rm, similar in form to HF exchange (but employing K-S orbitals in DFT), is included in an effort to approximate the exact density functional. In this chapter the term exact-exchange refers to DFT exact exchange (which is not exact), not HF exact exchange (which is exact). Some of the more recently developed methods contain additional terms dependent on the kinetic energy density.

PAGE 105

91 DFT methods can be categorized into many families based on the terms included in their respective functional forms. Members of five of these families are evaluated in this work. The simplest form of the density functional method is local spin density approximation (LSDA). This class of functi onal depends solely on the electron density. Generalized gradient approxi mation (GGA) methods account fo r reduced gradient of the electron density along with the electron dens ity itself. Meta-GGA me thods add a kinetic energy term to the GGA. Hybrid functionals are so-called because they mix an exact exchange term into their functional forms. Hybrid-GGA includes an exact exchange term along with a GGA functional, and hybrid-meta-GGA does the same for metaGGAs. The hybrid-meta-GGA family is the mo st recent to be developed and is functionally the most complex, comp rising a GGA exchange functional, a GGA correlation functional, an exact excha nge term, and a kinetic energy term. The key approximations for density f unctional theory begin with the B-O simplified electronic Hamiltonian: H 1 2 i 2 i 1 nrii 1 n 1 rij i jj where ri Zri (5-8) ri is the potential energy of the interact ion between the elect rons and the fixed nuclei in the system, which depe nds solely on the coordinates xi, yi, and zi of electron i. From Hohenberg and Kohn, it is known th at the exact electronic energy can be determined from the ground state electron density probability, 0r Moreover, that the ground state energy is a functiona l of the ground state density:3 E0 E r (5-9)

PAGE 106

92 The total ground state electronic energy can also be written as a sum of its component energies: the kinetic energy, the nuc lei-electron interaction potential, and the electron-electron interaction potential. Since Hohenberg a nd Kohn proved that the ground state density can be used to determine a ll molecular properties, each of the energy components can also be expressed as functionals of the ground state density.3 E0 E 0 T 0 VNe 0 Vee 0 (5-10) E is dependent upon r So there is an established relationship between 0r and r such that: VNe 0 rii 1 n 0 0rrd r (5-11) Now, one of the three terms in equation 510 is known, while the other two are not and must be approximated. The remaining two terms, T 0 and Vee 0, can be combined into one term, F 0, which is independent of r and whose value is unknown. In order to approximate F 0 an initial guess densit y is required. In 1965, Kohn and Sham derived a method for finding 0 and using it to determine E0.8 Kohn and Sham proposed an imaginary reference system of noninteracting electrons experiencing an external potential field, sri The external potential is chosen in such a manner that the corresponding ground-state density sr of the reference state equals the ground-state dens ity of the real system 0r Now that the reference density is known, the reference potential is functionally determined although its actual value may not be quantified.3

PAGE 107

93 The Kohn-Sham one-electron Hamiltoni an for the reference system of noninteracting electrons can be expressed as: HKS 1 2 i 2sri i 1 n (5-12) And the reference system can be related to the real system by equation 5-13 in which a scaling variable is applied to the electron-elec tron interaction potential term and where is the external potential which will make sr = 0r .3 H T rii V ee (5-13) Kohn and Sham now define a kinetic energy term T : T T Ts where Ts 1 2 s i 2 i s (5-14) And an electron repulsion term Vee : Vee Vee 1 2 r1 r2 r12 d r1d r2 (5-15) Now r12 is the distance between two electrons and the second term in equation 5-15 is the classical electron repulsion te rm for a distribution of density of electrons. Having applied the Kohn-Sham method to our system, equation 5-10 becomes: Errd r Ts 1 2 r1 r2 r12 d r1d r2 T Vee (5-16) The last two terms of 5-16 are unknown, but they only constitute a small correction to the total energy. Together, the two term s comprise the exchange-correlation energy functionalEXC The total ground state energy can no w be expressed as the sum of three density-dependent terms whose values are known and a fourth correction term

PAGE 108

94 EXC that is unknown. In order for a DFT me thod to have the ability to accurately calculate molecular properties, it mu st contain a good approximation to EXC .3 Earlier, the concept of spin-orbitals was introduced as they relate to the wave function of a system, and a wave function can be defined as being the Slater determinant of the spin-orbitals. By that definition, the electron probability dens ity can be related to the sum of the squares of the Slater determinants of the spin-orbitals i KS of the imaginary system as shown in equation 5-17. 0si KS 2 i 1 n (5-17) The Hohenberg-Kohn variational method wa s previously used to minimize the energy functional E by varying After establishing the relationship between and i KS, it makes sense that the energy functi onal can also be minimized by varying i KS. Once the KS spin-orbitals have been defined, a new energy functional can be defined in a similar fashion as the Hartree-Fock e quation (see equation 55). The new equation contains a kinetic energy term, a nuclei-electr on interaction potential term, an electronic repulsion term, and a new function XC1 called the exchange-correlation potential.3 1 1 1 2 12 12 2 1 2 1 KS i KS i KS i XCd r r Z r r (5-18) r r r XC XCE (5-19) XCr is the functional derivative of the exchange-correlation energy EXC. Now, if EXC is known, then XCr is also known. Each of thes e terms ultimately depend only on r the positions of the electrons in space.3

PAGE 109

95 Unfortunately, the correct EXC is not known. It can only be approximated. The Kohn-Sham method is widely used to estimate X C and EXC. Finally, different approaches to finding EXC give rise to the different families of density functional methods. The first step in simplifying the pr oblem of generating a reliable form of EXC is to split it into an exchange -energy functional and a correla tion-energy functional in this manner: E XC E X E C (5-20) EX is formed based on the KS spin-orbitals, and EC is the difference between EXC and EX. The magnitude of EX is much larger than that of the correlation functional, but a good EC is essential for an accurate DFT method nonetheless.3 With a general foundation of the com ponents of density functionals now established, a brief synopsis of the five DFT families sampled in this study is presented. As mentioned, LSDA is the least complex dens ity functional method used in this work as the exchange correlation functional depends only on the electron de nsity. Spin-density functional theory, as proposed by Parr and Yang4 allows for electrons with different spins to be assigned separate spin-orbitals. In this manner, the electron dens ities of one type of spin-orbital and those of another type of spin-orbital are dealt with separately. The LSDA exchange-correlation functional takes the form: EXC EXC, (5-21) This method treats the atom ic environment as a uniform electron gas, and works best in systems where the density cha nges slowly. With that being said, LSDA functionals perform surprisingly well fo r equilibrium molecular geometries and vibrational frequencies, consid ering the methods similarity to unrestricted Hartree-Fock

PAGE 110

96 (which performs poorly at each) and the reality that does not change slowly in molecules. However, for most functionals of this kind, it has been seen that the accuracy of LSDA methods deteriorates when calculating thermodynamic properties.3 Accounting for the rapidly changing electron density is the first correction to the LSDA method. Generalized-gra dient approximation (GGA) me thods attempt to do this by including gradients of the densities and into their exchange-correlation functionals. Members of this f unctional class of may contai n empirical parameters, such as the B88 exchange, or they may be non-empirical like the PW86 exchange.3 Hybrid functionals incor porate a HF-inspired exact exchange term in their exchange-energy functionals. The KS spin-orb ital exact exchange is mixed with the gradient-corrected exchange and correlation functionals. Equation 5-22 gives the form of the exact exchange energy for KS spin-orbitals. EX 1 4 i KS1j KS2 1 r12 j KS1i KS2 j 1 ni 1 n (5-22) The amount of KS exact exchange vari es by method, and scaling constants placed with each term in a complete exchange -correlation energy functional can be varied to produce the highest accuracy attainable for a particular method. Such scaling constants are often tweaked to conform to some empirical data.3 The meta-GGA class includes a correc tion to the kinetic energy term T in the attempt to further address issues not fully characterized by other density functional methods. The hybrid-meta-GGA class of func tional is the most all-encompassing variation of the density func tional method. From the basic spin-density approximation of the exchange-correlation energy, the hybrid -meta-GGA functionals add corrections for

PAGE 111

97 rapidly changing electron density. They also mi x in of a fraction of KS spin-orbital-based exact exchange, and some degree of correct ion to the kinetic energy. Surprisingly, there are hybrid-meta-GGA methods that are not empirically fit (or where the empirical bias is kept to a minimum) such as TPSSh. Table 5-2. Basis sets employed in the survey of DFT methods. Pople-type basis sets Dunning-type basis sets 3-21G* cc-pVDZ 3-21+G* cc-pVTZ 6-31G* cc-pVQZa 6-31+G* aug-cc-pVDZ 6-31++G* aug-cc-PVTZ aug-cc-pVQZa a. cc-pVQZ and aug-cc-pVQZ used only for geometrical properties. The basis sets used in this study range in size from small (3-2 1G*) to very large (aug-cc-pVQZ), and have a ll been widely validated.39-43 The Pople-style split-valence bases are the most familiar in quantum chemistry. Five of these basis sets were evaluated in this study: 3-21G*, 3-21+G*, 6-31G*, 631+G*, and 6-31++G*. The numbers in each basis set denote the number of Gaussian func tions used to describe each orbital. For example, when the 3-21+G* basis function is a pplied, the inner shells are described in a single basis comprising a fixed linear combin ation of three Gaussian type orbitals (GTOs). The valence shells ar e described by a combination of inner valence and outer valence orbitals. The inner valence group is described by two fixed-combination GTOs, while one GTO is allowed to vary in the li near combination of atomic orbitals to represent the outer valence group. The + refe rs to diffuse functions which have been added to the heavy atoms. A ++ sign means th at diffuse functions have been added to both heavy atoms and hydrogens.44

PAGE 112

98 Polarized basis sets, identified by *, are larger than normal basis sets. A single star (*) represents the addition of six Cartesian Gaussians (dxx, dxy, dxz, dyy, dyz, and dzz) or five pure d-orbitals (dz 2, dxy, dxz, dyz, and dx 2 -y 2) to first row atoms in the molecule. A second star (**) adds a 2p function onto hydrogen atoms. The necessity for polarized functions derives from the distortion of th e electron density of atoms once they form bonds, participate in other local interactions within molecules, or are acted upon by an external field. Although the magnitudes of these distortions may be small, the effects they have on the electronic energy are not negligible. Polarized basi s sets attempt to replicate these distortions to more accurately predict molecular energy. Diffuse functions are a necessity for anions and atoms with lone pairs since regular ba sis functions do not adequately account for densely populated atomic orbitals far from the nucleus as exist in anions.44 In effect, diffuse functions allow or bitals to occupy more space on the outer edges of the atoms.45 The Dunning-style functions, known as correl ation-consistent basis sets, follow a different manner of formalism and nomencl ature. These functions are inherently polarized and account for effects related to molecular properti es such as near-degeneracy of orbitals, and space and spin polarizati on. Dynamic correlations, such as orbital changes due to the movement of electrons, ar e also taken into cons ideration. Table 5-3 lists the polarization functions within each of the Dunning-ty pe functions used in this survey as reproduced from the Gaussian 03 Users Reference.46 Table 5-3. Valence shell pol arization functions incorpor ated into the correlationconsistent basis sets of Dunning. Atoms cc-pVDZ cc-pVTZ cc-pVQZ H 2s, 1p 3s, 2p, 1s 4s, 3p, 2d, 1f B-Ne 3s, 2p, 1d 4s, 3p, 2d, 1f 5s, 4p, 3d, 2f, 1g Al-Ar 4s, 3p, 1d 5s, 4p, 2d, 1f 6s, 5p, 3d, 2f, 1g

PAGE 113

99 By augmenting a typical sp basis with higher order correlation effects, high accuracy basis sets are obtained. The inclus ion of correlation effects, dynamic ones in particular, significantly adds to the computational expense of these methods as compared to Pople-type functions. Addi ng to the expense are the high angular momentum orbitals included in the valence shell polarization list ed in Table 5-3. The Dunning-type functions employed in this study include: cc-pVDZ, ccpVTZ, cc-pVQZ, and the diffuse versions of each of these: aug-cc-pVDZ, aug-cc-pV TZ, and aug-cc-pVQZ (the aug qualifier replaces the + and ++ notation used in Pople-style basis sets). Double zeta (DZ) and triple zeta (TZ) se ts were used throughout the survey, but quadruple zeta (QZ) functions we re only used in conjunction with the geometry test set due to the high computational expense of thos e basis sets. DZ, TZ, and QZ refer to the number of sizes of contracted Gaussian functi ons used in linear combinations of atomic orbitals to describe all of the molecular orbi tals. For instance, a triple zeta basis employs three different sizes of contracted functions for each atomic orbital. Computational Methods The ability to accurately pred ict the ground state geometry of a molecule is one of the most important (and easily observed) as pects of a good computational method. Bond angles and bond lengths best de scribe molecular geometry. Th e test set for the evaluation of geometry performance comprises forty-four molecules containing the atoms C, H, O, N, S, and P. Within the set of molecule s are seventy-one bond le ngths and thirty-four bond angles. The test set is listed in Table 5-4. Ground state vibrat ional frequencies are closely related to molecular structure and are included in this portion of the study. Thirtyfive molecules supply the 145 vibrational freque ncies measured in our test set, which is tabulated in Table 5-5.

PAGE 114

100 While this is the first study to encompass so many different DFT functionals and basis sets with a large test set, there ha ve been many previous evaluations of DFT performance on molecular geometry and vibrational frequencies.23,47-55 Work done by Johnson used a group of small molecules from th e G2 set to compare the performances of Slater and Becke type functionals al ong with HF and higher order methods.49 Raymond and Wheeler investigated the accuracy of diffe rent Dunning-style basis sets at predicting the geometries of a set of small inorganic molecules.52 Wang and Wilson compiled a group of seventeen small molecules and co mpared the accuracy of several hybrid-GGA methods at several Dunning-type bases54 while Riley et al assessed LSDA and hybridGGA methods paired with Pople-type basis sets.23 The methods for gathering geometrical parame ters are fairly straightforward. Each molecule was optimized at each basis set/functional combination from the same initial structures. The initial structures were generated using WebLab Viewer Pro56 in Cartesian coordinates or with the Molden57 z-matrix editor. In Gaussian 03, the optimization and frequency calculations were performed with the default numerical grid and default energy and geometry convergence criteria. Previous studies have used geometries obtained at high levels of theory for use in frequency calculations with lowe r level methods. This work differs from those studies in that frequency calculations were carried out on optimized structures obtained by the same methods. This approach is appropriate for this study since high-level computa tional values are not available for large biomolecular systems. The first ionization potential is the amount of energy required to remove one electron from a bound state to infinite separa tion, or the energy required to generate a

PAGE 115

101 cation from an uncharged system. The IP test set is shown in Table 5-6 and contains thirty-seven small molecules, radicals, and ions. Electron affinity is quantified as the energy gained by a neutral system when an unbound electron is captured, creatin g an anion. Table 5-7 lists the twenty-five molecules and radicals comprising the EA test set. Finally, heat of formation is the differen ce between the enthalpy of a molecule and the sum of the energies of its individual atom ic pieces. This quantity is a general measure of molecular stability and is used in predic ting the energy released by reactions and in the calculation of other thermodyna mic properties. A negative h eat of formation typically indicates a stable molecule whose formati on is spontaneous. A positive value indicates that an energy penalty has been paid in orde r to form the molecule from its elemental pieces. The heat of formation test set is the largest set in this survey and contains 127 singlet species (Table 5-8) and tw enty-nine radicals (Table 5-9). The molecules in the three test sets above were mainly gathered from the Gaussian G2/97 test set58,59, though a few non-G2 molecules ha ve been added to increase the number of phosphorus-containing compounds. The IP and EA test sets are completely derived from the G2/97 set with the exception of PO2. Of the 156 members of the HOF test set, three molecules: PO2, PH, and CH3PH2 are not from the G2/97 set. Each of these molecular properties has been previously investigated using DFT in similar surveys.58-68 Curtiss et al. computed EA values for a set of fifty-eight molecules and IP for eighty-eight molecules using one LSDA, two GGA, and three hybrid-GGA DFT methods and the Gaussian-2 me thod with the 6-311+G(3df,2p) basis.59 Ernzerhof and Scuseria completed a larger survey of the performance of SVWN, SVWNV, BLYP,

PAGE 116

102 B3LYP, VSXC, PBEPBE, and PBE1PBE functi onals with the 6-311+G(3df,2p) basis for atomization energies, IP, EA, and bond length.64 Table 5-4. The bond lengths and bond angles test set. # Formula Name Bond Angle 1 H2 hydrogen dimer r(H-H) 2r CH Methyne r(C-H) 3 CH2 (1A1) methylene singlet r(C-H) a(HCH) 4r CH2 (3B1) methylene triplet r(C-H) a(HCH) 5r CH3 Methyl r(C-H) a(HCH) 6 CH4 methane r(C-H) a(HCH) 7r NH imidogen r(N-H) 8r NH2 amino r(N-H) a(HNH) 9 NH3 ammonia r(N-H) a(HNH) 10r OH hydroxyl r(O-H) 11 OH2 water r(O-H) a(HOH) 12 HCCH acetylene r(C=C) r(C-H) 13 H2CCH2 ethene r(C=C) a(HCC) r(C-H) a(HCH) 14 H3CCH3 ethane r(C-C) a(HCC) r(C-H) a(HCH) 15r CN cyano r(C=N) 16 HCN hydrogen cyanide r(C=N) r(C-H) 17 CO carbon monoxide r(C=O) 18r HCO formyl r(C=O) a(HCO) r(C-H) 19 H2CO formaldehyde r(C=O) a(HCH) r(C-H) a(HCO) 20 H3COH methanol r(C-O) a(OCHa) r(C-Hb) a(COH) r(C-Ha) a(HbCHb) r(O-H) a(HbCO) 21 N2 nitrogen dimer r(N=N) 22 H2NNH2 hydrazine r(N-N) a(NNHb) r(N-Ha) a(NNHa) r(N-Hb) a(HaNHb) 23r NO nitric oxide r(N=O) 24r O2 oxygen dimer r(O=O) 25 HOOH hydrogen peroxide r(O=O) a(OOH) r(O-H) r denotes non-singlet species

PAGE 117

103 Table 5-4 (Continued): The bond length s and bond angles test set. # Formula Name Bond Angle 26 CO2 carbon dioxide r(C=O) 27 C3H6 propene r(C=C) a(CCC) r(C-C) a(HC3C2) r(C3-H) a(HC2C3) r(C2-H) a(HC2C1) r(C1-H) a(HC1C2) 28 C3H8 propane r(C-C) a(CCC) r(C2-H) a(HC2C1) r(C1-H) a(HC1C2) 29 C6H6 benzene r(C-C) a(CCC) r(C-H) a(HCC) 30 N(CH3)3 trimethyl amine r(C-N) a(CNC) a(HCN) 31 NH2CH3 methyl amine r(C-N) a(HNC) r(N-H) a(HNH) 32 CH3OCH3 dimethyl ether r(C-O) a(COC) a(HCO) 33 NH2CHO formamide r(C-N) a(OCN) r(C-H) a(OCH) r(C=O) a(HNC) r(N-H) 34 N2O nitrous oxide r(N=N) r(N=O) 35r NO2 nitrogen dioxide r(N=O) a(ONO) 36 CH3SH thiomethanol r(C-S) a(HSC) r(S-H) 37 CS2 carbon disulfide r(C=S) 38 SO2 sulfur dioxide r(S=O) a(OSO) 39r SN sulfur nitride r(S=N) 40r S2 sulfur dimer r(S=S) 41r PH phosphinidene r(P-H 42r PH2 phosphino r(P-H) a(HPH) 43 PH3 phosphine r(P-H) a(HPH) 44 CH3PH2 methyl phosphine r(C-P) a(HCP) r(C-H) a(HPC) r(P-H) r denotes non-singlet species

PAGE 118

104 Table 5-5. The test set for ground state vibrational frequency. # Formula Name Mode Occurrence 1 H2 hydrogen molecule g 1 2r O2 oxygen molecule g 1 3 N2 nitrogen molecule g 1 4r S2 sulfur molecule g 1 5r CH methylidyne 1 6 CH2(1A1) methylene singlet A1 2 B2 1 7r CH2(3B1) methylene triplet A1 1 B2 1 8r CH3 methyl radical A' 1 A" 1 E' 2 9 CH4 methane A1 1 E 1 T2 2 10 C2H2 ethyne g 2 u 1 g 1 u 1 11 C2H4 ethene Ag 3 Au 1 B3g 2 B3u 1 B2g 1 B2u 2 B1u 2 12 C2H6 ethane A1g 3 A1u 1 A2u 2 Eg 3 Eu 3 13r NH imidogen 1 14r NH2 amino radical A1 2 B2 1 15 NH3 ammonia A1 2 E 2 r denotes non-singlet species

PAGE 119

105 Table 5-5 (Continued). The test set fo r ground state vibrational frequency. # Formula Name Mode Occurrence 16 H2N-NH2 hydrazine A 7 B 5 17r OH hydroxyl radical 1 18 H2O water A1 2 B2 1 19 HO-OH hydrogen peroxide A 4 B 2 20r PH phosphinidene g 1 21r PH2 phosphino radical A1 2 A1 2 22 PH3 phosphine E 2 23r CN cyano radical 1 24 HCN hydrogen cyanide 2 1 25 CO carbon monoxide 1 26 CO2 carbon dioxide g 1 u 1 u 1 27r HCO formyl radical A' 3 28 H2C=O formaldehyde A1 3 B2 2 B1 1 29 CH3OH methanol A' 8 A" 4 30r NO nitric oxide 1 31 CH3SH methanethiol A' 8 A" 3 32 CS2 carbon disulfide g 1 u 1 u 1 33 SO2 sulfur dioxide A1 2 B2 1 34r SN nitrogen sulfide 1 35 CH3PH2 methyl phosphine A 13 r denotes non-singlet species

PAGE 120

106 Table 5-6. The ionizatio n potential test set. # Formula Name 1 CH4 methane 2 NH3 ammonia 3 OH hydroxy radical 4 H2O water 5 SH mercapto radical 6 SH2(2B1) hydrogen sulfide (2B1) 7 SH2(2A1) hydrogen sulfide (2A1) 8 C2H2 ethyne 9 C2H4 ethene 10 CO carbon monoxide 11 N2(g) nitrogen dimer g 12 N2(u) nitrogen dimer u 13 O2 oxygen dimer 14 S2 sulfur dimer 15 CO2 carbon dioxide 16 COS carbonyl sulfide 17 CS2 carbon disulfide 18 CH2 methylene 19 CH3 methyl radical 20 CN cyano radical 21 CH2OH hydroxymethyl radical 22 CH3OH methanol 23 CH3SH methanthiol 24 NH imidogen 25 NH2 amino radical 26 C carbon atom 27 N nitrogen atom 28 O oxygen atom 29 S sulfur atom 30 H hydrogen atom 31 CH3PH2 methyl phosphine 32 P phosphorous atom 33 PH phosphinidene 34 PH2 phosphino radical 35 PH3 phosphine 36 P2 phosphorous dimer 37 PO2 phosphorous dioxide

PAGE 121

107 Table 5-7. The electron affinity test set. # Formula Name 1 C carbon atom 2 O oxygen atom 3 S sulfur atom 4 CH methyne 5 CH2 (3B1) methylene triplet 6 CH3 methyl radical 7 NH imidogen 8 NH2 amino radical 9 OH hydroxyl radical 10 SH mercapto radical 11 O2 oxygen dimer 12 NO nitric oxide 13 CN cyano radical 14 S2 sulfur dimer 15 NCO isocyanato radical 16 NO2 nitrogen dioxide 17 SO2 sulfur dioxide 18 CH3O methoxy radical 19 CH3S methylthio radical 20 CH2CN cyanomethyl radical 21 P phosphorous atom 22 PO phosphorous monoxide 23 PO2 phosphorous dioxide 24 PH phosphinidene 25 PH2 phosphino radical HOF, IP, EA and proton affinity were ag ain studied by Curtiss et al. using the SVWN, BLYP, and B3LYP functionals a nd the Gaussian-3 method with the 6311+G(3df,2p) basis function.61 The research performed by Brothers and Merz focused on the use of small basis sets (3-21G*, 321+G*, and MIDI!) with a variety of LSDA, GGA, and hybrid-GGA DFT methods to calcul ate heats of formation.60 Brothers and Scuseria have recently presented a small basis set survey of heat of formation using the SVWN, cSVWNV, PBE, TPSS, and TPSSh functionals.68

PAGE 122

108 Table 5-8. The heat of forma tion test set, singlets only. # Formula Name 1 CH2(1A1) methylene 2 CH4 methane 3 NH3 ammonia 4 OH2 water 5 SH2 dihydrogen sulfide 6 C2H2 acetylene 7 C2H4 ethene 8 C2H6 ethane 9 HCN hydrogen cyanide 10 CO carbon monoxide 11 H2CO formaldehyde 12 H3COH methanol 13 N2 nitrogen dimer 14 H2NNH2 hydrazine 15 HOOH hydrogen peroxide 16 CO2 carbon dioxide 17 SC carbon monosulfide 18 H3CSH methyl thiol 19 SO2 sulfur dioxide 20 COS carbonyl sulfide 21 CS2 carbon disulfide 22 N2O nitrous oxide 23 O3 ozone 24 CH3CCH propyne 25 CH2=C=CH2 allene 26 C3H4 cyclopropene 27 CH3CH=CH2 propylene 28 C3H6 cyclopropane 29 C3H8 propane 30 CH2CHCHCH2 butadiene 31 C4H6 2-butyne 32 C4H6 methylene cyclopropane 33 C4H6 bicyclobutane 34 C4H6 cyclobutene 35 C4H8 cyclobutane 36 C4H8 isobutene 37 C4H10 trans-butane 38 C4H10 isobutane 39 C5H8 spiropentane 40 C6H6 benzene 41 CH3NH2 methylamine 42 CH3CN methyl cyanide 43 CH3NO2 nitromethane

PAGE 123

109 Table 5-8 (Continued). HOF test set, singlets only # Formula Name 44 CH3ONO methyl nitrate 45 HCOOH formic acid 46 HCOOCH3 methyl formate 47 CH3CONH2 acetamide 48 C2H4NH aziridine 49 NCCN cyanogen 50 (CH3)2-NH dimethyl amine 51 CH3CH2NH2 trans-ethylamine 52 CH2CO ketene 53 C2H4O oxirane 54 CH3CHO acetaldehyde 55 HCOCOH glyoxal 56 CH3CH2OH ethanol 57 CH3OCH3 dimethyl ether 58 C2H4S thiirane 59 (CH3)2-SO dimethyl sulfoxide 60 C2H5SH ethane thiol 61 CH3SCH3 dimethyl sulfide 62 CH2=CHCN acrylonitrile 63 CH3COCH3 acetone 64 CH3COOH acetic acid 65 (CH3)2-CHOH isopropanol 66 C2H5OCH3 methylethyl ether 67 N(CH3)3 trimethyl amine 68 C4H4O furan 69 C4H4S thiophene 70 C4H5N pyrrole 71 C5H5N pyridine 72 H2 hydrogen dimer 73 CH3-CH=C=CH2 methyl allene 74 C5H8 isoprene 75 C5H10 cyclopentane 76 C5H12 n-pentane 77 C5H12 neopentane 78 C6H8 1,3 cyclohexadiene 79 C6H8 1,4 cyclohexadiene 80 C6H12 cyclohexane 81 C6H14 n-hexane 82 C6H14 3-methyl pentane 83 C6H5CH3 toluene 84 C7H16 n-heptane 85 C8H8 1,3,5,7-cyclooctatetraene

PAGE 124

110 Table 5-8 (Continued). HOF test set, singlets only # Formula Name 86 C8H18 n-octane 87 C10H8 naphthalene 88 C10H8 azulene 89 CH3COOCH3 methyl acetate 90 (CH3)3-COH t-butanol 91 C6H5NH2 aniline 92 C6H5OH phenol 93 C4H6O divinyl ether 94 C4H8O tetrahydrofuran 95 C5H8O cyclopentanone 96 C6H4O2 benzoquinone 97 C4H4N2 pyrimidine 98 C2H6O2S dimethyl sulfone 99 N=-C-CH2-CH2-C=-N 1,2-dicyano ethane 100 C4H4N2 pyrazine 101 CH3-C(=O)-C=-CH acetyle acetylene 102 CH3-CH=CH-CHO crotonaldehyde 103 CH3-C(=O)-O-C(=O)-CH3 acetic anhydride 104 C4H6S 2,5-dihydrothiophene 105 (CH3)2-CH-CN isobutale nitrile 106 CH3-CO-CH2-CH3 methylethyl ketone 107 (CH3)2-CH-CHO isobutanal 108 C4H8O2 1,4-dioxane 109 C4H8S tetrahydrothiophene 110 C4H8NH tetrahydropyrrole 111 CH3-CH2-CH(CH3)-NO2 nitro-s-butane 112 CH3-CH2-O-CH2-CH3 diethyl ether 113 CH3-CH(OCH3)2 acetaldehyde dimethyl acetal 114 (CH3)3C-SH t-butanethiol 115 CH3-CH2-S-S-CH2-CH3 diethyl disulfide 116 (CH3)3C-NH2 t-butyl amine 117 C5H6S 2-methyl thiophene 118 C5H7N n-methyl pyrrole 119 C5H10O tetrahydropyran 120 CH3-CH2-CO-CH2-CH3 diethyl ketone 121 C5H10O2 isopropyl acetate 122 C5H10S tetrahydrothiopyran 123 cyc-C5H10NH piperidine 124 (CH3)3C-O-CH3 t-butyl methyl ether 125 (CH3)2CH-O-CH(CH3)2 di-isopropyl ether 126 SO3 sulfur trioxide 127 CH3PH2 methyl phosphine

PAGE 125

111 Table 5-9. The heat of forma tion test set, radicals only. # Formula Name 1 CH methyne 2 CH2(3B1) methylene 3 CH3 methyl 4 NH imidogen 5 NH2 amino 6 OH hydroxyl 7 CN cyano 8 HCO formyl 9 O2 oxygen dimer 10 S2 sulfur dimer 11 SO sulfur monoxide 12 CCH ethynyl 13 C2H3 (2A') vinyl 14 CH3CO (2A') acetyl 15 H2COH (2A) methoxy 16 CH3CH2O (2A") ethoxy 17 CH3S (2A') methylthio 18 C2H5 (2A') ethyl 19 (CH3)2-CH (2A') isopropyl 20 C(CH3)3 tbutyl 21 NO2 nitrogen dioxide 22 NO nitiric oxide 23 SH mercapto 24 CH3O methoxy 25 C6H5 phenyl 26 PH phosphinidene 27 PH2 phosphino 28 PH3 phosphine 29 PO2 phosphorus dioxide Literature values for IP, EA, and HOF we re obtained from the G2/97 and G3 test set references for most molecules an d from the NIST Chemistry WebBook for phosphorus-containing compounds. IP and EA values were calculated adiabatically using the methods described here. Ionization potenti als were calculated by subtracting the total electronic energy of the initial system from that of the more positively charged final system. EA is derived in a similar fashi on by subtracting the el ectronic energy of the initial molecule from that of the more negatively charged product. Heat of formation is

PAGE 126

112 not explicitly listed in the Gaussian output, so it must be calcula ted using a combination of molecular and atomic energies. In this study, HOF was calculated using the method described in the Gaussian white paper, Thermochemistry in Gaussian .69 Heats of formation were calculated using the same basis sets that were used to optim ize the structures. Due to high computational cost, optimization at the augcc-pVTZ level could not be co mpleted. Instead, single point frequency calculations were performed usi ng the aug-cc-pVTZ basis set on geometries obtained by the TPSS1KCIS/aug-cc-pVDZ me thod, which was found by a corresponding study to yield the most accurate geometri es for this set of small molecules.70 In order to calculate the heat of form ation for a compound, the energies of the individual constituent elements must be calculated along with the thermochemical properties of the compound itself. The energies of the atoms are summed and subtracted from the thermally corrected enthalpy of th e optimized molecule. For example, the heat of formation of methane would be: Hf H298CH4 EatomC 4 EatomH (5.23) IP and EA calculations were performed ge ometry optimizations in Gaussian 03. For HOF calculations, the frequency calculati ons were also performed to generate enthalpy data. For most calculations, the defau lt grid and convergence criteria were used. In some cases, a fine grid was specified. To resolve convergence problems, the quadratically convergent SCF method was impl emented or the convergence criteria were relaxed slightly. The effects of implemen ting these convergence-aiding options were investigated, and the use of these non-defa ult methods was found to have negligible effects on the proper ties being surveyed.

PAGE 127

113 Hydrogen bonding is an important featur e in many physical phenomena including the formation of clusters, protein folding a nd stability, and the formation of complexes between proteins and small molecules. The hydrogen bonding test set shown in Table 510 is made up of ten dimers involving H2O, CH3OH, H2CO, NH3, CH3OCH3, HCOOH, and HCONH2. Experimental values for hydrogen bonding interaction en ergies are not readily available, so the reference values used for this test set were derived from CCSD(T) calculations perfor med by Tsuzuki and Lthi.71 Table 5-10. The hydrogen bonding test set. # H-bonding system Eint 71 1 H2O H2O -4.80 2 H2O MeOH -4.90 3 H2O Me2O -5.51 4 MeOH MeOH -5.45 5 H2O H2CO -5.17 6 HCOOH HCOOH -13.93 7 NH3 NH3 -2.94 8 NH3 H2O -6.36 9 HCONH2 H2O -8.88 10 HCONH2 HCONH2 -13.55 Eint are given in kcal/mol. A methods ability to accurately predict c onformational energies is a measure of its worth as a technique for calculating geometri c and electronic proper ties of molecules as well. The conformational energy test set is co mposed of ten small organic molecules each in two different conformations, and is liste d in Table 5-11. Experimental values for conformational energies were obtained in studies performed by Zhao, Truhlar, Lynch, and Gonzles-Garcia.34,65 The study of reaction barrier heights is spli t into two segments. One part deals with small molecule reactions with radical (non-singlet) transiti on state species. The test set for small reaction barrier heights is listed in Table 5-12. The set cons ists of twenty-three

PAGE 128

114 Table 5-11. The conformational energy test set. # Molecule Conformations E34,65 1 Ammonia planar vs. pyramidal 6. 2 ethylene orthogonal vs. planar 65. 3 diethyl ether tt vs tg 1.14 4 formamide anti vs. eclipsed 20. 5 formic acid Z vs. E 3.9 6 ethane anti vs. eclipsed 2.9 7 methanol anti vs. eclipsed 1.1 8 methylamine anti vs. eclipsed 2. 9 methylcyclohexane axial vs. equatorial 1.75 10 propene anti vs. eclipsed 2. E are given in kcal/mol. hydrogen-transfer reactions, whose experimental barrier heights are also listed in the table. The second group of barrier heights, listed in Table 513, contains larger reactions with singlet transition-state species. Most of these reactions were drawn from Jorgensons PDDG paper.72 Density functional methods have been used to study hydrogen-bonding interactions by several groups.35,71,73-80 Tsuzuki and Lthi evaluated the BLYP, B3LYP, and PW91PW91 functionals, as well as the MP2 and HF methods, for the prediction of hydrogen bond interaction energi es. Those studies were car ried out using the Dunning type basis sets, cc-pVxZ (x=D, T, Q, 5). Zhao and Truhlar carried out studies to determine the accuracy of DFT methods for several types of nonbonding interactions: hydrogen bonding, charge transfer, dipole in teraction, and the weak (dispersion) interaction. These studies we re done using a very large num ber of functionals along with the 6-31+G( d,p ), MG3S81 (a modified version of 6-31+G( 3d2f, 2df,2p ), and aug-cc-pVTZ basis sets. To date, there have been only a limited number of studi es concerned with the accuracy with which DFT methods predict conformational energies.81,82 Lynch, Zhao, and Truhlar evaluated the conformational ener gies of several conformer pairs of 1,2-

PAGE 129

115 ethandiol and butadiene. These studies were done using a number of functionals based on the MPW correlation functional along with seve ral basis sets. There have been a number of studies carried out to eval uate the accuracy with which DFT methods describe reaction barrier heights.34,83-86 In a recent study, Zhao, Gonzles -Garca, and Truhlar tested the accuracy of a large number of functionals al ong with several different basis sets for calculating the barrier heights of thirty-eig ht hydrogen transfer a nd thirty-eight nonhydrogen transfer reactions.37 Table 5-12. The small system radical tr ansition state reaction barrier test set.35,83-86 # Reaction Vforward Vreverse 1 OH + H2 H + H2O 5.7 21.8 2 CH3 + H2 H + CH4 12.1 15.3 3 OH + CH4 CH3 + H2O 6.7 19.6 4 H + CH3OH CH2OH + H2 7.3 13.3 5 H + H2 H2 + H 9.6 n/a 6 OH + NH3 H2O + NH2 3.2 12.7 7 O + CH4 OH + CH3 13.7 8.1 8 H + PH3 PH2 + H2 3.1 23.2 9 H + OH O + H2 10.7 13.1 10 H + H2S H2 + H2 3.5 17.3 11 NH2 + CH3 CH4 + NH 8.0 22.4 12 NH2 + CH4 CH3 + NH3 14.5 17.8 Barrier heights are listed in kcal/mol. For the hydrogen bonding interaction energy calculations, the counterpoise method of Boys and Bernardi is em ployed in order to account for the basis set superposition error.87 For each level of theory c onsidered in this work th e geometries of the hydrogen bond dimers are fully optimized on the counterpoise hypersurface. Then, the constituent monomers are also fully optimized using th e same level of theory. Because of the difficulties associated with the extraction of the zero-point-exclusive binding energies from experimental data, binding energies obta ined at a very high le vel of theory were

PAGE 130

116 used as reference values. These reference va lues were determined at the CCSD(T) basis set limit by Tsuzuki and Lthi.71 Experimental conformational energy refe rence data were all obtained from reference 72. These quantities are calculated as poten tial energy differences, that is, the difference in electronic energy between the most stable conformer and the least stable one. This is the method employed in several other studies.88-90 Table 5-13. The organic molecu le singlet transition state reaction barrier test set. Reaction Exp. + 25.172 33.372 O O 34.172 32.972 OO OO H H 4-572 H H 35.423 Barrier heights are given in kcal/mol. The barrier height reference da ta for the test set of small radical reactions are in the form of zero-point-exclusive, Born-Oppenheimer barrier heights. Th ese barrier heights are simply calculated as the difference in electronic energy between the transition state

PAGE 131

117 and the reactants. For the large singlet test set, the data are all directly from experiment so it is necessary to include vibr ational effects into the calcula tion of the barrier heights. These barrier heights are calculated as the difference in the thermally corrected total enthalpy between the transition state and the gr ound state of the reactant(s). For all six reactions in the singlet set, initial coordinate s of the transition states were constructed to have sensible transition statelike geometries. The transiti on state optimization method in Gaussian 03 was implemented. Reaction barr ier heights were obtai ned from optimized reactant and transition state structures for a ll functional and basis set combinations except MP2/aug-cc-pVTZ. Ideally, the theoretical determination of a particular property should be calculated at the level of theory at which the geometry of the system is obtained. Most of the calculations in this work have been complete d in just such a fashion. However, because of substantial problems associated with tran sition state convergence in the investigations involving the small radical barrier height test set, the geometries determined by Lynch and Truhlar at the QCISD/MG3 level were used for all calculations of this property (see http://comp.chem.umn.edu/truhlar/).86 Results by Property Bond Lengths The average unsigned bond length errors fo r the gradient corrected, GGA, hybridGGA, meta-GGA, and hybrid-m eta-GGA functionals, with P ople type basis sets are given in Figure 5-1. HF, MP2, and LSDA resu lts are given in Tabl e 5-14. Inspection of these figures reveals that the hybrid-GGA a nd hybrid-meta-GGA methods generally yield the lowest bond length errors when paired with these basis sets. The GGA and metaGGA functionals both yield similar results with errors that are slightly greater than those

PAGE 132

118 of the hybrid and hybrid-meta functionals Among the LSDA func tionals, SVWN5 and SPL obtain errors that are comparable to those of GGA and meta -GGA while c-SVWN5 yields errors that are high er than the other LSDA methods Of the GGA functionals, from the meta-GGA class, the VSXC functional yiel ds the lowest unsigned bond length errors. Figure 2 gives average unsigned bond lengt h errors for the gradient corrected methods along with the Dunning type basis sets It can be seen that, overall, the hybridGGA functionals yield the lowest errors. Meta-GGA functionals and hybrid-meta-GGA functionals perform nearly as well, fo llowed by the GGA functionals. LSDA methods yield errors that are generall y higher than those of GGA f unctionals. It should be noted that, for the smallest of the Dunning type basis sets, cc-pVDZ and aug-cc-pVDZ, the hybrid-GGA and hybrid-meta-GGA methods generally outperform all other methods by a significant margin. For the larger, cc-pV TZ, cc-pVQZ, aug-cc-pVTZ, and aug-cc-pVQZ, basis sets the hybrid-GGA, hybrid-meta-GGA, and meta-GGA functionals all yield, on average, comparable results. As in the case of the Pople type basis sets, VSXC yields the lowest bond length errors. Generally, increasing the basis set size results in smaller bond length errors, the most dramatic improvements can be seen wh en the small 3-21G* and 3-21+G* basis sets are compared to the 6-31G* and 6-31+G* basi s sets. The addition of diffuse functions to hydrogen atoms in the 6-31++G* makes little difference in the magnitudes of bond length errors. For the Dunning type bases, the addi tion of diffuse functions improves accuracy only slightly. It is interesting to note that for all of the functionals considered here, the bond length errors obtained usi ng the 6-31G* and 6-31+G* ba sis sets are lower than

PAGE 133

119 those obtained using the much more comput ationally expensive cc-pVDZ and aug-ccpVDZ basis sets. One of key feature of the bond length data is that, there is little variation within a particular functional class for a given basis set. For example, the average unsigned bond length errors calculate d with the hybrid-GGA functionals along with the 6-31G* basis set are all within 0.002 of one another. This trend is especially evident for the GGA, hybrid-GGA, and meta-GGA functi onals whereas there is quite a bit more variation within the LSDA and hybrid-meta-GGA class of functionals. Another as pect of this trend is that, typically, the amount of variation with in a particular functional class decreases as the basis set size increases. I ndividual functionals that devi ate significantly from the other functionals within their class are c-SVWN, HCTH, B98, VSXC, and BB1K. Table 5-14. Bond lengths, bond angles, and vi brational frequencies for HF, MP2, and LSDA methods. Method 3-21G* 3-21 +G* 631G* 6-31 +G* 6-31 ++G* ccpVDZ ccpVTZ ccpVQZ aug-ccpVDZ aug-ccpVTZ aug-ccpVQZ Bond lengths () HF 0.015 0.016 0.015 0.014 0.014 0.013 0.018 0.019 0.012 0.017 0.019 MP2 0.024 0.027 0.011 0.012 0.012 0.014 0.009 0.015 0.009 SVWNV 0.024 0.024 0.018 0.017 0.017 0.021 0.014 0.014 0.019 0.014 0.014 SPL 0.025 0.025 0.018 0.017 0.017 0.022 0.014 0.014 0.019 0.014 0.014 cSVWNV 0.034 0.035 0.025 0.025 0.025 0.029 0.019 0.018 0.027 0.019 0.018 Bond Angles (degrees) HF 1.85 2.45 1.45 1.55 1.54 1.40 1.50 1.51 1.48 1.53 1.53 MP2 1.55 2.02 1.31 1.39 1.37 1.65 1.22 1.26 1.17 SVWNV 1.80 2.58 1.36 1.27 1.27 1.60 1.25 1.22 1.28 1.24 1.23 SPL 1.80 2.55 1.37 1.27 1.27 1.60 1.25 1.22 1.27 1.23 1.22 cSVWNV 1.69 2.31 1.52 1.28 1.28 1.75 1.24 1.20 1.25 1.19 1.17 Vibrational frequencies (cm-1) HF 210 203 236 235 234 211 209 203 207 MP2 151 147 149 141 141 126 122 113 117 SVWNV 83 87 51 50 50 58 51 59 52 SPL 82 87 52 50 50 57 52 58 51 cSVWNV 93 100 58 57 58 75 63 71 65

PAGE 134

120 For singlets, the hybrid-GGA and hybrid -meta-GGA methods yield the lowest errors when paired with most basis sets. However, for the TZ bases hybrid-GGA, metaGGA, and hybrid-meta-GGA methods yield the best singlet re sults. For the small Pople type basis sets, 3-21G* and 3-21+G* the LSDA methods yield results that are slightly better than those of the GGA functionals. At higher basis sets, the all other density functional classes outperform the LSDA methods. In terms of radical systems, the hybridGGA and hybrid-meta-GGA are the density f unctional methods that yield the lowest bond length errors for all basis sets, since th e best results overall are obtained by MP2. For all basis sets, the meta-GGA methods obtain errors that are higher than those of the hybrid-GGA and hybrid-m eta-GGA methods. For all methods except MP2, single and tr iple bonds are calculated much more accurately than double bonds. For all basi s sets the LSDA, GGA, and meta-GGA methods tend to yield lower unsigned errors for triple bonds than for double bonds. Hybrid-GGA and hybrid-meta-GGA also obtain lower errors for triple bonds (compared to single bonds) when paired with Pople t ype and cc-pVDZ/aug-cc-p VDZ basis sets, but give higher errors for triple bonds with the larger Dunning type basi s sets. Hartree-Fock and MP2 generally obtain lower bond length errors for single bonds than triple bonds. Bond Angles The average unsigned bond angle errors ar e shown in Figures 5-3 and 5-4 and in Table 5-14. Generally, the hybrid-GGA and hybrid-meta-GGA methods produce the lowest errors. The best results among the sma ll basis sets are generally obtained with the hybrid-GGA functionals along with the 3-21G* basis set. The hybrid-meta-GGA functionals also yield very good results although there is some of variation within this class. The LSDA methods produce bond angle er rors that are higher than most of the

PAGE 135

121 Figure 5-1. Average unsigned bond length er rors (in ) for GGA, hybrid-GGA, metaGGA, and hybrid-meta-GGA functionals al ong with the Pople type basis sets. 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalunsigned error () 3-21G* 3-21+G* 6-31G* 6-31+G* 6-31++G*

PAGE 136

122 Figure 5-2. Average unsigned bond length er rors (in ) for GGA, hybrid-GGA, metaGGA, and hybrid-meta-GGA functionals along with the Dunning type basis sets. 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalavg unsigned error () cc-pVDZ cc-pVTZ cc-pVQZ aug-cc-pVDZ aug-cc-pVTZ aug-cc-pVQZ

PAGE 137

123 gradient-corrected functionals. The addition of diffuse functions to the basis set results in a significant increase in the average bond a ngle errors for these functionals. The SVWN5 and SPL functionals both yield very similar errors while the c-SV WN5 functional obtains results that are only slightly better than its LSDA counterparts when paired with the small bases. Overall, the best small basis set results are obtained by the B1LYP/3-21G* and PBE1PBE/3-21G* methods which both calculate the average unsigned bond angle error to be 1.36. The larger Pople basis sets, 6-31G*, 6-31+G*, and 6-31++G*, generally produce better bond angle results for the basis sets containing diffuse functions. The wave function based methods yield lower errors wh en paired with the 6-31G* basis set. It should also be noted that th e addition of diffuse functions to hydrogen atoms in the 631++G* basis set does not resu lt in any significant improvement over the 6-31+G* basis. Once again, the hybrid-GGA and hybrid-metaGGA methods generally give the lowest unsigned bond angle errors. As in the case of the small Pople basis sets, there is a great deal of variation in the meta-GGA class of functionals. LSDA methods all yield similar results when paired with basis sets cont aining diffuse functions. However, c-SVWN5 produces errors that are signi ficantly higher than those of SVWN5 and SPL when paired with the 6-31G* basis set. For 6-31G*, HF generally outperforms the GGA and metaGGA methods and yields errors that are higher than thos e of all hybrid-GGA and hybridmeta-GGA methods. One aspect of the data for the Dunning type basis sets is that the errors obtained with the cc-pVDZ basis set are much higher th an those of all other Dunning basis sets for all methods considered except Hartree-Fock. In fact, cc-pVDZ is generally outperformed

PAGE 138

124 by all other basis sets, with the exception of 3-21+G*, fo r all functional methods. The hybrid-GGA and hybrid-meta-GGA functionals generally yiel d the lowest unsigned bond angle errors for the Dunning type basis sets. Generally speaking, incr easing the basis set size results in lower unsigned errors, this trend is especially evident for the GGA and meta-GGA functionals paired with non-diffu se basis sets. When diffuse functions are added to the cc-pVTZ basis set there is a significant increase of accuracy for GGA and meta-GGA functionals. For most functionals th ere is a small decrease in bond angle error upon addition of diffuse func tions to cc-pVQZ. Overall, the lowest error of 1.11 is obtained by the hybrid-GGA PBE1PBE/augcc-pVQZ method. PBEP86 and MPWP86 are the GGA functionals that typically yield the lowest uns igned bond angle errors, while HCTH is the least accurate in this class. It should be noted th at BLYP exhibits trends that are quite different than those of othe r GGA functionals. BLYP/aug-cc-pVDZ yields errors that are much higher than all other GGA/aug-cc-pVDZ combinations while BLYP/aug-cc-pVTZ produces errors that are si gnificantly lower than all other GGA/augcc-pVTZ combinations. LSDA met hods all yield similar results with the exception of cSVWN/cc-pVDZ. HF performs better when pair ed with the non-diffuse basis sets and larger Dunning-type bases generally yield larg er errors than the smaller Dunning bases. The unsigned bond angle errors obtained with MP2 improve with increasing basis set size and with the addition of diffuse functions to the basis set. HF, MP2, LSDA, GGA, and meta-GGA me thods obtain lower bond angle errors for singlet states for most Dunning and large Pople type basis sets. By and large, the hybrid-GGA and hybrid-meta-GGA methods yield lower errors for radicals than for singlets. Most methods produce better results fo r radical species when paired with the 3-

PAGE 139

125 21+G* basis set. For singlets, the LSDA functionals produce the lowest unsigned bond angle errors among all functional based met hods when paired with the Dunning type basis sets except cc-pVDZ and with the 631+G* and 6-31++G* Pople type basis sets. Hybrid-GGA methods give the lowest er rors among DFT methods for 3-21G* and 631G*. In terms of radical sp ecies, the hybrid-GGA methods give the lowest unsigned bond angle errors for all basis sets considered in this work. Hybrid-meta-GGA methods generally produce errors that are slightly larger than those of hybrid-GGA functionals. Ground State Vibrational Frequencies Figures 5-5 and 5-6 show the average uns igned vibrational frequency errors for gradient corrected DFT methods. Vibrational errors for the LSDA methods as well as HF and MP2 are given in Table 5-14. It has been observed previously that frequency errors for MP2 and HF methods are larger than thos e of most DFT methods It is observed in this study that the hybrid-GGA and hybrid -meta-GGA functionals, which include a Hartree-Fock like exchange term, are less accu rate than other classes of DFT functionals in predicting harmonic vibrational frequencie s. As expected, HF and MP2 errors are far worse than errors obtained with DFT calcu lations. Compared with the GGA, meta-GGA, and LSDA classes of functionals, MP2 errors ar e higher by a factor of about two or three, while HF errors are typically about three to five times higher. Th e lowest vibrational frequency error obtained by the MP2 method is 113 cm-1 for MP2/aug-cc-pVDZ while its highest error is 151 cm-1 for MP2/3-21G*. HF errors fo r vibrational frequencies range from 203 cm-1, for HF/3-21+G*, to 236 cm-1, for HF/6-31G*

PAGE 140

126 Figure 5-3. Average unsigne d bond angle errors (in degr ees) for GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functiona ls along with the Pople type basis sets. 0.0 0.5 1.0 1.5 2.0 2.5BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalsavg unsigned error (degree) 3-21G 3-21+G* 6-31G* 6-31+G* 6-31++G*

PAGE 141

127 Figure 5-4. Average unsigne d bond angle errors (in degr ees) for GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functi onals along with the Dunning type basis sets. 0.0 0.5 1.0 1.5 2.0 2.5BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalsavg unsigned error (degree) cc-pVDZ cc-pVTZ cc-pVQZ aug-cc-pVDZ aug-cc-pVTZ aug-cc-pVQZ

PAGE 142

128 The inclusion of diffuse func tions in the basis set does not greatly affect the ability of the functionals to predict vibrational fre quencies. As shown in the figures, 3-21G* and 3-21+G* are nearly identical in performance. There is only a small improvement, on the order of 5 cm-1, with the use of the 6-31+G* and 6-31++G* basis sets as opposed to 631G*. Augmented correlation-consistent basis sets do not perform markedly better than their non-diffuse counterparts for the doubl e-zeta and triple-zeta basis sets. The three LSDA functionals yield lower av erage errors than hybrid-meta-GGA and hybrid-GGA functionals, but they are generally higher than the errors of meta-GGA and GGA functionals. The SPL and SVWN5 functi onals perform slightly better than cSVWN5, with SPL performing the best of the three. The 6-31+G* and 6-31++G* basis sets yield the lowest errors for the SPL functional, around 50 cm-1. Within the GGA class, there is little vari ation of performance between functionals with the exception of HCTH, which yields vibrational frequenc y errors that are significantly higher than any of the other functionals. In th is class the Dunning style triple-zeta basis sets give the lowest unsi gned vibrational frequency errors. Both the diffuse and non-diffuse variants of this basis set produce unsigne d errors of 40-44cm-1. It is interesting to note that the 6-31+G* a nd 6-31++G* basis sets, which are much less computationally expensive than the cc-pVTZ and aug-cc-pVTZ basis sets, yield errors that are only 5-8 cm-1 higher than these Dunning type basis sets. 6-31+G* and 6-31++G* also give results that typically outperform the cc-pVDZ and aug-cc-pVTZ basis sets. Of the 16 functionals in the GGA family, we find that MPWLYP and MPWP86 perform the best, but their advantage over most of the other functionals is only slight.

PAGE 143

129 B3LYP is the most accurate of the hybr id-GGA class for calculating vibrational frequencies. B3LYP/cc-pVDZ yields the lowest error in this class at 70 cm-1. PBE1PBE does not perform as well as the other members of this class. The meta-GGA class also shows some variance between functionals. VSXC, which is among the best meta-GGA methods for calculating bond lengths, performs poorly for vibrational frequencies. BB95, MPWB95, MPWKCIS and PBEKCIS are simila r in performance throughout. BB95/augcc-pVTZ and MPWB95/aug-cc-pVTZ yield the lowest error of the class at 43 cm-1. Finally, the hybrid-meta-GGA group does not perfor m as well as most other classes due to its inclusion of the HF exchange. TPSS1KCIS and MPW1KCIS are the best functionals in this functional class, the cc-pVTZ and aug-cc-pVTZ basis sets typically give the lowest average errors within this group. Ionization Potential The average ionization potential unsigne d errors for each functional/basis set combination are given in Figures 5-7 and 5-8 and in Table 5-15. Tabl e 5-15 gives results of HF, MP2, LSDA functionals, and B3P86 (which yields high errors) for all basis sets. The best result for ionization potentials is obtained with the hybrid-meta-GGA functional B1B95 combined with the aug-cc-pVTZ basis set yielding an error of 4.25 kcal/mol. The worst average unsigned error among the dens ity functional methods is 15.05 kcal/mol and is obtained using the hybrid-GGA functi onal B3P86 along with the 3-21+G* basis set. Overall, the largest unsigned error of 26.53 kcal/mol is obtained with the HF/3-21G* method.

PAGE 144

130 Figure 5-5. Average unsigned vibr ational frequency errors (in cm-1) for GGA, hybridGGA, meta-GGA, and hybrid-meta-GGA functionals along with the Pople type basis sets. 0 20 40 60 80 100 120 140 160BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalAverage Error (cm-1) 3-21G* 3-21* 6-31G* 6-31+G* 6-31++G*

PAGE 145

131 Figure 5-6. Average unsigned vibr ational frequency errors (in cm-1) for GGA, hybridGGA, meta-GGA, and hybrid-meta-GGA functionals along with the Dunning type basis sets. 0 20 40 60 80 100 120 140BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalAverage Error (cm-1) cc-pVDZ cc-pVTZ aug-cc-pVDZ aug-cc-pVTZ

PAGE 146

132 Of the three LSDA functionals the best i onization potential unsigned error of 7.46 kcal/mol is obtained using the SPL/cc-p VDZ method. Among the GGA functionals G96P86/cc-pVTZ yields the lowest unsigned er ror of 4.93 kcal/mol, it should also be noted that the PW91PW91/cc-pVTZ, PBEP 86/cc-pVTZ, PBEPW91/aug-cc-pVTZ, and PBEPBE/aug-cc-pVTZ methods all yield errors lower than 5.00 kcal/mol. The B98/augcc-pVTZ method yields an average unsigned er ror of 4.65 kcal/mol, the lowest unsigned error among all hybrid-GGA methods. The func tional/basis set combination yielding the best result in the meta-GGA class of f unctionals is MPWB95/aug-cc-pVTZ with a calculated value of 4.38 kcal/mol. Among th e hybrid-meta-GGA functionals the lowest average unsigned error is gi ven by the B1B95/aug-cc-pVTZ method with a value of 4.25 kcal/mol. As in the case of electron affinities, HF does a very poor job in predicting ionization potentials, this can be explaine d in the same way as above. The cation has fewer electrons than the neutral systems and, thus, exhibits less correlation effects. The Hartree-Fock methods inability to describe electron correlation leads to a more accurate prediction for the electronic en ergy of the cation as compared to the neutral species. As one might expect, the small, 3-21G* a nd 3-21+G*, basis sets typically perform very poorly in predicting ionization potent ials compared to th e larger basis sets. Generally the 3-21+G* basis set predicts averag e errors that are substantially lower than the 3-21G* basis set. The best ionization potential result for the 3-21G* basis set is obtained with the hybrid-GGA B3LYP functional with an average erro r of 7.33 kcal/mol while the 3-21+G* basis set has its lowest av erage unsigned error of 5.26 kcal/mol when used in conjunction with the meta-GGA BB95 functional.

PAGE 147

133 Inspection of the average unsigned errors for individual functionals in Figures 5-7 and 5-8 and in Table 5-15 reveals that th e cc-pVTZ, 6-31+G*, 6-31++G*, aug-cc-pVDZ, and aug-cc-pVTZ basis sets all yield fairly similar results that are typically superior to the results obtained with the 3-21G*, 3-21+G*, 6-31G*, and cc-pVDZ basis sets. An exception to this is the LSDA functionals, fo r which the 6-31G* and cc-pVDZ basis sets yield the lowest average unsigned errors. As the Pople type 6-31+ G* and 6-31++G* basis sets are computationally much less expensiv e to use compared to the larger Dunning correlation consistent basis sets, it is very promising, in te rms of biological applications, that such high quality results can be obtain ed using the smaller basis sets. It should be noted that the MPWB95/(6-31+G*, 6-31++G*) methods (4.53 kcal/mol, 4.50 kcal/mol) outperform all other meta-GGA methods with the exception of the MPWB95/(cc-pVTZ, aug-cc-pVTZ) methods (4.49 kcal/mol, 4.38 k cal/mol). Similarly, the B1B95/(6-31+G*, 6-31++G*) methods (4.81 kcal/mol, 4.80 kcal/m ol) yield better resu lts than all other hybrid-meta-GGA methods with the excepti on of B1B95/(aug-cc-pVDZ, cc-pVTZ, augcc-pVTZ) methods (4.64 kcal/mol, 4.53 kcal/mol, 4.25 kcal/mol). For the 6-31++G* basis set, the best GGA result of 5.07 kcal/m ol is obtained with the PBEPW91 functional. The best hybrid-GGA result for Pople type ba sis sets is 5.05 kcal/mol and is obtained with the B98/6-31+G* method. As with the electron affin ities, the addition of diffuse functions to hydrogen atoms in the 6-31++G* basis set seems to have a negligible effect on the calculation of ionization potentials compared to the 6-31+G* basis set. There is also only a small difference between the results obtained w ith the cc-pVTZ and aug-cc-pVTZ basis sets.

PAGE 148

134 There is, however, a marked difference in the quality of the cc-pVDZ and aug-cc-pVDZ basis set results. Table 5-15. Average unsigned ionization pot ential errors for HF, MP2, LSDA, and B3P86 methods. Method 3-21G* 3-21+G* 6-31G* 6-31+G* 6-31++G* ccpVDZ ccpVTZ aug-ccpVDZ aug-ccpVTZ HF 26.53 23.00 25.70 24.74 24.76 26.33 25.74 25.19 25.40 MP2 15.59 9.77 10.45 10.42 5.84 11.44 9.78 6.74 5.53 SVWNV 10.74 10.29 7.57 8.52 8.48 7.68 8.33 8.59 8.75 SPL 10.50 10.37 7.49 8.62 8.57 7.46 8.42 8.70 8.83 cSVWNV 24.93 17.31 22.57 19.08 19.05 23.37 19.85 19.06 18.69 B3P86 15.05 19.01 14.25 15.87 15.86 13.83 15.22 15.84 15.68 Energies are in kcal/mol Electron Affinity The electron affinity average unsigned erro rs are shown in Table 5-16 and in Figure 5-9, the first of these gives the results fo r the HF, MP2, B3P86, and LSDA functional methods, the second gives all GGA, hybrid -GGA, meta-GGA, and hybrid-meta-GGA functional method results except for the hybrid-GGA functional B3P86, which yields very poor results. The best overall result for el ectron affinities is obtained using the metaGGA functional MPWB95 along with the 6-31+ +G* basis set yielding an average unsigned error of 3.08 kcal/mol. The wors t average unsigned error among the DFT methods is 16.78 kcal/mol and is obtain ed with the hybrid-GGA B3P86 functional combined with the 3-21+G* basis set. Am ong all methods studied in this work, the HF/aug-cc-pVTZ method yields the worst result with an average error of 29.63 kcal/mol. Among all of the functional types considered here, the LSDA methods produce the highest errors; this is an e xpected result as these are the least sophisticated functionals and they lack gradient dependent terms. It is also interesting to note that results obtained with methods that incorporat e the P86 correlation functional, with the exception of

PAGE 149

135 Figure 5-7. Average unsigned ionization pot ential errors for GGA, hybrid-GGA, metaGGA, and hybrid-meta-GGA functionals with Pople type basis sets. 0.00 2.00 4.00 6.00 8.00 10.00 12.00BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalAverage Error (kcal/mol) 3-21G* 3-21+G* 6-31G* 6-31+G* 6-31++G*

PAGE 150

136 Figure 5-8. Average unsigned ionization pot ential errors for GGA, hybrid-GGA, metaGGA, and hybrid-meta-GGA functionals with Dunning type basis sets. 0.00 2.00 4.00 6.00 8.00 10.00BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalAverage Error (kcal/mol) cc-pVDZ cc-pVTZ aug-cc-pVDZ aug-cc-PVTZ

PAGE 151

137 G96P86, are significantly worse than those obt ained using the other functionals within a given class.Of the three LSDA functionals the best result of 6.25 kcal/mol is obtained with SVWN5/6-31+G*. Among the GGA func tionals the PW91LYP/6-31++G* method yields the lowest average unsigned erro r of 3.56 kcal/mol, the PW91LYP/6-31+G*, PBEPW91/6-31++G*, and PBELYP/6-31++G* f unctional/basis set combinations all obtain errors lower than 3.60 kcal/mol. The B98/aug-cc-pVTZ method gives the smallest average error among the hybrid-GGA functiona ls of 3.15 kcal/mol. Among the metaGGA functionals, the MPWB95/6-31 ++G* method yields the lowest error with a value of 3.08 kcal/mol, it is also noteworthy th at the MPWB95/6-31+G* functional gives extremely good results with an error valu e of 3.12 kcal/mol. The MPW1KCIS/aug-ccpVTZ method yields an error value of 3.48 kcal/mol, the lowest error among all hybridmeta-GGA methods. The Hartree-Fock method performs very poor ly in describing electron affinities. This can be explained by the fact that, si nce an anion has one electron more than its neutral counterpart, co rrelation effects have a stronger effect on the negatively charged ion than on the neutral system. Due to the ne glect of correlation e ffects in the HartreeFock technique, there is a pronounced discre pancy in its description of neutral and anionic species. One salient aspect of these data is that not surprisingly, the 3-21+G* basis set performs very poorly compared to the larger basis sets for all functionals. This basis set does, however, outperform all ot her basis sets when combined with the Hartree-Fock

PAGE 152

138 method and gives results that are only slightly worse than those obtained with the larger, 6-31+G* and 6-31++G*, basis sets when co mbined with the MP2 method. The lowest unsigned error for the very small (and inexpens ive) 3-21+G* basis set is 5.17 kcal/mol as calculated with the BB95 functional. It is also surprising that, generally, th e 6-31+G* and 6-31++G* basis sets obtain results that are comparable to, or in many cases superior to, the aug-cc-pVDZ and aug-ccpVTZ results. As can be seen in Figure 7 and Table 3, the average unsigned errors for the 6-31+G* and 6-31++G* basis sets are genera lly lower than those for the aug-cc-pVDZ and aug-cc-pVTZ basis sets for the LSDA, GG A, and meta-GGA functional classes. This trend is especially pronoun ced for the LSDA and GGA functionals, two of the three LSDA functionals obtain better results when combined with the smaller, Pople 6-31G type, basis sets compared to the results they obtained when used in conjunction with the larger, Dunning cc-pVXZ type, basis sets. Th e Pople type basis sets outperform the Dunning type basis sets for thirteen of th e sixteen GGA functionals. For the hybrid-GGA and hybrid-meta-GGA functionals the Dunning t ype basis sets typically outperform the Pople type basis sets by a small margin (5 0 kcal/mol). The meta-GGA functionals represent a mixed bag in terms of this trend, here the smaller basis sets outperform the larger ones (not including the 3-21+G* basis se t) for four of the seven functionals studied in this work. It should be noted that the addition of diffuse functions to hydrogen atoms in the 631++G* basis set does not lead to results that are significantly different than those obtained with the 6-31+G* basis set. It is al so interesting to note that the aug-cc-pVTZ

PAGE 153

139 basis set generally yields resu lts that are only slightly better than the aug-cc-pVDZ basis set results. Table 5-16. Average unsigned electron affin ity errors for HF, MP2, LSDA, and B3P86 methods. Method 3-21+G* 6-31+G* 6-31++G* aug-cc-pVDZ aug-cc-pVTZ HF 26.40 29.05 28.95 28.90 29.63 MP2 11.14 10.73 10.62 5.81 4.70 SVWNV 9.37 6.25 6.36 7.58 7.53 SPL 9.92 6.79 6.90 8.14 8.04 cSVWNV 12.04 14.80 14.65 13.60 13.68 B3P86 16.78 13.02 13.08 13.99 13.53 Energies in kcal/mol Heat of Formation Unsigned errors for the heat of formation (HOF) test set are listed in Figures 5-10 and 5-11 along with Table 5-17. Overall, the combination that gives the lowest unsigned error is PBE1KCIS/aug-cc-pVTZ at 3.64 kca l/mol. Neglecting errors from the LSDA, MP2, and HF methods, the overall least accu rate combination is PW91P86/6-31G* with an average unsigned error of 51.4 kcal/m ol. The MPWLYP/3-21G* method yields an unsigned error of 5.66 kcal/mol which is the lowest error for the relatively inexpensive 321G* and 3-21+G* basis sets. For the Pople basis sets, the accuracy of HOF calculations is dependent on the size of the basis set for the hybrid-GGA and me ta-hybrid GGA classes of functionals. As shown in Figure 5-10, the all of the metahybrid GGA functionals a nd all but one of the hybrid-GGA functionals yield much higher errors for the 3-21G* and 3-21+G* basis sets than for the 6-31G*, 6-31+G*, and 6-31++G* ba sis sets while the ot her functional classes show no such dependency. The use of di ffuse basis sets with GGA and meta-GGA

PAGE 154

140 functionals appears to increas e the accuracy of the methods as 3-21+G*, 6-31+G*, and 631++G* produce typically lower HOF errors than their non-diffuse counterparts. The Figure 5-9. Average unsigned electron affinity errors (kcal/mol) for GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals. 0.00 2.00 4.00 6.00 8.00 10.00BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalAverage Error (kcal/mol) 3-21+G* 6-31+G* 6-31++G* aug-cc-pVDZ aug-cc-pVTZ

PAGE 155

141 opposite effect is observed when diffuse ba ses are paired with hybrid-GGA or hybridmeta-GGA functionals. The most accurate func tional/basis combinati on within the set of Pople bases is TPSSKCIS/6-31+G*, yielding an average unsigned erro r of 4.76 kcal/mol. Generally, the hybrid-meta-GGA class of functionals produces the most accurate HOF calculations. Within this class B1B95/631G* yields the most accurate results with a 5.03 kcal/mol average unsigned error. For all five functionals in this class, the 6-31G* basis is the most accurate of the Pople type bases. The meta-GGA class of functionals yields errors slightly larg er than those of the hybrid-m eta-GGA class, on the whole. Functionals employing the TPSS exchange perf orm the best in this class. BB95 also performs well. TPSSTPSS/6-31+G* and T PSSKCIS/6-31+G* are the most accurate combinations within this functional category, producing average errors of 4.76 kcal/mol. The 6-31+G* basis yields the lowest error for ea ch of the functionals in this class. Within the hybrid-GGA class, no functionals performa nce is particularly good. Within the GGA family, the P86 correlation term should be a voided as these data suggest that it is generally ill-suited for HOF calculati ons. Four functionals, PBELYP, MPWLYP, MPWPW91, and MPWPBE perform very we ll. With the exception of MPWP86, all functionals containing the MPW exchange perform well. MPWLYP/3-21G* gives the most accurate results of the entire class with an average unsigned error of 5.66 kcal/mol, which is remarkable for such an inexpens ive method. The accuracy of this method surpasses that of many of the more expensive me thods included in this study. As seen in Figure 5-10, the 6-31G* basis ge nerally produces the highest average errors with a few exceptions.

PAGE 156

142 As seen in Figure 5-11, the Dunning-style co rrelation-consistent basis sets yield a smaller range of errors than the Pople-style ba ses. This may be due to the fact that the ccpVDZ basis, the smallest Dunning-style basi s used, is still quite large and is more accurate than the 3-21G* and 3-21+G* base s. As discussed previously, enthalpy calculations were preformed at the same f unctional/basis combination as the geometry optimization of each molecule, the exception being aug-cc-pVTZ, for which single point enthalpy calculations were performed at th e TPSS1KCIS/aug-cc-pVDZ geometries. The lowest average error for all Dunning-style bases is obtained with B3PW91/aug-cc-pVTZ at 3.95 kcal/mol. Within each class of func tional there is a mixture of accurate and inaccurate methods. Within the hybrid-meta-GGA class, MPW1KC IS is the most accurate; with the MPW1KCIS/cc-pVTZ method producing an average error of 3.97 kcal/mol. Augmentation of the bases with diffuse f unctions tends to reduce the accuracy of methods in this functional class. Once again, the meta-GGA functiona ls prove to be the second most accurate DFT methods for HOF calculation. On the whole, the TPSSKCIS and TPSS methods produce the best resu lts among meta-GGA functionals. However, VSXC/aug-cc-pVDZ is the most accurate comb ination in the class with an average unsigned error of 3.98 kcal/mol. The use of DZ versus TZ bases does not seem significant within this class, as the TZ bases produces the largest erro rs in three of the seven functionals. The hybrid-GGA class of func tionals reveals the same trends with the Dunning-type bases as with the Pople bases. PBE1PBE is most accurate functional and B3PW91/aug-cc-pVTZ produces the lowest error at 3.95 kcal/mol. Within this class, the expansion of the basis set from DZ to TZ does not enhance the accuracy of the HOF

PAGE 157

143 calculations as the TZ bases produce nearly equiva lent errors for all functionals in the set. The behavior of the GGA class of functionals with the Dunning type basis sets is again similar to that of the Pople bases. Methods containing the P86 correlation term are again very poor at predicting heats of formation, while those containing the MPW exchange term are more accurate. BPW91 also perfor ms well compared to the rest of the functionals in this group. The most accura te method within this class is HCTH/aug-ccpVDZ, which produces an average error of 6.83 kcal/mol. Twelve of the sixteen functionals in the class show a decrease in accuracy with the addition of diffuse functions. Again, there is little difference in values obtained with DZ bases as opposed to TZ methods, as most functionals show only a slight increase in accuracy when using the TZ bases instead of the DZ sets. Of the LSDA methods, c-SVWN5 performs notably well at predicting heats of formation. c-SVWN5/3-21+G* yields an av erage unsigned error of 9.44 kcal/mol, the best value in this group. In terms of heat of formation, c-SVWN5 is 4 to 13 times more accurate than other LSDA methods. Other LSDA methods do not accurately predict HOF. Table 5-17. Average unsigned HOF errors for the HF, MP2, and LSDA methods. Method 3-21G* 3-21+G* 6-31G* 6-31+G* 6-31++G* ccpVDZ ccpVTZ aug-ccpVDZ aug-ccpVTZ HF 289.94 294.01 252.94 256.28 258.10 261.27 248.30 259.40 248.49 MP2 58.23 55.95 83.74 84.53 83.27 96.10 SVWNV 125.83 127.57 135.97 128.06 121.78 124.24 135.06 122.98 133.85 SPL 128.66 117.21 143.48 133.70 127.46 129.90 144.22 128.62 139.49 cSVWNV 13.45 9.44 19.47 26.46 27.14 19.48 29.44 18.95 18.10 Energies in kcal/mol

PAGE 158

144 Figure 5-10. Average unsigned he at of formation errors (kca l/mol) for the five Poplestyle basis sets employed in this study. 0 10 20 30 40 50 60BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalavg unsigned error (kcal/mol) 3-21G* 3-21+G* 6-31G* 6-31+G* 6-31++G*

PAGE 159

145 Figure 5-11. Average unsigned he at of formation errors (kca l/mol) for the Dunning-type basis functions used in this work. 0 10 20 30 40 50 60BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalavg unsigned error (kcal/mol) cc-pVDZ cc-pVTZ aug-cc-pVDZ aug-cc-pVTZ

PAGE 160

146 Hydrogen Bonding Interaction Energy Figures 5-12 and 5-13 give the av erage hydrogen bonding interaction energy unsigned errors for gradient corrected dens ity functional methods along with Pople and Dunning type basis sets resp ectively. Table 5-18 gives the average unsigned hydrogen bond interaction errors for HF, MP2, and LSDA functional methods. Overall the best result is obtained with MP2/aug-cc-pVDZ w ith an average error of 0.25 kcal/mol. The best result among density functional me thods is 0.31 kcal/mol as obtained by MPWLYP/aug-cc-pVTZ. The largest overall erro r of 10.26 kcal/mol is obtained by the cSVWN5/aug-cc-pVDZ method. Not surprisingly, the small Pople type basis sets, 3-21G* and 3-21+G*, generally yield poor results in terms of hydrogen bonding. For most f unctionals, the e rrors obtained with these small bases are greater than 2.00 kcal/mol. Some notable examples of small basis methods that perform fairly well are HCTH/3-21+G* (0.84 kcal/mol), G96LYP/321+G* (0.90 kcal/mol), and MP2/3-21G* (0.85 kcal/mol). The best result for these small basis sets combined with one of the LSDA methods, which are very computationally inexpensive, is 7.99 kcal/mol as calcula ted using c-SVWN5/3-21G*. As one might expect, the 6-31+G* and 6-31++G* basis sets, which contain diffuse functions, generally outperform 6-31G* in terms of hydrogen bonding, there are seven methods for which this is not the case, these are HF, BPW91, G96LYP, G96P86, VSXC, BB95, and B1B95. Somewhat surprisingly there is typically only a small adva ntage to using the 6-31++G* basis set, which incorporates diffuse functi ons for hydrogen atoms, as compared to the 631+G* basis set. For the large Pople type basis sets, th e MP2 method performs very well with average unsigned binding energies of 0.28 kcal/mol and 0.29 kcal/mol with 6-31++G*

PAGE 161

147 and 6-31+G* respectively (these values re present the second and third best overall results). Hartree-Fock performs fairly well with these basis sets with a best value of 0.91 kcal/mol when combined with 6-31G*. The LSDA functionals perform poorly for hydrogen bonding when combined with the larg e Pople type basis sets. The SVWN5 and SPL functionals both yield errors greater than 6.00 kcal/mol with these bases. The cSVWN5 functional, which gives results that are substantially better than those of the other two LSDA methods, still on ly yields a best result of 4.85 kcal/mol (with both the 631+G* and 6-31++G* basis sets). There is a great deal of variation in the hydrogen bonding results obtained with the GGA functionals. The lowest inte raction energy error of 0.46 kcal/mol is obtained with the MPWPW91 functional combined with both the 6-31+G* and 6-31++G* basis sets. The highest error of 2.59 kcal/mol is given by PW91P86/6-31G *. Other noteworthy methods in this class are MPWPBE/(6-31+G* ,6-31++G*) (0.47 kcal/mol) and BLYP/631++G* (0.55 kcal/mol). It is interesting to note that, generally, functionals containing the P86 correlation functiona l perform poorly while functi onals containing the MPW exchange functional perform fairly well when used along with the large Pople type basis sets. The MPWP86 functional perf orms moderately well with an average error of 1.03 kcal/mol for MPWP86/6-31++G*. For the large Pople type basis sets, the best result among hybrid-GGA methods is 0.33 kcal/mol as calculated with the B1LYP/6-31++G* method, it should also be noted that this is the best overall result for these basis sets among density functional methods. B1LYP/631+G* gives a slightly higher average unsigned interaction energy of 0.34 kcal/mol while B3LYP also performs well with average errors of 0.36 kcal/mol and 0. 38 kcal/mol with 6-31+G* and 6-31++G*

PAGE 162

148 respectively. Among the meta-GGA methods the lowest inter action energy error of 0.42 kcal/mol is obtained with the TPSS1KCIS /6-31++G* method. The VSXC functional performs very poorly compared to the othe r meta-GGA functionals (indeed, it performs poorly compared to most gradient corrected func tionals). It is interes ting to note that four of the seven functionals in this class obt ain errors lower than 0.50 kcal/mol when combined with the 6-31+G* and 6-31++G* ba sis sets, these functionals are MPWB95, TPSS, MPWKCIS, and TPSSKCIS. Each of the five hybrid-meta-GGA functionals performs quite well for hydrogen bond inter action energies when paired with the 631+G* and 6-31++G* basis sets, with no met hod obtaining average errors larger than 1.00 kcal/mol. The best result in this cla ss is 0.38 kcal/mol and is given by the MPW1KCIS/6-31++G* method. Other notew orthy methods are BB1K/6-31+G* (0.40 kcal/mol), BB1K/6-31++G* (0.41 kcal/mol), MPW1KCIS/6-31++G* (0.42 kcal/mol), and TPSS1KCIS/6-31+G* (0.42 kcal/mol). As in the case of the Pople-type basis se ts, the Dunning-type basis sets that contain diffuse functions, aug-cc-pVDZ and a ug-cc-pVTZ, yield better hydrogen bond interaction energies than th e ones that do not, cc-pVDZ and cc-pVTZ, for a majority of the functionals considered in this work Generally speaking the cc-pVTZ functional outperforms the, smaller, cc-pVDZ basis se t for hydrogen bonding, it should be noted that this is not the case for Hartree-Fock or any of the LSDA functionals. The aug-ccpVTZ typically outperforms the aug-cc-pVDZ basis set for LSDA, GGA, and hybridGGA functionals while the smaller basis, au g-cc-pVDZ, yields better results when combined with the meta-GGA and hybrid-meta-GGA functionals.

PAGE 163

149 HF yields fairly large errors with the Dunning-type basis sets, with the lowest unsigned error being 1.65 kcal/mol for HF/ cc-pVDZ and the highest being 1.77 kcal/mol for HF/cc-pVTZ. These values are significantl y higher than those obtained with the large Pople type basis sets. The MP2 method performs very well with most Dunning type basis sets, MP2/aug-cc-pVDZ produces an average unsigned error of 0.25 kcal/mol, which is the best value obtained for hydrogen bond intera ction energies obtained in this work. Once again the LSDA functionals perform ve ry poorly compared to the other DFT methods. Among these methods, the SPL and SV WN5 functionals generally yield results that are almost identical for all Dunning-ty pe basis sets. The c-SVWN5 method yields unsigned errors that are signi ficantly higher than those of SPL and SVWN5. The best LSDA result of 5.48 kcal is obtained with SPL/cc-pVDZ. The worst LSDA result is given by c-SVWN5/aug-cc-pVDZ with a value of 10.26 kcal/mol. Among the GGA methods, the Dunning-type ba sis sets outperform the large Pople type bases for nine of the sixteen func tionals. The MPWLYP functional performs significantly better than all other functionals in this class with the best value of 0.31 kcal/mol given by MPWLYP/aug-cc-pVTZ. The highest unsigned error of 2.78 kcal/mol is obtained with the G96LYP/cc-pVDZ met hod. Both functionals containing the G96 correlation functional, G96LYP and G96P86, give very poor re sults with errors that are greater than 2.00 kcal/mol for all Dunning-type basis sets. For the hybrid-GGA methods, four of th e six functionals, B1LYP, B3LYP, PBE1PBE, and B3P86, obtain errors that are between 0.50 kcal/mol and 0.80 kcal/mol, the remaining functional, B3PW91, does not perform as well, producing errors that are above 1.00 kcal/mol for all basis sets. For a ll Dunning-type basis sets, the B98 functional

PAGE 164

150 produces the best hydrogen bonding results. The best result within this class is given by the B98/aug-cc-pVDZ method, with a valu e of 0.40 kcal/mol. Among the meta-GGA methods, PBEKCIS stands out as being notably better than all other functionals. Values of 0.32 kcal/mol and 0.34 kcal/mol are obtai ned when PBEKCIS is combined with the aug-cc-pVDZ and aug-cc-pVTZ basis sets re spectively; these are the lowest unsigned errors within this class. The VSXC and BB 95 functionals both perform poorly. Within the hybrid-meta-GGA class of functionals, PB E1KCIS yields the lowest errors for hydrogen bonding interaction energies for all Du nning-type basis sets. The best result in this class is obtained with PBE1KCIS/aug-cc -pVDZ with a value of 0.36 kcal/mol. The next best functional in this class is TPSS1KC IS, whose lowest error is 0.55 kcal/mol at the TPSS1KCIS/aug-cc-pVDZ level. Table 5-18. Average unsigned hydrogen bond in teraction errors for HF, MP2, and LSDA methods. Method 3-21G* 3-21+G* 6-31G* 6-31+G* 6-31++G* ccpVDZ ccpVTZ aug-ccpVDZ aug-ccpVTZ HF 1.13 1.60 0.91 1.07 1.08 1.56 1.77 1.68 1.73 MP2 0.85 1.14 0.49 0.29 0.28 1.29 0.42 0.25 0.30 SVWNV 9.68 10.24 6.48 6.20 6.21 5.49 5.76 6.04 5.97 SPL 9.66 10.24 6.48 6.21 6.21 5.48 5.76 6.04 5.97 cSVWNV 7.99 8.66 5.13 4.85 4.85 9.90 10.06 10.26 10.18 Energies in kcal/mol Conformational Energies The average unsigned conformational energy errors are given in Figures 5-14 and 5-15 and in Table 5-19. There ar e great differences in the co nformational energies of the systems considered here. For example, the e xperimental difference in energy between the orthogonal and planar conformers of ethylene is 65.0 kcal/m ol whereas the experimental value for the conformational energy for the an ti and eclipsed forms of methanol is 1.1 kcal/mol.

PAGE 165

151 Figure 5-12. Average unsigned hydrogen bond in teraction energy errors (kcal/mol) for GGA, hybrid-GGA, meta-GGA, and hybrid -meta-GGA functionals along with Pople type basis sets. 0.0 1.0 2.0 3.0 4.0 5.0 6.0BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalAverage Error (kcal/mol)

PAGE 166

152 Figure 5-13. Average unsigned hydrogen bond in teraction energy errors (kcal/mol) for GGA, hybrid-GGA, meta-GGA, and hybrid -meta-GGA functionals along with Dunning type basis sets. 0.00 0.50 1.00 1.50 2.00 2.50 3.00BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalAverage Error (kcal/mol) cc-pVDZ cc-pVTZ aug-cc-pVDZ aug-cc-pVTZ

PAGE 167

153 For this reason, the conformational energies are reported in percent error, that is: error% E exp E theory Eexp 100 (5.24) Overall, the best result of 6.8% is obtained with the MP2/aug-cc-pVDZ method. The best result among density functional methods is 7.9% as calculated using MPWB95/cc-pVTZ. The worst conformational energy error is that of VSXC/3-21G* with a value of 81.9%. As seen in Figure 5-14 and Table 5-19, the small Pople type basis sets, 3-21G* and 3-21+G*, give conformational energy errors that are typically much gr eater than those of the larger Pople type basis sets, 6-31G* 6-31+G*, and 6-31++G*. Generally, 3-21+G* outperforms 3-21G*, there are several excep tions to this rule in the hybrid-GGA and hybrid-meta-GGA classes of functionals, also 321G* yields slightly lower errors than 321+G* for the GGA functional HCTH. For th ese small basis sets, the LSDA method produces conformational energies that are sign ificantly worse than those of the gradient corrected density functional methods. The lo west unsigned error for small Pople type basis sets is obtained with the MP2/3-21+ G* method with a value of 21.7%, for DFT methods the best value of 23.3% is obta ined with the PBELYP/3-21+G* method. For the Pople type basis sets the best conformational energy results can be found within the meta-GGA and hybrid-meta-GGA functiona l classes. The best overall result of 12.2% is obtained with the, hybrid-meta-GG A, B1B95/6-31++G* method. It should be noted that although the BB1K and B1B95 meth ods perform very well, the remaining three functionals in the hybrid-met a-GGA class, MPW1KCIS, PBE1KCIS, and TPSS1KCIS, yield errors that are about two to four percent higher. Within the meta-GGA

PAGE 168

154 group of functionals, BB95, MPWB95, and T PSS all yield very low conformational energy errors. The lowest unsigned error in this class is produced by the MPWB95/631++G* method with a value of 12.4%. Am ong the hybrid-GGA functionals, B98 obtains errors that are about one percent lower than t hose of the next best functional, B3P86. The lowest error in this class is obtained at th e B98/6-31++G* level with an average unsigned error of 14.2%. Two GGA functionals, PBEP86 and PW91P86, produce the best results within their class, both yield an error va lue of 14.0% when paired with the 6-31++G* basis set. Among the LSDA functionals, SPL and SVNWV both yield the same error values of 15.6% and 15.5% when paired with the 6-31+G* and 6-31++G* bases respectively. Hartree-Fock generates errors that are significantly higher than those obtained by most DFT methods, the best error value of 22.1% is obtained with 6-31+G*. The MP2 method obtains errors of 15.5% wh en paired with 6-31++G* and 16.8% with the 6-31+G* basis set. The basis sets that include diffuse fu nctions, 6-31+G* and 6-31++G*, generally give unsigned errors that are substantially lower than those obtained using the 6-31G* basis function. It is also in teresting to note that 6-31++G* outperforms 6-31+G* for most of the functionals considered here. Figure 5-15 and Table 519 give the conformational energy unsigned errors for the Dunning type basis sets. Here it can be seen that there is no class of functional that stands out as bei ng substantially more accurate than another. Some of the lowest unsigned errors are obtained with BB1K and B1B95 (hybrid-metaGGA), BB95 and MPWB95 (meta-GGA), and w ith PBEP86 and PW91P86 (GGA). The B98 functional produces the best hybrid-GGA re sults, which are not quite as good as the best results obtained by other DFT methods. It is also interesti ng to note that each of the

PAGE 169

155 LSDA methods studied here yi elds results that are comp etitive with many of those obtained with the, more sophisticated, grad ient corrected techniques. Among all density functional methods considered in this work, the lowest unsigned error obtained for this property is 7.9% as calculated using th e MPWB95/cc-pVTZ met hod. Once again, the VSXC functional (meta-GGA) performs ve ry poorly for describing conformational energies. The MP2 method also yields very good results for all Dunning type basis sets except for cc-pVDZ. The best overall conforma tional energy result obtained in this study is 6.8% and is given by the MP2/aug-cc-p VDZ method. Hartree-Fock produces errors that are significantly higher than those of most DFT techniques. Among the Dunning type basis sets, augcc-pVDZ and cc-pVTZ tend to yield the lowest errors. The aug-cc-pVDZ basis set give s the best results for all of the hybrid-GGA functionals, all of the LSDA functionals, and for all of the hybrid-meta-GGA functionals except MPW1KCIS. The cc-pVTZ basis set yields the lowest unsigned errors for all of the GGA functionals and for all of the meta-GGA functionals except VSXC and MPWKCIS. The cc-pVDZ basis set produces the largest errors among Dunning type basis sets for each of the computational te chniques employed in this study with the exception of VSXC. Table 5-19. Average unsigned conformationa l energy errors for HF, MP2, and LSDA methods. Method 3-21G* 3-21+G* 6-31G* 6-31+G* 6-31++G* ccpVDZ ccpVTZ aug-ccpVDZ aug-ccpVTZ HF 27.6 38.7 24.2 22.1 22.2 27.7 20.3 22.0 21.0 MP2 30.8 21.7 18.8 16.8 15.5 19.8 7.4 6.8 8.9 SVWNV 51.9 39.1 19.5 15.6 15.4 17.1 9.9 8.8 11.6 SPL 52.1 39.3 19.7 15.6 15.5 17.4 10.1 8.9 11.6 cSVWNV 51.2 34.4 17.4 17.0 17.0 17.8 10.2 10.0 15.6 Values in percent error

PAGE 170

156 Figure 5-14. Average unsigned conformati onal energy errors (kcal/mol) for GGA, hybrid-GGA, meta-GGA, and hybrid-m eta-GGA functionals along with Pople-type basis sets. 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalAverage Error (%) *3-21G *G+3-21 *6-31G *G+6-31 *G++6-31

PAGE 171

157 Figure 5-15. Average unsigned conformati onal energy errors (kcal/mol) for GGA, hybrid-GGA, meta-GGA, and hybrid-m eta-GGA functionals along with Dunning-type basis sets. 0.0 5.0 10.0 15.0 20.0 25.0BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalAverage Error (%) cc-pVDZ cc-pVTZ aug-cc-pVDZ aug-cc-pVTZ

PAGE 172

158 Reaction Barrier Heights for Small System s with Non-singlet Transition States Figures 5-16 and 5-17 give the average unsi gned barrier height e rrors of the SRBH systems for gradient corrected functionals along with the Pople and Dunning type basis sets respectively. Table 5-20 gives the SRBH barrier height errors for the HF, MP2, and LSDA functional methods along with all basis se ts considered in this work. Overall the best result is obtained with the BB1K/a ug-cc-pVTZ method with an average unsigned error of 1.05 kcal/mol. The highest erro r, 21.95 kcal/mol, is produced with the SVWN5/3-21G* functional/basis combination. Again we would like to point out that these barrier heights are based on single point calculations at geometries determined at the QCISD/MG3 level of theory. Inspection of these data reveals that the DFT methods that include exact exchange, that is, the hybrid-GGA and hybr id-meta-GGA methods generall y yield the lowest barrier height errors. The LSDA methods, which are based solely on the electron density, produce the largest uns igned errors. The LSDA, GGA, and meta-GGA perform poorly for SRBH barrier heights. Each of the LSDA methods produces errors larger than twelve kcal/mol for all basis sets. Of the GGA functi onals, only HCTH yields errors smaller than six kcal/mol. The best result in this cla ss is obtained with the HCTH/6-31++G* method with an average unsigned error of 4.86 k cal/mol. Among the meta-GGA methods, only the VSXC functional obtains errors smaller than six kcal/mol. The smallest error in this class is 4.24 kcal/mol and is give n by the VSXC/6-31++G* method. Among the hybrid-GGA functionals, B1LYP yields the smallest errors for all basis sets, this functional produces its lowest erro r of 3.11 kcal/mol when paired with the augcc-pVTZ basis set. It should be noted that B1 LYP/6-31++G* gives a s lightly higher error

PAGE 173

159 of 3.23 kcal/mol. In the hybrid-meta-GGA cl ass, the BB1K functional stands out as clearly being the best performe r, indeed, for each basis set, this functional produces the best results among all methods considered in th is work. The lowest error in this class is obtained with the BB1K/aug-cc-pVTZ method w ith a value of 1.05 kcal/mol. The next best functional for the calculation of thes e SRBH barrier heights is B1B95, which produces the second best results among all met hods studied here. The lowest error given by this functional is 2.64 kcal/mol as calculated using the 6-31G* basis set. The Hartree-Fock method performs very poorly in describing radical transition state barrier heights, the lo west unsigned error attained with this technique is 10.78 kcal/mol, with the 3-21+G* basis set. MP2 yi elds fairly good results when paired with the Dunning type basis sets but, when paired with the Pople type basis sets, produces much larger errors. The lowest unsigned error attained with this method is 2.98 kcal/mol at the MP2/aug-cc-pVDZ level. Table 5-20. Average unsigned errors for the non-singlet transition st ate reaction test set for HF, MP2, and LSDA methods. Method 3-21G* 3-21+G* 6-31G* 6-31+G* 6-31++G* ccpVDZ ccpVTZ aug-ccpVDZ aug-ccpVTZ HF 10.79 10.78 12.49 12.84 12.78 11.49 12.51 12.08 13.10 MP2 6.70 5.66 6.53 6.63 6.45 3.46 3.35 2.98 3.14 SVWNV 21.95 19.83 17.73 16.70 16.73 19.31 17.65 18.16 17.12 SPL 21.94 19.75 17.71 16.75 16.79 19.26 17.61 18.20 17.26 cSVWNV 18.82 16.38 14.18 14.18 13.23 16.05 14.18 14.70 13.46 Energies in kcal/mol Reaction Barrier Heights for Organic Mol ecules with Singlet Transition States Figures 5-18 and 5-19 and Table 5-21 show the reaction barrier heights for the six reactions listed in the LSBH test set. Tran sition state barrier hei ghts in this study are calculated as the difference between temperatur e-corrected total enthalpy of the transition state and that of the reactants. All stru ctures have been fully optimized at each functional/basis set combination.

PAGE 174

160 Figure 5-16. Average unsigned barrier heig ht energy errors (kcal/mol) for SRBH reactions along with the GGA, hybr id-GGA, meta-GGA, and hybrid-metaGGA functionals along with Pople type basis sets. 0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalAverage Error (kcal/mol) 3-21G* 3-21+G* 6-31G* 6-31+G* 6-31++G*

PAGE 175

161 Figure 5-17. Average unsigned barrier heig ht energy errors (kcal/mol) for SRBH reactions along with the GGA, hybr id-GGA, meta-GGA, and hybrid-metaGGA functionals along with Dunning-type basis sets. 0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalAverage Error (kcal/mol) cc-pVDZ cc-pVTZ aug-cc-pVDZ aug-cc-pVTZ

PAGE 176

162 The values listed in the following tables ar e average values of the error in transition state barrier height over all si x reactions considered. These reactions include: 1) DielsAlder reaction of butadiene and ethene form ing cyclohexene 2) Cope rearrangement of 1,5 hexadiene 3) Claisen rearrangement of a llyl vinyl ether to pent enal 4) the electrocyclic rearrangement of cy clobutene to butadiene 5) 1, 5-sigmatropic shift of 2,4 pentanedione and 6) the 1,5-sigm atropic shift of 1,3-pentadiene. Overall, the functional that provides the lo west average error over all six reactions for both Pople and Dunning basis sets is B1LYP. The average error for this functional is 2.63 kcal/mol for the 6-31++G* basis and 2. 58 kcal/mol for the aug-cc-pVTZ basis. Generally, a marked improvement in accuracy is observed between basis sets for each functional. The 3-21G* and 3-21+G* are less ac curate than the larger Pople-type bases by 3-4 kcal/mol, while the triple-zeta Dunningstyle basis sets are more accurate than their double-zeta counterparts by nearly 0.5 kcal/mol. The hybrid-GGA and hybrid-meta GGA functional classes perform markedly better for predicting barrier heights than the LSDA, GGA, and meta-GGA classe s. This result indicates a tr end that is the opposite of that observed for frequency calculations, for which functionals that include the HartreeFock exact exchange perform worse than those without a DFT exact exchange term. Since frequency calculations must be perfor med for transition state optimizations this result is somewhat surprising. Moreover, on its own, the HF method is more accurate than most DFT methods at predicting ba rrier heights when the 3-21G* and 3-21+G* basis sets are used. MP2 also performs well with the lower basis sets. In fact, for HF, basis sets larger than 3-21G* produce errors nearly twice as large as those given by the smallest bases.

PAGE 177

163 Among the LSDA functionals, cSVWN5 f unctional gives the greatest accuracy, while SPL is slightly less accurate. The av erage barrier height error for c-SVWN5/ccpVDZ is 11.22 kcal/mol. Typically, average errors within the LSDA class are near 12 kcal/mol except for the lower Pople style ba sis sets, which returned errors from 18-20 kcal/mol for the SPL and SVWN5 functionals Within the GGA class, most functionals yield similar results, while the HCTH func tional clearly returning the most accurate results. As mentioned, the accuracy of the ba rrier height calculations is highly basis set dependent for the Pople-type basis sets HCTH/6-31+G* and HCTH/6-31++G* both yield an average error of 4.15 kcal/mol over all six reactions while HCTH/cc-pVTZ produced an average error of 4.05 kcal/mol. Fo r the class as a whole, errors for the 321G* and 3-21+G* average 9-13 kcal/mol while errors for the larger Pople bases average 6-8 kcal/mol. The Dunning basis sets provide ac curacy equivalent to the high level Pople sets. Similar results are obtained by the meta -GGA class, in which VSXC is by far the most accurate. Once again the lower Pople ba sis sets yield average errors of 8-11 kcal/mol, while the larger Pople sets and th e correlation-consistent sets give average errors of 7-8 kcal/mol. The VSXC functional consistently yields lower errors than the other functionals in this class. Hybrid-GGA methods perform better than either the GGA or meta-GGA methods, with B1LYP proving to be the most accurate functional tested. Again a large dependence on the basis is observed with the Pople-type ba sis sets as the larger basis sets are much more accurate than 3-21G* and 3-21+G*. TZ Dunning-type sets are slightly more accurate than the DZ sets. B1LYP/6-31 ++G* and B1LYP/aug-cc-pVTZ are the most accurate functional/basis combina tions in the entire test set, producing average errors of

PAGE 178

164 2.63 and 2.58 kcal/mol, respectively. B3LYP al so provides very accurate calculations for the barrier height test set. On the w hole, the hybrid-GGA and hybrid-meta-GGA classes provide similar accuracy. Of the hybrid-m eta-GGA class, BB1K yields the lowest average errors with the Pople-type base s, while B1B95 performs better when the Dunning-style sets are employed. Table 5-21. Average unsigned errors for the si nglet transition state large reaction test set for HF, MP2, and LSDA methods. Method 3-21G* 3-21+G* 6-31G* 6-31+G* 6-31++G* ccpVDZ ccpVTZ aug-ccpVDZ aug-ccpVTZ HF 7.79 8.69 13.93 13.89 13.89 13.73 14.32 13.39 14.39 MP2 5.68 5.41 5.18 5.28 7.07 7.71 8.80 SVWNV 19.86 17.81 12.61 12.04 12.08 12.01 11.83 12.54 12.21 SPL 19.82 17.78 12.59 12.02 12.05 11.99 11.75 12.49 11.83 cSVWNV 14.11 13.25 13.01 11.31 11.35 11.22 11.93 11.74 12.33 Energies in kcal/mol Conclusions In terms of geometric parameters h ybrid-GGA and hybrid-meta-GGA generally yield the best results for both bond lengths and bond angles. The LSDA functionals generally do not perform as well as the more sophisticated functionals. The choice of basis set has a large impact on the quality of calculated geometric parameters. In terms of bond lengths, the large Pople type basis sets 6-31G*, 6-31+G*, and 6-31++G*, generally perform similarly to or better than the much larger (and more expensive) cc-pVDZ and aug-cc-pVDZ basis sets for all gradient corrected functionals. For bond angles, the Dunning type basis sets generally yield the best results. The largest of these bases, augcc-pVQZ, generally obtains the lowest bond angl e errors for all DFT functional classes. The large Pople type basis se ts that incorporate diffuse functions typically yield bond angles that are only slightly less accurate than those obtained with the aug-cc-pVDZ and cc-pVTZ basis sets. For most functionals, 631++G* produces bond angle errors that are only 0.01 to 0.05 higher than those of aug-cc-pVDZ.

PAGE 179

165 Figure 5-18. Average unsigned barrier height energy errors (kcal/mol) for large singlet transition state reactions along with the GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals along w ith Pople-type basis sets. 0 2 4 6 8 10 12 14BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalsavg unsigned error (kcal/mol) 3-21G* 3-21+G* 6-31G* 6-31+G* 6-31++G*

PAGE 180

166 Figure 5-19. Average unsigned barrier height energy errors (kcal/mol) for large singlet transition state reactions along with the GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals along with Dunning-type basis sets. 0 2 4 6 8 10 12BLYP BPW91 PBELYP PBEP86 PBEPW91 PBEPBE PW91LYP PW91P86 PW91PW91 MPWLYP MPWP86 MPWPW91 MPWPBE G96LYP G96P86 HCTH B1LYP B3LYP PBE1PBE B3P86 B3PW91 B98 VSXC BB95 MPWB95 TPSS MPWKCIS PBEKCIS TPSSKCIS BB1K B1B95 MPW1KCIS PBE1KCIS TPSS1KCISFunctionalavg unsigned error (kcal/mol) cc-pVDZ cc-pVTZ aug-cc-pVDZ aug-cc-pVTZ

PAGE 181

167 The methods that include DFT exact exchange perform very poorly for calculating the vibrational fre quencies of molecules. For large Pople and Dunning type basis sets, these methods generally yield unsigne d frequency errors that are 1.5 to 2 times larger than those obtained with method that do not include exact exchange. For all basis sets, with the exception of 3-21G*, the GGA functionals produce the lowest average frequency errors. For LSDA and GGA functiona ls, the augmented Pople type basis sets, 6-31+G* and 6-31++G*, typically produce errors that are slightly lower than those of aug-cc-pVDZ and slightly higher than thos e of aug-cc-pVTZ. For all functionals, the Pople-type basis sets yield errors that are comparable to the errors computed using all Dunning-type basis sets. For electron affinities there is no strong tendency for one functional class to significantly outperform anothe r, with the exception of LSDA, which performs very poorly compared to all other functional groups. It is interesting to not e that all functionals containing the P86 correlation functional ( GGA and hybrid-GGA) perform very poorly. Functionals incorporating exact exchange tend to yield the smallest errors when combined with larger Dunning-type basis se ts while the other functional groups, LSDA, GGA, and meta-GGA, all obtain the most accurate results when used in conjunction with the 6-31+G* and 6-31++G* basis sets. For ionization potentials the best results are obtained with the hybrid-meta-GGA functionals. It is very promising, in terms of large-scale calculati ons, that the ionization potential results obtained with the 6-31+G* and 6-31++G* basis sets are comparable to those obtained using the much larger Dunning-type basis sets for most functionals. As

PAGE 182

168 one might expect, the inclusion of diffuse f unctions in the basis set greatly improves the results for this property. For heats of formation, the meta-GGA and hybrid-meta-GGA classes of DFT functionals appear to be the most accurate. It is important to note that in all classes except LSDA, one can find some functional/basis comb ination that performs well. Overall, the Dunning-style bases are more accurate than th e Pople-type sets, with the cc-PVTZ and aug-cc-pVTZ bases yielding the lowest aver age unsigned errors for our 156-molecule heat of formation test set. However, it shoul d be noted that one can achieve a very high level of accuracy with the MPWLYP/3-21G method. This combination produces an average error of only 5.6 kcal/mol, which is only 2 kcal/mol less accurate than the best result obtained within the enti re study. Within the GGA class of functionals, a wide range of accuracies is obtained. Generally the hybrid-GGA, meta-GGA, a nd hybrid-meta-GGA functionals yield the best results for hydrogen bond interacti on energies. There is a large amount of variation among the GGA functionals, with so me giving very good results and others performing very poorly. The MP2 method pr oduces some of the lowest hydrogen bonding interaction energy errors. For both th e large Pople type basis sets and the Dunning type bases the addition of diffuse f unctionals typically produces lower unsigned errors. The inclusion of diffuse functions on hydrogen atoms in the 6-31++G* basis does not generally increase the performance in terms of hydrogen bonding interaction energies when compared to the 6-31+G* basis. For the large Pople and Dunning type bases that include diffuse functions, there is no clea r tendency for one particular basis set to

PAGE 183

169 consistently produce the lowest errors within the GGA class of functionals, for all of the other functional classes the 6-31+G* and 6-31++G bases generally give the best results. In terms of conformational energies the meta-GGA and hybrid-meta-GGA functionals produce the lowest av erage errors. Not surprisingly, the large Pople type basis sets, 6-31G*, 6-31+G*, and 6-31++G*, yield re sults that are typically about ten percent better than those obtained using the smaller Pople type bases, 3-21G* and 3-21+G*. For the large Pople type basis sets there is a slight improvement in the calculated conformational energies when diffuse functi onals are employed. Overall the basis sets that produce the lowest errors are the D unning type bases, aug-cc-pVDZ and cc-pVTZ. One of the most salient aspects of the data concerning the barrier heights of small molecules with radical transition states (S RBH) is that functionals containing exact exchange terms generally produce the lowest average barrier height errors. The LSDA methods, which depend only on the electron dens ity, produce errors that are significantly higher than those of all other methods cons idered here. In terms of basis sets, the inclusion of diffuse functions typically in creases the accuracy with which the barrier heights of these reactions can be calculated. The lowest barrie r height errors are generally produced with the 6-31+G*, 6-31++G*, and aug-cc-pVTZ bases. As in the case of the SRBH reactions, th e barrier heights of larger systems with singlet transition states (LSBH) are generally better describe d by functionals that contain exact exchange. The addition of diffuse func tions to the 3-21G*, 6-31G*, and cc-pVTZ basis sets generally results in a lower unsigne d average error, in the case of the cc-pVDZ basis set however, the addition of diffuse functions typically increases the errors slightly.

PAGE 184

170 For the LSBH reactions, the 6-31+G*, 6-31++G *, and aug-cc-pVTZ basis sets generally produce the lowest errors for most methods studied in this work. General Summary of the Survey of DFT Methods Here, we will attempt to summarize the resu lts obtained in the entire study and draw some conclusions concerni ng the functionals that seem to offer the best compromise in terms of describing all of the physical proper ties investigated in this work. As we have generated a tremendous amount of data in this study, we will limit our discussion by considering only the results obtained by tw o popular basis sets, 6-31+G* and aug-ccpVDZ. One of the most interesting observations th at can be made from the data presented here is that, for many physical properties, the large Popletype basis sets (6-31G*, 631+G*, and 6-31++G*) produce resu lts that are comparable to, or superior to, those given by the much larger and computationally expe nsive Dunning-type basis sets. For example, for the B1B95 functional, the 6-31+G* basis set outperforms the aug-cc-pVDZ basis set for bond distances, heats of formation, hydroge n bond interaction ener gies, and reactions barrier heights (both SRBH and LSBH); th e average unsigned bond angle error obtained with the smaller basis set is only 0.034 high er than that of the larger basis and the average unsigned ionization potential error for 6-31+G* is only 0.28 kcal/mol larger than that of aug-cc-pVDZ. The average unsigned el ectron affinity, vibrational frequency and conformational energy errors are larger for 6-31+G* than for aug-cc-pVDZ. Table 5-22 indicates the rankings of the t op five functional/basis set combinations overall and the top three functional/basis se t combinations among Pople-type basis sets for each property considered in this work. In the table, it can be seen that for each

PAGE 185

171 Table 5-22. Rankings of functi onal/basis set combinations fo r properties considered in this work. Rank Avg. Unsigned Error Bond Length () 1 VSXC/cc-pVQZ 0.0056 2 VSXC/aug-cc-pVQZ 0.0057 3 VSXC/aug-cc-pVTZ 0.0061 4 VSXC/cc-pVTZ 0.0061 5 TPSS1KCIS/cc-pVTZ 0.0063 1 B1B95/6-31+G* 0.0075 2 B1B95/6-31++G* 0.0075 3 B1B95/6-31G* 0.0078 Bond angle (deg) 1 BLYP/aug-cc-pVTZ 1.07 2 PBE1PBE/aug-cc-pVQZ 1.11 3 B3P86/aug-cc-pVQZ 1.12 4 PBE1PBE/cc-pVQZ 1.12 5 B3P86/aug-cc-pVDZ 1.12 1 PBE1PBE/6-31++G* 1.22 2 PBE1PBE/6-31+G* 1.23 3 TPSSTPSS/6-31++G* 1.23 Frequencies (cm-1) 1 G96LYP/aug-cc-pVTZ 40 2 PW91LYP/cc-pVTZ 40 3 BLYP/aug-cc-pVTZ 40 4 G96LYP/cc-pVTZ 40 5 MPWLYP/cc-pVTZ 40 1 PBEP86/6-31+G* 46 2 PBEP86/6-31++G* 46 3 MPWP86/6-31++G* 46 EA (kcal/mol) 1 MPWB95/6-31++G* 3.08 2 MPWB95/6-31+G* 3.12 3 B98/aug-cc-pVTZ 3.15 4 BB95/6-31++G* 3.35 5 B98/aug-cc-pVDZ 3.42 1 MPWB95/6-31++G* 3.08 2 MPWB95/6-31+G* 3.12 3 BB95/6-31++G* 3.35

PAGE 186

172 IP (kcal/mol) 1 B1B95/aug-cc-pVTZ 4.25 2 MPWB95/aug-cc-pVTZ 4.38 3 MPWB95/cc-pVTZ 4.49 4 MPWB95/6-31++G* 4.50 5 MPWB95/6-31+G* 4.53 1 MPWB95/6-31++G* 4.50 2 MPWB95/6-31+G* 4.53 3 BB95/6-31++G* 4.67 HOF (kcal/mol) 1 B3PW91/aug-cc-pVTZ 3.95 2 MPW1kcis/cc-pVTZ 3.97 3 VSXC/cc-pVTZ 3.99 4 MPW1KCIS/aug-cc-pVTZ 4.10 5 TPSSTPSS/aug-cc-pVTZ 4.73 1 TPSSKCIS/6-31+G* 4.76 2 TPSSTPSS/6-31+G* 4.77 3 B3PW91/6-31G* 4.79 Hydrogen Bond Interaction Energy (kcal/mol) 1 MPWLYP/aug-cc-pVTZ 0.31 2 B1LYP/6-31++G* 0.33 3 MPWLYP/aug-cc-pVDZ 0.33 4 B1LYP/6-31+G* 0.34 5 PBE1KCIS/aug-cc-pVTZ 0.36 1 B1LYP/6-31++G* 0.33 2 B1LYP/6-31+G* 0.34 3 B3LYP/6-31++G* 0.36 Conformational Energy (% error) 1 MPWB95/cc-pVTZ 7.90 2 B1B95/aug-cc-pVDZ 8.10 3 BB1K/aug-cc-pVDZ 8.30 4 PBEP86/cc-pVTZ 8.30 5 BB95/cc-pVTZ 8.60 1 B1B95/6-31++G* 12.20 2 MPWB95/6-31++G* 12.40 3 B1B95/6-31+G* 12.50

PAGE 187

173 SRBH (kcal/mol) 1 BB1K/aug-cc-pVTZ 1.05 2 BB1K/cc-pVTZ 1.31 3 BB1K/aug-cc-pVDZ 1.69 4 BB1K/6-31+G* 1.95 5 BB1K/6-31++G* 2.58 1 BB1K/6-31+G* 1.95 2 BB1K/6-31++G* 2.58 3 BB1K/6-31G* 2.60 LSBH (kcal/mol) 1 B1LYP/aug-cc-pVTZ 2.575 2 B1LYP/cc-pvTZ 2.591 3 B1LYP/6-31++G* 2.631 4 B1LYP/6-31+G* 2.637 5 B3LYP/aug-cc-pVTZ 3.102 1 B1LYP/6-31++G* 2.631 2 B1LYP/6-31+G* 2.637 3 B1LYP/6-31G* 3.123 physical property considered here, with th e exception of conformational energies, the best results obtained with P ople-type basis sets are comparable to the best results produced by the larger Dunning-type bases. One of the main goals of this survey is to get a rough estimate of a functionals performance in terms of its ability to descri be all of the properties considered in this study. In order to accomplish this goal we compare the average functional ranks and standard deviations for each of the func tionals studied in this work. The average functional rank is given as the mean of a functionals rank for all of the properties considered here and the standard deviation was also calculated. Table 6 lists the average functional ranks and standard deviations of the fifteen functionals with the lowest average ranks for the 6-31+G* and aug-cc-pVDZ basis sets. For both basis sets there are five hybrid -meta-GGA and three meta-GGA functionals

PAGE 188

174 represented in the top fifteen. The top fift een of the 6-31+G* basis also included four hybrid-GGA and three GGA functionals, while th e top performers from the aug-cc-pVDZ set included five hybrid-GGA and two GGA functionals. In the aug-cc-pVDZ group, each of the top five functiona ls in terms of average f unctional rank contains exact exchange terms. Whereas the only three of the top five of the 6-31+G* set contain an exact-exchange term. Also, for both basis se ts, the only GGA functional to rank in the top ten is MPWPW91/aug-cc-PVDZ. Table 5-23 lists the fifteen best functionals for the 631+G* basis set along with their unsigned errors for each of the propertie s considered in this work, for purposes of comparison, the lowest and highest unsigned er rors for each property are given, as well as the mean unsigned error averag ed over all of the functiona ls in this study. For the 631+G* basis set, the B1B95 functional obtains the lowest average functional rank with a value of 10.7. However, the standard deviati on of this functional is fairly high with a value of 11.9, since the method performs ve ry well for some properties and relatively poorly for others, as can be seen in Table 5-24. Other functionals that perform notably well are B98, TPSSTPSS, TPSS1KCIS, and PB E1PBE; each of these functionals gives reasonably good results for all of the physical properties here (with the possible exception of vibrational frequencies). Table 5-25 lists the fifteen best functionals for the aug-cc-pVDZ basis set along with their unsigned errors for each of the properties considered in this work in the same manner as was done for the 6-31+G* basis. Fo r this basis set ther e are a number of functionals that perform very well in term s of giving a good description of each of the physical properties in this work. The B98 functional has the lowest average functional

PAGE 189

175 ranking with a value of 10.1 (standard deviat ion = 8.8). When paired with aug-cc-pVDZ, B98 ranks in the top eleven functionals fo r all properties except HOF and vibrational frequency. B98s predicted heat of formation is in error by an average of 18.38 kcal/mol. TPSS1KCIS, which ranks as third best with the DZ basis, pred icts HOF very well, but is less accurate for electron affinity, conforma tional energy, and vibr ational frequency. Other functionals of note are B1B95, PBE1PBE, and B3LYP. Table 5-23. Average functional rankings and standard deviations for the top fifteen functionals along with 6-31+G* and aug-cc-pVDZ basis sets. 6-31+G* Avg. Rank Std. Dev. 1 B1B95 10.711.9 2 B98 11.97.5 3 TPSSKCIS 13.68.4 4 TPSSTPSS 13.78.4 5 PBE1PBE 13.810.9 6 B3LYP 13.99.0 6 MPWB95 13.911.2 8 TPSS1KCIS 14.08.2 9 B3PW91 14.29.2 9 BB1 K 14.212.4 11 MPW1KCIS 14.810.5 12 MPWPW91 15.85.6 13 PBEPW91 16.68.5 14 PBE1KCIS 16.79.2 15 MPWPBE 16.85.3 aug-cc-pVDZ 1 B98 10.18.8 2 B1B95 11.712.2 3 TPSS1KCIS 12.08.1 4 PBE1PBE 12.210.0 5 B3LYP 12.39.2 6 PBE1KCIS 12.810.9 7 TPSSTPSS 13.36.3 8 TPSSKCIS 13.67.6 9 B3PW91 13.89.0 10 MPWPW91 15.35.9 11 MPWPBE 15.76.7 12 MPW1KCIS 15.99.9 13 BB95 16.29.3 13 B1LYP 16.213.1 15 BB1 K 16.713.8

PAGE 190

176 Table 5-24. Performances of the fi fteen highest-ranking functionals pa ired with the 6-31G* basis. HOF IP EA H-bond Freq Length Angle Conf E SRBH LSBH B1B95 9.94 4.81 5.07 0.64 104.1 0.0074 1.23 12.47 2.64 4.16 B98 13.47 5.05 3.83 0.43 88.6 0.0094 1.29 14.59 3.81 4.28 TPSSKCIS 4.76 5.99 4.47 0.43 65.6 0.0135 1.26 14.07 6.41 7.04 TPSSTPSS 4.77 5.52 4.83 0.47 65.8 0.0135 1.23 13.73 7.33 7.25 PBE1PBE 5.94 5.34 5.15 0.77 103.4 0.0079 1.23 15.45 3.92 5.49 B3LYP 14.03 5.29 3.91 0.38 84.4 0.0093 1.29 16.54 4.36 3.32 MPWB95 18.31 4.53 3.12 0.44 48.0 0.0161 1.31 12.66 8.57 8.41 TPSS1KCIS 12.06 5.30 5.07 0.43 79.8 0.0090 1.25 16.23 4.37 5.12 B3PW91 8.32 5.48 4.39 0.56 92.5 0.0081 1.26 16.46 3.73 5.01 BB1K 17.34 5.33 6.40 0.40 138.2 0.0098 1.28 12.79 1.95 3.38 MPW1KCIS 7.28 4.86 3.98 0.40 75.9 0.0101 1.60 16.71 5.71 5.41 MPWPW91 8.57 5.16 3.80 0.46 49.7 0.0157 1.29 15.67 7.78 8.17 PBEPW91 17.18 5.10 3.67 0.89 48.8 0.0163 1.28 14.84 8.62 8.41 PBE1KCIS 15.20 4.95 4.22 0.63 91.0 0.0086 1.57 16.18 5.09 5.07 MPWPBE 8.76 5.16 3.87 0.47 48.9 0.0159 1.29 15.79 7.83 8.32 Lowest Err. TPSSKCIS MPWB95 MPWB95 B1LYP G96P86 B1B95 cSVWN5 B1B95 BB1K B1LYP Value 4.76 4.53 3.12 0.34 49 0.007 1.28 12.00 1.95 2.64 Highest Err. SPL cSVWN5 cSVWN5 SPL BB1K cSVWN5 VSXC VSXC SPL SPL Value 133.7 19.08 14.8 6.21 142 0.025 1.56 44 16.75 12.04 Avg. Err. 15.82 5.78 4.54 0.93 66.01 0.014 1.33 16.24 6.74 6.62 Notes: Errors given in the following units: bond length (), bond angle (degrees), frequency (cm-1), ionization potential (kcal/mol), electron affinity (kcal/mol), heat of formation (kcal/mol), hydrogen-bond interaction energy (kcal/mol), conformational energy (percent error), reaction barrier heig ht (kcal/mol). Average errors include all 37 density functionals c onsidered in this work.

PAGE 191

177 Table 5-25. Performances of the fifteen highe st ranked density func tional methods paired with the aug-cc-pVDZ basis set. HOF IP EA H-bond Freq Length Angle Conf E SRBH LSBH B98 18.38 4.90 3.42 0.40 73.9 0.0114 1.24 9.85 4.98 4.68 B1B95 14.13 4.53 4.54 1.13 88.3 0.0093 1.20 8.06 3.72 4.67 TPSS1KCIS 8.35 5.30 4.64 0.55 68.5 0.0109 1.21 11.70 5.48 5.58 PBE1PBE 8.82 5.27 4.66 0.56 86.5 0.0104 1.17 10.71 5.17 5.92 B3LYP 18.66 5.28 3.78 0.63 70.5 0.0111 1.22 12.34 5.37 4.12 PBE1KCIS 11.63 4.84 3.79 0.37 82.2 0.0109 1.54 11.84 6.37 4.71 TPSSTPSS 8.72 5.53 4.49 0.57 50.9 0.0159 1.23 8.99 8.61 8.02 TPSSKCIS 8.31 5.96 4.13 0.55 52.1 0.0157 1.25 9.58 7.51 7.64 B3PW91 12.02 5.42 4.08 1.09 77.6 0.0104 1.15 11.80 4.96 5.52 MPWPW91 8.58 5.36 4.03 0.69 51.1 0.0177 1.27 11.17 8.85 8.77 MPWPBE 8.91 5.27 3.89 0.72 50.9 0.0180 1.27 11.29 8.92 8.85 MPW1KCIS 12.57 4.78 3.58 0.76 70.9 0.0121 1.58 12.64 7.00 5.92 BB95 9.94 5.00 3.85 1.58 52.6 0.0179 1.31 8.98 8.62 8.50 B1LYP 34.07 5.93 4.63 0.68 77.6 0.0092 1.20 12.61 4.15 3.19 BB1K 21.29 6.50 5.94 0.93 119.3 0.0107 1.23 8.31 1.69 4.54 Lowest Err. PW91LYP B1B95 B98 PBEKCIS G96LYP B1LYP cSVWN5 B1B95 BB1K B1LYP Value 8.24 4.53 3.43 0.32 48 0.009 1.25 8.1 1.69 3.19 Highest Err. SPL cSVWN5 B3P86 cSVWN5 BB1K BLYP BLYP VSXC SPL SPL Value 128.62 19.06 13.99 10.26 119 0.029 1.6 51.5 18.2 12.54 Avg. Error 16.23 5.77 4.77 1.00 62.2 0.016 1.30 12.33 7.84 7.15 Notes: Errors given in the following units: bond length (), bond angle (degrees), frequency (cm-1), ionization potential (kcal/mol), electron affinity (kcal/mol), heat of formation (kcal/mol), hydrogen-bond interaction energy (kcal/mol), conformational energy (percent error), reaction barrier heig ht (kcal/mol). Average errors include all 37 density functionals considered in this work.

PAGE 192

178 CHAPTER 6 CONCLUSIONS This chapter provides a rough overview of th e work presented in this dissertation. Since the three studies presented in this work are for the most part unrelated, the individual chapters contain more in-dep th discussion. These research projects demonstrate the usefulness of computational methods to address important biological and chemical issues in the human body. The st udies performed on HAH1 and AAP are only possible after much information has been obtained about these systems through physical experimentation. With this in mind, it is a combination of computational and analytical methods that provide a complete descripti on of the geometric and energetic phenomena that occur in nature. Chapter two outlined the techniques used in the study of metalloenzymes. These methods included density functional and ab initio quantum mechanics calculations on model systems, molecular mechanics, and mo lecular dynamics simulations of large-scale solvated protein systems. The results of th ese studies are shown in chapters 3 and 4. HAH1 is a key protein for Cu homeosta sis in the human body. Improper function of HAH1 or of its target pr otein the Wilsons disease ATPase results in accumulation of too much Cu or insufficient amount of Cu in the cell, which both cause death. One of the key unanswered questions involving Cu(I) homeostasis is the mechanism by which the metal ion is transferred from the donor pr otein to the target active site. The study described here provides evidence that Cys 17 of the MNK4 target domain is energetically favored to be the first target residue to bind Cu(I).

PAGE 193

179 The AAP system has also been associated with several serious illnesses. One subject of investigation in this work was the di-Zinc bridging species in the AAP active site. The electronic structure of the ac tive site was fully characterized by QM calculations, and the contribution of the 1st shell residues to Zn ion coordination was determined. The results from this work sugge st that the bridging species is most like a OHgroup which then becomes a termin al group upon substrate binding. The fifth chapter details the large-scal e survey of the accuracy of well-known density functional methods for nine molecula r and intermolecular properties. Comprising more than 150,000 individual calculations, this study is believed to be the most comprehensive of its kind ever perf ormed. The newer hybrid-meta-GGA methods showed the most promise, with the MPW1KC IS density functional receiving the best overall score for accuracy over all of the properties that were surveyed. Although many of the methods employed in these studies are approximations, they have all been well validated and can be us ed to obtain highly reliable and accurate information on the conformations and energetic s of chemical systems. The simulation of metalloenzymes is an example of the kind of computational work that has become possible over the last decade thanks to advances in an alytical and computational instrumentation, and soon these simulations may be fully quantum mechanical in nature. The continued development of parameters fo r metal ions and metal-binding active sites within proteins will allow for further identifi cation of the mechanisms of metal-catalyzed processes in the body.

PAGE 194

180 APPENDIX Gaussian Keywords opt Perform a geometry optimization freq=noraman Perform a single-point frequency calculation for vibrational frequencies and ground state thermodynamical properties. Noraman is selected to skip the calculation of Raman spectral data. IOp(7/33=1) Has bonding force constant s included in the Gaussian output file pop(mk,readradii) Perform a population an alysis calculation for atomic charges using the Merz-Kollman-Singh me thod. Readradii allows the user to define ionic radii that are not listed in the Gaussian library of ionic radii. This was used to define the ionic radius of Cu(I). IOp(6/33=2) Used in Gaussian 98 to format the output from charge calculations into AMBER-readable format Gen 6D 7F Specifies the use of multiple basis sets for different atoms within a system. 6D 7F specifi es the use of 6 Cartesian dorbitals and 7 Cartesia n f-orbitals instead. scrf(pcm,solvent=water) For self-consist ent reaction field calculations using the polarizable continuum model for implicit solvation using water as the solvent, int(grid=ultrafine) Requests a pruned (99,590) grid that is finer than the default (75,302) grid scf(xqc,maxcycle=N) xqc requests the quadr atically convergent SCF procedure for energy convergence with an extra qc step in case the firstorder scf did not converge. Th is is computationally more expensive than the default method. maxcycle sets the maximum number of iterations allowed before the program exits due to convergence failure ts Requests geometry optimizati on to a transition state rather than to a local minimum state. calcall Requests the calculation of force constants at every point along the optimization pathway. This also increases the computational expense of the experiment. noeigentest For use in TS searches. This skips the portion of the calculation where Gaussian checks to ensure the existence of only one negative eigenvalue for the structure being optimized to a TS.

PAGE 195

181 LeAP Commands in AMBER loadpdb Load PDB structure into Le AP and protonate the structure loadamberparams Load AMBER parameter files for the system edit Edit the unit that has been created in LeAP. Charges, atom types, atom names, and perturbation flags can be edited in this manner. check Check the unit for errors saveoff Save all atom names, atom types, charges, and perturbation information into a library for a unit. loadoff Load information about a unit that was previously created and stored. solvateBox Add a solvent box around a unit saveamberparm Save the any non-default parameters that have been imposed on the system by the previously loaded AMBER parameter files. saveamberparmpert Save perturbation informa tion and the parameters associated with the perturbation.

PAGE 196

182 LIST OF REFERENCES (1) Wernimont, A. K.; Huffman, D. L.; Lamb, A. L.; O'Halloran, T. V.; Rosenzweig, A. C. Nat Struct Biol 2000 7 766-771. (2) Anastassopoulou, A.; Banci, L.; Bertini, I.; Cantini, F.; Katsari, E.; Rosato, A. Biochem 2004 43 13046-13053. (3) Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. J Am Chem Soc 1985 107, 3902-3909. (4) Tao, J.; Perdew, J. P.; Staroverov, V. N.; Scuseria, G. E. Phys. Rev. Lett. 2003 91 146401. (5) Zhao, Y.; Lynch, B. J.; Truhlar, D. G. Phys. Chem. Chem. Phys. 2005 7 43. (6) Case, D. A.; Darden, T. A.; Cheatham, T. E. I.; Simmerling, C. L.; Wang, J.; Duke, R. E.; Luo, R.; Merz, K. M. J.; P earlman, D. A.; Crowley, M.; Walker, R. C.; Zhang, W.; Wang, B.; Hayik, S.; Roitberg, A. E.; Seabra, G.; Wong, K. F.; Paesani, F.; Wu, X.; Brozell, S.; Tsui, V.; Gohlke, H.; Yang, L.; Tan, C.; Mongan, J.; Hornak, V.; Cui, G.; Beroza, P.; Mathews, D. H.; Schafmeister, C.; Ross, W. S.; Kollman, P. A.; 9 ed.; University of California, Sa n Francisco: San Francisco, 2006. (7) Frisch, M. J.; Trucks, G. W.; Schl egel, H. B.; Scuseria, G. E.; Robb, M. A.; Chesseman, J. R.; Zakrzewski, V. G.; Montgomery Jr., J. A.; Stratmann, R. E.; Burant, J. C.; Dapprich, S.; Millam, J. M.; Da niels, A. D.; Kudin, K. N.; Strain, M. C.; Farkas, O.; Tomasi, J.; Barone, V.; Cossi, M.; Cammi, R.; Mennucci, B.; Pomelli, C.; Adamo, C.; Clifford, S.; Ochterski, J.; Pe tersson, G. A.; Ayala, P. Y.; Cui, Q.; Morokuma, K.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Forseman, J. B.; Cioslowski, J.; Ortiz, J. V.; Baboul, A. G. ; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Gomperts, R.; Mart in, R. L.; Fox, D. J.; Keith, T.; AlLoham, M. A.; Peng, C. Y.; Nanayakkara, A.; Gonzalez, C.; Challacombe, M.; Gill, P. M. W.; Johnson, B. G.; Chen, W.; Wong, M. W.; Andr es, J. L.; Head-Gordon, M.; Replogle, E. S.; Pople, J. A.; C.01 ed.; Gau ssian Inc.: Pittsburgh, PA, 2003. (8) Ni, B.; Kramer, J. R.; Werstiuk, N. H. J. Phys. Chem. A 2003 107 89498954. (9) Olsson, M. H. M.; Ryde, U. J Am Chem Soc 2001 123 7866-7876. (10) Ryde, U.; Olsson, M. H. M.; Roos, B. O.; Borin, A. C. Theo. Chem. Acct. 2001 105 452-462. (11) Siegbahn, P. E. M.; Blomberg, M. R. A. Ann. Rev. Phys. Chem 1999 50 221-249.

PAGE 197

183 (12) Rulisek, L.; Solomon, E. I.; Ryde, U. Inorg Chem 2005 44 5612-5628. (13) Schfer, A.; Horn, H.; Ahlrichs, R. J Chem Phys 1992 97 (14) Mitlin, A. V.; Baker, J.; Pulay, P. J Chem Phys 2003 118 7775-7782. (15) Rassolov, V. A.; Pople, J. A.; Ratner, J. A.; Windus, T. L. J Chem Phys 1998 109 1223-1229. (16) Molecular Simulations, I.; 3.7 ed., 2000. (17) Besler, B. H.; Merz, K. M. J.; Kollman, P. A. J Comp Chem 1990 11 431. (18) Singh, U. C.; Kollman, P. A. J Comp Chem 1984 5 129. (19) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J Am Chem Soc 1995 117 5179-5197. (20) Case, D. A.; Pearlman, D. A.; Caldwell, J. W.; Cheatham, T. E. I.; Wang, J.; Ross, W. S.; Simmerling, C. L.; Darden, T. A.; Merz, K. M.; Stanton, R. V.; Cheng, A.; Vincent, J. J.; Crowley, M.; Tsui, V.; Gohlke, H.; Radmer, R.; Duan, Y.; Pitera, J.; Massova, I.; Kollman, P. A. AMBER 7 Users' Manual ; University of California: San Francisco, 2002. (21) Wollacott, A. M. In Chemistry ; Pennsylvania State University: State College, PA, 2005. (22) Rossi, K. In Chemistry ; Pennsylvania State Univer sity: State College, PA, 1994. (23) Cramer, C. J. Essentials of Computational Chemistry: Theories and Models ; 1st ed.; John Wiley & Sons, Lt d.: Chichester, England, 2002. (24) Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A. J Comp Chem 1992 13 1011. (25) Roux, B. Comp Phys Conn 1995 91 275-282. (26) Chodera, J. D.; Swope, W. C. ; Pitera, J.; Seok, C.; Dill, K. JCTC 2005 Submitted (27) Harrison, M. D.; Jones, C. E.; Solioz, M.; Dameron, C. T. TIBS 2000 25 29-32.

PAGE 198

184 (28) Rae, T. D.; Schmidt, P. J.; Pufahl, R. A.; Culotta, V. C.; O'Halloran, T. V. Science 1999 284 805-808. (29) O'Halloran, T. V.; Culotta, V. C. J Biol Chem 2000 275 25057-25060. (30) Gaggelli, E.; Kozlowski, H.; Valensin, D.; Valensin, G. Chem. Rev. 2006 106 1995-2044. (31) Huffman, D. L.; O'Halloran, T. V. Ann. Rev. Biochem. 2001 70 677-701. (32) Lieberman, R. L.; Rosenzweig, A. C. In Comprehensive Coordination Chemistry II: from biology to nanotechnology ; 1st ed., 2004, pp 195-211. (33) Puig, S.; Thiele, D. J. Curr Opin Chem Biol 2002 6 171-180. (34) Hamza, I.; Schaefer, M.; Klomp, L. W. J.; Gitlin, J. D. Proc. Natl. Acad. Sci. 1999 96 13363-13368. (35) Mercer, J. F. B.; Barnes, N.; Stevenson, J.; Strausak, D.; Llanos, R. M. BioMetals 2003 16 175-184. (36) Valentine, J. S.; Gralla, E. B. Science 1997 278 817-818. (37) Rosenzweig, A. C. Acc. Chem. Res. 2001 34 119-128. (38) Gresh, N.; Policar, C.; Giessner-Prettre, C. J. Phys. Chem. A 2002 106 5660-5670. (39) Lamb, A. L.; Torres, A. S.; O'Halloran, T. V.; Rosenzweig, A. C. Biochem 2000 39 14720-14727. (40) Portnoy, M. E.; Rosenzweig, A. C.; Rae, T.; Huffman, D. L.; O'Halloran, T. V.; Culotta, V. C. J. Biol. Chem. 1999 274 15041-15045. (41) Gitschier, J.; Moffat, B.; Reilly, D.; Wood, W. I.; Fairbrother, W. J. Nat Struct Biol 1998 5 47-54. (42) Lamb, A. L.; Torres, A. S.; O'Halloran, T. V.; Rosenzweig, A. C. Nat Struct Biol 2001 8 751-755. (43) Zerner, M. C.; Bacon, A. D. Theo. Chem. Acct. 1979 53 21-54. (44) Bertini, I.; Gray, H.; Lippard, S.; Valentine, J. S. Bioinorganic Chemistry ; University Science Book s: Sausalito, CA, 1994. (45) O'Halloran, T. V. Science 1993 261 715-725.

PAGE 199

185 (46) Radisky, D.; Kaplan, J. J. Biol. Chem. 1999 274 4481-4484. (47) Rosenzweig, A. C.; Huffman, D. L.; Hou, M. Y.; Wernimont, A. K.; Pufahl, R. A.; O'Halloran, T. V. Structure 1999 7 605-617. (48) Larin, D.; Mekios, C.; Das, K.; Ross, B.; Yang, A.; Gilliam, T. C. J Biol Chem 1999 274 28497-28504. (49) Lamb, A. L.; Pufahl, R. A.; O'Halloran, T. V.; Culotta, V. C.; Rosenzweig, A. C. Nat Struct Biol 1999 6 724-729. (50) Lamb, A. L.; Wernimont, A. K.; Pufahl, R. A.; O'Halloran, T. V.; Rosenzweig, A. C. Biochem 2000 39 1589-1595. (51) Ochterski, J. W.; Gaussian, Inc., 2000. (52) Pufahl, R. A.; Singer, C. P.; Pearis o, K. L.; Lin, S.-J.; Schmidt, P. J.; Fahrni, C. J.; Culotta, V. C.; Penner-Hahn, J. E.; O'Halloran, T. V. Science 1997 278 853-856. (53) M. J. Frisch, G. W. T., H. B. Schl egel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, V. G. Zakrzewski, J. A. Montgomer y Jr., R. E. Stratmann, J. C. Burant, S. Dapprich, J. M. Millam, A. D. Daniels, K. N. Kudin, M. C. Strain, Farkas, J. Tomasi, V. Barone, M. Cossi, R. Cammi, B. Mennucci C. Pomelli, C. Adamo, S. Clifford, J. Ochterski, G. A. Petersson, P. Y. Ayala, Q. Cui, K. Morokuma, P. Salvador, J. J. Dannenberg, D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J. Cioslowski, J. V. Ortiz, A. G. Baboul, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komromi, R. Gomperts, R. L. Martin, D. J. Fox, T. Keith, M. A. Al-Laham, C. Y. Peng, A. Nanayakkara, M. Challacombe, P. M. W. Gill, B. Johnson, W. Chen, M. W. Wong, J. L. Andres, C. Gonzalez, M. Head-G ordon, E. S. Replogle, and J. A. Pople; A.11 ed.; Gaussian, Inc.: Pittsburgh, PA, 1998. (54) Arnesano, F.; Banci, L.; Bertini, I.; Bonvin, A. M. Structure 2004 12 669-676. (55) Comba, P.; Remenyi, R. J Comp Chem 2002 23 697-705. (56) Christianson, D. W.; Lipscomb, W. N. Journal of the American Chemical Society 1980 108 4998-5003. (57) Holz, R. C. Coordination Chemistry Reviews 2002 232 5-26. (58) Christianson, D. W.; Lipscomb, W. N. Accounts of Chemical Research 1989 22, 62-69.

PAGE 200

186 (59) Desmarais, W. T.; Bienvenue, D. L.; Bzymek, K. P.; Holz, R. C.; Petsko, G. A.; Ringe, D. Structure 2002 10 1063-1072. (60) Matthews, B. W. Accounts of Chemical Research 1988 21 (61) Lowther, W. T.; Matthews, B. W. Chemical Reviews 2002 102 45814607. (62) Weston, J. Chemical Reviews 2005 ASAP Article (63) Prescott, J. M.; Wilkes, S. H. Methods of Enzymology 1976 45 530-543. (64) Prescott, J. M.; Wagner, F. W.; Holmquist, B.; Vallee, B. L. Biochemistry 1985 24 5350-5356. (65) Bennett, B.; Antholine, W. E.; D' souza, V. M.; Chen, G. J.; Ustinyuk, L.; Holz, R. C. Journal of the American Chemical Society 2002 124 13025-13034. (66) Stamper, C.; Bennett, B.; Edwards, T.; Holz, R. C.; Ringe, D.; Petsko, G. Biochemistry 2001 40 7035-7046. (67) Chevrier, B.; Schalk, C.; D'Orchymont, H.; Rondeau, J.-M.; Moras, D.; Tarnus, C. Structure 1994 2 283-291. (68) Bertini, I.; Luchinat, C.; Rosi, M.; Sgamellotti, A.; Tarantelli, F. Inorg Chem 1990 29 1460-1463. (69) Chevrier, B.; DOrchymont, H.; Schalk, C.; Tarnus, C.; Moras, D. Eur J Biochem 1996 237 393-398. (70) De Paola, C. C.; Bennett, B.; Holz, R. C.; Ringe, D.; Petsko, G. A. Biochemistry 1999 38 9048-9053. (71) Schrer, G.; Lanig, H.; Clark, T. Biochemistry 2004 43 5414-5427. (72) Baker, J. O.; Prescott, J. M. Biochemistry 1983 22 5322-5331. (73) Bienvenue, D. L.; Bennett, B.; Holz, R. C. J Inorg Biochem 2000 78 4354. (74) Chan, W. W.; Dennis, P.; Demmer, W.; Brand, K. Journal of Biological Chemistry 1982 257 7955-79757. (75) Wilkes, S. H.; Prescott, J. M. Journal of Biological Chemistry 1983 258 13517-13521.

PAGE 201

187 (76) Wilkes, S. H.; Prescott, J. M. Journal of Biological Chemistry 1987 262 8621-8625. (77) Chen, G.; Edwards, T.; D'souza, V. M.; Holz, R. C. Biochemistry 1997 36 4278-4286. (78) Martell, A. E.; Hancock, R. D.; Plenum Press: New York, 1996, p 199. (79) Christianson, D. W.; Cox, J. D. Annu Rev Biochem 1999 68 33-57. (80) Bennett, B.; Holz, R. C. Journal of the American Chemical Society 1997 119 1923-1933. (81) Bennett, B.; Holz, R. C. Biochemistry 1997 36 9837-9846. (82) Bennett, B.; Holz, R. C. Journal of the American Chemical Society 1998 120 12139-12140. (83) Christianson, D. W.; Alexander, R. S. Journal of the American Chemical Society 1989 111 6412-6419. (84) Stamper, C. C.; Bienvenue, D. L.; Bennett, B.; Ringe, D.; Petsko, G. A.; Holz, R. C. Biochemistry 2004 43 9620-9628. (85) Frisch, M. J.; Trucks, G. W.; Schl egel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; J. A. Montgomery, J.; Vreven, T.; Kudin, K. N.; Burant, J. C.; Millam, J. M.; Iyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N.; Petersson, G. A.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; H onda, Y.; Kitao, O.; Nakai, H.; Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.; Morokuma, K.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Zakrzewski, V. G.; Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas, O.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Ortiz, J. V.; Cui, Q.; Baboul, A. G.; Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith, T.; Al-Laham, M. A.; Peng, C. Y.; Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W.; Gonzalez, C.; Pople, J. A.; A.1 ed.; Gaussian, Inc: Pittsburgh, PA, 2003. (86) Szabo, A.; Ostlund, N. S. Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory ; 1st, revised ed.; Dover Publications, Inc.: Mineola, NY, 1989. (87) Levine, I. N. Quantum Chemistry ; 5th ed.; Prentice Hall: Upper Saddle River, NJ, 2000.

PAGE 202

188 (88) Parr, R. G.; Yang, W. Density-Functional Theory of Atoms and Molecules ; Oxford University Press: New York, 1989; Vol. 16. (89) Hohenberg, P.; Kohn, W. Phys. Rev. 1964 136 B864-B871. (90) Kohn, W. In Highlights of Condensed-Matter Theory ; Bassani, F., Fumi, F., Tosi, M. P., Eds.; North-Holland: Amsterdam, 1985. (91) Scuseria, G. E.; Staroverov, V. N. In Theory and Applications of Computational Chemistry: The First 40 Years (A Volume of Technical and Historical Perspectives) ; C. E. Dykstra, G. F., K. S. Kim, G. E. Scuseria, Ed.; Elsevier: Amsterdam, 2005, pp 669-724. (92) Kohn, W.; Sham, L. J. Phys. Rev. 1965 140 A1133-A1138. (93) Roothan Rev. Mod. Phys. 1951 23 69. (94) Mller, C.; Plesset, M. S. Phys. Rev. 1934 46 618. (95) Adamo, C.; Barone, V. J. Chem. Phys. Lett. 1997 274 242-250. (96) Becke, A. D. Phys. Rev. A 1988 38 3098-3100. (97) Lee, C.; Yang, W.; Parr, R. G. Phys. Rev. B 1988 37 785-789. (98) Stephens, P. J.; Devlin, F. J. ; Chabalowski, C. F.; Frisch, M. J. J. Phys. Chem. 1994 98 11623-11627. (99) Hertwig, R. H.; Koch, W. Chem. Phys. Lett. 1997 268 345-351. (100) Perdew, J. P.; Burke, K.; Ernzerhof, M. Phys. Rev. Lett. 1996 77 38653868. (101) Slater, J. C. Quantum Theory of Molecular and Solids. Vol. 4: The SelfConsistent Field for Molecular and Solids ; McGraw-Hill: New York, 1974. (102) Vosko, S. H.; Wilk, L.; Nusair, M. Canadian Journal of Physics 1980 58 1200-1211. (103) Perdew, J. P. Phys. Rev. B 1986 33 8822-8824. (104) Perdew, J. P.; Wang, Y. Phys. Rev. B 1992 45 13244-13249. (105) Perdew, J. P.; Chevary, J. a.; Vosko, S. H.; Jackson, K. A.; Pederson, M. R.; Singh, D. J.; Fiolhais, C. Physical Review B 1992 46 6671-6687.

PAGE 203

189 (106) Becke, A. D. J. Chem. Phys. 1993 98 5648-5652. (107) Riley, K. E.; Brothers, E. N.; Ayers, K. B.; Merz, K. M. JCTC 2005 1 546-553. (108) Schmider, H. L.; Becke, A. D. J. Chem. Phys. 1998 108 9624-9631. (109) Voorhis, T. V.; Scuseria, G. E. J. Chem. Phys. 1998 109 400-410. (110) Becke, A. D. J. Chem. Phys. 1996 104 1040-1046. (111) Adamo, C.; Barone, V. J. J. Chem. Phys. 1998 108 664-675. (112) Staroverov, V. N.; Scuseria, G. E.; Tao, J.; Perdew, J. P. J. Chem. Phys. 2003 119 12129-12137. (113) Rey, J.; Savin, A. Int. J. Quantum Chem. 1998 69 581-590. (114) Krieger, J. B.; Chen, J.; Iafrate, G. J.; Savin, A. In Electron Correlations and Materials Properties ; Gonis, A., Kioussis, N., Eds.; Plenum: New York, 1999, p 463. (115) Toulouse, J.; Savin, A.; Adamo, C. J. Chem. Phys. 2002 117 1046510473. (116) Zhao, Y.; Lynch, B. J.; Truhlar, D. G. J. Phys. Chem. A 2004 108 27152719. (117) Zhao, Y.; Lynch, B. J.; Truhlar, D. G. Phys. Chem. Chem. Phys. 2005 7 43-52. (118) Zhao, Y.; Truhlar, D. G. JCTC 2005 1 415. (119) Gill, P. M. W. Mol. Phys. 1996 89 433-445. (120) Zhao, Y.; Gonzlez-Gar cia, N.; Truhlar, D. G. J. Phys. Chem. A 2005 109 2012-2018. (121) Hamprecht, F. A.; Cohen, A. J.; Tozer, D. J.; Handy, N. C. J. Chem. Phys. 1998 109 6264-6271. (122) Binkley, J. S.; Pople, J. A.; Hehre, W. J. J. Am. Chem. Soc. 1980 102 939-947. (123) Dunning, T. H. J. Chem. Phys. 1989 90 1007-1023.

PAGE 204

190 (124) Gordon, M. S.; Binkley, J. S.; Popl e, J. A.; Pietro, W. J.; Hehre, W. J. J. Am. Chem. Soc. 1982 104 2797-2803. (125) Hehre, W. J.; Ditchfie, R.; Pople, J. A. J. Chem. Phys. 1972 56 22572261. (126) Pietro, W. J.; Francl, M. M.; Hehr e, W. J.; Defrees, D. J.; Pople, J. A.; Binkley, J. S. J. Am. Chem. Soc. 1982 104 5039-5048. (127) Hinchliffe, A. Modeling Molecular Structures ; 2nd ed.; Wiley & Sons, Ltd.: Chichester, 2000. (128) Foresman, J. B.; Frisch, A. Exploring Chemistry with Electronic Structure Methods ; 2nd ed.; Gaussian, Inc.: Pittsburgh, PA, 1996. (129) Frisch, A.; Firsch, M. J.; Trucks, G. W. Gaussian 03 User's Reference ; Gaussian, Inc.: Pittsburgh, PA, 2003. (130) Bauschlicher, C. W. Chem. Phys. Lett. 1995 246 40-44. (131) Johnson, B. G.; Gill, P. M. W.; Pople, J. A. J. Chem. Phys. 1992 97 7846-7848. (132) Johnson, B. G.; Gill, P. M. W.; Pople, J. A. J. Chem. Phys. 1993 98 5612-5626. (133) Neugebauer, A.; Hfelinger, G. Journal of Molecular Structure-Theochem 2002 578 229. (134) Neugebauer, A.; Hfelinger, G. Journal of Molecular Structure-Theochem 2002 585 35-47. (135) Raymond, K. S.; Wheeler, R. a. J. Comp. Chem. 1999 20 207-216. (136) Scheiner, A. C.; Baker, J.; Andzelm, J. W. J. Comp. Chem. 1997 18 775795. (137) Wang, N. X.; Wilson, A. K. J. Chem. Phys. 2004 121 7632-7646. (138) Wang, N. X.; Wilson, A. K. Molecular Physics 2005 103 345-358. (139) Schaftenaar, G.; Noordik, J. H. J. Comput.-Aided Mol. Design 2000 14 123-124. (140) Curtiss, L. A.; Raghavachari, K.; Redfern, P. C.; Pople, J. A. J. Chem. Phys. 1997 106 1063-1079.

PAGE 205

191 (141) Curtiss, L. A.; Redfern, P. C.; Raghavachari, K.; Pople, J. A. J. Chem. Phys. 1998 109 42-55. (142) Brothers, E. N.; Merz, K. M. J. Phys. Chem. A 2004 108 2904-2911. (143) Curtiss, L. A.; Raghavachari, K.; Redfern, P. C.; Pople, J. A. J. Chem. Phys. 2000 112 7374-7383. (144) Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. J. Chem. Phys. 2005 123 124107/124101-124107/124112. (145) Curtiss, L. A.; Redfern, P. C.; Rossolov, V.; Kedziora, Z.; Pople, J. A. J. Chem. Phys. 2001 114 9287-9295. (146) Ernzerhof, M.; Scuseria, G. E. J. Chem. Phys. 1999 110 5029-5036. (147) Lynch, B. J.; Truhlar, D. G. J. Phys. Chem. A 2003 107 8996-8999. (148) Mole, S. J.; Zhou, X.; Liu, R. J. Phys. Chem. 1996 100 14665-14671. (149) Rabuck, A. D.; Scuseria, G. E. Chem. Phys. Lett. 1999 309 450-456. (150) Brothers, E. N.; Scuseria, G. E. J. Chem. Theo. and Computation 2006 2 1045-1049. (151) Riley, K. E.; Op't Holt, B. T.; Merz, K. M. J. Chem. Theo. and Computation 2006 In Press (152) Tsuzuki, S.; Lthi, H. P. J. Chem. Phys. 2001 114 3949-3957. (153) Repasky, M. P.; Chandrasekhar, J.; Jorgensen, W. L. J. Comp. Chem. 2002 23 1601-1622. (154) Helkier, A.; Klopper, W.; Helgak er, T.; Jorgensen, P.; Taylor, P. R. J. Chem. Phys. 1999 111 9157. (155) Hobza, P.; Sponer, J.; Reschel, T. J. Comp. Chem. 1995 16 1315-1325. (156) Ireta, J.; Neugebauer, J.; Scheffler, M. J. Phys. Chem. A 2004 108 56925698. (157) Paizs, B.; Suhai, S. J. Comp. Chem. 1998 19 575-584. (158) Rabuck, A. D.; Scuseria, G. E. Ther. Chem. Acc. 2000 104 439-444.

PAGE 206

192 (159) Rappe, A. K.; Bernstein, E. R. J. Phys. Chem. A 2000 104 6117-6128. (160) Tuma, C.; Bosese, A. D.; Handy, N. C. Phys. Chem. Chem. Phys. 1999 1 3939-3947. (161) Xu, X.; Goddard, W. A. J. Chem. Phys. A 2004 108 2305-2313. (162) Lynch, B. J.; Zhao, Y.; Truhlar, D. G. J. Phys. Chem. A 2003 107 13841388. (163) Stamant, A.; Cornell, W. D.; Kollman, P. A.; Holgren, T. A. J. Comput. Chem. 1995 16 1483. (164) Dickson, R. M.; Becke, A. D. J. Chem. Phys. 2005 123 Art. No. 111101/111101-111101/111103. (165) Dybala-Defratyka, A.; Paneth, P.; Pu, J. Z.; Truhlar, D. G. J. Phys. Chem. A 2004 108 2475-2486. (166) Lynch, B. J.; Fast, P. L.; Harris, M.; Truhlar, D. G. J. Phys. Chem. A 2000 104 4811. (167) Lynch, B. J.; Truhlar, D. G. J. Phys. Chem. A 2002 106 842-846. (168) Boys, S. F.; Bernardi, F. Mol. Phys. 1970 19 553-566. (169) Freile, M. L.; Risso, S.; Curaqueo, A.; Zamora, M. A.; Enriz, R. D. J. Molec. Struct.-Theochem. 2005 2005 107. (170) Halgren, T. J. Comp. Chem. 1999 20 (171) Zhong, H. Z.; Stewart, E. L.; Kontoyianni, M.; Bowen, J. P. J. Chem. Theo. and Computation 2005 1 230.

PAGE 207

193 BIOGRAPHICAL SKETCH I was born in Chicago in 1979 when my pa rents worked at the hospital and while my dad was a graduate student at Illinois. I a ttended three high schools, the last of which, the Alabama School of Mathematics and Sc ience, provided a strong foundation in chemistry and math that carried me into colleg e. I spent four years at the University of South Alabama obtaining a BS in chemistry and working in the laboratory of Doctor Andrej Wierzbicki. He was the first to suggest that I continue my education in chemistry by attending graduate school. I enrolled at Pe nnsylvania State University in 2001, where I joined Doctor Kennie Merz and his group with the intention of using quantum chemistry to address complex problems that could not be solved in a wet chemistry lab. After graduating from the University of Florida, I joined Doctor Ed Solomon at Stanford University as a postdoctoral researcher in his group. I married Jenny in 2004, and she has been wonderfully supportiv e of me. I would have to say that the toughest part of graduate school for me was getting out of bed every morning and trying to make a little bit of progress each day. Graduate school guarantees no instant gratification, but offers a li fetime of rewards upon successful completion.


Permanent Link: http://ufdc.ufl.edu/UFE0016360/00001

Material Information

Title: Computational Studies of the Structure and Function of Metalloenzymes and the Performance of Density Functional Methods
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0016360:00001

Permanent Link: http://ufdc.ufl.edu/UFE0016360/00001

Material Information

Title: Computational Studies of the Structure and Function of Metalloenzymes and the Performance of Density Functional Methods
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0016360:00001


This item has the following downloads:


Table of Contents
    Title Page
        Page i
    Dedication
        Page ii
    Acknowledgement
        Page iii
    Table of Contents
        Page iv
        Page v
    List of Tables
        Page vi
        Page vii
        Page viii
    List of Figures
        Page ix
        Page x
        Page xi
        Page xii
    Abstract
        Page xiii
        Page xiv
    Introduction
        Page 1
        Page 2
        Page 3
        Page 4
        Page 5
    QM and MM methods of the study of metalloproteins
        Page 6
        Page 7
        Page 8
        Page 9
        Page 10
        Page 11
        Page 12
        Page 13
        Page 14
        Page 15
        Page 16
        Page 17
        Page 18
        Page 19
        Page 20
        Page 21
        Page 22
        Page 23
        Page 24
        Page 25
    Computational studies of the CU(I) metallochaperone HAH1
        Page 26
        Page 27
        Page 28
        Page 29
        Page 30
        Page 31
        Page 32
        Page 33
        Page 34
        Page 35
        Page 36
        Page 37
        Page 38
        Page 39
        Page 40
        Page 41
        Page 42
        Page 43
        Page 44
        Page 45
        Page 46
        Page 47
        Page 48
        Page 49
        Page 50
        Page 51
        Page 52
        Page 53
        Page 54
        Page 55
        Page 56
        Page 57
        Page 58
        Page 59
        Page 60
        Page 61
        Page 62
        Page 63
        Page 64
        Page 65
    Electronic structure of the active site of aminopeptidase from Aeromonas proteolytica
        Page 66
        Page 67
        Page 68
        Page 69
        Page 70
        Page 71
        Page 72
        Page 73
        Page 74
        Page 75
        Page 76
        Page 77
        Page 78
        Page 79
        Page 80
        Page 81
        Page 82
    Survey of density functional theory methods
        Page 83
        Page 84
        Page 85
        Page 86
        Page 87
        Page 88
        Page 89
        Page 90
        Page 91
        Page 92
        Page 93
        Page 94
        Page 95
        Page 96
        Page 97
        Page 98
        Page 99
        Page 100
        Page 101
        Page 102
        Page 103
        Page 104
        Page 105
        Page 106
        Page 107
        Page 108
        Page 109
        Page 110
        Page 111
        Page 112
        Page 113
        Page 114
        Page 115
        Page 116
        Page 117
        Page 118
        Page 119
        Page 120
        Page 121
        Page 122
        Page 123
        Page 124
        Page 125
        Page 126
        Page 127
        Page 128
        Page 129
        Page 130
        Page 131
        Page 132
        Page 133
        Page 134
        Page 135
        Page 136
        Page 137
        Page 138
        Page 139
        Page 140
        Page 141
        Page 142
        Page 143
        Page 144
        Page 145
        Page 146
        Page 147
        Page 148
        Page 149
        Page 150
        Page 151
        Page 152
        Page 153
        Page 154
        Page 155
        Page 156
        Page 157
        Page 158
        Page 159
        Page 160
        Page 161
        Page 162
        Page 163
        Page 164
        Page 165
        Page 166
        Page 167
        Page 168
        Page 169
        Page 170
        Page 171
        Page 172
        Page 173
        Page 174
        Page 175
        Page 176
        Page 177
    Conclusions
        Page 178
        Page 179
    Appendix: Gaussian keywords
        Page 180
        Page 181
    References
        Page 182
        Page 183
        Page 184
        Page 185
        Page 186
        Page 187
        Page 188
        Page 189
        Page 190
        Page 191
        Page 192
    Biographical sketch
        Page 193
Full Text












COMPUTATIONAL STUDIES OF THE STRUCTURE AND FUNCTION OF
METALLOENZYMES AND THE PERFORMANCE OF DENSITY FUNCTIONAL
METHODS















By

BRYAN T. OP'T HOLT


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


2006


































For Jenny Kay
















ACKNOWLEDGMENTS

For the motivation to complete this dissertation and to focus on my research even

when I thought it would be too hard, Jenny should get all the credit. For the ideas and

suggestions that kindled my interest in computational and biological chemistry when it

started to wane, I recognize Kennie. The desire to complete my educational career with a

Ph.D. and the knowledge of the value of a strong education were instilled in me by my

parents, and I owe them my best. As for life in the Merz group, I will always be thankful

for the contributions of Ken Ayers, Ed Brothers, Guanglei Cui, and Kevin Riley to my

research.
















TABLE OF CONTENTS




A C K N O W L E D G M E N T S .................................................................... ......... .............. iii

LIST OF TABLES ............................ ................ .. ...... ........ vi

LIST OF FIGURES ......... ........................................... ............ ix

CHAPTER

1 IN T R O D U C T IO N ............................................................................. .............. ...

2 QM AND MM METHODS OF THE STUDY OF METALLOPROTEINS ...............6

M etal B finding Site Studies in G aussian 03 ........................................ .....................6
Molecular Mechanics of HAH1 ........... ..... ........ ................... 10
The AM BER Force Field ............................................................................. 10
LeAP and sander .............................................. .... .. ................. 12
Free Energy Simulations of HAH1............................ .....................17
Therm odynam ic Integration ........................................ .......................... 20
Free Energy Perturbation...................... ...... ............................. 23
Potential of Mean Force .................................. ...... ..............24

3 COMPUTATIONAL STUDIES OF THE CU(I) METALLOCHAPERONE
H A H 1 ..........................................................................................................2 6

Cu(I) Biochemistry and the HAH1 System .............................................. .......... 26
Cu M etallochaperones ............................................................ ............... 26
C u pathw ays w within the C ell.................................................................... ..... 29
C opper H om eostasis ................................... .. ........................ ........... 31
Quantum Chemical Characterization of the Cu(I) Binding Site from HAH1 ............33
Creation of a MM Force Field for the Cu(I) Binding Site of HAH1................... 36
The use of the HAH1 dimer as a model for the HAH1-MNK4 heterodimer......36
Creation of parameter files for Cu(I)-bound HAH1 ....................... ...........37
Free Energy Calculations of the Cu(I)-Bound HAH1 Dimer..................................47
Quantum Energy Calculations on the Small Cu(I) Thiolate Cluster Models......48
Free Energy Calculations on HAH 1 ....................................... ............... 50
C o n clu sio n s..................................................... ................ 6 4










4 ELECTRONIC STRUCTURE OF THE ACTIVE SITE OF AMINOPEPTIDASE
FR OM A erom onas proteolytica ......................................... .....................66

A A P In tro du ctio n ............................................ .............................. ................. .. 6 6
Effects of lst-Shell M stations ............................................................................. 72
C conclusions ................................................ 81

5 SURVEY OF DENSITY FUNCTIONAL THEORY METHODS ............................83

In tro d u ctio n ........................................................................................................... 8 3
M methods ........................................................ 84
C om putational M ethods...................................................................... ..................99
R results by Property .......................................... ........ ... ...... ....... .... 117
B on d L en gth s .................................................................................. 1 17
B o n d A n g le s ................... ............................................................... 12 0
Ground State Vibrational Frequencies ................................... ............... 125
Ionization Potential .................. .......... ............ ........ ..... ....... 129
E electron A affinity ........ ......... ........ ....... .......................... ............ 134
H eat of Form ation..................................................................... ... 139
Hydrogen Bonding Interaction Energy .................................. ............... 146
C onform national Energies............... ...... ...... ................ ...... ............... 150
Reaction Barrier Heights for Small Systems with Non-singlet Transition
S states .......... ............... ... ... .... ............. ...... ... .. ......... ................... 15 8
Reaction Barrier Heights for Organic Molecules with Singlet Transition
S tate s ................................ ................................................................. 15 9
Conclusions...................... ....................... ....................164
General Summary of the Survey of DFT Methods ................................................170

6 CONCLU SION S ............................... ... .. .......... .. .............178

A PP E N D IX ..................................................... .......... 180

LIST O F R EFER EN CE S ......... .... ................. ........... ............................ .................. ......182

BIOGRAPHICAL SKETCH ........................ ............. ..................... ............... 193
















v
















LIST OF TABLES


Table page

2-1 The DZpdf basis function used for Cu(I) in the DFT calculations of the cluster
m models. ............................................................................... 9

2-2 Windows and weighting factors for a 12-window TI calculation in sander ..............22

2-3 X values for a 12-window TI calculation in sander................................ ............... 22

3-1 Geometry parameters of the DFT-optimized multi-coordinate models shown in
F figure 3-2 ............................................................................36

3-2 Atom type, atomic mass, van der Waals radii, and van der Waals well-depths for
Cu(I) and Cu(I)-bound S in HAH 1. .............................. .... .............................. 41

3-3 Bond lengths, bond angles, and associated force constants for the HAH1 Cu(I)
binding site. .......................................... ............................ 4 1

3-4 CYM and Cu(I) RESP charges used for the HAH1 Cu(I) binding site.......................41

3-5 Comparison of the HAH1 active site between the four-coordinate model, the
solvated, equilibrated protein, and the X-ray crystal structure of the Cu(I)-bound
protein (1FEE ).................................................................... .......... 43

3-6 Summary of rms deviation, rms flexibility, and radius of gyration for the five
solvated HAH1 protein models and key active site residues. ................................43

3-7 Reaction energies and energies of solvation for the model systems .........................50

3-8 Atoms, atom types, atomic masses and van der Waals parameters used in TI
simulations of Cu(I)-bound HAH 1. .............................. .... .............................. 56

3-9 Bond length parameters for the reactions used in TI calculations of Cu(I)-bound
HAH 1 .......... ............ ..... ......... .............. 57

3-10 Bond angle parameters for the reactions used in TI calculations of Cu(I)-bound
H A H 1 ................... ..................................... ............................ 5 8

3-11 RESP charges used for TI calculations on Cu(I)-bound HAH1 .............................59

3-12 Free energy changes for TI calculations on the reactions shown in Figure 3-9.......63









3-13 The free energy difference of changing the coordination environment of Cu(I) in
H A H ........ ......... ......... ..................................... ............................63

4-1 Electrons in the side chains of Asp 117 and Asp 179 are equally delocalized over
the carboxylic acid region, while Glul51 and Glu152 side chains contain one C-
O bond with more electron density than the other. .............................................76

4-2 Several distances are shown for Zn-Zn and Zn-O interactions for the structure
shown below in Figures 4-6 and 4-7. ............................................ ............... 77

5-1 The thirty-seven density functional method and two wave function methods tested
in this survey with appropriate references..................................... ............... 90

5-2 Basis sets employed in the survey of DFT methods.................................................97

5-3 Valence shell polarization functions incorporated into the correlation-consistent
basis sets of D unning........... ............................................................. ... .... ....... 98

5-4 The bond lengths and bond angles test set. .................................... .................102

5-5 The test set for ground state vibrational frequency. .............................................104

5-6 The ionization potential test set. ........................................ ........................... 106

5-7 The electron affinity test set. ............................................ ............................. 107

5-8 The heat of formation test set, singlets only ......... ............. .............. 108

5-9 The heat of formation test set, radicals only................ ........................111

5-10 The hydrogen bonding test set ......... ......... ................................. ............... 113

5-11 The conformational energy test set ........ .. .... ............................. .... ........... 114

5-12 The small system radical transition state reaction barrier test set. ..........................115

5-13 The organic molecule singlet transition state reaction barrier test set.....................116

5-14 Bond lengths, bond angles, and vibrational frequencies for HF, MP2, and LSDA
m methods. ....................................................... ........................... 119

5-15 Average unsigned ionization potential errors for HF, MP2, LSDA, and B3P86
m methods. .......................................................................... 134

5-16 Average unsigned electron affinity errors for HF, MP2, LSDA, and B3P86
m methods. .......................................................................... 139

5-17 Average unsigned HOF errors for the HF, MP2, and LSDA methods....................143









5-18 Average unsigned hydrogen bond interaction errors for HF, MP2, and LSDA
m methods. ........................................................................... 150

5-19 Average unsigned conformational energy errors for HF, MP2, and LSDA
m methods. ........................................................................... 155

5-20 Average unsigned errors for the non-singlet transition state reaction test set for
HF, M P2, and LSDA m ethods. ........................................................................159

5-21 Average unsigned errors for the singlet transition state large reaction test set for
HF, M P2, and LSDA m ethods. ........................................................................164

5-22 Rankings of functional/basis set combinations for properties considered in this
w ork ......... .................. .................................... ...........................17 1

5-23 Average functional rankings and standard deviations for the top fifteen
functionals along with 6-31+G* and aug-cc-pVDZ basis sets.............................175

5-24 Performances of the fifteen highest-ranking functionals paired with the 6-31 G*
b a s is .................................................... ................... ................ 1 7 6

5-25 Performances of the fifteen highest ranked density functional methods paired
with the aug-cc-pVDZ basis set. ........................................ ........................ 177















LIST OF FIGURES


Figure page

3-1 The proposed mechanism for Cu(I) transfer from HAH1 to the fourth domain of
the Wilson's disease protein. Cym indicates a negatively charged Cys residue......34

3-2 Gas-phase optimized structures of the multi-coordinate models of the HAH1 Cu(I)
binding site. Cu(I) is in green, S is yellow, and C is gray. The top figure shows
the two-coordinate model, followed by the three-coordinate model and the four-
coordinate m odel at the bottom ........................................ .......................... 35

3-3 The 1.80 A crystal structure for Cu(I)-bound HAH1. PDB ID 1FEE........................40

3-4 Root-mean-squared deviations between the five solvated HAH1 models and the
Cu(I)-bound HAH1 crystal structure as a function of time................................45

3-5 rmsd between the active site loop regions of the five solvated HAH1 models and
the Cu(I)-bound HAH1 crystal structure as a function of time.............................46

3-6 The isodesmic reaction and solvation of three-coordinate Cu(I) to four-coordinate
in the m odel system ............... ............................... .. ........ .. ........ .... 49

3-7 The relative energies of the species in the isodesmic reactions of the model
systems in the gas phase (top) and in implicit solvent. ....... ............. .............. 50

3-8 A schem e for Cu(I) transfer .......................... .................. ................. ............... 52

3-9 Two separate TI simulations were performed to compare two different Cu(I)
transfer pathways from HAH1 to MNK4. The two reactions shown have
different starting points, but the same endpoint. The top reaction is referred to as
"Reaction 1" and the bottom is "Reaction 2" in the discussion that follows..........53

3-10 FEP vdW correction to TI on HAH1. Evaluate trajectory of structure with vdW
exclusions removed with the Hamiltonian with vdW exclusions intact. ................60

3-11 Bond length correction to FEP calculations by PMF: contract Cu(I)-S bond
length from -2.8 A to -2.0 A by PMF analysis of twenty-three windows .............61

3-12 PMF curve of solvated HAH1 showing minimum energy Cu(I)-S12B bond length
near 2.1 A for the bonding of Cys 12B to Cu(I). ............................................ 62









3-13 PMF curve for the binding of Cys 15B to Cu(I) in solvated HAH1, showing a
minimum energy bond-length of just over 2.1 A for the Cu(I)-Sl5B bond...............62

3-14 The free energy difference by thermodynamic integration between the different
three-coordinate Cu(I)-bound HAH 1 dimers. .................................. .................63

4-1 AAP active site inhibited by Tris (left) and BuBA. Investigation of the X-ray
structures of these complexes shed light on substrate conformation and a
potential mechanism for peptide hydrolysis in AAP. PBD ID 1LOK, 1CP6 ..........67

4-2 In fluoride inhibition studies of AAP, it was shown that a F- ion displaces a
terminal hydroxide group, deactivating the enzyme. ............................................. 68

4-3 A proposed mechanism for AAP peptide hydrolysis showing proton transfer to
Glul51, formation of a terminal hydroxyl- group, a gem-diolate intermediate,
donation of a proton back to the leaving amino group, and reformation of the
water/hydroxide bridge. Adapted from Petsko.59........... ......... .................. 70

4-4 The general model for the QM work is the AAP active site from PDB structure
1AMP, the 1.8 A resolution structure elucidated by Chevrier and Schalk.67 Asp
117 is below the two Zn ions, with Zn2 on the left and Znl on the right. The
residue at the top of the active site is Glu 151. Zn2 is bound to His 97 and Asp
179, and Znl is completed with His 256 and Glu 152.......................................... 74

4-5 B3LYP/6-31G* optimized geometries of two models of the AAP active site. Asp
117 is shown in the upper-right of both pictures, binding each Zn. Zn1 is the ion
on the left and Zn2 on the right in each structure. The structure on the left is from
an originally water-bridged structure and Glul51 has gained a H, while the
structure on the right started with a OH- bridge and Asp 179 gains a H from a
crystallographic w after. ...................... .. .................... ............ .. ............. 78

4-6 Starting structures (left) and B3LYP/6-31G* optimized geometries for different
Zn-Zn bridging species within the active site of AAP. a) a water bridge b) a
hydroxl- bridge c) an oxyl- bridge. ............................................... ............... 79

4-7 Initial structures (left) and B3LYP/6-31G* optimized geometries for models with
Glul51 protonated. The starting structures vary only in the protonation state of
the bridging group. Structure a) contains an 02O bridge and the Zn ions in c) are
bridged by a hydroxide group. ........................................................................... 80

5-1 Average unsigned bond length errors (in A) for GGA, hybrid-GGA, meta-GGA,
and hybrid-meta-GGA functionals along with the Pople type basis sets.............121

5-2 Average unsigned bond length errors (in A) for GGA, hybrid-GGA, meta-GGA,
and hybrid-meta-GGA functionals along with the Dunning type basis sets..........122

5-3 Average unsigned bond angle errors (in degrees) for GGA, hybrid-GGA, meta-
GGA, and hybrid-meta-GGA functionals along with the Pople type basis sets....126









5-4 Average unsigned bond angle errors (in degrees) for GGA, hybrid-GGA, meta-
GGA, and hybrid-meta-GGA functionals along with the Dunning type basis sets. 127

5-5 Average unsigned vibrational frequency errors (in cm-1) for GGA, hybrid-GGA,
meta-GGA, and hybrid-meta-GGA functionals along with the Pople type basis
se ts ....................................................................................... 1 3 0

5-6 Average unsigned vibrational frequency errors (in cm-1) for GGA, hybrid-GGA,
meta-GGA, and hybrid-meta-GGA functionals along with the Dunning type
b asis sets. .................................................. ............. . ... ..... ...... 13 1

5-7 Average unsigned ionization potential errors for GGA, hybrid-GGA, meta-GGA,
and hybrid-meta-GGA functionals with Pople type basis sets............................135

5-8 Average unsigned ionization potential errors for GGA, hybrid-GGA, meta-GGA,
and hybrid-meta-GGA functionals with Dunning type basis sets.....................136

5-9 Average unsigned electron affinity errors (kcal/mol) for GGA, hybrid-GGA, meta-
GGA, and hybrid-meta-GGA functionals. .................................. .................140

5-10 Average unsigned heat of formation errors (kcal/mol) for the five Pople-style
basis sets em played in this study..................................................... ... ........... 144

5-11 Average unsigned heat of formation errors (kcal/mol) for the Dunning-type basis
functions u sed in this w ork ............................................................... ... ............. 145

5-12 Average unsigned hydrogen bond interaction energy errors (kcal/mol) for GGA,
hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals along with Pople
type basis sets. ........................................................................ 15 1

5-13 Average unsigned hydrogen bond interaction energy errors (kcal/mol) for GGA,
hybrid-GGA, meta-GGA, and hybrid-meta-GGA functionals along with
Dunning type basis sets. ..... ........................... ........................................152

5-14 Average unsigned conformational energy errors (kcal/mol) for GGA, hybrid-
GGA, meta-GGA, and hybrid-meta-GGA functionals along with Pople-type
basis sets................................... .................................. ......... 156

5-15 Average unsigned conformational energy errors (kcal/mol) for GGA, hybrid-
GGA, meta-GGA, and hybrid-meta-GGA functionals along with Dunning-type
basis sets............................. ........................ .......................... 157

5-16 Average unsigned barrier height energy errors (kcal/mol) for SRBH reactions
along with the GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA
functionals along with Pople type basis sets. .............. ........... .....................160









5-17 Average unsigned barrier height energy errors (kcal/mol) for SRBH reactions
along with the GGA, hybrid-GGA, meta-GGA, and hybrid-meta-GGA
functionals along with Dunning-type basis sets ........... .....................................161

5-18 Average unsigned barrier height energy errors (kcal/mol) for large singlet
transition state reactions along with the GGA, hybrid-GGA, meta-GGA, and
hybrid-meta-GGA functionals along with Pople-type basis sets .........................165

5-19 Average unsigned barrier height energy errors (kcal/mol) for large singlet
transition state reactions along with the GGA, hybrid-GGA, meta-GGA, and
hybrid-meta-GGA functionals along with Dunning-type basis sets. ...................166















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

COMPUTATIONAL STUDIES OF THE STRUCTURE AND FUNCTION OF
METALLOENZYMES AND THE PERFORMANCE OF DENSITY FUNCTIONAL
METHODS

By

Bryan T. Op't Holt

December 2006

Chair: Kenneth M. Merz, Jr.
Major Department: Chemistry

Transition metals work within metalloproteins to catalyze an innumerable array of

important reactions in the body. Experimental data on the intermolecular and

intramolecular phenomena that govern these biochemical processes are often difficult to

obtain with traditional methods of analytical and biological chemistry. However,

computational methods have been developed to simulate macromolecular systems in

order to understand the dynamics and energetic of protein structure and function.

The first half of this dissertation details the application of quantum mechanical and

molecular mechanical methods to the metalloproteins HAH1 and aminopeptidase. The

active site of the Cu(I)-binding protein HAH1 is initially characterized using QM

methods to determine the geometrical and electrostatic parameters of the system. This

information is subsequently used to create a molecular mechanics force field for the

active site within the native holo-protein. A series of molecular dynamics simulations and

free energy calculations are performed on the protein following this parameterization.

xiii









The energetic of Cu(I) transfer are discussed and a favorable mechanism for metal ion

transport is proposed. The aminopeptidase system is investigated in a different manner as

the effects of first-shell mutations are studied with respect to the electronic structure and

coordination of the di-Zn(II) active site. Ab initio methods are employed to describe a

series of native and mutant active sites.

The second part of this work describes a large-scale survey of the performance of

density functional methods and basis sets at predicting a series of molecular properties

and intermolecular interactions. As the computational resources available to the scientific

community increase in speed while becoming more affordable, the study of large scale

systems with DFT and ab initio methods will be more plausible. The intention of this

study is to individually examine the accuracy of density functional methods, not only to

identify which methods work the best for certain properties, but also to note where there

may be room for improvement as more methods are developed.














CHAPTER 1
INTRODUCTION

As the scientific community continues to discover more about the roles of

metalloenzymes in biological processes, it becomes necessary to investigate the structure

and function of these proteins on numerous levels. Experimental methods have been used

to study the structure and function of proteins for decades, and now computational

methods are available that can be used to further investigate protein systems. The

combination of experimental and computational research on biochemical systems

enhances the ability to identify potential drug targets, determine catalytic mechanisms of

protein function, and describe the energetic of chemistry occurring in the body on the

macromolecular and atomic levels.

For example, X-ray crystallography and NMR spectroscopy visualize the structures

of proteins. Spectroscopic methods such as EXAFS, XAS, and EPR can be used to

elucidate conformational and electronic details of the metal binding sites of

metalloenzymes in certain cases, and kinetics and binding affinity studies are routinely

used to suggest mechanisms of protein function. However, each of these methods has

limitations as the scale of the research approaches the atomic and sub-atomic levels.

Except for the most recent high-resolution structures, X-ray crystallography is not

able to fully resolve the protonation states of residues, which is a key factor in protein

function. The practical difficulty of the method is compounded by the delicate task of

crystallizing a sample (only to have it destroyed by the experiment). Dynamically, X-ray

structures reveal the protein's structure at low temperature and in a fixed crystalline









state.1 On the other hand, NMR methods can sample many conformations of proteins in

solution and at room temperature and provide high structural resolution.2 Wet chemistry

techniques such as assays, titrations, and electrophoresis supply valuable data on reaction

rates, equilibrium constants, and binding affinity, but are time consuming. Moreover, the

number of distinct systems that can be evaluated consecutively is limited by lab space

and lab equipment. A trait that is missing from all of these experimental methods is the

ability to predict or measure the energetic of the individual bonds and interactions that

play key roles in protein structure and function.

The past twenty years has seen extensive advancements in computational methods

and computational ability. From the semi-empirical methods of Dewar et al. 3 to the

newest density functional methods of the Scuseria and Truhlar groups,4'5 systems are

being better-defined both energetically and geometrically. In addition, more complex

systems are being investigated today than ever before thanks to massively parallel

computer clusters that feature extensive available memory and the ability to perform

trillions of operations per second. Furthermore, development of software packages such

as AMBER 9.06 and Gaussian 037 allow researchers to carry out a variety of specific

calculations to meet their needs.

While the ability to computationally predict or determine protein structure or

sequence has not been developed, many of the systematic hurdles of the experimental

methods mentioned above can be addressed by some form of presently available

computational method. Reliable, accurate data on proteins and parts of proteins can be

gathered using robust and well-validated computational techniques. Bonding and

nonbonding interaction energies can be quantified, electrostatic potentials can be









calculated, strained systems can be energetically minimized and equilibrated, and the

motion of large systems can be followed over time in dynamics simulations. Moreover,

computational methods can be amended or parameterized for accuracy and new methods

can be developed that scale less dramatically or are not heavily parameter-dependent.

This dissertation highlights the application of computational methods to a range of

chemical systems from small molecules to solvated proteins. Together with experimental

data, the application of computational techniques provides a complete picture of the

world of biochemistry. The following chapters detail the investigation of metal ion

transfer between proteins, the electronic structure of a di-metal active site, and the

assessment of the performance of commonly used density functional methods.

Chapter 2 outlines the techniques used in the computational study of

metalloproteins. Starting with the use of QM methods to describe the metal binding sites

of the Cu(I) binding protein HAH1 and the di-Zn(II) site of the aminopeptidase from A.

proteolytica (AAP), the process of defining the geometries and energetic of metal

binding sites is explained. The QM data for the HAH1 system were used to construct a

MM force field for the Cu(I) binding site. The force field was employed in a long

timescale MD simulation to investigate the dynamics of Cu(I)-bound HAH1.

Furthermore, the free energy of Cu(I) transfer from HAH1 to the fourth domain of the

Menkes Disease Protein, MNK4 is investigated using other MM methods including

thermodynamic integration, free energy perturbation, and potential of mean force

calculations.

Chapter 3 discusses Cu(I) homeostasis, Cu(I) transport, and the HAH1 system. QM

calculations using Gaussian 037 were conducted on a model of the HAH1 active site









based on the crystal structure of the HAH1 dimer, and MM calculations were performed

on the entire HAH1 dimer in gas-phase and explicit solvent with AMBER.6 The ultimate

goal of the HAH1 study was to elucidate details of Cu(I) transfer from HAH1 to the

Menkes disease protein MNK. This study involved the QM description of the HAH1

active site, the construction of an MM force field to describe the active site classically,

and numerous MD-based calculations. Finally, the results of the study are discussed in

terms of protein dynamics and the mechanistic implications of the research.

Chapter 4 summarizes the application of QM methods to the AAP system. Several

questions regarding the active site were addressed by modeling variations of the active

site of the protein. The first goal was to identify the molecular bridge between the two

Zn(II) ions in the active site. The coordination state of each ion was investigated, and the

contribution of each lighting residue was studied.

The fifth chapter details the large-scale survey of the performance of commonly

used density functional methods and basis functions. The accuracy of thirty-seven density

functionals and two wave function methods is evaluated for nine molecular and

intermolecular properties. These properties include bond lengths, bond angles, ground

state vibrational frequencies, ionization potentials, electron affinities, heats of formation,

hydrogen bonding interaction energies, conformational energies, and reaction barrier

heights. Finally, a list of the best-performing methods is given.

The final chapter gives a brief summary of the work presented in this dissertation.

More complete discussions and conclusions are given within each chapter. The

computational investigation of biological systems is necessary to completely understand

the mechanisms by which macromolecules function within the body. The metalloprotein






5


studies presented here incorporate the use of a variety of computational tools to address

questions that would be very difficult to answer using other types of experiments. The

DFT survey is a prime example of the usefulness of powerful computer clusters, and

provides a detailed comparison of commonly-used density functional methods. This

research is made possible by the continuous development of computational methods and

computer power and serves as a testament to the flexibility and far-reaching applicability

of QM and MM calculations.














CHAPTER 2
QM AND MM METHODS OF THE STUDY OF METALLOPROTEINS

The behavior of metalloproteins can only be fully described using a combination of

several computational techniques. For instance, MM methods do not take electrons into

account, while high-level QM calculations are too computationally expensive to be

tractable for large systems over 120 atoms or so. The electronic structures, atomic

charges, and accurate geometries of metal binding sites should be investigated with

quantum mechanical methods whenever possible. Data from these studies form a

foundation for the molecular mechanics or semi-empirical methods applicable to large-

scale systems.

This section discusses the various computational techniques utilized in the studies

of HAH1 and AAP presented later in this thesis. A brief discourse on the background and

theory of QM methods is given in chapter 5 as an introduction to the DFT study.

Metal Binding Site Studies in Gaussian 03

Creating models of metal binding sites in proteins for the purpose of QM

calculations is a common practice in studies that aim to describe the metal binding

environment.8-11 For the study of the Cu(I) binding site of the Cu transport protein HAH1,

a model active site was created from the X-ray crystal structure of the Cu(I) binding site

in the HAH1 dimer1 using thiol [HSCH3] or thiolate [SCH3]- ligands instead of the native

Cys residues to complex the central ion. Using DFT in Gaussian 03, the thermodynamics,

bonding parameters, and electrostatics of the model systems were determined.







7

The QM calculations of the model clusters should be reliable so that the MM

parameters derived from the QM data are also reliable. One approach to ensuring a stable

QM structure is to use several initial geometries of the metal cluster to so to avoid getting

trapped in a local minimum energy well.9 Another way in which the accuracy and

reliability of calculations of Cu(I) is increased is the development of Cu(I)-specific basis

sets. Olsson and Ryde9'12 modified a double-zeta basis function of Shafer13 by the

inclusion of diffuse p, d, and f orbitals (DZpdf). Table 2-1 lists the Ryde basis set used

for the production QM calculations of the model Cu(I) clusters. The modification from

the original Shafer basis is the enhancement of the p, d, and f orbitals with exponents

0.174, 0.132, and 0.39, respectively. A modified 6-31G* basis function has been created

by Pulay et al. that can also be used for Cu and other 1st row transition metals.14'15 The

modified 6-31G* basis set was used to generate an initial structure for the use of the Ryde

basis set, optimizing the Cu(I) cluster in a stepwise fashion.

The initial structures for the model clusters were created in WebLabViewer Pro16

from the Cu(I)-bound HAH1 dimer crystal structure 1FEE1 in Cartesian coordinates and

optimized in Gaussian using the default grid at B3LYP/m6-31G*, where m6-31G* is the

modified basis set of Pulay et al.14 Since the HAH1 model clusters are all closed-shell

singlet states, RB3LYP could have been specified in the Gaussian input file to request a

restricted calculation. While this method is less computationally expensive, restricted and

unrestricted calculations will produce the same results for singlet species. Further

optimization was performed using the Ryde DZpdf basis set for Cu(I) and the split

valence 6-311++G* basis for S, C, and H. Upon final optimization at this level, each of

the clusters was subjected to single point calculations for frequencies and thermodynamic







8

properties. Bonding force constants between Cu(I) and S were also obtained. Gaussian

outputs a user-readable table of bonds and associated constants based on internal

coordinates.

Electrostatic potential calculations were also performed on the optimized structures.

The Merz-Kollman-Singh charge method17'18 was used and the ionic radius for Cu(I) was

set as 0.91 A. The MKS method spreads the partial atomic charge (the calculated electron

moment) of each atom onto the surface of a sphere around the atom defined by its ionic

radius. Gaussian ESP's were converted to AMBER charges using the espgen and respgen

programs in the AMBER suite. These three calculations account for all of the bonding

and electronic properties of the model that are needed to create a MM force field for the

native metal binding site for Cu(I)-bound HAH1.

The AAP system was also subjected to geometry optimizations and single point

frequency calculations following the same procedures as above. Once again, the active

site from a high-resolution crystal structure of the apo-form of AAP was used to create

and initial structure model for the AAP active site. Residues were abridged and capped at

their termini except in cases of adjacent residues, for which the peptide bonds were

retained for the calculations. Since the creation of a new force field was not the goal for

the AAP system, and the primary interests were the electronic structure of the active site

and the parameters of Zn(II) ion coordination within the active site, the B3LYP/6-31 G*

method was sufficient for the calculations. Also, Zn had already been sufficiently

described by the normal 6-31G* basis function, so no modified or larger basis sets were

needed for the metal ions.15 Population analysis for atomic charges was not performed on









the AAP active site. The model for the AAP active site needed to be more robust than

that of the HAH1 active site since the goal of the study was to elucidate the finer details


Table 2-1. The DZpdf basis function used for Cu(I) in the DFT calculations of the cluster
models.


Orbital Exponents Orbital Exponents
S 61.00 P 3 1.00
441087.2507 -2.18E-04 73.67182138
66112.02119 -1.69E-03 30.44736969
15047.01143 -8.81E-03 13.12271488
4263.427308 -3.60E-02 P 1 1.00
1396.38158 -0.117429705 5.521483997
511.9605579 -0.288442674 P 1 1.00
S 21.00 2.145792213
203.4542695 -0.426788989 P 1 1.00
82.79233703 -0.330441285 0.767974887
S 1 1.00 P* 1 1.00
20.85428563 1 0.174
S 1 1.00 D 3 1.00
9.041067958 1 47.3137437
S 1 1.00 13.15468845
2.751813517 1 4.366288575
S 1 1.00 D 1 1.00
1.043485652 1 1.412206594
S 1 1.00 D 1 1.00
0.111722924 1 0.38840713
S 1 1.00 D* 1 1.00
4.10E-02 1 0.132
P 3 1.00 F* 1 1.00
2530.096567 1.91E-03 0.39
600.0979295 1.58E-02
194.0820448 7.63E-02
Represents enhanced orbital exponents
Courtesy of Olsson and Rvde.9


0.238814523
0.449800159
0.393376824

1

1

1

1

3.24E-02
0.168227065
0.384944296

1

1

1

1


of the electronic structure of the complete di-Zn(II) binding site to the 1st coordination

shell. Therefore, the model AAP active site comprised about seventy atoms for 1st shell

coordination sphere compared to 21-23 atoms for the HAH1 model cluster.







10

Molecular Mechanics of HAH1

The completion of the set of QM calculations on the HAH1 model site led to the

next phase of the experiment, the description of the MM force field for the Cu(I) binding

site. AMBER6 is a suite of programs that perform a variety of calculations and analyses

based on user input and a collection of libraries. The research discussed here utilized

AMBER versions 6 -9. Despite the many versions used, the force fields and libraries used

in the course of this study remained the same. For the purposes of the calculations

discussed here, the 1994 force field of Cornell et al. 19 was used in conjunction with the

1994 libraries and the parameters constructed for Cu(I) and Cu(I)-bound ligands after the

QM work described earlier. The 1994 force field is a large database of parameters

detailing bond lengths, bond angles, bond torsions, nonbonding parameters, and

electrostatics for dozens of specific atom types occurring in proteins, nucleic acids, and

organic compounds. The parameters within the force field libraries are derived from QM

calculations of organic molecules and amino acids. Each of these parameters contributes

to the overall energy of the system being investigated.

The AMBER Force Field

The AMBER energy EAMBER is defined as the enthalpy difference between the

folded and unfolded states of a protein and can be calculated as the sum of the energy

contributions of the parameters listed above. This expression is known as the AMBER

force field (equation 2-1). The first three terms of 2-1 are bonding interactions. Bond

stretching and bond angle bending contributions to the energy are calculated using force

constants applied to each term. Equilibrium values and force constants are user-defined

and the nature of the energy functions sets the equilibrium value at the bottom of a

potential well. The third term involves the bond torsion contribution to the total energy






11

represented by a truncated Fourier series in which the measured torsion angle q! and the

phase y are input and V is the rotation barrier height for the torsion.20'21

The last two terms represent the nonbonding interactions between atoms i andj.

The fourth term is the 6-12 Lennard-Jones potential, accounting for the attraction and

repulsion between two nonbonded atoms known as van der Waals interactions. The

1 1
term accounts for repulsion, while the attraction is calculated in the term. In this
R12 R6

term, A and B represent the lowest-energy distance between the two interacting atoms,

and the term can be modified by the inclusion of an artificial energy well in the potential

curve. The final term includes the Coulombic interaction between charged nonbonded

particles. The magnitude of this interaction depends on the charges of the two interacting

particles and the distance between them embedded in field with a dielectric constant, .

In AMBER, a nonbonding cutoff can be defined to set a maximum distance at which

these interactions can be felt between two particles.20

2 + +KO( 2
EAMBER= ZK(r r + K _0 -eq) +
bonds angles

S-(1+cos[n-v])+
dihedrals 2 (21)
atoms A B atoms q

R 2 R6 R .
i
The difference between MM and QM methods is the inclusion of nuclear and

electronic interactions in QM methods. Except for the electrostatic interactions in the last

term of equation 2-1, no quantum effects are considered in the MM force field. Even







12

then, electrostatic potentials are user-defined or drawn from a library in AMBER, not

explicitly calculated as with QM methods.

In essence, the molecular dynamics package in AMBER, called sander, relies

solely on the AMBER potential to calculate all interactions between the particles in a

system in a stepwise fashion. Several files must be prepared in order for a successful

simulation or minimization to be performed in sander. The necessary files are created in

the LeAP program within AMBER.

LeAP and sander

LeAP is initialized by loading a force field and parameter library suitable for the

system being investigated. The Cornell 1994 force field is used throughout this work,

which includes the parm94.dat force field and the all_amino94.in amino acid database.

These two files form the basis for the atoms and residues that LeAP recognizes. Next, the

initial structure file for the system under study is loaded into the interface. A check of the

structure will output any faults within the system such as non-integral charge or unknown

atom types. In the case ofHAH1 MD and free energy simulations, Cu, Cu-bound S, and

any dummy atoms had to be described in separate force field modification files (.frcmod)

before LeAP could recognize them as atom types.

At this point, any new parameter files can be loaded into the interface and applied

to the structure. Parameter files must be correctly formatted to be AMBER-compatible

and may contain user-defined atom names, atom types, and many other atomic

parameters. Furthermore, bond connections can be defined or broken and atomic partial

charges can be specified for certain atoms if they are unknown or if they differ from the

values in the AMBER library so that the total charge of the system is an integer.

Incidentally, AMBER charges differ from any charges than may have been generated







13

outside of AMBER using a QM method like Merz-Kollman-Singh. Separate programs in

AMBER, called espgen and respgen, are used to read the output from a QM charge

calculation and convert the QM charges into AMBER charges. A successful check of the

system within LeAP should reveal no further unknown atom types or electrostatic errors.

sander uses two files to describe the system. These are the coordinate file (.crd) and

the topology file (.top). Once all of the parameters have been specified for the system in

LeAP and there are no further errors, the .top and .crd files can be created. Now, LeAP

will return any parametrical errors such as missing bond, angle, or torsion force constants

or undefined atoms types within the system that need to be added or amended before the

two new files are created.

sander reads one more file. This is the input file for the program. The first

simulation performed on HAH1 was a MM minimization. Minimization routines vary,

but the default in AMBER is the steepest descent (SD) method. This method calculates

the gradient of the energy difference between two conformations of the system and

initializes the next minimization step along a linear path so that the energy of the system

is decreased most rapidly. After some number of steps, sander may convert to the

conjugate gradient (CG) method of energy minimization to aid in convergence. CG

differs from SD in that a more direct route to a minimum energy state is taken. CG is

particularly well-suited for finding the minimum of a shallow potential surface or an

elongated energy well. Many conditions of the minimization may be defined including

minimization method and number of steps to be taken.

Although it is unlikely that the global minimum of a large system of many

thousands of atoms will be found by the methods used by sander, a sufficiently







14

minimized structure will be generated after a large number minimization steps have been

performed. Traits of a sufficiently minimized structure are low vdW, bond, angle,

torsional, and electrostatic energies compared to the initial structure. The user can

monitor the progress of a minimization run by tracking the decrease of energy

components and by visually inspecting the system at any point during the process.

Once the gas-phase structure has been minimized, it can be equilibrated in the gas

phase with MD or it can be solvated. The creation of an ordered "water box" is

performed in LeAP. In these simulations, TIP3P water boxes were added to the gas-phase

minimized and equilibrated proteins. The TIP3P box is a snapshot of an equilibrated

water box,20 and the use of actual water molecules around the protein instead of an

external dielectric field is called explicit solvation. The protein fits within the box of

user-defined dimensions so that there is plenty of water surrounding the protein. In this

research an 8.0 A or 10.0 A water box was used to solvate the protein. The solvated

HAH1 protein system comprises more than 19,000 atoms compared to about 2,050 atoms

in the gas-phase system.

To protect waters on the edge of the simulated box from being exposed to the

vacuum that exists outside of the box, periodic boundary conditions (PBC) are applied to

the system. PBC overcomes the problem of the outer solvent molecules being directly

exposed to vacuum by replicating the real box containing the solvent and solute and

placing imaginary copies on all sides of the real box. That way, if a water molecule or

part of the solute leaves the edge of the real box, a water molecule entering the other side

of the box from an imaginary neighbor replaces it. Another feature implemented in MD

simulations of solvated systems is Particle Mesh Ewald (PME), in which short-range and






15

long-range interactions are separated and calculated in different ways in order to decrease

computational expense. Without PME, the generation of the list of nonbonding

interactions like vdW and electrostatics for large systems would take a very long time.

The solute must equilibrate with the newly applied solvent. Constraints are

imposed onto the protein within the box so that it does not change conformation as the

water equilibrates around it. The temperature of the system is slowly raised to 300 K over

thousands of equilibration steps, and the restraints on the protein within the solvent box

are also gradually relaxed. Once the system reaches 300 K, the constraints are removed

and the whole system is allowed to equilibrate. Once the solvated system is sufficiently

equilibrated, it is subjected to long term MD simulation.

The prediction of molecular motion through time is derived from subjecting a

system to Newtonian laws of motion in conjunction with a potential function, in this case

the AMBER force field. The force felt by particles in the system is the derivative of the

potential with respect to position.

av
Fi ar (2-2)


Moving particles through time is done in iterations of calculating the incident force

on each particle and then moving the particles in reaction to that force. With the initial

positions of the particles known, the force is actuated and the particles are allowed to

move. The positions of the particles after a small time step At can be calculated using the

Verlet equation.


r(t+ At) = 2r(t)- r(t A)+ At2 (2-3)
m







16

Subsequent positions are formulated in the same manner until the user-defined

number of steps has been performed. Each step results in a new conformation of the

system that can be written to an output file. Typical MD step sizes range from 0.5 to 1.5

femtoseconds, and the total length of production-quality MD simulations is generally four

nanoseconds (at least four million steps with a 1.0 fs time step). A collection of these

individual snapshots can be analyzed to see how the system changes over time.

In the HAH1 system, the geometry of the active site was monitored to ensure that

Cu(I) was always bound in the correct coordination and that the active site behaved well.

This validated the new parameters that were imposed on the Cu(I) binding site. The root-

mean squared deviation (rmsd) from the crystal structure was monitored to quantify how

the simulated protein differed from the reference state during the simulation. Also root

mean squared fluctuation (rmsf) of key atoms and residues were calculated to see which

parts of the protein were highly mobile and which maintained more fixed positions.

Radius of gyration (radgyr) is another way to quantify the motion of the system is a

general manner. This can be interpreted as the rmsd of scattering elements from the

particle's center of mass.

SNatoms d2
rmsd = i= (2-4)
Natoms


d, = (r,,,, ri,extl) (25)

Equation 2-4 calculates the rmsd between a reference state and an individual

structure generated by the MD simulation. rmsd is probably the most basic function for

quantifying the difference between two structures. Good rmsd values between a crystal

structure reference state and a MD snapshot of the simulated protein are generally lower






17

than 3.5 A. The movement of a subset of the system with respect to the average structure

over the whole simulation is the rmsf, and the B-factor is related to rmsf by equation 2-7.

A high B-factor identifies a highly mobile subset.

rmsfi = 2)- )2 (2-6)


82
AMBER = rmsfi, T 2 (2-7)

Finally, radyr is a measure of how much a protein spreads out from its center. In

essence, it quantifies the level of unfolding of a protein. radgyr is size dependent, as larger

proteins have larger radgyr values. It is also shape dependent and different kinds of

structures (coil, sheet, random loop) all have different radgr.


N N 2

rad -rJ =
radgyr = N (2-8)




Each of these values can be obtained using another program in the AMBER suite

called ptraj. This program reads in a reference state and the coordinate file of the MD

simulation (which may contain thousands of conformations for a system that is thousands

of atoms big) and monitors user-selected parameters for the entire MD simulation. ptraj

is also used to generate structure files for individual poses.

Free Energy Simulations of HAH1

Once the integrity of the HAH1 parameters had been validated by analyzing the

MD simulations, free energy calculations on the protein began. One of the goals of the







18

HAH1 study was to identify an energetically favorable transition state of Cu(I) transfer

between HAH1 and its target MNK. Energetically, the enthalpy, entropy, and free energy

of different Cu(I) coordination states could be determined. Of these three properties, free

energy provides the most accurate account of the relative energies of different systems.

As with the MD simulations, an initial structure was constructed for the TI

calculations. However, the reactions involved in the TI calculations add a level of

complexity to this process. TI simulations perturb the system from one state to another. In

this case, the initial structure was a three-coordinate Cu(I)-bound active site and the

"product" of the perturbation was a four-coordinate Cu(I) active site in HAH1. These two

conformations differ by one atom, which is the hydrogen on the unbound Cys of the

initial structure. To compensate for the "disappearance" of that hydrogen, a dummy atom

was created in the final state to ensure that the total number of atoms in each state was the

same.

Once the number of atoms in the two states was equal, then the initial structure was

loaded into LeAP. The initial structure used the same parameters as the MD simulation

on the three-coordinate Cu(I) structure used. However, the four-coordinate product had

different charges and bonds than the starting structure. These changes were outlined in

LeAP and represented the final state of the TI simulation. In AMBER, the perturbed

partial atomic charges and atom types were input for every atom that was changed during

the simulation. Now, the parameters of both the initial and final states of the TI

simulation have been described in LeAP. A check of the system should reveal no

unknown atom types and that the total charges of the two states are integer values.







19

The final phase in LeAP for preparing free energy calculations is the generation of

the coordinate files and topology files for the two states. The unperturbed topology file

will be used to minimize and equilibrate the initial structure prior to the TI simulation.

The perturbed topology file will be used in the actual TI calculation and contains all of

the information for both states of the simulation. If there are any incomplete parameters

pertaining to either state, LeAP will alert the user at this point. The perturbed topology

file will only be generated when the perturbed system is fully described.

At this point, the initial structure is minimized and equilibrated in the same manner

described above. Gas-phase and solvated protein structures are both equilibrated and

prepared for TI simulations. Twelve sander input files are created; one for each window

of the TI simulation. The icfe=l keyword turns on TI in sander. The details of the TI

calculations are provided in chapter 3.

Several requirements must be met to ensure the accuracy of the TI calculations. A

good potential function must be used to describe the system. In this study, the AMBER

force field was used along with the parameters derived for the Cu(I) binding site.

Secondly, there must be a way to rapidly update the system as it changes over the course

of the free energy simulation. This includes evaluating forces and energies and updating

positions through time. Finally, AG must be calculated. Equation 2-9 shows how the

force is calculated as the second derivative of position with respect to time, and equations

2-10 and 2-11 reveal how AG can be determined in the canonical ensemble (constant N,

V, and T). In equation 2-11, the Hamiltonian H is approximated by the AMBER force

field. Although equation 2-9 allows rapid calculation of the force, it requires high

accuracy. This explains why the time length of steps in MD and TI calculation should be






20

about 1 fs. In order to sample a sufficiently large conformational space, the number of

steps in each window must be very large.22

F V(r) m82r
F= =ma = (2-9)
r t2 (2-9)

GNVT = -RTlnQNVT (2-10)

QNVT =C e -H(xp)/RTxdp (2-11)

Unfortunately, these calculations take a very long time to converge even for

simple systems such as the water dimer. Solving explicitly for G is, in essence, an

expansion of H and higher orders of the expansion take even longer to converge.

Therefore, other methods of calculating G must be derived using some approximations.

Thermodynamic Integration

Another functionality of the sander program is the ability to perform

thermodynamic integration (TI) calculations. Like other free energy methods, TI perturbs

an initial structure to a final structure over a series of windows. TI uses a scaling

parameter k, which varies between 0 and 1 as the system's character progresses toward

the final structure. At X=0, the system exists completely in the initial state, and at X=1 the

system exists completely in the final state. TI is based on the integration of the In form of

the G expression (equation 2-10) and where H is V(X,x).

dG RT iF dQNvT
dA QNVT L dA ] (2-12)

Now, substituting for QNVT:









dV(A,(x)2e-V(,x)/RTdxK
dG Jf dA =
dG I e x) dVcA /x (2-13)
diA e-)Jv(/,xRTdx d

And integration yields AG:


A i dV(A, x).
AG= d d (2-14)


Yet another obstacle presents itself here in that equation 2-14 is not analytically

solvable and must be solved numerically with another approximation.


AG I V(,x) (+1)- ( (2-15)
2 )A \ A 2 )

Or simply,


AG wi= ,(2-16)

8V
sander outputs values. The user can define how often these values are
OA

calculated and averaged. w, is the weighting factor for each window. The TI calculations

8V
used in this work contained twelve windows between X=0 and X=1. The values from
OA

the sander output were averaged for each window, multiplied by the weighting factor for

that window and then all of the weighted values were summed to generate the AG value

for the TI perturbation. Table 2-2 lists the weighting factors used the AG calculations. k

values for the twelve windows are listed in Table 2-3. For the TI simulations of the

0V 0V
solvated protein, 1,000 values were collected for each window (each of those
Ol OA









6V
values represented 500-step averages within each window). The average of the 1,000


values for each window were then averaged and weighted.

Placement of the windows along X and weighting factors are based on Gaussian

8V
quadrature, which quantifies the integral of the space under the curve with an


accuracy similar to that of simpler methods like the midpoint method or the trapezoid

method, but the Gaussian method requires only half the sample size of the simpler

methods which saves on simulation time.

Table 2-2. Windows and weighting factors for a 12-window TI calculation in sander.
Window Weight
1 & 12 0.02359
2& 11 0.05347
3 & 10 0.08004
4&9 0.10158
5&8 0.11675
6&7 0.12457


Table 2-3. X values for a 12-window TI calculation in sander.
Window k
1 0.0092
2 0.04794
3 0.11505
4 0.20634
5 0.31608
6 0.43738
7 0.56262
8 0.638392
9 0.79366
10 0.88495
11 0.95206
12 0.99078









Free Energy Perturbation

Free energy perturbation (FEP) differs from TI in that FEP is a more continuous

representation of the change from X=0 to X=1. Instead of performing twelve separate

calculations for windows of the TI simulation, FEP is a single calculation containing as

many windows as the user wants. Each window within an FEP calculation contains a

number of equilibration steps followed by a number of data-gathering steps. Another was

to perform a FEP is to evaluate the trajectory of some initial state with the Hamiltonian of

the perturbed state. This closely resembles the calculation performed on HAH1 where the

structure generated by the initial Hamiltonian was evaluated using an alternate

Hamiltonian that instituted some vdW exclusions. This is discussed in chapter 3.

Instead of perturbing atom types and charges as in the TI simulations, the

perturbation in the FEP calculations for the HAH1 study is the exclusion of some vdW

forces in the perturbed state. AG between two states A and B is defined in equation 2-17.


AG=GB -GA = -RT1nQ A (2-17)


Q has the same form as in equation 2-11. In the general equation for FEP,
( -AHI "+IRT \
AG =-RT In e A T (2-18)


AH is the change in the Hamiltonian (approximated by the AMBER force field) from the

initial state to the perturbed state. In the case of the HAH1 study, the difference between

the two Hamiltonians should only be in the vdW term. Essentially, FEP spans the space

of two physical endpoints 0 and 1 with an array of non-physical states in between,

characterized by the value of k.










AG=0--,=I =- AG'_ z (2-19)
2

The gibbs program in the AMBER package was created to be able to perform FEP

on large systems, with one advantage over TI in sander of being able to define dummy

atoms in both states. Despite numerous attempts, I was never able to adequately simulate

the HAH1 system with gibbs. After AMBER 7, gibbs was no longer developed or

supported in AMBER.

Potential of Mean Force

Potential of mean force (PMF) calculations were performed on the HAH1 system

to determine the energetic of the Cu(I)-S bonds within the active site. PMF between two

bodies is a function of the distance between their centers of mass.23 In AMBER, a

harmonic potential is applied to a certain bond (or any other parameter), centered at an

equilibrium value. Then, over a series of windows, the parameter is varied from an initial

position to a final position. The interaction between the two particles involved in the

PMF is monitored. W can be an expression of the free energy change with respect to

coordinates and is calculated in equation 2-20.

W(q)= -k Tln r(q) (2-20)

Here, q is the coordinate and '7 is the probability of q obtaining a certain value. In

an MD simulation, the values of q can be collected into bins and then analyzed as a

histogram. The histograms for all of the windows are aligned with WHAM,24'25 a

weighted histogram analysis program so that a PMF curve is constructed. 26 In order for a

range of possible values of q to be adequately represented, a very large number of

samples must be taken. In the case of the HAH1 system, the Cu(I)-S was varied from







25

about 2.8 A to 2.0 A over the course of twenty-three windows. Each window sampled

100,000 steps for a total of over two million samples taken for just 0.8 A of coordinate

space. A biasing potential is applied in the AMBER PMF to ensure that the simulation

sufficiently samples the bond lengths of interest to the HAH1 study. The biasing potential

is shown in equation 2-21.

1 2
U(q)= k(q qo) (2-21)

qo is the equilibrium value that is of interest in the calculation. By assigning a high

value to k, a large energy penalty has to be paid to any coordinate that is too far from qo.

For the HAH1 PMF calculations a biasing potential of 2,000 kcal/mol was placed around

an equilibrium distance of 2.19 A. The PMF calculation was specified in the sander input

files by specifying nmropt=l for the program to read in the biasing potential and calling

a PMF parameter file which contained that information for each window individually.

This represents a TI-method of performing the PMF calculation with sander. Unlike the

TI simulation of changing the Cu(I) coordination, there is no chemical change between

the initial and final states of the PMF simulation, only a conformational change in the

bond length that is being adjusted in each window.














CHAPTER 3
COMPUTATIONAL STUDIES OF THE CU(I) METALLOCHAPERONE HAH1

Cu(I) Biochemistry and the HAH1 System

Cu Metallochaperones

Several transition metals have been implicated in important intracellular biological

processes. These metals, including Fe, Co, Zn, Mo, and Cu, are involved in such central

biological roles in part due to their ability to exist in multiple oxidation states in vivo.27

For example, copper exists in both the cuprous and cupric states within the cell and

functions as a catalyst in both. Cycling of copper ions between two oxidation states can

catalyze the production of highly toxic hydroxyl radicals within the cellular environment

that can result in damage to many intracellular macromolecules. This creates a potential

problem within the cell: metal ions such as Cu(I) and Cu(II) are essential for normal cell

behavior, yet the "free" existence of these ions in the cell is clearly toxic.

A group of metal-binding proteins labeled "metallochaperones" have been shown

to bind transition metal ions in both prokaryotic and eukaryotic cells. Of particular

interest are the chaperones involved in copper and zinc transport in human cells. In

human cells, a number of these chaperones deliver Cu ions to other copper-binding

proteins or organelles. Noteworthy is the fact that these chaperones are not anti-toxins.

Instead, they act to sequester the ion and transport it through the intracellular

environment. It has been found that the average concentration of intracellular "free"

copper is on the order of 10-18 M, translating to less than one unbound copper ion per cell

in the human body.28 This low concentration has been attributed to the over-abundance of







27

moderate and strong copper-chelating sites including metallothioneins, vesicular sites,

and specific copper-binding proteins.29 Yet, with all the other potential copper-binding

sites, specific copper chaperones are able to acquire the ions as they enter the cell and

distribute them throughout the cell as needed. These include the human antioxidant

protein (HAH1), the Menkes and Wilson's ATPases, the human copper chaperone for

superoxide dismutase (hCCS), and the human copper, zinc superoxide dismutase

(SOD1).

Several different copper-transport routes within the cell are responsible for copper

homeostasis to ensure that the total copper concentration (normally in the micromolar

range29) in the cell does not get too high or too low.27'29-36 This involves regulation of the

amount of copper entering and exiting the cell. Copper must then be delivered from the

trans-membrane proteins to the metal-binding sites of the correct proteins in the

cytoplasm. Finally, the copper ions must be transported through the cell to the proteins

and organelles where they are needed. Proteins that perform each of these functions have

been discovered and studied. Defects in the metabolism of intracellular metal ions result

in a vast array of health problems. For example, problems within the copper-transport

structure of the cell results in Menkes Syndrome, Wilson's Disease, familial amyotrophic

lateral sclerosis (fALS) disorders, and Alzheimer's disease.30

Individual pathways and binding interactions will be discussed later, but a brief

overview of metal binding and inter-protein transport is given here. According to

Rosenzweig, copper binding proteins deliver the copper ions to their targets via "direct

protein-protein interaction".37 Moreover, copper transport between a chaperone and its

target is thought to progress through the formation of a series of multi-coordinate







28

complexes until the ion has been completely released from the donor and bound by the

receptor. 1,37 With this in mind, another concept suggests that the chaperone donates its

metal ion in an enzymatic fashion, lowering the energy barrier for inter-protein ion

29,31
transport.293

Recently, crystal structures have been solved for many Cu(I) transport proteins38-41

and some transport mechanisms have been suggested.1,31,33,35,37 However, the exact

mechanisms by which copper delivery is accomplished is still under intense study.

Several key issues should be addressed when considering this problem. For example, the

specificity of the donor-target interaction must be understood. Currently, highly

conserved secondary structures between the donor and target at the metal binding site are

believed to explain the problem of recognition. Possible protein rearrangement during

metal transfer must also be investigated. Disulfide bonds near the binding sites may

contribute to rearrangement of the fold during metal transfer. Protein docking should also

be addressed. It has been suggested that docking involves electrostatic interactions27, and

that heterodimers or even higher order oligomers42 may be formed during copper transfer

between proteins.

The recent structure determinations of many of the proteins involved in copper

transport in both apo- and holo- forms have opened this field to computational study.

Ideally, using computational tools, an investigator could model the binding sites of

several proteins, ultimately using molecular dynamics (MD) simulations to determine the

binding and transfer mechanisms for the processes described above. Some ab initio

modeling of Cu binding to sulfur and Cys groups has been performed.8-10,38 Currently,







29

ZINDO43, PM3(tm), ab initio14, and SIBFA1238 are some of a limited number of methods

for which Cu parameters have been established.

Cu(I) is generally unstable in aqueous solutions. However, it may be stabilized by

sulfur-containing ligands or immobilized by a protein.44 Generally, the copper-binding

active site within a protein is characterized by the use of two or more Cys or His ligands

for direct binding less than 2.5 A from the ion, with Met residues or charged amino acids

as part of the supporting structure at 3.5 A 8.0 A away from the metal. The human

copper, zinc, superoxide dismutase (SOD1) employs four His at its Cu(I) active site.

Three His appear to be tightly bound (2.0-2.12 A) while the other is bound to a lesser

extent (3.12 A). Each His is bound to Cu(I) by the 61 or S2N of the imidazole ring. Cys

residues are present at the binding sites of the human copper chaperone for superoxide

dismutase (hCCS), HAH1, yeast antioxidant protein (Atxl), the yeast homolog to

Menkes disease protein (Ccc2), and the Menkes disease protein (MNK4). These

chaperones bind the ion with multiple Cys residues as part of a common MT/CXXC

binding motif, which forms part of a turn between an c-helix and P-sheet. Some Cu(II)

binding proteins such as human nitrite reductase and the plant electron-transfer protein

plastocyanin employ both Cys and His at their active sites. This study focuses on

optimizing structures for the Cu(I) binding sites in HAH1 as well as other multi-

coordinate Cu(I) structures, and to observe the differences between two-, three-, and four-

coordinate Cu(I) complexes.

Cu pathways within the Cell

Once through the membrane, copper must be delivered to other sites in the cell

where it is needed. Metallochaperones perform this task. Due to the specificity of the







30

binding and transfer mechanisms between chaperone and target, there is a different

chaperone for delivering copper to each target. Chaperones are grouped according to the

primary structure of their binding site(s). Atx -type proteins exhibit the MT/HCXXC37 or

MXCXXC29 binding motif at the N-terminus. In yeast, Atxl is responsible for delivering

Cu(I) to Ccc2, the yeast CPx-type copper ATPase, for eventual incorporation into Fet3,27

an important protein in iron metabolism. HAH1 delivers Cu(I) to domain 4 of the

Wilson's disease P-type ATPase, MNK4, in human cells. Another pathway involves the

chaperone for superoxide dismutase in yeast, yCCS or Lys7, and in humans, hCCS. The

function of these proteins is to deliver Cu(I) to the copper, zinc superoxide dismutase,

SOD1. This pathway is more complicated than the Atxl pathway since the proteins

involved contain multiple domains and the Cu(I) binding sites are more complex and are

not located near the surface of the protein.

Atxl-like copper chaperones bind the Cu(I) ion in a loop between an c-helix and a

P-sheet near the exterior of the protein. The binding residues are all Cys, although there

may be some contributing electrostatic interactions to the binding site from nearby Met,

His, and Thr residues. HAH1 exists as homodimer in the crystal structure, binding one

Cu(I) per dimer. Each monomer donates up to two Cys residues for ion binding. Atxl

exists as a monomer, coordinating the Cu(I) in a two- or three-coordinate system in the

binding loop described above. The targets for Atxl and HAH1, Ccc2 and MNK4,

respectively, share secondary structure homology with their chaperones. Thereby, the

chaperone and target are able to dock and ion transfer can occur. A representation of the

Cu(I)-bound HAH1 dimer is shown in Figure 3-3.







31

The CCS proteins are more complex, and their structures have not yet been fully

resolved crystallographically. yCCS is a 27 kDa two-domain monomer that exists as a 54

kDa homodimer protein42. Each domain possesses its own unique copper binding site.

hCCS is a three domain 274 residue monomer that exists in vivo as a 548 residue dimer.

Once again, each domain shares structure homology with another chaperone in the cell.

Both yCCS and hCCS domain 1 have similar Cu(I) binding sites as Atxl and HAH1,

namely the MT/HCXXC or MX/CXXC binding motif. Domain 2 in both yCCS and

hCCS has similar folding to SOD1, although each lacks several key structural features of

SOD1, differentiating them from each other. It should be noted here that SOD1 exists as

a 32 kDa dimer in vivo and utilizes four His residues as its Cu+ binding site8. Finally,

domain 3 in hCCS is a small 39-residue feature containing a CXC binding motif. It is

believed that this domain is involved in the physical transfer of Cu(I) from hCCS to the

Cu binding site in SOD1.

Of these pathways, copper transfer between CCS and SOD1 is perhaps the process

that has been investigated the most. Of interest is the intra-protein transfer of Cu(I)

between domains in CCS and the transfer of Cu(I) from the CXC binding site of hCCS

domain 3 to the quad-His binding site in SOD1. It has been suggested that the transfer

involves the formation of a heterodimer or even higher order oligomers between

monomers of CCS and SOD1.42

Copper Homeostasis

One of the fundamental problems in biological coordination chemistry today is the

insufficient understanding of how metal ion concentrations are mediated by intracellular

processes.32 On one hand, enough metal ions must be present with the cell to facilitate

essential biochemical functions. However, transition metals, and copper in particular, are







32

prone to cause problems due to their catalytic nature and the presence of so many

favorable metal binding sites. As mentioned, Cu can readily cycle between two oxidation

states, which can catalyze the production of toxic radicals within the cell. Moreover,

several amino acids can easily bind Cu ions, creating an abundance of copper chelation

sites within the cell.

Copper chaperones are proteins that bind Cu ions in both the Cu(I) and Cu(II)

oxidation states and perform a twofold purpose in maintaining Cu homeostasis. First, the

metal-binding site on the metallochaperones must be able to bind Cu ions more readily

than the other favorable yet non-essential Cu binding sites throughout the cell. Secondly,

the chaperones must act to sequester Cu ions from the intracellular environment, or at

least ensure that the Cu ions are always bound within another essential Cu-binding

protein.

Although there are numerous Cu pathways within the cell,27'36 the copper

chaperones involved in these pathways can be divided into two groups: trafficking

proteins and metalloregulatory proteins.32 Trafficking proteins are confined to cell

membranes and the cytoplasm and include trans-membrane metal transport proteins and

water-soluble Cu transport proteins that exist in the cytoplasm, delivering metal ions to

specific intracellular target proteins. Metalloregulatory proteins bind ions in a more

permanent fashion, using the ions to regulate gene expression and cell function.32'45'46

Initially, Cu pathways were identified but the details of metal binding were unknown. A

large number of crystallographic and spectroscopic studies in the last decade have

clarified the details of Cu binding in both Cu-transport and Cu-regulatory proteins. One

facet of Cu homeostasis that remains largely unresolved is Cu transfer between proteins.







33

Isolating membrane-bound proteins for crystallization is a daunting task for

crystallographers. Further complication arises in producing crystals of metal-bound

proteins. While no structures currently exist for the Cu-binding trans-membrane protein

hCtrl, the structures of soluble holo-proteins HAH1 and Atxl have been determined by

NMR2 and X-ray crystallography. 1,47 The structure of the fourth domain of the Menkes

disease protein MNK4 (the target for Cu(I) transport from HAH1) has also been

determined by XRC41, and the interaction between HAH1 and MNK4 has been

investigated.48 The copper-transport complex of yCCS (and its human homologue hCCS)

and SOD1 has also been well-characterized by X-ray crystallography.39'42'49-51

Quantum Chemical Characterization of the Cu(I) Binding Site from HAH1

The work reviewed in this chapter focuses on Cu(I) transfer from HAH1 to the

Menkes disease protein. This pathway was selected due in part to the availability of high-

resolution structures of the Cu(I) donor HAH1 and the target domain MNK4. The Cu-

binding domains of both the donor and target employ two Cys residues in an MT/CXXC

motif to hold Cu in a multi-coordinate state. HAH1 exists as a dimer in solution, with

each monomer containing one binding domain. The coordination state of the Cu ion in

the dimer is yet unknown. EXAFS studies of holo-Atxl suggest a three-coordinate state,

with two Cys residues binding tightly to Cu at 2.25 A and a third less strongly bound Cys

at 2.40 A which may be from an adjacent Atxl molecule.52 The 1.80 A resolution X-ray

crystal structure of the Cu(I)-bound HAH1 dimer reveals four Cys in close proximity to

Cu(I). This structure suggests a roughly tetrahedral geometry for Cu(I) with three

strongly bound Cys at 2.30 A and the fourth Cys at 2.40 A. The coordination

environment for MNK4 is believed to be similar to that of the HAH1 monomer or Atxl.







34

Cu(I) transport between HAH1 and MNK4 is thought to progress via a multi-

coordinate mechanism.1,47,52 Figure 3-1 characterizes the proposed mechanism for Cu(I)

transfer as a series of Cu(I)-S bonds forming with the target domain as Cu(I)-S bonds

break within the donor. The energetic of this process have yet to be determined, and it is

not known if a potential four-coordinate intermediate exists as part of the transfer

mechanism. The order of Cu(I)-binding and release is also unknown. In the HAH1

monomer, Cu(I) is bound by Cysl2 and Cysl5. As the target domain comes into close

proximity, one of these two residues releases Cu(I) first. In a similar manner, the target

domain must also sequentially form bonds, but it is not known whether Cysl4 or Cysl7

of MNK4 is the first to complex the incoming Cu(I) ion.

Cym Cys\ Cym ,CyCYm C Cym Cys Cym
HAH1 CuCu MK4 HAH1 Cu MNK4HAH1 u Cy HAH1\ +Cu MNK4
Cym s Cym Cys/ cCys C.C ys Cm

Figure 3-1. The proposed mechanism for Cu(I) transfer from HAH1 to the fourth domain
of the Wilson's disease protein. Cym indicates a negatively charged Cys
residue.

In order to address the details of Cu(I) transport, several models of the HAH1

Cu(I)-binding site were constructed in WebLabViewer Pro16 substituting Cys residues

with methylthiolate [SCH3]- ligands. Two-, three-, and four-coordinate models were

constructed and geometrically optimized using Gaussian 98.53 Figure 3-2 depicts the

optimized structures of the models and Table 3-1 lists some geometrical parameters of

the structures.






























Figure 3-2. Gas-phase optimized structures of the multi-coordinate models of the HAH1
Cu(I) binding site. Cu(I) is in green, S is yellow, and C is gray. The top figure
shows the two-coordinate model, followed by the three-coordinate model and
the four-coordinate model at the bottom.


~i


Ci











The structures were optimized using the B3LYP density functional combined with

the Ryde double-zeta basis (DZpdf) set for Cu(I)9 and the 6-311++G** split-valence

basis set for all other atoms using the six Cartesian-type d-orbitals.

Table 3-1. Geometry parameters of the DFT-optimized multi-coordinate models shown in
Figure 3-2.
Model Cu-S (A) Cu-S Cu-S Cu-S S-Cu-S (deg) C-S-Cu
2-coord 2.23 2.23 5.25 6.00 180.0 101.8
3-coord 2.31 2.35 2.41 4.98 114.6 105.2
4-coord 2.19 2.19 2.19 2.24 109.5 108.9


Upon optimization, the structures were used in single point calculations to

determine atomic charges, thermodynamic properties, and bond force constants. Quantum

charges were determined using the Merz-Kollman-Singh17'18 method. For these

calculations, the Cu(I) ionic radius was set at 0.91A. Thermodynamic properties of the

model clusters were obtained at the same level of theory as the geometry optimizations

were performed. Finally, bond force constants were determined using the z-matrix form

of the optimized geometry as the input.

The geometrical parameters, bond force constants, and MKS charge data were all

used in conjunction to create a molecular mechanics (MM) force field for the Cu(I)

binding site of HAH1.

Creation of a MM Force Field for the Cu(I) Binding Site of HAH1

The use of the HAH1 dimer as a model for the HAH1-MNK4 heterodimer

The X-ray structure of HAH1 shows that the protein crystallizes as a homodimer,1

but NMR structures show that HAH1 exists as a monomer in solution.2 The target

protein, domain 4 of MNK, interacts with the HAH1 monomer in the cell, forming a

heterodimer during Cu(I) transfer. Unfortunately, no structures exist (NMR or X-ray) for







37

the HAH1-MNK4 dimer or between HAH1 and any other MNK domain. Such a structure

would be useful in MM and QM studies on Cu(I) transfer. Instead, an acceptable

homolog to the HAH1-MNK4 heterodimer must be employed in such studies. Arnesano

and coworkers performed a docking study of HAH1 to MNK4 to investigate the

interactions between the metal binding domains and the protein-protein interface of the

donor and target proteins.54 Docking of the yeast antioxidant protein ATX1 to its target

Ccc2 was performed in order to map the protein-protein interactions that facilitate Cu(I)

transfer between the two proteins. Superimposition of the docked yeast heterodimer onto

the crystal structure of the HAH1 dimer revealed that the two structures could be

considered "remarkably similar".54 Larin and coworkers performed a manual docking of

HAH1 to MNK4.48 This study was performed previous to the elucidation of the NMR or

X-ray structures of HAH1 and the HAH1 dimer, although the X-ray structure of MNK4

has already been resolved.41 In the Larin study, the homology between MNK4 and HAH1

was known, so one MNK4 domain was computationally adapted to model HAH1 in a

computationally docked HAH1-MNK4 heterodimer. The information provided by these

two studies suggests that the use of the HAH1 homodimer as a computational mimic for

the HAH1-MNK4 heterodimer is a valid approximation.

Creation of parameter files for Cu(I)-bound HAH1

Cu(I) and S bound to Cu(I) are not defined atom types in the current version of the

AMBER suite.6 AMBER 6 and AMBER 7 were used for the bulk of this work. In order

to perform molecular dynamics simulations on Cu(I)-bound HAH1 and MNK4, these

atom types must be defined in a format that AMBER can understand. This involves

creating a force field parameter file that includes all the pertinent information used in the

AMBER force field equation. For the purpose of this study, force field parameters







38

included molecular mass, two bond lengths types: Cu-S and S-C and their force

constants, numerous bond angle types: S-Cu-S, C-S-Cu, C-C-S, and H-S-Cu and their

force constants, a host of torsion angles and torsion constants, and van der Waals radii for

Cu(I) and S. Initially, Cu-S bond lengths and bond force constants were implemented

directly from the QM calculations on the model systems. C-S bond parameters in the

metal-binding site were taken from the 1994 force field as were bond angles, angle force

constants, and torsion parameters. Cu-S bonding parameters were varied from the QM-

derived values after initial MD simulations those parameters revealed that the bonds were

not strong enough to hold the desired binding site geometry. Once the geometry of the

binding site was sufficiently described, the atomic charges were added. Cu(I) changes the

normal charges of the adjacent atoms from their normal values in AMBER. Using the

MKS charges output by the Gaussian calculation mentioned earlier, the antechamber

package in AMBER was used to generate the electrostatic potential (RESP) charges for

use in the AMBER input package LeAP. Charges on bound Cys residues were modified

from typical AMBER charges for the HAH1 system to compensate for Cu(I) binding in

order to ensure integral charge of the system.

Cu(I)-binding residues were specified as the CYM residue type in AMBER. This

form denotes a ten-atom negatively charged cysteine ligand. Conversely, unbound

cysteines or other cysteines elsewhere in the protein are defined using the CYS residue

type. The CYM side chain is defined -CpH2S- while the CYS side chain is -CpH2SH.

Atom types for Cu(I) and copper-bound S atoms also had to be defined. Copper was

defined as atom PP. The four S atoms were identified as SA, SB, SC, and SD.







39

Tables 3-2 through 3-4 list the parameters defined in the AMBER force field file

for the HAH1 Cu(I) binding site. Table 3-2 gives atomic mass and van der Waals

parameters for S and Cu(I). Table 3-3 shows bond lengths, bond angles, and force

constants for each, and Table 3-4 lists the RESP charges used for CYM ligands and

Cu(I). All Cu(I)-bound S atoms have been kept equivalent, as have bound and unbound

residues in terms of atom types and charges. Cu-S bond force constant values in Table 3-

3 were increased by a factor of five over the force constants generated by the Gaussian 03

calculations performed on the model systems. This adjustment was implemented after the

initial force constant parameters were found to be not be strong enough to keep the Cu(I)

ion bound within the active site. CT-S-Cu angle force constant parameters were based on

the CT-S-H parameters for normal Cys from the parm94 force field. The parameters

described here closely match those determined in a similar study for Cu(II) bound to Met

and His by Comba and Remenyi.55 Figure 3-3 shows the active site region of the high

resolution crystal structure of Cu(I)-bound HAH1. This site was reproduced in AMBER

for MM simulations of HAH1. Cu(I) and the lighting Cys residues from HAH1 are

shown. Cu(I) is in green, and the Cys residues are shown in stick form pointing toward

Cu(I) creating a nearly tetrahedral binding environment. The two top Cys residues are

Cys 12A and Cys 12B that are more solvent-exposed. Cys 15A and Cys 15B reside close

to the monomer interface and have less contact with solvent. The total charges of the

different entire-protein models are: unbound=0; two-coordinate=-1; three-coordinate=-2;

and four-coordinate=-3. The holo-HAH1 dimer comprises between 2059 and 2061 atoms

depending on the number of coordinating CYM ligands, and includes C- and N- terminal

caps on each of the monomers present in the structure.








































Figure 3-3. The 1.80 A crystal structure for Cu(I)-bound HAH1. PDB ID 1FEE.1

Once the force fields are fully described and the RESP charges are in place, the

protein structure is ready to be minimized. The minimization and equilibration process

takes several steps. First, the protein is described in LeAP and checked to ensure that all

the bonds, angles, torsions and nonbonding parameters are fully described. Counterions

can be added (Na+ ions) to adjust the overall charge of the system to zero. An initial gas-

phase minimization is performed on the protein. Then, the temperature of the system is

gradually raised to 300 K over a series of MD runs. Once the temperature of the system

reaches 300 K, it is subjected to a long MD run allowing the system to reach equilibrium.









Table 3-2. Atom type, atomic mass, van der Waals radii, and van der Waals well-depths
for Cu(I) and Cu(I)-bound S in HAH1.
2-coordinate structure
Well-depth
Atom Atom type Mass (au) van der Waals radius (A) (s, kcal)
Cu PP 63.55 2.50 0.20
S (Cys 12A) SA 32.06 2.00 0.25
S (Cys 15A) SB 32.06 2.00 0.25
3-coordinate structure
Cu PP 63.55 2.50 0.20
S (Cys 12A) SA 32.06 2.00 0.25
S (Cys 15A) SB 32.06 2.00 0.25
S (Cys 12B) SC 32.06 2.00 0.25
4-coordinate structure
Cu PP 63.55 2.50 0.20
S (Cys 12A) SA 32.06 2.00 0.25
S (Cys 15A) SB 32.06 2.00 0.25
S (Cys 12B) SC 32.06 2.00 0.25
S (Cys 15B) SD 32.06 2.00 0.25


Table 3-3. Bond lengths, bond angles,
binding site.
Bond kbond (kcal/mol-A2)
Cu-S 60.00


Angle
S-Cu-S
C-S-Cu


kt (kcal/ mol-rad2)
50.00
93.98


and associated force constants for the HAH1 Cu(I)


ro(A)
2.19
o0 (deg)
109.5
95.91


Table 3-4. CYM and Cu(I) RESP charges used for the HAH1 Cu(I) binding site.
RESP charge in CYM


CYS charge
-0.4157
0.2719
0.0213
0.1124
-0.1231
0.1112
-0.3119


Atom type
N
H
CT
H1
CT
H1
SA, SB, SC, or
SD
C
O
HS


2-coordinate
-0.4408
0.2468
-0.1000
0.0257
-0.0646
0.0445

-0.8290
0.6016
-0.5636
n/a


3-coordinate
-0.4157
0.2719
-0.0351
0.0508
0.0168
0.0053

-0.8682
0.5973
-0.5679
n/a


Cu n/a PP


4-coordinate
-0.3630
0.2520
0.0350
0.0480
-0.5720
0.2440


-1.0920
0.6160
-0.5040
n/a
1.3670


Atom
N
H
Ca
H,
Cp
Hp


0.5973
-0.5679
0.1933


0.5922 0.6484











The equilibrated system can be used directly for solvent-phase MD simulations. In

order to perform MD simulation in solvent, a few more steps must be taken. The gas-

phase minimized and equilibrated structure is again loaded into the LeAP program, where

an explicit solvent box is added. In this case, an 8.0 A TIP3P water box was imposed

around the protein, increasing the total number of atoms in the system to over 19,000.

The solvated system was subjected to a series of relaxation runs similar to the gas-

phase system. Incrementally smaller constraints were placed on the protein part of the

solvated system as the system's temperature was increased to 300 K. Eventually, an

equilibrated, solvated protein system was obtained. At this point, long timescale MD was

performed. Each of the five multi-coordinate solvated protein model systems were

subjected to the minimization and equilibration scheme just described. The solvated

systems were ultimately simulated over a timescale of at least 3.6 ns.

Table 3-5 compares the AMBER-equilibrated HAH1 Cu(I) binding sites of the

four-coordinate protein model to its model QM cluster counterpart minimized in

Gaussian and to the X-ray crystal structure of Cu(I)-bound HAH1, and Table 3-6 lists

other key data taken from the long timescale MD simulations. Although the geometries

are not exactly reproduced by the MM force field parameters created for the HAH1 active

site, the shape of the active site and the local protein environment are good. Figure 3-4

shows the root-mean-square deviation from the crystal structure for the entire protein

sequence for each of the three solvated model systems, and Figure 3-5 displays the rmsd

values for the active site loop regions of each protein model.







43

The figures below show that the active site regions of all five models reached

equilibrium rapidly after about 400 ps. And while the entire proteins are in good

agreement with the crystal structure throughout, the complete protein structures did take

longer to reach equilibrium. For the whole-protein models, rmsd values of between 2.0 A

and 2.5 A were achieved by 2,500 ps and were maintained beyond that point in the

simulations. The radgyr values are within the expected range, and reveal that the most

highly mobile residues are ones near the termini of the monomers.

Table 3-5. Comparison of the HAH1 active site between the four-coordinate model, the
solvated, equilibrated protein, and the X-ray crystal structure of the Cu(I)-
bound protein (1FEE).
Parameter QM Model Protein X-ray
Cu-S (Cys 12A) 2.19 A 2.29 A 2.30 A
Cu-S (Cys 15A) 2.19 A 2.14 A 2.39 A
Cu-S (Cys 12B) 2.19 A 2.33 A 2.30 A
Cu-S (Cys 15B) 2.24 A 2.39 A 2.32 A
Cysl2A-Cu-Cysl5A 109.0 deg 117.5 deg 115.7 deg
Cysl2A-Cu-Cysl2B 109.5 deg 112.5 deg 109.4 deg


Table 3-6. Summary of rms deviation, rms flexibility, and radius of gyration for the five
solvated HAH1 protein models and key active site residues.
12B bound 15B bound
2-coord 2-coord (cis) 3-coord (B) 3-coord (A) 4-coord
RMSD (A)
Total 2.69 2.67 2.27 2.10 2.04
Backbone 1.89 1.88 1.30 1.34 1.17
Binding loop 0.89 1.29 0.87 0.94 1.31
Bind. lp. 0.34 0.71 0.37 0.38 0.54
Radar (A)
Protein avg. 29.38 27.26 29.35 29.30 29.43
RMSF (A)
Total 4.98 4.85 3.60 4.54 5.46
Backbone 4.80 4.70 3.46 4.37 5.23
Cu 5.11 2.66 2.19 4.19 3.94
Cys 12A 5.28 3.83 2.40 4.95 5.08
Cys 15A 4.10 2.66 1.78 4.38 3.45
Cys 12B 6.14 3.62 3.44 3.64 4.68
Cys 15B 4.32 2.15 1.88 2.60 3.21









The rmsf values listed in Table 3-6 reveal details about the flexibility of certain residues

as well as the complete protein and the protein backbone for the three models. The small

difference between rmsf values for the complete protein compared to the backbone

suggests that the flexibility of the protein is not limited to the side chains and that the

backbone also move freely. From the rmsf data for the Cu(I)-binding residues Cys 12A,

Cys 15A, Cys 12B, and Cys 15B, it appears that Cys 12A and Cys 12B have comparable

magnitudes in each model. The values for Cys 15A and Cys 15B are also similar for each

model. The similarity is derived from the location of these residues on the binding loop.

Cys 12A and Cys 12B are more solvent-exposed and move more freely due to solvent

interactions and being further away from the monomer interface. On the other hand, Cys

15A and Cys 15B show less flexibility as they are close to the interface region and not

generally solvent-exposed. The flexibility of the Cys 12 residues may play a role in Cu(I)

transfer between binding domains. rmsf data for Cu(I) show that Cu(I) is least mobile

when bound by only three residues. In the four-coordinate model, Cu(I) is more flexible.

This may be a result of the Cu(I) moving around within the binding site as different

binding ligands move in and out of proximity to the ion. Perhaps, Cu(I) maintains a three-

coordinate state even in the four-coordinate model, but complexes different residues over

time.

The results from the MD simulations of the three Cu(I)-bound HAH1 dimer models

show that the QM-derived parameters used to construct the MM force field adequately

described the system. The rmsd data show that the computationally generates structures

maintain the same fold and Cu(I) binding affinity as the protein in vivo. After the MD

simulations were completed, the question of deciphering the order of Cu(I) binding and














-whole
- backbone


4Coord


yK~Y ~~~.~J ~ M.* ~$4IAF~L~


0 500 1000 1500 2000 2500 3000 3500
whole
--backbone


500 1000 1500 2000 2500 3000 3500


402 802 1202 1602 2002 2402 2802 3202 3602
whole
--backbone


0 500 1000 1500 2000 2500 3000 3500
whole
--backbone
3.5 2Coord bridge backbone

3
2.5

2
1.5

1

0.5


0 400 800 1200 1600 2000 2400

Simulation Time (ps)


2800 3200 3600


Figure 3-4. Root-mean-squared deviations between the five solvated HAH1 models and

the Cu(I)-bound HAH1 crystal structure as a function of time.














-- loop


I ... "I .


r-r'T I' 1 I r 1 0 r I 'w i-


- . .I


0.2
0
0 500 1000 1500 2000 2500 3000 3500


-loop
--backbone


3Coord B


1.2


0.8
0.6


0.2
0
0 500 1000 1500 2000 2500 3000 3500


2 402 802 1202 1602 2002 2402 2802 3202 3602


500 1000 1500 2000 2500 3000 3500


-binding loop
-loop backbone


0 360 720 1080 1440 1800 2160

Simulation Time (ps)


2520 2880 3240 3600


Figure 3-5. rmsd between the active site loop regions of the five solvated HAH1 models

and the Cu(I)-bound HAH1 crystal structure as a function of time.


'UYIYIIIYU


LYim


4Coord



V1,1w0


- i









release during Cu(I) transfer remained to be answered. As the next section describes, the

thermodynamic details of Cu(I) transfer were elucidated through a series of free-energy

calculations using several different methods in AMBER.

Free Energy Calculations of the Cu(I)-Bound HAH1 Dimer

Since the HAH1 Cu(I) binding motif has been described using QM and MM

methods, the data garnered from those studies can be used to get an idea of the

thermodynamics of Cu(I) transport so that more can be understood about the mechanism

through which Cu(I) is transferred between the active sites of two metalloproteins. This

study focused on proposing the order of Cu(I) binding and release based on an

energetically favorable Cu(I) transfer pathway. As mentioned, the currently proposed

mechanism suggests that Cu(I) is handed off from the donor to the target via a series of

multi-coordinate Cu(I) intermediates during which the coordination number of the Cu ion

is no lower than two.1,37 The experiments described here attempt to establish the order of

Cu(I) binding to the target domain and the order of Cu(I) release from the active site of

the donor. Also, this work suggests that a potential four-coordinate Cu(I) intermediate

during ion transfer is energetically unfavorable.

In order to address these issues, the three Cu(I) cluster models were subjected to

several calculations in Gaussian 03 so that free energy differences between the three

models could be ascertained. In another experiment, the MM parameters defined for the

system were used again in three kinds of free energy calculations using the AMBER

suite. The three complete, solvated protein systems were subjected to thermodynamic

integration calculations using single topology mutations, free energy perturbation

calculations, and potential of mean force calculations.









Quantum Energy Calculations on the Small Cu(I) Thiolate Cluster Models

One difficulty in performing free energy calculations on the model system was the

difference in number of atoms between different Cu(I) coordination states. For instance, a

two-coordinate Cu(I) model cluster ( Cu(I)[HSCH3]2[SCH3-]2-1 ) comprises twenty-three

atoms, while the three-coordinate model ( Cu(I)[HSCH3][SCH3-]3-2 ) contains twenty-

two atoms. Because of the difference in the number of atoms, a direct free energy

comparison between the two models cannot be made. However, a comparison between

models was able to be made by simulating an isodesmic reaction for protonation of a

water molecule. An isodesmic reaction for the conversion of the two-coordinate model to

the four-coordinate model with the transfer of two protons from the unbound methylthiols

from the two-coordinate structure to two water molecules is shown below.

Cu(I)[HSCH3]2 [SCH3 ]2 + 2H20 <- Cu(I)[HSCH3 ][SCH3 ]32 + H20 + H30+
Cu(I)[SCH ]43 + 2H30

Single point energies were calculated for the gas-phase-optimized structures in both

the gas-phase and implicitly solvated phase. In this manner, the AEsolvation could be

calculated for each species in the reaction as well. The solvation correction to the gas-

phase is important, especially for the charged species. Figure 3-6 shows the reaction for

Cu(I) becoming four-coordinate from the three-coordinate state and the energy

differences between products and reactants as well as AEsolvation for each molecule.

Figure 3-7 shows the reaction profile of changing the coordination of Cu(I) in the

model systems. As expected, the addition of implicit solvent around the model mitigates

the instability of the charged molecules in the gas phase. This is shown as the highly

charged four-coordinate system as a much more favorable AEsolv than the two-coordinate









system. The figure illustrates the notion that the four-coordinate state is not energetically

favorable, as it is 65.440 kcal/mol higher in energy than the three-coordinate model in the

solvated state. Table 3-7 lists the energy differences between the different models in both

the gas and solvent phases. The energy difference between the two-coordinate and three-

coordinate models is lower by comparison at 48.531 kcal/mol. It should be noted that

while two different three-coordinate structures can exist in the protein system, they are

indistinguishable in the QM model. So while this experiment showed some relative

energies between the three- and four- coordinate models, full protein simulations were

needed to compare the energies of the two different three-coordinate states to each other.

HSHCH3
H3CH
S SCH3
AErxn,gas
Cu(I) + H20 Cu(1) ....... + H30+
H3C s S SCH3
/ H3CS SCH3
H3C

a




H,, SHCH3
S SCH3

,/ AErxn solv
Cu(1) + H20 Cu(I) ..., + H30+
H3Ca s/ S "SCH3
/ H3CS SCH3
H3C

Figure 3-6. The isodesmic reaction and solvation of three-coordinate Cu(I) to four-
coordinate in the model system.










Table 3-7. Reaction energies and energies of solvation for the model systems.


Gas


Solvent


AErxn 3coord to 4coord 581 65.4
AErxn 2coord to 3coord 244 48.5

AEso1v 4coord + 2H30 -770
AEso1v 3coord + H20 + H30 -254
AEso1v 2coord + 2H20 -59.1
Values in kcal/mol

2coord + 2 H20 3coord + H20 + H30+ coor
E gas
H E solve


Reaction


Figure 3-7. The relative energies of the species in the isodesmic reactions of the model
systems in the gas phase (top) and in implicit solvent.



Free Energy Calculations on HAH1

Having developed the force field for the Cu(I) binding active site of HAH1, the

challenge in setting up the free energy calculations lay in choosing which free energy

method to apply and ensuring that the number of atoms in each simulation was the same

for each different Cu(I) coordination state. The thermodynamic integration functionality

of the sander program in AMBER 8 was chosen. The purpose of this study was to

identify an energetically favorable route for Cu(I) transfer between two three-coordinate

high affinity Cu(I) binding sites. From the two-coordinate state, there are two possibilities


2 H30+









for Cu(I) transfer. Earlier, the difference of the two Cys residues in the Cu(I) binding site

was discussed. Cys 12 is more solvent exposed in the dimer state, while Cys 15 is along

the interface between the donor and target Cu(I) binding domains and is not solvent-

exposed. One possibility for Cu(I) transfer is for Cys 12 of the target to bind to the

incoming Cu(I) ion first. The other option is for Cys 15 of the target to bind Cu(I) as the

two domains come into close proximity.

The solvated, MD-equilibrated three-coordinate HAH1 dimer was initially used to

create a starting point for the free energy calculations. This active site featured three

Cu(I)-S bonds of approximately 2.3 A along with a fourth longer distance Cu(I)---SH

interaction to an unbound Cys residue at nearly 5.0 A from the Cu(I) ion. In the initial

state, both active site Cys residues of the donor domain bound Cu(I) while only one target

Cys bound the Cu(I) ion. This structure was mutated in a stepwise fashion to the final

structure which used both target Cys residues to bind Cu(I) while only one donor Cys

continued to coordinate the metal ion. Figure 3-8 shows the proposed scheme for Cu(I)

transfer. State 0 is the initial state and State 1 is the final state. Using a simple model

system, such as that in the quantum work described earlier, there is no difference between

State 0 and State 1 as shown below. However, when the entire solvated protein is

simulated, conformational and solvation differences between the two states result in a

free energy change as Cu(I) is transferred between the two active sites.

Although a simulation of the scheme in Figure 3-8 would yield a reliable value for

the free energy difference between the two states, such a method is not currently

applicable. The dual-topology free energy perturbation method was implemented in the

gibbs program within AMBER. This method allows for the existence of dummy atoms in









H /"
,S- Cys12B Cys12A-S
Cys 12A- S -, \
SS--Cys 12B

"'Cu )" cu-l"
ICu(I C


Cys 15A-- 'V
Cys 15A S Cys 15B Cys 15A- S S- Cys 15B


State 0 State 1

Figure 3-8. A scheme for Cu(I) transfer.

both the initial and final states of the simulation. A substantial attempt was made to

perform a dual-topology calculation on the scheme presented above, but the simulation

failed.

Since dual-topology could not be used to obtain the desired information, a single-

topology approach was undertaken. Unlike a dual-topology approach, TI is not a

continuous blending of initial and final reaction states as the simulation proceeds. Instead,

TI calculations blend the reactant and product states in a series of discrete windows.

These experiments used twelve windows to describe the reaction from start to finish. The

first window would simulate a model whose character was very similar to the reactant,

the sixth window would simulate a species whose character was nearly an equal blend of

the force fields of the product and reactant, and the final window would simulate a

system nearly identical to the product. Using TI in sander, it is possible to define an

initial system with dummy atoms and mutate it to a structure without dummy atoms. In

order to compare two different pathways for Cu(I) transfer, this experiment combined a

set of two TI calculations. One calculation took the State 0 structure shown above and

mutated it into a four-coordinate structure as defined in the previous section. This









simulation represented initial binding by Cys 15 of the target monomer. A similar

simulation mutated State 1 from Figure 3-8 into the same four-coordinate structure. This

represented initial binding by Cys 12 of the target domain. Since the two TI simulations

shared the same endpoint, the energies of the two reactions could be compared. Figure 3-

9 depicts the two reaction paths followed in the TI simulations.

H
Model A. ,S-Cys 12B
J.


y IL I
S ,I

C'u

Cys 15A/ I Cys 1


3 (D


Cys 12A \Cys 12





S-
/S


5B Du
Cys 12A
,1, \S-Cys12B

Cu


-Cys 15B
Cys 15A ,Du/
Du


B






-Cys 15B


Equivalent endpoint


Model B.

Figure 3-9. Two separate TI simulations were performed to compare two different Cu(I)
transfer pathways from HAH1 to MNK4. The two reactions shown have
different starting points, but the same endpoint. The top reaction is referred to
as "Reaction 1" and the bottom is "Reaction 2" in the discussion that follows.

Each state contains the same number of atoms, but some atoms may have different

atom types. Dummy atoms are used to as placeholders for the hydrogens that "appear"

and "disappear" during the simulation. Dummy atoms have the same mass as hydrogen









atoms, but no charge. In each reaction, Cu(I) mutates from a three-coordinate state to a

four-coordinate state, so its atom type must be changed. Initially, Cu(I) is defined as atom

type PP, with bonds to S12A (atom type SA), S15A (SB), and S15B (SD). A weak interaction

is defined between PP and S12B (SC). As the simulation progresses, the ion is mutated to

type PQ which is bonded to SA, SB, SD, and S12B (SF). Other atoms that change atom

type are the sulfur atoms of Cys 12B and Cys 15B. In the top reaction of Figure 3-9,

sulfur 12B is assigned atom type SC. It is bound to Cp from Cys 12B and a hydrogen of

atom type HS (the default AMBER atom type for hydrogen bonded to S in Cys). SC also

shares a weak interaction with Cu(I). The same atom is changed during the simulation to

type SF which is bound to a dummy atom, Cp of Cys 12B, and Cu(I). The HS atom in the

reactant becomes a dummy atom bound to sulfur SF in the product. SA, SB, and SC do

not change atom type in the top reaction and are all considered to be equivalent. Bond

angles between SF-PQ-SX (X=A,B, or D) in the product are included in the SF and PQ

force fields.

The mutation scheme in the lower reaction is similar. In reaction two of Figure 3-9,

the metal ion begins as atom type PP, and is bound to S12A (SA), S15A (SB), and S12B (SC)

and shares a weak interaction with S15B (SD). Sulfur of Cys 15B is initially identified as

atom type SD which is bound to a hydrogen (HS), Cp from Cys 15B, and has a weak

interaction with the metal center. SD is mutated into atom type SE, which bonds a

dummy atom and Cu(I) which is labeled as atom type PQ in the product of reaction 2. PQ

in the lower reaction binds SA, SB, SC, and SE. In this reaction, SA, SB, and SC are

treated equivalently and are not mutated in any manner.









Even though no bond exists between Cys 12B and Cu(I) at the start the reaction in

Figure 3-9, a bond must still be defined. This is to satisfy the fact that bonds cannot be

created nor destroyed in an MD simulation. In essence, the 5.0 A Cu(I)-S12B bond with a

very weak arbitrary force constant in the reactant is being mutated into a 2.3 A bond in

the product with a well defined force constant. The same thing occurs over the course of

the simulation of the lower reaction for the Cu(I)-S15B bond. This treatment allows one

bond to "form" with the target over the course of the simulation even though no actual

bond exists in the initial state.

Tables 3-8, 3-9, and 3-10 list the atom types and force field parameters specified

for the reactions shown in Figure 3-9. The parameters were derived from the three- and

four-coordinate solvated and equilibrated structures from the long timescale MD

simulations outlined above. Table 3-11 shows the atomic charges for the atoms in the two

reactions of Figure 3-9. The charges on the lighting Cys residues can be compared to the

default AMBER charges for Cys given in Table 3-4. For brevity, tables referring to atom

type SX mean that the parameter is the same for all sulfur atoms within the active site.

For example, SX-Cp refers to any bond between sulfur in the active site and Cp of the

lighting Cys residue. Likewise, in Table 3-10, in angles referring to SX-PP SC, SX

includes any S in that active site that is not SC. Some bond parameters, such as those for

HS-SX and Cp-SX were adapted from the parm94.dat library instead of from QM

calculations on the model clusters as were some bond angle parameters such as Cp-S-HS.

The TI calculations were performed on both gas-phase and explicitly solvated

proteins. This was done in order to determine the solvation energy of the protein and any

differences in active site geometry that may be caused by solvent. There were some









differences between the solvated and gas-phase simulations. No periodic boundary

conditions were applied to the gas-phase system, which also was given a high cutoff (>

20 A) for nonbonding interactions. The solvated system employed periodic boundary

conditions, and the nonbonding cutoff was kept at the default value of 8.0 A. The

simulation of the solvated system was performed at constant pressure, while the

temperature scaling was set for constant energy dynamics. For the solvated system using

constant pressure dynamics, anisotropic pressure scaling was used in conjunction with the

TIP3P water box.


Table 3-8. Atoms, atom types, atomic masses and van der Waals parameters used in TI
simulations of Cu(I)-bound HAH1.
van der Waals
Atom Atom type Mass (au) radius (A) Well-depth (kcal)
Reaction 1
Cu (reactant) PP 63.55 2.40 0.05
Cu (product) PQ 63.55 2.40 0.05
S12A SA 32.06 2.00 0.25
S15A SB 32.06 2.00 0.25
S12B (reactant) SC 32.06 2.00 0.25
S12B (product) SF 32.06 2.00 0.25
S15B SD 32.06 2.00 0.25
HS 12B (reactant) HS 1.008 0.60 0.015
HS 12B (product) DU 1.00 0.00 0.00
Reaction 2
Cu (reactant) PP 63.55 2.40 0.05
Cu (product) PQ 63.55 2.40 0.05
S12A SA 32.06 2.00 0.25
S15A SB 32.06 2.00 0.25
S12B SC 32.06 2.00 0.25
S15B (reactant) SD 32.06 2.00 0.25
S15B (product) SE 32.06 2.00 0.25
HS 15B (reactant) HS 1.008 0.60 0.015
HS 15B (product) DU 1.00 0.00 0.00










Table 3-9. Bond length parameters for the reactions used in TI calculations of Cu(I)-
bound HAH1.
Bond kbond (kcal/mol A2) r0 (A)
Reaction 1
PP-SA 60.000 2.190
PP-SB 60.000 2.190
PP-SC 0.001 5.000
PP-SD 60.000 2.190
HS-SC 274.000 1.336
Cp-SX 219.354 1.849
PQ-SX 60.000 2.190
DU-SF 274.000 1.336
Reaction 2
PP-SA 60.000 2.190
PP-SB 60.000 2.190
PP-SC 60.000 2.190
PP-SD 0.001 5.000
HS-SD 274.000 1.336
Cp-SX 219.354 1.849
PQ-SX 60.000 2.190
DU-SE 274.000 1.336


The TI calculations do not encompass all of the contributions to the free energy

change of the reaction. Because bonds cannot be broken or formed in MD simulations,

weak interactions were described in places where bonds would be forming over the

course of the simulations. This presents a problem in terms of how AMBER deals with

bonding and nonbonding interactions. In AMBER, when two atoms share a bond, they

are excluded from each other's nonbonding interactions. In our three-coordinate model,

Cu(I) is supposed to be bonded to only three Cys ligands. However, since the fourth S

had to be bonded to Cu with a weak interaction, its nonbonding interactions were being

neglected. In reality, no bond exists between Cu and a fourth Cys ligand. To compensate

for this, the topology files generated by LeAP had to be modified for the TI calculations

so that the weakly bound S would be removed from the nonbonding exclusions of Cu and









its neighboring atoms. This allowed for the metal ion and the other Cys ligands in the

active site to have nonbonding interactions with the unbound Cys.

Table 3-10. Bond angle parameters for the reactions used in TI calculations of Cu(I)-
bound HAH1.
Angle kt (kcal/ mol rad2) 80 (deg)
Reaction 1
SA-PP-SB 50.000 109.50
SA-PP-SD 50.000 109.50
SB-PP-SD 50.000 109.50
Cp-SX-PP 93.700 109.50
SX-PP-SC 0.001 109.50
HS-Cp-SC 0.001 109.50
Hp-Cp-SC 100.000 109.50
Cp-SC-PP 0.001 109.50
Cp-SC-HS 43.000 96.00
SX-PQ-SX 50.000 109.50
Cp-SX-PQ 93.700 109.50
Hp-Cp-SF 100.000 109.50
C"-Cp-SX 50.000 109.50
Cp-SF-DU 43.000 109.50
PQ-SF-DU 50.000 109.50
Reaction 2
SA-PP-SB 50.000 109.50
SA-PP-SC 50.000 109.50
SB-PP-SC 50.000 109.50
Cp-SX-PP 93.700 109.50
SX-PP-SD 0.001 109.50
HS-Cp-SD 0.001 109.50
Hp-Cp-SD 100.000 109.50
Cp-SD-PP 0.001 109.50
Cp-SD-HS 43.000 96.00
SX-PQ-SX 50.000 109.50
Cp-SX-PQ 93.700 109.50
Hp-Cp-SE 100.000 109.50
C,-Cp-SX 50.000 109.50
Cp-SE-DU 43.000 109.50
PQ-SE-DU 50.000 109.50


While this approach solved one problem, it created another. TI simulations can

only read in one topology file for each simulation. Therefore, once the fourth Cu(I)-S









bond formed by the end of the simulation, the atoms were still feeling the nonbonding

interactions of the fourth Cys. As a correction to the TI calculation, the effects of forming

that bond had to be determined. This was done using free energy perturbation. In separate

simulations, the products of the two reactions in Figure 3-9 were used as the starting

points for a FEP calculation. The perturbation would be the introduction of the vdW

exclusions that were removed in the TI calculation.

Table 3-11. RESP charges used for TI calculations on Cu(I)-bound HAH1.
RESP charge
Reaction 1 Reaction 2
Atom type Reactant Product Reactant Product
PP 0.6483 n/a 0.6483
PQ n/a 1.3670 1.3670
SA -0.8682 -1.0448 -0.8682 -1.0448
SB -0.8682 -1.0448 -0.8682 -1.0448
SC -0.8485 n/a -0.8682 -1.0448
SF n/a -1.0448 n/a n/a
SD -0.8682 -1.0448 -0.8485 n/a
SE n/a n/a n/a -1.0448
HS 0.5470 n/a 0.5470 n/a
DU n/a 0.0000 n/a 0.0000
N -0.4157 -0.4157 -0.4157 -0.4157
H 0.2719 0.2719 0.2719 0.2719
Ca -0.0351 -0.0351 -0.0351 -0.0351
Ha 0.0508 0.0508 0.0508 0.0508
Cp 0.0168 0.1011 0.0168 0.1011
Hp 0.0053 -0.04934 0.0053 -0.0493
C 0.5973 0.5973 0.5973 0.5973
O -0.5679 -0.5679 -0.5679 -0.5679


Figure 3-10 displays the FEP scheme as a correction to the TI calculations. A four-

coordinate model (the product of the TI simulation) with vdW exclusions removed

between the newly bound Cys and the rest of the active site was simulated using the with

the vdW exclusions intact. The FEP calculation was a trajectory analysis of the

exclusions-removed structure using the exclusions-present Hamiltonian and only took









one step. The free energy difference between the initial and final states of the FEP

simulations equaled the contribution to the total free energy of nonbonding interactions

becoming bonding interactions as the new Cu(I)-S bond formed in the TI simulation.

Du Du
Cys 12A S-Cys 12B Cys12A, S-ys12B
Cu+ Cu+
/ Cys 15B Cys 15B
Cys 15A Cys 15A

vdW Exclusions Removed vdW exclusions in place

Figure 3-10. FEP vdW correction to TI on HAH1. Evaluate trajectory of structure with
vdW exclusions removed with the Hamiltonian with vdW exclusions intact.

The FEP simulations revealed another contribution to the total free energy change

between the three- and four-coordinate proteins. While the Cu(I)-S bond lengths for the

three initially bound Cys ligands remained unchanged throughout the TI and FEP

simulations, the new Cu(I)-S bond did not reach the correct length. This was due to the

fact that AMBER did not allow the new S to come any closer than about 2.8 A to the

Cu(I) ion while the bonding interactions were turned off. In a sense, the penalty for

removing the vdW exclusions was not only the omission of bonding interactions once the

new bond had been formed, but also that the new bond was too long. The normal Cu(I)-S

bond length was around 2.2 A with a force constant of 60.00 kcal/mol-A, but the newly

formed Cu(I)-S bond was 2.8 A with the same force constant.

Another correction to the TI simulation had to be made in the form of "reeling in"

the newly-bound S to the metal center. The energy profile of shortening the bond length

could be generated by a potential of mean force calculation during which the products of

the reactions listed in Figure 3-9 (with vdW exclusions in place) would again be used as

starting structures. The four-coordinate structures featured three Cu(I)-S bonds of the









correct length, and one Cu(I)-S bond that was too long. The fourth S bond would be

contracted from 2.8 A to 2.0 A over the course of twenty-three windows in the PMF

simulation. Figure 3-11 shows the reaction scheme for the PMF experiment, and Figure

3-12 shows the energy profile of shortening the final Cu(I)-S bond. A steep harmonic

potential was induced upon the long Cu(I)-S bond with a minimum at 2.1 A. The energy

difference between the initial bond length and the minimum energy bond length on the

PMF curve served as the third and final contribution to the free energy change of the

reactions in Figure 3-9. The data from the PMF experiments were connected using

weighted histogram analysis with the WHAM software.24'25

Du
SDu
Cys 12A S-Cys 12B Cys 12A ,S-Cys 12B
Cu+ 2.77A "Cu+ 2.03A
/ 'Cys 15B *Cys 15B
Cys 15A Cys 15A

vdW exclusions in place vdW exclusions in place

Figure 3-11. Bond length correction to FEP calculations by PMF: contract Cu(I)-S bond
length from -2.8 A to -2.0 A by PMF analysis of twenty-three windows.

The overall free energy change from three- to four-coordinate Cu(I) in HAH1 is

the sum of the TI mutation, the FEP trajectory analysis for vdW interaction correction,

and the PMF for bond length correction. Table 3-12 lists the free energy changes for the

TI reaction shown above for both gas-phase and aqueous systems. As shown, the addition

of solvent lowers the energy barrier of binding the fourth S to Cu(I). Recalling Figure 3-

9, the endpoints of each reaction are equivalent. So the free energy difference between

the two different three-coordinate reactants can be determined by taking the difference

between the total free energy differences of their respective reactions.

























2 2.1 2.2


2.3 2.4 2.5
Cu-S bond length (0)


2.6 2.7 2.8


Figure 3-12. PMF curve of solvated HAH1 showing minimum energy Cu(I)-S12B bond
length near 2.1 A for the bonding of Cys 12B to Cu(I).


2 2.1 2.2


2.3 2.4 2.5
Cu-S Bond Length (0)


2.6 2.7 2.8


Figure 3-13. PMF curve for the binding of Cys 15B to Cu(I) in solvated HAH1, showing
a minimum energy bond-length of just over 2.1 A for the Cu(I)-S15B bond.

Table 3-12 shows the free energy change of the reactions displayed in Figure 3-9,

and Figure 3-14 plots the free energy difference between the two different three-


9

.9~


*





0*
0....
*.... .w **O*/***



,i -_ v v









coordinate states in the explicitly solvated protein. These values show that the Model A

structure of Cys 15B of the target monomer binding Cu first is energetically favorable to

Cys 12B binding Cu first by 24.7 kcal/mol.

Table 3-12. Free energy changes for TI calculations on the reactions shown in Figure 3-9.
TI AG
Model A: Cys 15B binding 1st
Gas 224.2
Solvent 177.7
Model B: Cys 12B binding 1st
Gas 213.9
Solvent 153.0




Solvated Reactions
AG 4coord










24.7 kcal/mol 2coord 3coord A
E 3coord B


Figure 3-14. The free energy difference by thermodynamic integration between the
different three-coordinate Cu(I)-bound HAH1 dimers.


Table 3-13. The free energy difference of changing the coordination environment of
Cu(I) in HAH1.
Cys 12B unbound Cys 15B unbound AG
Gas 255.5 252.8 -2.7
Solvent 219.5 193.8 -25.7
Solvent effect -23.0
Values are in kcal/mol.









Conclusions

The results from the QM and MM studies on the model systems and the HAH1

dimer can be interpreted to suggest an energetically favorable order of Cu(I) transport

between the active site of a donor HAH1 monomer and the active site of the fourth

domain of the Cu(I) receptor MNK. The QM calculations done in the first part of this

experiment created a foundation for the description of the Cu(I) binding site in HAH1.

Further QM work detailed the thermodynamics of Cu(I) thiolate clusters as models of the

active sites of the MT/CXXC family of Cu(I)-binding metalloproteins. The first part of

the molecular dynamics study was to create a force field to describe the atoms involved

in Cu(I) binding in HAH1 based on the QM calculations. Then, MD simulations were

performed with the new force field. Analysis of these simulations showed the accuracy

and reliability of the new force field parameters. The final stage of the experiment was an

investigation of the free energy of Cu(I) transport between two metal binding domains.

The HAH1 dimer was used as a model for Cu(I) transfer from the active site of a

HAH1 monomer to the fourth domain active site of the Wilson's disease protein. In this

model, Cys 12A and Cys 15A of the HAH1 dimer represent the donor active site, while

residues Cys 12B and Cys 15B represent the metal binding site of MNK4 which are Cys

14 and Cys 17, respectively. Mechanistically, the free energy calculations suggest that

when a Cu(I) is being transferred from the HAH1 binding site to the MNK4 site, that Cys

17 of the MNK protein fourth domain is energetically more favorable to bind the

incoming Cu(I) before the more solvent-exposed Cys 14. Physically, this makes sense

due to the fact that the solvent-exposed Cys 14 is farther away from the protein-protein

interface than Cys 17 on the target domain and that solvent interactions would stabilize

Cys 14 on the surface of the protein. At that point, Cys 12 of the donor domain would






65


start releasing Cu(I) as Cys 14 of MNK4 started to bind the ion. There is no evidence that

a purely four-coordinate Cu(I) species exists during copper transport. This is supported

by the QM results early in the study. Instead, it appears that the Cu(I) ion is nearly always

three-coordinate as it is transferred between the two proteins. In the proposed transfer

mechanism, Cys 15 of HAH1 is the last donor residue to release the copper ion. When

copper transfer is complete, HAH1 is no longer bound to copper and the active site of

MNK4 complexes the Cu(I) ion.














CHAPTER 4
ELECTRONIC STRUCTURE OF THE ACTIVE SITE OF AMINOPEPTIDASE

FROM Aeromonas proteolytica

AAP Introduction

Zinc-dependent peptidases such as bovine lens leucine aminopeptidase (bLAP),

carboxypeptidase A, thermolysin, and the aminopeptidase from Aeromonasproteolytica

(AAP) play important roles in tissue repair, carcinogenesis, protein maturation, cycle cell

control, the regulation of hormone levels,56'57 and the degradation of DNA, RNA,

phospholipids, and polypeptides.58 Improper functioning of aminopeptidases has been

linked to health issues including aging, cataracts, inflammation, cystic fibrosis, cancer,

and leukemia.56-60 Despite the variety of cellular processes in which aminopeptidases are

involved, not much was known about their exact functions or mode of action until

recently. Peptidases such as carboxypeptidase A and thermolysin which utilize a sole

Zn2+ ion for catalysis have been extensively studied and their modes of action are

relatively well understood.56'58'60 Aminopeptidase from Aeromonasproteolytica is a

dinuclear metallohydrolase which employs two Zn2+ ions to catalytically cleave the N-

terminus of a polypeptide chain. Its small size (- 32 kDa), high thermal stability, and

functionality as a monomer 61,62 led AAP to being one of the first peptidases to be

isolated and characterized in detail.63 Substituting the spectroscopically silent Zn2 ions

with Co2+ or Cu2+ allowed for further kinetic and mechanistic studies on the protein and

did not adversely affect catalytic activity.63,64 In fact, some hyper-active species of AAP

were created by these susbstitutions.65









Native AAP (Figure 4-1) contains two Zn2+ ions in the active site, but can perform

its function at 80% efficiency with only one Zn2 present.66 In fully functioning AAP,

both cations are present and perform some catalytic function. The reason why some

peptidases function in a mononuclear capacity while others require multiple ions for full

efficiency is not yet understood.66 The binding pocket in AAP has been shown to bind all

N-terminal amino acids and can accommodate all penultimate residues except Glu and

Pro. Being largely hydrophobic in nature,67 the active site preferentially binds

hydrophobic residues with Leu being the most easily cleaved.62

HIS256 GO Asp17 His,

_/ / --\
9 AsNO Z Gu52 0 0 n O =sp,
S P117 O
AsP179 /n n






The metal-binding pocket of AAP is characterized by several Asp, Glu, and His






residues which coordinate the Zn cations. X-ray crystallographic studies on native AAP

have predicted a tetrahedral (Td) geometry for both cations when no substrate is present,68

although in its closed-shell electronic state Zn2+ shows no preference for either octahedral

(Oh) or Td geometry." Beyond the divalent cations, other catalytically important features
N //

H Tyr225








of thigure binding4-1. AAP active site include the bridging water/hydroxide molecule(left) and BuBA. Investigationlu1 each of the X-ray
structures of these complexes shed light on substrate conformation and a







which have potentially important roles in the hydroposed catalytic mechanism of AAP. In

1992, Chevrier etal-binding. were the firspocket to produce a high resolution (1.8d by several Acrystal structure, and His
residues which coordinate the Zn cations. X-ray crystallographic studies on native AAP

have predicted a tetrahedral (Td) geometry for both cations when no substrate is present,68

although in its closed-shell electronic state Zn2+ shows no preference for either octahedral

(Oh) or Td geometry.57 Beyond the divalent cations, other catalytically important features

of the binding site include the bridging water/hydroxide molecule and Glu 151 each of

which have potentially important roles in the proposed catalytic mechanism of AAP. In

1992, Chevrier et al. were the first to produce a high resolution (1.8 A) crystal structure










of native AAP. This pioneering work not only showed that the active site was dinuclear,

but it also identified the key first shell Zn-complexing residues Aspl 17, Aspl79, Glu152,

His256, and His97.67 Further high resolution crystallographic studies on inhibitor-bound

AAP carried out by several groups have since clarified the roles of second shell

completing residues such as Glul51, Tyr225, Ser228, Cys227, and Asp99, the

importance of the bridging water/hydroxide and other water molecules in the active site,

active site coordination geometries upon substrate binding, and have lead to the proposal

of several catalytic mechanisms.62,66,69-71

HIS, HIS97
ASP179 ASP179
0N ;\ N

Glu O Zn F- Glu, O H
o._ .OH. -- H Zn--
i~AS1 17
Zn Zn O
0 0 HIS256
SY YN""
Glu152 Glu,52

Figure 4-2. In fluoride inhibition studies of AAP, it was shown that a F- ion displaces a
terminal hydroxide group, deactivating the enzyme.

Several inhibition studies have been performed on this system to complement the

crystallographic work. The twofold purpose of these studies has been to both characterize

the nature of the inhibited protein and to investigate possible drug candidates for enzyme

inhibition. Beyond the preference for hydrophobic residues in the binding cleft, potential

substrates should have a free a-amino group in the L-configuration.57 At present, several

well-known peptide inhibitors have been shown to inhibit AAP. Potent inhibitors include

L-leucinethiol, hydroxamates, a-hydroxyamides, and notably 1-butaneboronic acid

(BuBA) and Tris.57'59'62'63'72-76 Inhibitor binding to both cations is not necessary for AAP

inhibition, and X-ray structures of both the Tris- and BuBA-inhibited enzyme (Figure 4-









2) have revealed that the water/hydroxide bridge between the cations is broken.72 These

data suggest that the [t-aqua bridge is broken to form a terminal hydroxyl- group at some

point during peptide hydrolysis in order for the enzyme to function properly. This

hypothesis is confirmed by fluoride inhibition studies of native AAP.57'77 A single

fluoride ion binds to Znl in the active site (Figure 4-3), displacing the terminal

water/hydroxyl- group after substrate binding, and the reaction does not proceed.

Inactivation only occurs after substrate binding, suggesting that a terminal hydroxyl-

group is not present until the carbonyl oxygen of the activated scissile bond has bound to

Znl and peptide hydrolysis is underway. Interestingly, chloride ions do not inhibit AAP

up to a 2 M concentration because they do not bind with sufficient strength to the cation

in the active site.57'77 The highest resolution structure of AAP was obtained in 2002 by

the Petsko lab.59 The 1.2 A structure of native AAP in Tris buffer reduced the amount of

structural uncertainty due to side chain motion, determined the position of several

hydrogen atoms in the protein, and clarified to some degree the geometry of the Tris-

bound active site.59 Of note is how the distances between the Zn ions and the bridging O

atom change from the unbound native structure 1AMP and the Tris-bound structure

1LOK. In the unbound protein Znl-O and Zn2-O distances are 2.29 A and 2.25 A,

respectively. The Tris-bound active site reveals Znl-O and Zn2-O distances of 1.95 A and

2.21 A. The change in Zn-O distances may be further evidence of the conversion of the

bridging water/hydroxide group into a terminal moiety.

Both cations must be present in the active site in order for AAP to be fully efficient,

and they must have a task to perform during peptide hydrolysis. Zn2+ has been shown to

be a hard acid,78 and the mono-zinc environment has been shown by Christianson and









Cox to reduce the pKa of a single water molecule in bulk solvent from 15.7 to 9.0.59,79

The pKa of bound water in a di-zinc environment is expected to be much less than 9.0. It

has been proposed that the two cations, each acting as a Lewis acid, perform separate but

equally important functions in the reaction cycle. A common thread between several

o o ..so

o X Glul51,
0\o< H)o HONh HO

/" I>:"/ \ N
I-K H / N >lu

Ph O
HI-K






...... / Z"O P-
', o _,z': ....






o..................
----------------------
-- -' Zl A--





Figure 4-3. A proposed mechanism for AAP peptide hydrolysis showing proton transfer
to Glul51, formation of a terminal hydroxyl- group, a gem-diolate
intermediate, donation of a proton back to the leaving amino group, and
reformation of the water/hydroxide bridge. Adapted from Petsko.59

proposed mechanisms has been that the N-terminal amino group binds to Zn2 and Zni

binds the carbonyl oxygen of the activated scissile bond. The mechanism proposed by

Stamper et al. based on kinetic, crystallographic, and spectroscopic studies shows Zni

binding to the carbonyl group of the scissile bond, followed by the N-terminal amino

group binding to Zn2.57,66'71'77'80-82 Holz reasons that carbonyl binding occurs prior to

amino binding as a result of inhibition studies of LeuSH on [CoCo(AAP)].57'73 The

observed geometry of boron in the study of BuBA-inhibited AAP is further evidence of









this binding sequence.57'70 As stated earlier, other key players in AAP peptide hydrolysis

are Glul51 and the bridging water/hydroxide group. In proposed mechanisms, a bridging

or terminal OH- would serve as a nucleophile and Glul51 would act as a general

base.57,61,62,66,71,77

Beyond the observations made from fluoride inhibition studies, there is further

evidence of a terminal hydroxyl group. BuBA inhibition studies show that when a

substrate is bound, the distance between Zn2 and Asp117 and Asp179 is decreased to 3.0

A from 3.4 A in the native structure. The decreased distance allows for the formation of a

strong hydrogen bond between Glul51 and His97 that does not exist in the unbound

protein. Substrate binding also brings Asp99 closer to His97, creating yet another

hydrogen bond. The proximity of the two negatively charged residues to Zn2 along with

the newly formed H-bonds effectively stabilizes the charge neutrality of Zn2 and

regulates its Lewis acidity. A sufficient decrease in the acidity of Zn2 would facilitate the

formation of a terminal water/hydroxide group on Zn1.57,70,83 Chen et al.77 suggested that

Glul51 assists in the deprotonation of a terminal water molecule, resulting in a

nucleophilic hydroxo- moiety, followed by attack by that group on the carbonyl oxygen

of the scissile peptide bond, forming a gem-diolate intermediate characterized by two

oxygens binding to Znl. The gem-diolate is stabilized through its interaction with both Zn

ions.59 At this point, the reaction proceeds toward completion with Glul51 donating a

proton back to the penultimate amino group (now the N-terminus of the leaving group),

which departs the binding cleft upon C-N bond cleavage, the rate-limiting step of peptide

hydrolysis.77 The final step is the reformation of the water bridge between Zni and Zn2 as









the active site returns to its native unbound conformation. Other publications have since

supported the mechanism proposed by Chen et al.57'59'84

Many mechanisms have been suggested for peptide hydrolysis by AAP, and there

are some contentious points among them. Overall, assumptions are made about

protonation states of the water bridge and Zn-binding residues, and the conformation of

the substrate in the active site. Desmarais et al. contend that uncertainties in the reaction

mechanism can not be clarified without more detailed knowledge of the electronic

structure and protonation state of the metal ions, water molecules, and residues in the

immediate active site.59 Despite their exhaustive QM/MM study of the AAP peptide

hydrolysis mechanism, Schurer et al. suggest that molecular dynamics simulations of the

protein are needed to take accurate account of conformational movements of the protein

and substrate.7 While Schuirer et al. suggest that high level ab initio or DFT studies on

the complete AAP active site would be prohibitively expensive, those experiments have

been performed in this study. A series of those calculations have produced some data

about the electronic structure and geometry of several active site protonation states.

Effects of 1st-Shell Mutations

Numerous inhibition, crystallographic, kinetic, and computational studies have

been performed on the aminopeptidase from Aeromonasproteolytica (AAP) in order to

gain a better understanding of the mechanism of the peptide hydrolysis reaction catalyzed

by the enzyme.57'59'62'66'67'71'84 However, the research performed on AAP to this point has

yet to answer key questions regarding the protonation state of the Zn-Zn bridging species

in the native and active states of the enzyme, the role of Glul51 in the reaction, and the

electronic structure of the dinuclear center. Crystallization of Tris-inhibited AAP and its

structural characterization by XRC to a resolution of 1.2 A yielded new information









about the side chain conformations of several residues in inhibited AAP as well as the

positions of some hydrogens in the enzyme.59 However, that study was not able to

determine the nature of the Zn-Zn bridge in the protein. Schurer, Lanig, and Clark

completed a detailed QM/MM study of the AAP peptide hydrolysis mechanism by

determining relative energies of several possible intermediate and transition state species

using AM1 and VAMP for the QM and MM regions, respectively.71

This computational work entails fully quantum geometric and energetic

optimization of the AAP active site using B3LYP/6-31G* in Gaussian 03.85 Here, data is

presented from these studies pertaining to the electronic structure and coordination of the

di-zinc-containing AAP active site. The active site model employed in these calculations

is similar in nature to the one investigated by Schiirer et al., comprising the side chains of

Asp 17, Aspl79, His256, His97, Glu152, Glul51, two Zn2+, the bridging species, and

crystallographic water molecules within the active site. However, none of the structures

of 1st-shell mutations include an inhibitor molecule or the second-shell residues Asp99,

Cys227, Ser228, and Tyr225. The initial active site geometry was obtained from the 1.8

A resolution crystal structure of AAP (Figure 1, PDB-ID 1AMP) obtained by Chevrier et

al. in 1992.67

This structure was modified by the removal of the backbone atoms of each residue

except for the bridging atoms between Glu 151 and Glul52 and the addition of one or

two protons to the bridging oxygen. Other structures were created by protonating Glul51

at the oxygen closest to the di-zinc bridge. The initial structures were used to generate

B3LYP/3-21G* optimized geometries in Gaussian 03. Those models were in turn used as

starting structures for the final B3LYP/6-31G* optimization. Single point
































Figure 4-4. The general model for the QM work is the AAP active site from PDB
structure 1AMP, the 1.8 A resolution structure elucidated by Chevrier and
Schalk.67 Asp 117 is below the two Zn ions, with Zn2 on the left and Znl on
the right. The residue at the top of the active site is Glu 151. Zn2 is bound to
His 97 and Asp 179, and Znl is completed with His 256 and Glu 152.

energy calculations at a higher level of theory, such as MP2, have not yet been attempted

because the computational expense of such an experiment would be too high, and the

resulting energies from the geometry optimizations performed here are sufficiently

accurate for future mechanistic studies. Ultimately, the goal of this work is threefold: to

investigate the different protonation states of the water/hydroxide bridge, to measure C-O

bonds in the Zn-coordinating carboxylate residues, and to gauge the importance of

Glul51 as a proton acceptor in the initial stages of AAP peptide hydrolysis.

By performing calculations on an array of protonation states, the relative energies

between possible intermediates of the hydrolysis reaction were able to be determined,

namely in a potential initial proton transfer from the water bridge to Glul51. The first









system that was investigated was one where a water bridge exists between Znl and Zn2.

This species was compared to an active site with a hydroxide bridge and a protonated

Glul51, with both models containing 78 atoms. The relative energy of the optimized

structure of the hydroxide-bridged state shows that it is energetically favored over the

water-bridged state by more than 4.10 kcal/mol. However, upon inspection of the

optimized water-bridged structure, it was seen that one hydrogen from the water bridge

transfers to Glul51. Another feature of the optimized water-bridged structure is the

formation of interactions between E152 and H256 and between Znl and crystallographic

water that was retained in the active site. The optimized structure of the hydroxide-

bridged model with E151 initially protonated reveals protonation of D179. D179 gains a

proton from one of the water molecules retained in the active site, while the hydroxide

ion formed by that deprotonation interacts with H256 on the other side of the active site.

In the end, the energy difference between the two 78-atom models may be attributed to

the different interactions and conformations that form during the optimization.

The next study compared a model with an initial hydroxide bridge and

unprotonated Glul51 to an oxygen-bridged active site with Glul51 being protonated,

with each model containing 77 atoms. The hydroxide-bridged model is 4.62 kcal/mol

more favorable. Upon comparison of multiple species, the structures with an OH- bridge

are lower in energy than either the 02- -bridged model or the H20-bridged model. This

trend was also observed during a simple single point energy comparison between

different protonation states of the native crystal structure without geometry optimization.

This may suggest that favorable intermediates in the reaction mechanism may all have

bridged or terminal OH- species as opposed to 02- or H20 bridges.









These data generally support the previous proposal that a high-energy water-

bridged active site would initially donate one of its hydrogens to Glu 151 in order to

produce a nucleophilic hydroxide-bridging species in an exothermic process.77 Further

interpretation of these results suggests that an initial hydroxide bridge would not donate

its hydrogen to Glul 51. In that instance, Glul51 would not have a proton that it could

donate to the N-terminus of a polypeptide chain within the active site as proposed. Both

systems reveal the stability of a hydroxide bridge over an oxyl- or water-bridge between

Znl and Zn2. In all cases, the crystallographic water molecules in the active site work

together with the bridging species and the surrounding carboxylate residues to establish a

robust hydrogen-bonding network within the active site. A more detailed discussion of

the optimized geometries of several models is included below.

Table 4-1. Electrons in the side chains of Asp 117 and Asp 179 are equally delocalized
over the carboxylic acid region, while Glul51 and Glul52 side chains contain
one C-O bond with more electron density than the other.
Aspll7 Aspl79 Glul51 Glu152
CP-O1y CP-Oy2 CP-Oyl CP-Oy2 Cy-O6 Cy-Os2 Cy-O6 Cy-O62
H20 bridge 1.28 1.26 1.26 1.28 1.24 1.31 1.36 1.22
OH- bridge 1.27 1.26 1.26 1.27 1.28 1.32 1.30 1.24
02- bridge 1.27 1.27 1.25 1.28 1.22 1.33 1.28 1.25
GluH + 02-
bridge 1.26 1.27 1.24 1.30 1.22 1.33 1.28 1.25
GluH + OH-
bridge 1.27 1.27 1.24 1.30 1.23 1.33 1.29 1.25
X-ray 1.22 1.24 1.22 1.22 1.24 1.23 1.23 1.22
Values in A


C-O bond lengths for Asp 17, Aspl79, Glul51, and Glu152 are listed in Table 4-1.

It is clear that both C-O bonds equally share the electrons in the carboxylic regions of the

aspartic acid residues as the bond lengths are nearly identical. The acidic regions of the

Glu residues do not share this feature. Instead, one C-O bond is clearly higher in electron









density, while the other C-O bond is comparatively longer. These data suggest that the

oxygens in the Asp117 and Asp179 are coordinating equally into the metallic center of

the active site. On the other hand, only one oxygen of the side chains of Glul51 and

Glu152 is coordinating with the zinc centers. In the case of Glul51, the side chain serves

a role in the H-bonding network that exists throughout the active site.

Along with the protonation state of the bridging species, the metal-binding

carboxylic amino acids Asp 117, Asp 79, and Glu152 are interesting research subjects.

An analysis of C-O bond lengths and O-Zn distances help describe bond order and

electron density and metal ion coordination, respectively. Measurement of the C-O

distances in the carboxylic acid side chains of the Zn-coordinating residues yields

information about the localization of electrons within the carboxyl regions of the

coordinating residues. Table 4-2 lists Zn-Zn, Zn-O, and Zn-Asp 117 distances for the

structures shown in Figures 4-5 and 4-6. Generally, Zn-Zn distances are around 3.3 A,

which is shorter than the inter-zinc distance in the 1AMP crystal structure. The one

exception is for structure 4-6f in which a terminal peroxo- group exists and the two zinc

ions are separated by more than 4.0 A.

Table 4-2. Several distances are shown for Zn-Zn and Zn-O interactions for the structure
shown below in Figures 4-6 and 4-7.
Znl-Zn2 Zni-O, Zn2-O, D117-Znl D117-Zn2 E151-O,
H20 bridge 3.31 1.96 2.02 1.99 2.00 2.55
OH- bridge 3.28 1.96 1.97 1.96 2.00 3.28
02- bridge 4.23 1.97 4.03 2.02 2.01 3.27
GluH + 02-
bridge 3.37 2.05 1.99 2.01 1.98 2.71
GluH + OH-
bridge 3.26 1.96 1.95 1.96 1.97 2.77
X-ray 3.47 2.25 2.29 2.05 2.01 3.30
Values in A
























Figure 4-5. B3LYP/6-31G* optimized geometries of two models of the AAP active site.
Asp 117 is shown in the upper-right of both pictures, binding each Zn. Znl is
the ion on the left and Zn2 on the right in each structure. The structure on the
left is from an originally water-bridged structure and Glul51 has gained a H,
while the structure on the right started with a OH- bridge and Asp 179 gains a
H from a crystallographic water.

The geometry optimizations of several variations of this active site model are

discussed here. Generally, the starting structures are the same in each case, and the first

three models we investigated differ only by the protonation state of the bridging group

while Glul51 is unprotonated (Figure 4-6).As shown in Figure 4-5a, an initial water

bridge with an unprotonated Glul51 is optimized to a OH- bridge as the initial water

loses a hydrogen to nearby Glul51. The Zn-Zn distance decreases from the 3.47 A shown

in the original crystal structure to 3.31 A in the optimized structure. The optimization of

the second model (Figure 4-5b) is more complicated as the initial OH- bridge and

unprotonated Glul51 becomes a bridging 02H peroxo- species and Glul51 and Glu152

are both protonated. It appears that a crystallographic water and the initial bridging OH-

donate one H each to Glul51 and Glu152.
















a. *4:t "7 b.


















e.
f.


Figure 4-6. Starting structures (left) and B3LYP/6-31G* optimized geometries for
different Zn-Zn bridging species within the active site of AAP. a) a water
bridge b) a hydroxl- bridge c) an oxyl- bridge.

This type of bridging group has not yet been discussed as a possibility in the

proposed reaction schemes. However, the formation of the peroxo- group may be one

consequence of not modeling an inhibitor into the active site. Not this optimized structure

nor any other optimized structure containing a peroxo- bridge was shown to have the

lowest relative HF energy to structures with a similar number of atoms. The last model of

the first group (Figure 4-6b) contains a bridging 02- ion. This optimization results in the

formation of two terminal species. On Znl, a terminal peroxo- group forms, similar in

nature, but not geometry to the peroxo- group formed during the optimization of structure








4-5b. Then, one of the crystallographic water molecules becomes terminal to Zn2. Glul51

becomes protonated as the water molecule that helps to form the terminal peroxo- group

on Znl donates one of its hydrogens to it. The final two models that were studied both

start with a neutral Glul51 while differing by a 02- (Figure 4-6a) and a OH- (Figure 4-6b)

bridge.


Figure 4-7. Initial structures (left) and B3LYP/6-31G* optimized geometries for models
with Glul51 protonated. The starting structures vary only in the protonation
state of the bridging group. Structure a) contains an 02- bridge and the Zn ions
in c) are bridged by a hydroxide group.

When model 4-6a is optimized, the formation of another peroxo- group is observed

as a crystallographic water donates a hydrogen to Asp179 and the remaining OH- binds to

the original bridging 02-. Glul51 remains protonated throughout the optimization.

Structure 4-6b forms yet another interesting structure upon optimization. In this case,

both Glul51 and the OH- bridge retain their original protonation states. However, one of


JW









the crystallographic water molecules donates a hydrogen to Asp179 while the remaining

OHf group complexes with His256.

Conclusions

Here, the initial efforts to detail the lst-shell electronic structure, geometry, and

protonation states of the active site of the aminopeptidase from Aeromonasproteolytica

have been described. However, much work remains to be done until a complete picture of

the mechanism of peptide hydrolysis in AAP can be revealed.

One purpose of this study was to investigate the different protonation states of the

water/hydroxide bridge. To that end, many model active sites were created, each

containing one of the three possible bridging species. In each case, the bridge interacted

to some extent with the surrounding crystallographic waters in the active site. In some

cases, a hydrogen-bonding network was established which helped to stabilize the active

site structure. Some minimum energy structures also contained a peroxo- species that

resulted from an oxo-bridge interacting with an active site water molecule. Finally, the

DFT minimization calculations suggested that a hydroxide bridge was the most

energetically stable, supporting some mechanistic studies previously done by other

groups.

Another facet of the AAP study was to measure C-O bonds in the Zn-coordinating

carboxylate residues. Each carboxylate side chain that complexes a Zn(II) ion can do so

in either a monodentate or bidentate manner. Equivalent C-O bond lengths suggest that

the electronic character of the side chain is distributed evenly throughout the carboxylate

region and that each partially negative oxygen is interacting with a Zn cation. Residues in

which one C-O bond is noticeably longer than the other indicate a residue that binds a

metal cation in a monodentate fashion.









Finally, to gauge the importance of Glul51 as a proton acceptor in the initial stages

of AAP peptide hydrolysis active sites were created with and without this residue. It was

shown through DFT minimization that in species containing a OH- or H20 bridge

between the Zn ions, that proton transfer occurred between the bridge and the previously

unprotonated Glul51 residue. This further supports the notion that the bridging species

must either be water or a hydroxide ion. Moreover, it suggests the necessity for an

unprotonated Glul51 before substrate binding can occur.

This work has only scratched the surface of the computational work that can be

performed on the AAP system. Other studies are currently underway to investigate the

effects of 2nd-shell mutations around the active site. That study hopes to identify other

key residues in active site geometry that may also participate in substrate binding or that

may be targets for drug interactions. Some structures have been resolved which contain

some small molecule substrate. Investigation of these structures could be used to better

determine the electronic structure of substrate binding and locate any substrate interaction

with 2nd-shell residues. Another study that is currently being performed is the full QM

minimization of the implicitly solvated protein. In this work, the native unbound protein

is being minimized along with structures containing 2nd-shell mutations. Comparing the

minimized structures of the native protein and mutant proteins will illuminate the effects

of the mutations on the overall structure of the active site.














CHAPTER 5
SURVEY OF DENSITY FUNCTIONAL THEORY METHODS

Introduction

The availability of large-scale parallel high-performance computer clusters is

facilitating the application of ab-initio methods to large chemical systems such as

biomolecules. Density Functional Theory (DFT) methods are a sensible choice for use in

such calculations due to their relatively low expense compared to Hartree-Fock (HF) and

post-HF methods and for the array of specific functionals which can be employed.

However, when presented with a list of all of the DFT methods available, a scientist may

only see an alphabet soup. Choosing an appropriate functional and basis set can be a

daunting task, even for a seasoned computational chemist. One purpose of this survey is

to evaluate a host of widely used DFT methods so that members of the scientific

community at large can find which method is best suited to their needs.

Another way that the data presented in this study may be used is to quantify any

progress made by recent DFT methods. The number of methods available to

computational chemists has greatly increased over the last ten years and it seems that new

functionals are being introduced every month in the literature. This work allows for the

comparison of old "tried-and-true" methods to some of the newer functionals over a

broad sampling of molecular properties.

Ultimately, this survey, which comprised more than 150,000 individual

computational jobs, is the largest of its kind ever performed. Although some DFT

methods have been omitted, a fair sampling of five families of DFT functionals is









presented and evaluated. The end result is a useful reference guide for future research on

large scale organic and biomolecular systems using ab initio methods.

The next section in this chapter outlines the theory and development of ab initio

techniques in a general sense from wave function methods to the most recent density

functionals. Specific methods for calculating individual molecular properties are also

discussed in the next section. Other sections of this chapter provide in-depth analysis of

each facet of this work, addressing each molecular property in turn. The final section

recapitulates the entire study with general conclusions.

Methods

All of the calculations in this work were performed using Gaussian 03 Rev C.011

and version D.01 and the functionalities therein. More detailed property-specific

calculations are described below in individual sections. While the main focus of this work

is to evaluate DFT methods, Hartree-Fock and second-order perturbation (MP2) methods

are included for further comparison. A brief theoretical introduction to density functional

methods is given in this section along with some discussion on the basis sets used in this

work.

Schrodinger's equation can yield the exact energy of a system if the complete wave

function and Hamiltonian are employed. However, a complete wave function and

Hamiltonian are far too computationally expensive to be tractable and are difficult to

define for multi-electron systems. A series of approximations have been adopted to

simplify the Hamiltonian, thereby limiting the number calculations that must be

performed on a system. A complete Hamiltonian for a system ofN electrons takes the

form:









N M2 1 NM NN MZ
H2=- -A1 +Z ZAZ (5-1)
2=1 A=1 A z=1 A=1 A z=1 j>iz r A=IB>A AB

The first term is the classical kinetic energy operator for the electrons. The third

term is the Coulombic term for electrostatic interactions between the electrons and the

nuclei, and the fourth term represents the charge repulsion between electrons. The second

and fifth terms deal with nuclear kinetic energy and charge interactions, respectively, and

are reduced to constants by the Born-Oppenheimer approximation which treats nuclei as

fixed point charges in a field of moving electrons. The reduced form of the Hamiltonian

after the B-O approximation is known as the electronic Hamiltonian of electrons moving

in a field of fixed nuclear point charges:

N N M N N
H = 2 +r (5-2)
i=1 i=1 A=1 A i=1 j>i ij

Following this approximation, the total energy of the system is the sum of the

electronic energy and the constant nuclear charge interaction energy, which is dependent

on the orientation of the nuclei to each other in space. A nuclear Hamiltonian can be used

to account for motion of the nuclei as well. This simply consists of the second term from

equation 5-1 and an added potential for nuclear motion.

The B-O approximation and its associated Hamiltonian satisfactorily describe the

spatial parameters of the electron field. But to fully characterize an electron, spin must be

taken into account. The concept of spin is roughly derived from the Pauli Exclusion

Principle to ensure that no two electrons on an atom exist with the same energy or

quantum configuration. A common visualization is that of one spin-up and one spin-

down electron occupying a full orbital. In other words, one spatial orbital gives rise to








two unique spin orbitals. Electronic spin satisfies the notion of antisymmetry, which

prohibits the existence of two like electrons in the same orbital. A spin orbital, -,,

represents a complete picture of an electron both spatially and in terms of its spin.2

A new approximation is made to deal with the fully represented electrons, allowing

for correct placement of the electrons into orbitals in a manner that satisfies the

antisymmetry rule. When an antisymmetric wave function (equation 5-3), comprised of

the spin orbitals of a ground state N-electron system, is operated upon by a Hamiltonian

the lowest possible energy is returned (equation 5-4).

= 1Z2 N... ) (5-3)

E, = (T H ,) (5-4)

When Eo is minimized with respect to the spin orbitals of 'o, the Hartree-Fock

equation can be used to determine the optimal spin orbitals for the system:

f(O (X ) = (X,) (5-5)

This is the central tenet of the Hartree-Fock approximation and is the common

starting point for more accurate quantum chemical methods. The Fock operator, f(i), is a

one-electron operator and vHF(i) in equation 5-6 is the effective potential incident upon

electron i due to the other electrons in the system. In this representation, the many-body

problem of electron-electron interaction has been reduced to a one-electron problem as

electron-electron repulsion has been treated in an average manner.2

Afi) V2 ZA + (i) (5-6)
2 A=1 1iA