<%BANNER%>

Parameterization of Semiempirical Quantum Mechanical Methods for the Prediction of Nuclear Magnetic Resonance Chemical S...

Permanent Link: http://ufdc.ufl.edu/UFE0024844/00001

Material Information

Title: Parameterization of Semiempirical Quantum Mechanical Methods for the Prediction of Nuclear Magnetic Resonance Chemical Shifts in Biologically Relevant Systems
Physical Description: 1 online resource (160 p.)
Language: english
Creator: Williams, Duane
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2009

Subjects

Subjects / Keywords: am1, mndo, nmr, parameterization, pm3, quantum, semiempirical
Chemistry -- Dissertations, Academic -- UF
Genre: Chemistry thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The applicability of semiempirical quantum mechanical methods for the qualitative description NMR chemical shifts in biomolecules is presented. Full QM chemical shift calculations are performed on protein systems and other biologically relevant compounds. Semiempirical QM Hamiltonians are used in conjunction with the linear-scaling divide-and-conquer approach to solving the self-consistent field equations. The first study investigates the artifacts that arise as a result of performing geometry optimizations at this level of theory on protein systems. Geometry optimizations were performed on a collection of 32 globular protein systems with a variety of secondary structures and a range in size from 54 to 99 amino acid residues. A detailed analysis of the structures is presented. Included is a comparison of the structures generated by performing geometry optimization in vacuum versus utilizing an implicit Poisson-Boltzmann solvation model. Among the important artifacts observed was an inability of the methods to maintain planarity for the planar side chains of several amino acids. In addition, the inability of the vacuum-minimized structures to mask the charge-charge interactions resulted in several unphysical artifacts including proton transfer from positively- to negatively-charged groups. The subsequent studies focus on the development of new NMR-specific semiempirical parameters expressly geared towards the study of proteins and other biologically relevant compounds. New parameters are presented for the prediction of proton and carbon NMR chemical shifts for the AM1 Hamiltonian that were generated using a data set comprised of globular proteins. NMR-specific fluorine parameters were also developed to augment the currently available MNDO-NMR parameter set. Detailed comparisons are made with DFT and empirical methods. The current approach can be employed using semiempirical (AM1/PM3) geometries with good accuracy, and can be executed at a fraction of the cost of ab initio and DFT methods, providing an attractive option for the computational NMR studies for much larger systems.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Duane Williams.
Thesis: Thesis (Ph.D.)--University of Florida, 2009.
Local: Adviser: Merz, Kenneth Malcolm.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2011-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2009
System ID: UFE0024844:00001

Permanent Link: http://ufdc.ufl.edu/UFE0024844/00001

Material Information

Title: Parameterization of Semiempirical Quantum Mechanical Methods for the Prediction of Nuclear Magnetic Resonance Chemical Shifts in Biologically Relevant Systems
Physical Description: 1 online resource (160 p.)
Language: english
Creator: Williams, Duane
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2009

Subjects

Subjects / Keywords: am1, mndo, nmr, parameterization, pm3, quantum, semiempirical
Chemistry -- Dissertations, Academic -- UF
Genre: Chemistry thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The applicability of semiempirical quantum mechanical methods for the qualitative description NMR chemical shifts in biomolecules is presented. Full QM chemical shift calculations are performed on protein systems and other biologically relevant compounds. Semiempirical QM Hamiltonians are used in conjunction with the linear-scaling divide-and-conquer approach to solving the self-consistent field equations. The first study investigates the artifacts that arise as a result of performing geometry optimizations at this level of theory on protein systems. Geometry optimizations were performed on a collection of 32 globular protein systems with a variety of secondary structures and a range in size from 54 to 99 amino acid residues. A detailed analysis of the structures is presented. Included is a comparison of the structures generated by performing geometry optimization in vacuum versus utilizing an implicit Poisson-Boltzmann solvation model. Among the important artifacts observed was an inability of the methods to maintain planarity for the planar side chains of several amino acids. In addition, the inability of the vacuum-minimized structures to mask the charge-charge interactions resulted in several unphysical artifacts including proton transfer from positively- to negatively-charged groups. The subsequent studies focus on the development of new NMR-specific semiempirical parameters expressly geared towards the study of proteins and other biologically relevant compounds. New parameters are presented for the prediction of proton and carbon NMR chemical shifts for the AM1 Hamiltonian that were generated using a data set comprised of globular proteins. NMR-specific fluorine parameters were also developed to augment the currently available MNDO-NMR parameter set. Detailed comparisons are made with DFT and empirical methods. The current approach can be employed using semiempirical (AM1/PM3) geometries with good accuracy, and can be executed at a fraction of the cost of ab initio and DFT methods, providing an attractive option for the computational NMR studies for much larger systems.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Duane Williams.
Thesis: Thesis (Ph.D.)--University of Florida, 2009.
Local: Adviser: Merz, Kenneth Malcolm.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2011-08-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2009
System ID: UFE0024844:00001


This item has the following downloads:


Full Text

PAGE 1

1 PARAMETERIZATION OF SEMIEMPIRICAL QUANTU M MECHANICAL METHODS FOR THE PREDICTION O F NUCLEAR MAGNETIC R ESONANCE CHEMICAL SH IFTS IN BIOLOGICALLY RELEVANT SYSTEMS By DUANE E. WILLIAMS A DISSERTATION PRESENTED TO THE GRADUATE SCH OOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2009

PAGE 2

2 2009 Duane E. Williams

PAGE 3

3 To Sary and my Parents

PAGE 4

4 ACKNOWLEDGMENTS There are many to whom I owe gratitude for the important role that they have played in helping me to earn this degree. For all of your help, support, patience, prayer, love and sacrifice, please know that I am deeply and sincerely grateful. Thank yo u Bing, Ken, Andrew, Adrian, my sister, brother, cousins, aunts and uncles. Thanks also to all of my other friends in the Merz group, QTP, Springhill Missionary Baptist Church, and at Penn State. Thank you Kennie for all that you have done throughout my gr aduate career. A special thanks is due to my mother, Martin, David, and Sary for all of your support.

PAGE 5

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ............................................................................................................... 4! LIST OF TABLES .......................................................................................................................... 7! LIST OF FIGURES ......................................................................................................................... 9! ABSTRACT .................................................................................................................................. 11 CHAPTER 1 INTRODUCTION .................................................................................................................. 13! 2 THEORY AND METHODS .................................................................................................. 18! 2.1 Theoretical and Computational Nuclear Ma gnetic Resonance .................................... 18! 2.1.1 General NMR Theory ....................................................................................... 18! 2.1.1 Computational Considerations ......................................................................... 20! 2.2 Basis Sets and QM Descriptions of Molecular Geometries ......................................... 23! 2.3 Hartree-Fock Theory .................................................................................................... 28! 2.4 Self-Consistent Field Theory ........................................................................................ 32! 2.5 The Semiempirical Approximations ............................................................................. 33! 2.6 Semiempirical NMR Calculations in the FPT -GIAO Approach .................................. 38! 2.7 Divide and Conquer ...................................................................................................... 42! 2.8 Geometry Optimization ................................................................................................ 43! 2.9 Semiempirical Parameter Optimization Using a Genetic Algorithm ........................... 45! 3 SEMIEMPIRICAL PROTEIN GEOMETRIES .................................................................... 48! 3.1 Introduction .................................................................................................................. 48! 3.2 Methods ........................................................................................................................ 50! 3.2.1 Experimental Data ............................................................................................ 50! 3.2.2 Data Set Preparation ......................................................................................... 52! 3.3 Results and Discussion ................................................................................................. 53! 3.3.1 In Vacuo Optimization Artifacts ...................................................................... 53! 3.3.2 Effects on Bond Lengths .................................................................................. 57! 3.3.3 Effects on Bond Angles .................................................................................... 62! 3.3.4 Effects on Torsional Angles ............................................................................. 63! 3.3.5 Effects on Atomic Charges ............................................................................... 67! 3.3.6 Effects on Non -Bonded Atomic Contacts ........................................................ 70! 3.3.7 Effects on Hydro gen-Bonds ............................................................................. 72! 3.4 Conclusions .................................................................................................................. 73! 4 1H AND 13C NMR PARAMETERIZATION OF AM1 FOR PROTEIN SYSTEMS ........... 75!

PAGE 6

6 4.1 Introduction .................................................................................................................. 75! 4.2 Methods ........................................................................................................................ 77! 4.2.1 Parameterization ............................................................................................... 77! 4.2.2 Experimental Data ............................................................................................ 80! 4.2.3 Data Set Preparation ......................................................................................... 86! 4.3 Results and Discussion ................................................................................................. 87! 4.3.1 General Results ................................................................................................. 87! 4.3.2 Drawbacks ........................................................................................................ 94! 4.3.3 AM1-NMR ....................................................................................................... 97! 4.3.4 AM1-NMR-C (Carbon) .................................................................................... 99! 4.3.5 AM1-NMR-H (Hydrogen) ............................................................................. 100! 4.4 Conclusions ................................................................................................................ 101! 5 19F NMR PARAMETERIZATION OF MNDO FO R BIOLOGICALLY RELEVANT SYSTEMS ............................................................................................................................ 114! 5.1 Introduction ................................................................................................................ 114! 5.2 Methods ...................................................................................................................... 115! 5.2.1 Parameterization ............................................................................................. 115! 5.2.2 Experimental Data .......................................................................................... 119! 5.3 Results and Discussion ............................................................................................... 121! 5.4 Conclusions ................................................................................................................ 130! 6 CONCLUSIONS .................................................................................................................. 144! LIST OF REFERENCES ............................................................................................................ 148! BIOGRAPHICAL SKETCH ....................................................................................................... 160!

PAGE 7

7 LIST OF TABLES Table page 2-1 Properties of some common NMR active nuclei. .............................................................. 19! 2-2 Point charge distributions for s,p basis .............................................................................. 36! 3-1 Protein data set .................................................................................................................. 51! 3-2 Distribution of amino acids in protein data set. ................................................................. 52! 3-3 AMBER atom types used in parm94 parameter set .......................................................... 61! 3-4 Percentage of torsional angles found in regions of the Ramachandran plot ..................... 65! 3-5 Average number of hydrogen bonds in protein structures ................................................ 65! 3-6 Secondary structural content of proteins ........................................................................... 66! 4-1 Comparison of standard versus NMR-optimized AM1 parameters .................................. 77! 4-2 Small molecule 1H and 13C RMS errors using !ref values optimized on the small molecule data set. .............................................................................................................. 81! 4-3 Small molecule 1H and 13C RMS errors using !ref values optimized on the protein data set. .............................................................................................................................. 82! 4-4 Protein systems used in the parameterization .................................................................... 83! 4-5 Comparison of 1H RMS errors (ppm) by protein. ............................................................. 90! 4-6 Comparison of 13C RMS errors (ppm) by protein. ............................................................ 90! 4-7 Protein 1H and 13C RMS errors ......................................................................................... 91! 4-8 RMS errors of 1H and 13C NMR chemical shifts for complete protein set. ...................... 94! 4-9 Correlation of 1H and 13C NMR chemical shifts for complete protein set ........................ 94! 4-10! Experimental and calculated chemical shifts for small molecule data set. ..................... 103! 5-1 Comparison of standard and NMR-optimized MNDO parameters ................................. 122! 5-2 Comparison of errors associated with each method for 100 compounds using the shielding constant calculated for CFCl3 as the !ref value in Equation 5 -1. ..................... 124! 5-3 Comparison of errors associated with each method for 100 compounds using the average signed error as the !ref value in Equation 5 -1. ................................................... 125!

PAGE 8

8 5-4.! Comparison of NMR chemical shifts (ppm) grouped to facilitate analysis fluorinated aliphatic chains.. ........................................................................................... 132! 5-5! Chains containing heteroatoms ....................................................................................... 133! 5-6! Aliphatic rings ................................................................................................................. 134! 5-7! Non-aromatic double bonds ............................................................................................ 136! 5-8! Bicyclic compounds ........................................................................................................ 137! 5-9! Five member heterocycles ............................................................................................... 138! 5-10! 6-member heterocycles .................................................................................................... 140! 5-11! Benzene Deriv atives ........................................................................................................ 141!

PAGE 9

9 LIST OF FIGURES Figure page 2-1 Point charge configurations corresponding to monopole (q), dipole ( !), linear quadropole (Q! !) and square quadropole (Q! ") moments. ................................................ 37! 2-2 Schematic diagram of a genetic algorithm. ....................................................................... 47! 3-1 Average heavy atom RMSD (no hydrogen atoms included) for minimized structures relative to crystal structures. .............................................................................................. 54! 3-2 Example of proton transfer occurring between charged residues. .................................. 55! 3-3 A more unphysical example of proton transfer in which the C! from the amino acid backbone loses the hydrogen atom to the carboxylate group of the same amino acid. ..... 55! 3-4 Chemical bond formation in 1HOE vacuum minimized structure using the PM3 Hamiltonian. ..................................................................................................................... 56! 3-5 A spurious sulfur bond formed between two disulphide bridges in a vacuum minimized PM3 structure. ................................................................................................. 57! 3-6 Average bond lengths in protein structures for some common bonds involving heavy-atoms. ..................................................................................................................... 58! 3-7 Average bond lengths observed for common bonds involving hydrogen. ....................... 59! 3-8 Average bond angles observed for common bonds.. ......................................................... 59! 3-9 Percentage of residues exhibiting out -of-plane side -chain conformations. ..................... 66! 3-10 Loss of planarity observed in t he tyrosine residues of the vacuum minimized structures due to charge -charge interactions. .................................................................... 67! 3-11 Average lysine charges found in protein systems. ........................................................... 68! 3-12 Typical salt bridge formation observed in when going from the crystal structure (left) to the vacuum minimized structure (right). ....................................................................... 69! 3-13 Vacuum PM3-CM2 charges before and after minimization for side chain atoms in Figure 3-12. ....................................................................................................................... 69! 3-14 Energy decomposition using the MM force field parm94 in AMBER. .......................... 71! 4-1 Distribution of amino acids in protein training set. ........................................................... 84! 4-2 Distribution of amino acids in complete protein data set (black ) compared to test .......... 84!

PAGE 10

10 4-3 13C chemical shift correlation for small molecule data set using AM1 -NMR parameters. ....................................................................................................................... 93! 4-4 1H chemical shift correlation for small molecule data set using AM1 -NMR param eters. ....................................................................................................................... 93! 4-5 Decomposition of 13C NMR R2 value for PDB ID 2JN0 using AM1 -NMR-C parameters. ....................................................................................................................... 96! 4-6 Decomposition of 13C NMR R2 value for PDB ID 2JN0 with SHIFTX. ......................... 97! 5-1 Histogram of the distribution of experimental chemical shifts in the full data set (training set and test set of 123 unique chemical shifts for 100 compounds.) ................ 120! 5-2 Box and whiskers plot of the average unsigned error for 19F NMR chemical shifts. ..... 128! 5-3 Correlation between experimental and calculated (MNDO -NMR//AM1) 19F chemical shifts using the newly generated fluorine parameters.. ................................................... 129!

PAGE 11

11 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy PARAMETERIZATION OF SEMIEMPIRICAL QUANTU M MECHANICAL METHODS FOR THE PREDICTION O F NUCLEAR MAGNETIC RESONANCE CHEMICAL SHIFTS IN BIOLOGICALLY RELEVANT SYSTEMS By Duane E. Williams August 2009 Chair: Kenneth M. Merz, Jr. Major: Chemistry The applicabili ty of semiempirical quantum mechanical methods for the qualitative description NMR chemical shifts in biomolecules is presented. Full QM chemical shift calculations are performed on protein systems and other biologically relevant compounds. Semiempirical QM Hamiltonians are used in conjunction with the linear -scaling divide -andconquer approac h to solving the self-consistent field equations. The first study investigates the artifacts that arise as a result of performing geometry optimizations at this level of theory on protein systems. Geometry optimizations were performed on a collection of 3 2 globular protein systems with a variety of secondary structures and a range in size from 54 to 99 amino acid residues. A detailed analysis of the structures is presented. Included is a comparison of the structures generated by performing geometry optimiz ation in vacuum versus utilizing an implicit Poisson -Boltzmann solvation model. Among the important artifacts observed was an inability of the methods to maintain planarity for the planar side chains of several amino acids. In addition, the inability of th e vacuum -minimized structures to mask the charge-charge interactions resulted in several unphysical artifacts including proton transfer from positively to negatively -charged groups.

PAGE 12

12 The subsequent studies focus on the development of new NMR -specific semiempirical parameters expressly geared towards the study of proteins and other biologically relevant compounds. New parameters are presented for the prediction of proton and carbon NMR chemical shifts for the AM1 Hamiltonian that were generated using a data set comprised of globular proteins. NMR -specific fluorine parameters were also developed to augment the currently available MNDO -NMR parameter set. Detailed comparisons are made with DFT and empirical methods. The current approach can be employed using sem iempirical ( AM1/PM3) geometries with good accuracy and can be executed at a fraction of the cost of ab initio and DFT methods, providing an attractive option for the computational NMR studies for much larger systems.

PAGE 13

13 CHAPTER 1 INTRODUCTION Nuclear Magnetic Resonance (NMR) Spectroscopy is one of the premier instrumental procedures for examining the structure and dynamics of biologically relevant molecules ranging in size from sma ll organic compounds to large bio -macromolecules .1,2 NMR takes advantage of the natural properties of certain isotopes of atoms to reveal information about that atoms chemical environment through their interaction with electromagnetic radiation. The technique can broadly be applied for the acquisition of a range of important data for bio molecules because it requires relatively mild conditions.1 By examining the effect of low energy radiation on certain nuclei, NMR experiments yield a range of information about the molecule of interest. Among the most important pieces of data that can be gathered in an NMR experiment are the chemical shifts of the atoms in the molecule. Chemical shifts are a critical component in the determination of the three-dimensional structur es of biological macromolecules by NMR spectroscopy. One example of the utility of chemical shifts is demonstrated in the prediction of protein secondary structure using the chemical shift data for C! and C" atoms in protein systems.3 5 More recently, the three dimensional structures of protein systems have been elucidated solely from chemical shift data both in solution and in solid state NMR.6 8 Accurate theoretical chemical shift prediction can facilitate various types of experimental NMR studies in proteins and as well as other biological systems.9 The ability to make routine, quick, and accurate predictions of fluorine NMR chemical shifts for biological molecules (ranging in size from a few hundred atom s to many thousands) can aid in structure elucidation, give insight into the binding modes of ligands in proteins, and add valuable information on

PAGE 14

14 dynamics of biological systems. The ideal method of calculating chemical shifts in the biological environment should be versatile and sufficiently fast to enable it to be applied to large systems. There are three main classes of computational methods of predicting NMR chemical shifts in large biomolecules. These are quantum mechanical, classical and empirical. There is often significant overlap between classical and empirical approaches. A general discussion of the classical/empirical methods is presented in the Theory and Methods section of this dissertation. The quantum mechanical prediction of NMR chemical sh ifts is the focus of this dissertation. In particular, the focus is on the improving the accuracy of approximate QM methods for the prediction chemical shifts of large proteins and other biomolecules. A brief discussion of the reasons for using a QM approach follows: Modern QM approaches offer several advantages over empirical or classically based approaches for the prediction of NMR chemical shifts. One advantage is that the electrostatic representation is more accurate because environmental, conformation al, polarization and charge transfer effects are explicitly included, in contrast to the process used in simplified models. A second advantage is that the ways in which quantum chemical methods can be improved have been thoroughly documented (e.g., inclusi on of correlation and improvements in the basis set.) In principle, QM calculations are able to include each of the effects that influence a given chemical shift. This lends QM methods the beneficial quality that empirical methods do not have that of being more generalized and applicable to the wide variety of organic molecules of interest as ligands in biochemistry studies. Unfortunately, the expense of ab initio and Density Functional Theory (DFT) computation prohibits their routine application to protei n systems containing thousands of atoms.10

PAGE 15

15 We are currently unaware of any of ab initio techniques that have been shown to routinely and quickly calculate the NMR chemical shifts of large systems with significant non -bonded interactions (such as proteins) which are of interest to NMR spectroscopists. Semiempirical methods have previously been investigated f or their promise in the area of predicting chemical shifts for organic small molecules.11,12 The standard semiempirical parameters were not developed to reproduce NMR chemical shifts. It has been found that using th ese standard parameters, the semiempirical MNDO13 Hamiltonian underestimates the size of the gap between the atomic HOMO and LUMO and therefore systematically underestimates of the atomic excitation energies. This results i n an over estimation of the slight paramagnetic effect and consequently renders the method incapable of reproducing chemical shifts. The finite perturbation theory (FPT) de veloped in the framework of MNDO using gauge including atomic orbitals (GIAO) has s hown promising results for the calculation of 1H and 13C chemical shifts.11,12 The use of the linear scaling Divide-and-Conquer (DC) algorithm for the diagonalization of the Fock matrix in conjunction with the finit e perturbation theory has expanded the purview of this approach to handle much larger systems.14,15 This method has recently been implemented using NMR -specific parameters developed to reproduce 1H, 13C, 14N, 15N and 17O chemical shifts and has been shown to give fast and accurate results that can be applied to large protein -ligand complexes.1416 The focus of this dissertation is generating new NMR-specific semiempirical para meters with the target of improving the predictive ability for a subset of biological compounds. The second chapter of this dissertation summarizes the theoretical framework for these calculations: the approximations on the QM Hamiltonian that enable the c alculation s on large systems to be carried out routinely. Before discussing the calculations Chapter 2 briefly

PAGE 16

16 summarizes the basis for the atomic property of NMR activity, the physical principles on which the methods are based. In order to more fully exp lain the semiempirical approximations this chapter also briefly outlines Hartree-Fock theory and the SCF procedure, highlighting the major differences between ab initio and the semiempirical approximate QM methods. Many of the topics discussed in this chap ter represent vast and complex areas of science and it is not feasible to discuss them thoroughly here; therefore, they are briefly outlined in order to give a general framework for the remainder of the dissertation. Because protein systems are an importan t target group for these NMR investigations, the third chapter discusses some of the geometric artifacts that result from using the semiempirical approximation being used in the study of protein systems. Full QM geometry optimizations are carried out and c omparisons are made between two of the more common semiempirical approaches used to investigate protein systems. Additionally, this chapter investigates the importance of solvent in geometry optimizations of protein systems by including the results of full QM calculations carried out in a Poisson -Boltzman implicit solvent model. In the forth chapter the parameterization of the semiempirical AM117 Hamiltonain to reproduce 1H and 13C chemical shifts in protein systems is discussed. These parameters were developed on a training set of 5 protein systems and tested on a set of 10 protein systems. A detailed analysis of the new protein -specific NMR parameters for AM1 is presented. The AM1 Hamiltonian was chosen rather than MNDO because AM1 has a qualitative abil ity to account for the energetic s associated with hydrogen bonding interactions that are particularly important in protein systems. Chapter five details the parameterization of MNDO to reproduce 19F chemical shifts in a variety of biomolecules. Although f luorine is not endogenous to protein systems, the use of

PAGE 17

17 fluorine NMR in the study of biological systems is a significant area of research, particularly in drug design.18 The MNDO Hamiltonian was used because it has already been parameter ized to reproduce 1H, 13C, 14/15N, and 17O chemical shifts.12 Therefore, the new fluorine parameters were developed to augment that general set of NMR parameters. The final chapter of this dissertation briefly reviews a few of the more important results found in the p resent works. More detailed discussions of the key points can be found in the conclusion sections of the individual chapters. It is hoped that this dissertation sheds light on the promise as well as the limitations of semiempirical QM methods in the NDDO19 approximation being applie d to the study of large biomolecules in general and proteins in particular.

PAGE 18

18 CHAPTER 2 THEORY AND METHODS 2.1 Theoretical and Computational Nuclear Magnetic Resonance 2.1.1 General NMR Theory Within the nuclei of atoms, individual protons and neutrons ex hibit a magnetic field which is the result of a quantum mechanical property called spin. Nuclei with unpaired protons or neutrons possess a net magnetic dipole moment and a corresponding spin angular momentum represented by and S, respectively. It is use ful to relate these terms through their projection onto an arbitrary z -axis: Z="Sz (2-1) where #, the gyromagnetic ratio, is a characteristic of the nucleus. Quantum mechanics dictates that the spin angular momentum is quantized such th at S may be zero, integer or half-integer values and Sz may have 2S+1 different values within the range of S to S. The applications here involve spin systems, for which there are two spin states; by convention these states are referred to as spin up and spin down. In the absence of an externally applied magnetic field the spin states are degenerate and there is no preference for the direction of alignment of the individual nuclear magnetic moments. Applying an external magnetic field( B ) to the system resolves this degeneracy by inducing a net alignment with the magnetic field; by convention, the field is placed on the z-axis and the lower energy orientation is the spin up state. Table 2-1 contains the values for # as well as other properties for some common spin nuclei used in NMR studies.

PAGE 19

19 Table 2 -1. Properties of some common NMR active nuclei. Isotope Natural Isotopic Abundance (%) Unpaired Protons Unpaired Neutrons # /2 $ (MHz/T) 20 Chemical Shift Range ** 1 H 99.99 1 0 42.58 13 13 C 1.11 0 1 10.71 200 15 N 0.37 1 0 4.32 900 19 F 100.00 1 0 40.08 700 31 P 100.00 1 0 17.25 430 *Data correspond to an applied field of 104 gauss = 1 Tesla **The chemical s hift range is for the majority of compounds. Occasionally signals will be observed outside this range. The difference in energy between two neighboring spin states is proportional to the strength of the externally applied field: E = !#N" B = h$ (2-2) where, "=#! B 2$ and = h 2$ (2-3) Here h is Plancks constant and the subscript N indicates that the gyromagnetic ratio is a characteristic of the nucleus. It is possible to promote a transition between these spin states using an oscillating magneti c field of the appropriate resonance frequency, %. The frequency of radiation that is required to promote this transition from the lower energy state to the higher energy state is referred to as the resonance or Larmor frequency and can be precisely determ ined by Equation 2 -3. Equation 2 -3 implies that the frequency used to promote the transition is dependent only upon the atom and the externally applied field. However, different atoms experience slightly different fields locally depending on their function al group as well as neighboring non -bonded atoms. The result is that each atom absorbs the radiation at a very specific frequency that is characteristic of its local environment by: E = !#N( 1 $%N) B = h& (2-4)

PAGE 20

20 where &N is the chemical shielding constant t hat is characteristic of not only the kind of nucleus but also of its chemical environment. The chemical shift corresponds to the change in resonance frequency, %, that results from the difference between Equations 2 -2 and 2 -4. The evaluation of the chemic al shifts is the focus of the works presented here because they reveal very particular information regarding the bonding and chemical environment of an atom, and consequently the molecular structure. For many very small organic molecules the NMR spectra consist of a relatively uncomplicate d chemical shift pattern that can be interpreted in a straightforward manner As molecules become more complex, interpreting the c hemical shift pattern becomes a more exacting task.1 Because of the importance of these properties and the uncertainty that continues in their measurements, theoretical prediction to assist in the interpretation of NMR spectra of bio-macromolecules continues to be a very active area of computational research.21 The determination of protein structures in solution NMR is a multi -step process. This process generally consists of sample preparation, data acquisition, peak -picking, resonance assignment, the collection of st ructural restraints and, finally, structure calculation and structure refinement Depending on the speed of the method, an accurate computational prediction of the chemical shifts may have a significant impact in the resonance assignments and structure refinement. Various classical and empirical approaches to chemical shift prediction are currently available to assist with structure refinement.3,22 26 2.1.1 Computational Considerations The description of the QM calcu lation of NMR chemical shifts was given in the early 1950s.27,28 H owever, the practicality of carrying out these calculations computationally for relevant systems has proven to be a non -trivial task. In the past two decades advances in the

PAGE 21

21 approach to calculate these quantities, as well as advances in the computational r esources, have made chemical shift calculations on small organic molecules feasible with correlated ab initio procedures.2938 One of the more significant challenges faced in the early QM calculations of the chemica l shifts centered around the vector potential used to describe the effect of the magnetic field on the nucleus. The use of approximate wave functions to describe the atomic orbitals does not guarantee invariance to the gauge of the vector potential.39 Therefore, i f the origin of the gauge were chosen to be x=0, y=0, z=0 in the Cartesian frame, the chemical shielding constant would depend upon the location of the nucleus in the Cartesian frame. This led to arb itrary results that could not easily be compared to exper iments or other calculations. The most widely accepted remedy is the use of Gauge Including Atomic Orbitals (GIAOs)40,41 in the implementation developed by Wolinski, Hinton and Pulay.42 This is the strategy implemented in the works presented here. Early attempts to implement this approach were unsucce ssful due to a variety of technical problems.41,43 45 Various alternative approaches to ensure Gauge invariance were not as widely used due in large part to low extendibility of the methods. Other problems encountered in the QM calculation of chemical shifts included the dependence of the results on electron correlation and on the choice of a basis set.30 Because electron correlation was found to affect the quality of the results, various post -Hartree-Fock methods have been investigated for their ability to reliably predict the chemical shifts.30,46,47 Overwhelmingly, the computational cost of performing these calculations has limited their usefulness for the systems of interest here. The implicit inclusion of electron correlation and the speed of density functi onal theory48,49 methods have made them a focus of a large number of theoretical NMR investigations.30,50 52

PAGE 22

22 These studies have shown that in some instances DFT methods are able to outperform MP2 calculations in the prediction of the chemical shifts.53 However, in addition to being dependent upon the choice of functional, DFT performance is also quite dependent upon the choice of basis set.30 Empirical approaches have been very successful in the prediction of chemical shifts for proteins systems.3,25,26,54 57 These methods generally benefit from the use of extensive parameterization even though the actual methods of parameterization may vary. One succe ssful strategy used in these programs has been to employ databases of pre -calculated chemical shift values for the different atom types, then modify these values based on classically calculated quantities for the evaluation of ring current, electric field, hydrogen bonding and solvent effects.55 As ill ustrated in chapter 4, empirical approaches currently offer accurate predictions of chemical shifts for very large systems several thousands of atoms and do so at a cost that is orders of magnitude smaller than QM approaches. These methods typically requ ire less than a few seconds to determine the chemical shifts for the protein system. QM methods appropriately recognize the importance of accurate bond lengths, angles and torsions in the NMR calculation and as a consequence, these calculations must often be run after an initial geometry optimization at an appropriate level of theory. This adds to the cost of the calculation because QM geometry optimization starting from a poor initial structure is more computationally expensive that the NMR calculation it self. Some of the empirical methods can be employed without this step because they are not sensitive to their effects on the chemical shift.55 With the combination of speed and accuracy offered by this approach, empirical calculations are a very attractive option for NMR studies of very large systems. However,

PAGE 23

23 because protein systems generally comprise only a small number of functional groups, it is easier to generate parameters for unbound protein systems than for a large variety of organic ligands. Empirical methods cannot easily be applied to protein -ligand complexes because parameterization that would account for the wide variety of ligand structures is significantly more challenging. The semiempirical QM approach to the calculation of NMR chemical shifts is attractive because their computational cos t is significantly less than ab initio and DFT methods and their flexibility enables them to be used for a large variety of compounds, including protein -ligand complexes. They are of particular interest here because a tested linear-scaling implementation o f the semiempirical methods is employed, enabling their routine use on very large systems. The studies that follow concern the generation of parameters for the semiempirical methods to improve their accuracy for the evaluation of chemical shifts for large protein systems. If the error in semiempirical methods could be significantly improved for relevant systems, they would be a very attractive option for a range of studies. As illustrated in chapter 5, the creation of new NMR-specific parameters enables the semiempirical methods to achieve accuracy comparable to the more computationally expensive DFT methods. 2.2 Basis Sets and QM Descriptions of Molecular Geometries Before addressing the particular method of calculating NMR chemical shifts used in this text, it is first necessary to discuss some of the basic theory that underlies electronic structure calculations. This section is aimed at developing a framework on which to discuss the evaluation of the molecular energies in the Hartree-Fock approximation. Hatree-Fock theory is then presented to serve as a backdrop for the semiempirical methods, the methods through which the NMR chemical shifts are calculated. All of the discussions of quantum mechanics in this dissertation assume certain common approximatio ns. Two of these approximations can be seen

PAGE 24

24 in the use of the time -independent Schdinger equation and the omission of relativistic effects (the relativistic nature of spin is not discussed). Other approximations will be presented as they become relevant. The magnetic resonance of nuclei is evaluated quantum mechanically with respect to the energy of the system. In order to evaluate the energy for any molecular system it is first necessary to establish the coordinates of the system. These can be either use r specified or adopted from a previous calculation, as is the case with geometry optimization, which will be discussed in greater detail in a later chapter. The system is then described in terms of atomic orbitals, and ultimately molecular orbitals, that a re generally centered on the Cartesian coordinates specified for the nuclei. At the heart of quantum mechanical procedures is the ability to account for the distribution of electrons in a molecule. Basis functions are employed for this purpose. Basis func tions are functions used to describe a quantum mechanical wave function. Their moduli squared describe the spatial distribution of electrons as a function of their distance from the nucleus and are represented by '(r), or simply '. The two common types of basis functions are Slater Type Orbitals58 (STOs) and Gaussian Type Orbitals59 (GTOs). The functional forms are shown below: "STO= Nrn # 1e#$rYl m (2-5) "GTO= Nxiyjzke#$r2 (2-6) where Y is the complex spherical harmonic n is the orbital s principal quantum number and r is the distance of the electron from the nucleus (cf. r and riA presented later). The normalization constant, N, ensures that integration of the probability density over all space is unity. The orbital exponent, !, is a parameter that is generally fit to reproduce experimental quantities (cf. and ( presented later). It determines the radial extent or diffuse character of the orbital These values

PAGE 25

25 for N and are specific to the basis set and are constant throughout the calculations that solve for the energy of the system. The fact that even ab initio methods use these types of basis sets highlights an important fact. Parameterization and data set preparation and validation are truly central components of the f ield of quantum chemistry today In the GTOs, x, y and z are the Cartesian coordinates an d their exponents i, j and k enable an appropriate description of the different types of orbitals. For example, i+j+k=0, 1, and 2 represents an s -type, p -type, and d -type Gaussian orbital respectively. The functional form of these two types of orbitals differs in the exponential term, resulting in very consequential practical differences. Two significant differences are that GTOs do not have a cusp at r=0 and they approach zero too rapidly as r increases. Therefore, STOs yield a more physically accurate description of atomic orbitals. However, it is simpler to manipulate GTOs because the product of two Gaussians is a third Gaussian (the Gaussian Product Rule) and because the integrals of Gaussians can be evaluated in a very generic fashion. Therefore, for co mputational purposes, GTOs have become the frequently applied method.60 In order to address the shortcomings of using GTOs, Slater functions are approximated as a linear combination of GTOs with molecular orbital coefficients optimized to best reproduce the Slater function. Each of the Gaussian term s that is used to describe the atomic orbital is referred to as a primitive. The simplest form of this linear combina tion of GTOs that is frequently used in ab initio calculations is the Slater Type Orbital that is a linear combination of 3 Gaussians, hence the name STO-3G. This is commonly referred to as a minimal basis set because it describes each occupied atomic orbi tal using just one basis function. The minimal basis set has insufficient flexibility to accommodate charged atoms or atoms in which the electron density is not spherically symmetric. This basis set has been found to do fairly well for reproducing geometri es

PAGE 26

26 of small organics, but not as well for systems that require a more subtle description of the asymmetric electron density around an atom.60 There are four very common improvements upon the minimal basis set. These enhancements increase the computational expense to varying degrees based upon the size of the system and the Hamiltonian. The first improvement is to give more flexibility to the basis set by describing valence electrons differently from core electrons. Using split -valence basis sets allows for different orbital exponents to describe the core electrons and the valence electrons. The shape of valence orbitals changes (eg. upon bonding valence orbital si ze may increase along the bonding axis) and can benefit from a slightly different description from that used for the orbitals of the core electrons. Because this allows for the use of more than one orbital exponent, which is represented by the Greek symbol (, these basis sets are often called double -zeta or triple-zeta basis sets. Note that this ( is analogous to the orbital exponents represented by previously in Equation 2 -6. The second improvement upon the minimal basis set is to add diffuse functions by adding Gaussian type functions with slightly reduced orbital exponent values for chosen atoms. Diffuse functions represent loosely bound electrons and are particularly useful to represent anions. The third improvement is to add polarization function s by increasing the angular momentum via altering the values of i, j and k, the exponents of the x, y and z in Equation 2 -6. Polarization functions allow for better representations of some bonding situations. The fourth improvement is to use correlation co nsistent basis sets, basis sets that are designed to account for electron correlation, which is useful in post Hartree -Fock ab initio calculations.

PAGE 27

27 With the one -electron spatial orbitals defined, it is necessary to look at the molecular orbitals The sh ape of the molecular orbitals are conveniently represented as a linear combinati on of atomic orbitals (LCAO) as: "i#= Ci#$ % (2-7) Where )i is the ith molecular orbital, is the spin of the electron and is the atomic orbital. Solving for the molecular orbital coefficients, C, is the critical step in the energy calculations because it reveals the extent to which an electron is occupying every atomic orbital in each molecular orbital Thus, in the LCAO approximation, the molecular orbital coeff icients describe the electron density of the entire syst em in terms of GTOs and enable the calculation of the energy. In some portions of the following discussion it is useful to define the spin of the electron in order to specify that one electron is bei ng discussed at a time. The systems will be defined in terms of spin orbitals. Analogous equation exists for spin orbitals. The charge density of all electrons in spin occupied orbitals is defined as the square of the modulus of the molecular orbital s as shown below: "#=$i# 2 i occ%=$i#i occ%$i# "# $%& Inserting Equation 2 -7 into Equation 2 -8 yields: "#= C$i#%$ $&i occ&Ci#% & (2-9) "#= Ci#C$i#i occ%& ( ( ) + + $%,,$ (2-10)

PAGE 28

28 "#= P$ #%%$ $& (2-11) where P" # is an element of the molecular orbital density matrix that defines the location of the electrons. The s um over and % ensures that all possible combinations of two atomic orbitals is explored. Taking into account the electrons angular momentum, molecular orbitals can be expressed in terms of spin orbitals. Spin orbitals, represented by *, are the product of the spatial and spin component as shown below: "( x ) =#( r )$(%) (2-12) "( x ) =#( r )$(%) (2-13) where, x is the general coordinate: x={r, +}, and and spin functions represent spin up and spin down respectively. Using these definitions of molecular geometries and basis functions it is now possible to discuss the quantum mechanical evaluation of the molecular energy. 2.3 Hartree -Fock Theory With the molecular geometry specified, the discussion of Hartree -Fock theory continues by defining the total Hamiltonian for the evaluation of the ground state energy in a chemical system. The wave function is a central element in quantum mechanics because when operators act upon it, they return observable properties of the system. In the time -independent Schrdinger equation, the wave function is expressed as an eigenfunction of the Hamiltonian operator whose eigenvalue is the energy of the system as shown: (2-14)

PAGE 29

29 In the Hartree -Fock approximation, the ground state wave function for an N electron system is then approximated by a single Slater determinant in terms of the one -electron spin orbitals as: (2-15) where (N!)1/2 serves as a normalization constant. Therefore, via a single determinant of spin orbitals, this wave function describes a probable location of all electrons in the system based on the choice of geometry and basis set. Determining the appropriate linear combination yields the energy. This depends upon the use of an appropriate Hamiltonian operator. In order to discuss the Hami ltonian, it is first necessary to present some key terms. There are M nuclei with coordinates RA and N electrons with coordinates ri. Distances are defined as follows: (2-16) (2-17) (2-18) Using these terms, the total Hamiltonian for the system is ev aluated as a series of two -body interactions in the five -term equation that follows: (2-19) where,

PAGE 30

30 (2-20) It is important to note that this equation is expressed in atomic units. The positive terms indicate the repulsive nature of the interaction an d the negative terms indicate that the interaction is attractive. The first term refers to the repulsive force that is felt between two nuclei as a result of both having positive charges, the second to the kinetic energy of the nuclei, and the third to the attractive force that an electron feels to a nucleus. The forth term describes the kinetic energy of an electron and the fifth accounts for the repulsive force between two electrons. For the studies of interest here, a convenient variation of this Hamilt onian expresses just the electronic component of the Hamiltonian in the Born -Oppenheimer approximation. In this approximation it is assumed that the nuclei are moving so slowly relative to the electrons that they are essentially fixed in their position The kinetic energy of the nucleus can therefore be ignored and the inter -nuclear repulsion terms can then be evaluated separately and subsequently added to the electronic energy The equation is reduced to an electronic Hamiltonian shown below: (2-21) where, (2-22) The electronic energy is then evaluated as: E0= "0H "0 =#ih #i i= 1 N$+ 1 2 #i#j#i#j %#i#j#j#i [ ]j = 1 N$i= 1 N$ (2-23)

PAGE 31

31 The first term is the one -electron integral. For the general case, where i ,j, this is defined as: "i h"j = dx1#"i *( x1) h ( 1 )"j( x1) (2-24) The second term contains two two -electron integrals that ev aluate the interaction energies of one electron with each of the other electrons in the system. They are evaluated as: (2-25) With the meaning of the integrals being fully expressed, Hartree -Fock theory continues with the derivation of the Roothaan -Hall equation: FC = SC" (2-26) The Fock operator is represented by F, C is the matrix of the molecular orbital coefficient matrix, and S is the overlap matrix. The goal of the Hartree -Fock procedure is to obtain -, the diagonal matrix for molecular orbital energies. T he Fock operator is of the form: F"= H"+ P#$ # $%" $#( )& 1 2 $ #"( )' ( ) + (2-27) The H term is the core Hamiltonian. The two final integrals correspond to the Coulomb repulsion and exchange integrals. The fraction is there because it is only necessary to account for exchange among electrons of the same spin. In the HF procedure, is obtained by solving for the molecular orbital coefficients that best describe the system for the chosen basis set and geometry. In order to solve for C, it is necessary to solve the F. However, the Fock matrix (F) depends upon the molecular orbital density matrix (P), which in turn consists of molecular orbital coefficients (C). Because of this interdependence of terms, the Roothaan-Hall equations must be solved self-consistently. This is accomplished by using self-consistent field equations that are discussed below.

PAGE 32

32 2.4 Self-Consistent Field Theory The Roothaan -Hall equations61 are solved using the self-consistent field procedure to obtain the molecular energies. In general the Roothaan -Hall equations can be transformed into a standard eigenvalue problem. The key concept in the SCF procedure is that given an initial guess for the density matrix, the solution to the equations will result in a set of molecular orbital coefficients that define a lower energy density matrix; this can in turn be used to form a new Fock matrix. The SCF procedure starts by construct ing an approximate Fock matrix using a guess molecular orbital density matrix and calculating the core -Hamiltonian, the two -electron integrals, and the overlap matrix. The overlap matrix is then diagonalized to form a transformation matrix. Using this transformation matrix allows for the Fock matrix to be transformed and diagonalized to give the coefficient matrix and the orbital energies. Transformation of the coefficient matrix using the same transformation matrix produces a new molecular orbital density matrix. If there is no change between the new density matrix and the starting density matrix, the system is self consistent and said to be converged. If the new density matrix exhibits a significant improvement upon the initial guess, the procedure is repe ated using the new density matrix as the guess. This is done until the system is converged. Several parts of the individual SCF calculations result in this procedure being computationally expensive. First, the overlap matrix is not initially diagonal because the basis functions used do not form an orthonormal set. Therefore solving the Roothaan -Hall equations first involves a n orthogonalization step. Secondly, the two-electron integrals are an expensive part of the calculation because of the sheer number t hat must be evaluated for large systems. The number of integrals that must be evaluated scales as (N4), where N is the total number of basis functions. This is because each electron is evaluated with respect to two, three, and four nuclear

PAGE 33

33 centers. Because these integrals must be evaluated in each iteration of the SCF cycle, this procedure becomes too computationally demanding to be performed routinely on very large systems. It is important to note that in the Hartree-Fock approximation the SCF procedure evaluates the energy for one electron at a time in the mean field of all other electrons. In a system with N electrons, each time the solution is found for an individual electron it may improve the description of the mean field for the subsequent electro ns. When the solution is found for electron N, it is then incorporated into the mean field for the re -evaluation of the energy for the first electron If there is no change in the energy the SCF has converged. The use of the mean field approximation with a single determinant wave function renders Hartree-Fock theory incapable of addressing the correlated motion of electrons in the system. As one electron moves, other electrons may move to an orbital classified as virtual if it relieves the energy of confi nement. The first step to improving upon the Hartree-Fock approximation is the inclusion of electron correlation. However, these methods are far too computationally expensive to be used for the large systems of interest here. Indeed Hartree -Fock calculatio ns are currently too time -consuming to be routinely applied to these systems. In order to evaluate the NMR chemical shifts for these large systems, a semiempirical approach is employed in which a series of approximations are made to reduce the computationa l cost of the Hartree-Fock approach. The approximations are then accounted for through the use of appropriate parameters. This is discussed further in the section below. 2.5 The Semiempirical Approximations Semiempirical methods can largely be viewed as an approximation on the Hartree -Fock procedure. The philosophy of these methods is to significantly reduce the number of calculations to be performed in an effort to make an appreciable difference in the computational expense of

PAGE 34

34 the calculations. To countera ct the negative effect of ignoring important parts of the energy calculation, parameters are generated for the semiempirical Hamiltonian that are designed to enable the method to reproduce certain experimental quantities of interest. The parameterization process may implicitly account for electron correlation; therefore, these methods are not strictly an approximation on Hartree -Fock. The only class of semiempirical methods discussed in this dissertation is of the Neglect of Diatomic Differential Overlap19 (NDDO) approximation develo ped in the Pople laboratory in the 1960s and modified in Dewar laboratory in the late 1970s and 1980s. In particular, three modified versions of the NDDO approximation are discussed. These include the Modefied Neglect of Diatomic Overlap62 (MNDO), Austin Model 117 (AM1), and Parametric Model 363 (PM3). AM1 and PM3 can largely be viewed as re parameterizations of MNDO. MNDO will be the reference point for this discussion. In the MNDO procedure, one significant approximation made may be summari zed with respect to the Fock Matrix as: F"= H"+ P#$ # $%A"B $C#D( )& 1 2 A$B #C"D( )' ( ) + (2-28) A"B#C$D ( )=%AB%CDA"B#C$D ( ) (2-29) where A and B indicate the atom center of the orbitals (which may be the same) and #ij is the Krnecker delta function: "AB= 0 for A # B 1 for A = B $ % & (2-30) Thus, all three and four -center integrals are ignored. This approximation greatly reduces the number of unique integrals to be evaluated in the MNDO Hamiltonians relative to the ab initio procedures and enables QM calculations to be performed on systems of much larger sizes. It is

PAGE 35

35 important to note that the overlap matrix (S) is assumed to be equal to the identity matrix (I) and there is no longer a need for a transformation in the SCF procedure. This eliminates several steps from the SCF procedure. The MNDO procedure treats only valence electrons in what is referred to as the core approximation. The nuclei and the electrons on the inner shells are summed into a fixed core potential that is parameterized for each atom. The valence electrons are then treated using Slater type orbitals to represent s-orbitals and p-orbitals. This method was not initially designed for the handling of d -orbitals, although extensions of the method were made to account for them subsequently.6 4 69 The core Hamiltonian matrix elements are expressed as: H"= 1 2 S "#+#"$ % & ( ) A, "* B U "+ ", ZB"sBsB ( )B A. ," A / 0 1 2 1 (2-31) where A and B indicate different atom centers. The term sB represents the valence s -orbital on atom B (see table 2-2). The U % term (which will always be U ) represents the one-center oneelectron energies the kinetic energy of the electron in orbital A, and the attractive force that an electron feels to nucleus A. It is a parameter optimized to reproduce spectroscopic data. The attraction of the electron to other nuclei is represented by subtraction of the repulsion integral from U The ",% terms represent the two -center core resonance integrals and contribute significantly to the bonding energies. The overlap matrix S % is used to evaluate the core Hamiltonian. However, it is not evaluated in the SCF procedure. The one-center two-electron repulsion integrals are represented by parameters. The parameters are determined from experimental results rather than from optimization during the parameterization method for MNDO and AM1. Conversely, they were optimized in the PM3 Hamiltonian.

PAGE 36

36 The two-center two-electron repulsion integrals are evaluated using a multipole expansion to describe the charge distributions. Examples of these multipole moments are given in Table 2-2. The point charge configurations for the monopole, dipole and quadrupole moments are illustrated in Figure 2 -1. The D1 and D2 terms (illustrated in Figure 2 -1) are functions of the other parameters, (s and (p respectively. The ( parameter is the optimiz ed Slater orbital exponent that is atom specific. Table 2-2. Point charge distributions for s,p basis Charge Distribution Point Charge Configuration ss Monopole (q) sp Dipole ( ) pp Monopole + Linear Quadrupole ( +Q ) pp Square Quadrupole (Q ) *Point charge configurations are illustrated in Figure 2 -1 The integrals are then calculated as: A"A#B$B ( )= qiqjrij 2+%i A+%j B( )2 j & B'i & A' (2-32) The .i,j terms are functions of other parameters, including D1 and D2 (shown in Figure 2 -1), and thus also depend on (. The D1, D2 and terms are referred to as derived or dependent parameters because of their dependence upon the optimized parameters. In the present set of works, all of the derived parameters, including several that are not discussed here, were recalculated f or each set of optimized parameters evaluated.

PAGE 37

37 Figure 2-1. Point charge configurations corresponding to monopole (q), dipole ( !), linear quadropole (Q! !) and square quadropole (Q! ") moments. The last term needed in order to calculate the MNDO energy o f the system is the core repulsion. This term is evaluated by: Ecore core MNDO= ZAZBsAsAsBsB ( )1 + e"#ARAB+ e"#BRAB( )B > A$A$ (2-33) The optimized parameter here is different from (, which represents the orbital exponents in the basis set. Both AM1 and PM3 differ from MNDO in that they add a c orrective term to the core repulsion in order to address an underestimation of this effect in the MNDO procedure that rendered it incapable of accounting for hydrogen -bonds. The core-repulsion term is evaluated as:

PAGE 38

38 Ecore core AM 1 / PM 3= Ecore core MNDO+ ZAZBRAB akAe" bkARAB" ckA( )2k#+ akBe" bkBRAB" ckB( )2k#$ % & & ( ) ) (2-34) Because of the addition of these Gaussian terms, three additional atomic parameters ak, bk, and ck are present in the AM1 and PM3 Hamiltonians. By optimizing more parameters the AM1 and PM3 Hamiltonians were able to search a larger area of parametric space and conseq uently improve upon the ability to predict various molecular properties relative to MNDO. One of the more important improvements was in their ability to account for the energetics in Hydrogen bonding. PM3 can largely be viewed as a re -parameterization of A M1, the main differences being that PM3 uses more optimized parameters and there are relatively small differences in their performance for first and second row elements. Because the evaluation of many important integrals in the Hartree -Fock approximation are omitted, the semiempirical methods rely heavily on the quality of their parameters to supplement their considerable number of approximations. However, the use of the minimal basis set gives these methods very limited flexibility. As a consequence, para meters that work very well for one set of properties on one type of systems may not be suited for a different set of properties, or even the same property for a different type of system. For this reason it is sometimes necessary to generate new parameters to improve the performance of these methods for the particular property and system of interest. 2.6 Semiempirical NMR Calculations in the FPT -GIAO Approach The chemical shift is measured as a difference between the shielding constant and a chosen reference value. The shielding constant for a given nucleus is defined as the second partial derivative of the energy with respect to the externally applied magnetic field (B) and the nuclear magnetic moment ( ).70 As discussed previously, a problem arises because the use of

PAGE 39

39 approximate wave functions to de scribe the atomic orbitals does not guarantee gauge invariance. This problem can be accounted for through the use of high angular momentum basis sets, and is eliminated in the limit of a complete basis set.30 Because semiempirical methods use a minimal basis set it is necessary to specifically account for this issue. We therefore use Gauge Incl uding Atomic Orbitals (GIAOs) in which the origin of the gauge associated with the vector potential of is placed on the individual atomic orbital as:42 "= exp # i 2c B $ R A( ) r % & ( ) "0 (2-35) where, for a given nucleus, A, "0 is the field independent atomic orbital centered at RA, is now the field dependent atomic or bital, and c is the speed of light. The chemical shielding constant can be expressed in terms of the semiempirical energy as: "ab=#2ESemiempirical#! B #! B = 0, = 0 = P$ $%H$ab& P$aH $b$% (2-36) where the letters a and b indicate the dependence on the operators and respectively. The subscript (0) denotes that the vectors are evaluated at the nucleus. Using the definition of GIAOs, the molecular orbitals are given as linear combinations of complex atomic orbitals and coefficients as: "i= Ci# $ (2-37) The complete density mat rix elements are then defined as: P"= Ci *C"i i occ# = Ci rC"i r+ Ci iC"i i( )i occ#+ i Ci rC"i i$ Ci iC"i r( )i occ# = P"R+ iP "I (2-38)

PAGE 40

40 Therefore, the density matrix using GIAOs is complex. The superscripts r and i refer to real and imaginary components of C % and the capital letters serve the same purpose for P %. As discussed previously, even in the presence of a strong magnetic field, NMR uses relatively low energy radiation; it is therefore reasonable to assume that the effect of the externally applied field on the overall electron density of a molecule is very small. In the Finite Perturbation Theory (FPT) approach, the assumption is made that the effect of the magnetic field on the overall density matrix is negligible. Relative to the contribution of Ci iC"i i is considered to be minor and can therefore be neglected. This re moves any dependence of PR on the magnetic field. The unperturbed density matrix in Equation 2 -36 can now be approximated by:11 P"# P"R (2-39) The imaginary component of the complex density matrix is then viewed as the only portion of the density matrix that exhibits a perturbation resulting from the magnetic field. That is to say that the perturbed density matrix in Equation 2 -36 can then be approximated as the imaginary component of the complex density matrix:11 P"a=#P"#i B $ % & ( ) 0* iP"IBa (2-40) The Fock matrix elements for two orbitals on the same atom center that result from t hese approximations are shown below: F"= H"+ G"= H"R+ G"R( )+ i H"I+ G"I( ) (2-41) H"R= U#"$ ZB"sBsB ( )B % A& (2-42) (2-43)

PAGE 41

41 (2-44) G"I= # 1 2 P$ %I$ %" ( )$,%& A' (2-45) where, L R= r # (2-46) Here, G represents the two -electron integrals and H represents the one -electron integrals. As stated before, ," $ A and %,& $ B For orbitals on different atom centers, the Fock matrix elements are evaluated as shown below: F"= H"+ G"= H"R+ G"R( )+ i H"I+ G"I( ) (2-47) H"R= 1 2 S"#+#"( ) (2-48) H"I= 1 4 c B #+#"( )! R $ R "( )% r $ R "% $ 1 2 % L R"%" + 1 2 %" L R% & ( ) ( + ( ( (2-49) G"R= # P$%R$ "% ( )%& B'$& A' (2-50) G"I= P#$I # $ ( )$% B&#% A& (2-51) By contrasting these equations with those of the standard semiempirical equations, it is established that the real components of the Hamiltonian are unchanged. The presence of the B term in all of the imaginary components indicates that these integrals are all defined relative to the external field. Furthermore, the real components of the two center integrals are evaluated using the real components of the density matrix. The same is true for the imaginary components

PAGE 42

42 of the two center integrals. Finally, there is no Coulombic interaction in the imaginary component of the two -electron one -center integrals. This series of approximations enables the diagonalization of the complex density matrix to obtain the eigenvalues for both the perturbed and unperturbed density matrices in a single step. The divide -and-conquer strategy is used for the diagonalization. The calculations can therefore be performed far more quickly and consequently can be applied to much larger systems. 2.7 Divide and Conquer Semiempirical approximations significantly increase th e speed of electronic structure calculations relative to ab initio methods by their handling of the two -electron integrals among other approximations. However, the improvements in speed gained by the semiempirical approximations alone do not suffice to sup port the routine application of these methods to very large systems (systems containing several thousands of atoms). This is because the cost of the matrix diagonalization is still high, increasing as O(N3), where N is the number of basis functions. Wherea s the bottleneck in ab initio methods is the evaluation of the multi -center integrals, the bottleneck in the semiempirical approach is shifted to the diagonalization of the matrices. Using the divide-and-conquer strategy has been shown to significantly increase the speed of semiempirical QM calculations enabling the routine QM study of large protein structures.7175 This strategy works by dividing the molecular orbital density matrix into subsystems, so that the corresponding Fock matrices require less time in the diagonalization step. This avoids the expensive diagonalization of global matrices and results in calculations that scale linearly, O(N), with the number of subsystems. This method is particularly amenable to protein systems because the subsystems are easily defined by residues. The divide-and-conquer strategy has been successful in the study of protein

PAGE 43

43 systems because many of the properties that are of interest are largely local effects.76 This highlights the importance of using appropriate subsystems sizes when implementing this scheme. This issue has previously been investigated specifically for NMR calculations.77 For the protein systems examined in these studies the subsystems were 4.0 surrounded by two 2.0 buffer regions. This scheme was implemented for both the geometry optimizations and NMR calculations. These calculations, and all semiempirical calculations presented here, were performed using the QM package that was previously developed in ou r laboratory, DivCon.78 2.8 Geometry Optimization The selection of the molecular geometry has a significant influence on the QM calculations. It is always important to ensure that the geometry used in a QM calculation is free of significant structural flaws before evaluating the property of interest. Experimental models are often used as the initial structure. Some of th e subtle problems that may be found in some experimental models include instances in which certain bond lengths or angles are inaccurately modeled Another subtle, but important, problem is an instance in which non -bonded atoms are too close together, ther eby forming bad van der Waals clashes. Geometry optimization serves as a consistent method of relieving such unfavorable interactions in the initial configuration of a system prior to performing other QM calculations. This entails minimizing the energy of a potential function with respect to the molecular coordinates. In the current set of studies, the molecular coordinates are changed to attain a lower energy as judged by the specified Hamiltonians operators. Optimizing the geometry to a local minimum on the potential energy surface is desirable because these minima are generally representative of more stable and more highly populated states. Additionally many properties of interest are dependent upon the energy; unfavorable interactions that spuriously raise the energy may contaminate the evaluation of the desired

PAGE 44

44 property.60 For example, QM NMR calculations are particularly sensitive to the bond lengths and angles, so it is of interest to perform geometry optimizations at an appropriate level of theory prior to performing the NMR calculations. The two geometry optimization routines used in these studies are: steepest descent and conjugate gradient. Both routine s use the first derivative of the energy with respect to the molecular coordinates in order to determine the direction of the search. They are therefore referred to as first order minimization methods. The steepest descent routine calculates the gradient of the current point along the potential energy surface and uses a line search method to determine the necessary step size. The line search method chooses three points along the potential energy surface such that the middle point is lower in energy than the two outer points. The points are then fitted to a polynomial function to determine their likely minimum. This strategy is used to determine the distance that should be covered along the gradient (the step size). This is a very effective strategy for movin g from a crude conformation to one closer to the local minimum in a short amount of time. However, because the gradient is only evaluated for the current state the method is slow to converge. The other minimization routine used in these studies is conjuga te gradient. The conjugate gradient routine is very similar to steepest descent in that they are both first order routines and they both employ a line search method to determine the step size along the gradient. By computing the gradient at the current loc ation and comparing it to that of the previous location conjugate gradient is able to converge to a local minimum more quickly. However, this procedure results in the conjugate gradient routine sometimes requiring a longer time to find the local minimum wh en starting from a poor structure. A common strategy is to use the steepest descent routine for a few cycles to minimize a crude structure before using a more quickly

PAGE 45

45 convergent routine. This strategy of using a combination of two minimization routines was employed in the current works. 2.9 Semiempirical Parameter Optimization Using a Genetic Algorithm Genetic algorithms are methods that seek to optimize a set of parameters computationally in a manner modeled after Darwins theory of natural selection. Th ese are heuristic search strategies that judge fitness based on a user defined scoring function. For this reason they are easily applied to the optimization of a wide variety parameters in various disciplines. A schematic diagram of the type of GA used in this work is given in Figure 2-2. The starting points for the parameterizations performed in this text were individual sets of parameters previously developed for more general purposes. In the initialization step of the GA, a chosen number of new parame ters (individuals) were developed by randomly mutating the initial set. The new parameters plus the initial parameter set comprised the first generation. The number of individuals was kept constant for all generations and each individual was evaluated fo r fitness by a chosen scoring function. The number of generations was chosen by the user. If the specified convergence criteria were met, the algorithm would be terminated prior to reaching the chosen number of generations. Each new generation was assembl ed by a linear combination of the reproductive processes given in Figure 2 -2. There are different advantages to each type of reproductive process. For example, crossing -over makes larger steps and enables the algorithm to discover more promising areas of t he search space. On the other hand, mutations make smaller steps, staying near the parent, and optimize within the current most promising areas. As discussed previously, the standard parameters for the semiempirical methods were generated to reproduce a series of experimental quantities including heats of formation, electron affinities, molecular geometries and atomic energies. However, they were not optimized for a

PAGE 46

46 high level of performance in NMR calculations. As a result, it is necessary to search over a large area of parameter space in order to find parameters that would yield a more accurate description of NMR chemical shifts. Some of the parameters in the initial semiempirical parameter set are derived from others and have other features of interdep endence. This results in a poorly defined parameter space with multiple minima. Minimizing the chosen goal functions to optimize new parameters via routine optimization schemes such as BFGS or conjugate gradient, may keep the new parameters within a local minimum. Finding the global minimum was not our specific goal, as that may have involved straying too far from parameters with any physical meaning; however, a genetic algorithm was used because it had the potential to find parameters covering a broader ra nge of parameter space. Therefore the genetic algorithm is well suited for the particular applications used here. Furthermore, genetic algorithms have been successfully used on several occasions to generate parameters for semiempirical methods.7981

PAGE 47

47 Figure 2-2. Schematic diagram of a genetic algorithm.

PAGE 48

48 CHAPTER 3 SEMIEMPIRICAL PROTEIN GEOMETRIES 3.1 Introduction A critical step in quantum chemical studies is the specification of an appropriate molecular geometr y. Although experimental geometries are often adequate for certain studies, geometry optimization is frequently used to relieve unfavorable interactions in the initial configuration of a system prior to performing other quantum mechanical (QM) calculations. Optimizing the geometry to a local minimum on the potential energy surface is desired because these minima are generally representative of more stable and more highly populated states. Furthermore, many properties of interest are dependent upon the energ y; unfavorable interactions that spuriously raise the energy may contaminate the evaluation of the desired property.60 While QM methods have proven reliable in th e modeling of chemical systems, poor scaling with system size prevents their routine application for study ing larger biological systems. Currently, semiempirical methods (AM117 and PM363 in particular) are the more frequently used QM methods for geometry optimization of systems such as protein -ligand complexes, which are larger than a few hundred atoms. Protein structure and dynamics are determined largely by non -covalent interactions including hydrogen bonds, salt bridges, and charge -neutral interactions. These interactions cannot always be appropriately examined using MM methods because these methods cannot account for the relevant QM effects such as polarization and charge transfer. QM approac hes have the benefit of more accurate charge distributions and molecular shape,82 both of which are important for accurately representing the nature of non -covalent interactions Through various approximations in the formulation of the QM Hamiltonian, more computationally feasible approaches, such as semiempirical QM methods, have been developed These methods have

PAGE 49

49 been parameterized to varying degrees to reproduce experimental data in order to account for various approximations in their formulation. They are also frequently employed because they offer a favorable compromise between speed and accuracy. With the recent development of linear -scaling semiempirical methods, such a s implemented in DivCon,78 it is now feasible to apply these approaches to the study of macromolecules.72,73,83 A known limitation of semiempirical theory is that the method may not accurately predict properties for systems that were not appropriately accounted for in the parameterization .84 It is therefore necessary to evaluate the method in new systems before implementing it. Although semiempirical potential functions have been used in the study of protein systems,76,85 87 AM1 and PM3 have not yet been rigorously evaluated for that purpose. Two recent additions to the semiempirical NDDO methods are RM188 and PM6.89,90 PM6 has recently been critically evaluated with respect to its applicability for the study of protein systems.90 The present study is significantly different in that the effect of solvation is examined and a closer examination of the interactions of charged side chains is carried out. However, because PM6 has already been closely evaluated with respect for its use in the study of protein systems there would be some overlap and it was not necessary to include PM6 in this study. Because AM1 and PM3 are the most widely used of the semiempirical methods RM1 was al so not included. Furthermore, limiting the number of methods examined was necessary for clarity. The AM1 and PM3 parameters were based largely on the values observed in small organic molecules.17,63 For the prediction of a variety of properties in these types of systems, the semiempirical methods have been thoroughly evaluated.9196 However, proteins are large polymers made of a small set of functional groups, therefore neither the extensibi lity of the Hamiltonian nor the ability to reproduce small molecule data is necessarily relevant to these

PAGE 50

50 systems, as it is more important to correctly capture the behavior of the few functional groups that are present in the protein systems. The degree to which known flaws observed in small systems would manifest themselves in much larger and more complex systems is not entirely obvious a priori. We therefore evaluate the Hamiltonians here with respect to their ability to reproduce various qualities of pro tein structure a target area for research in our lab. These qualities include but are not limited to, bond lengths, angles, torsions, planarity of certain side chains, and van der Waals interactions 3.2 Methods 3.2.1 Experimental Data In order to exam ine the effects of semiempirical minimizations on protein systems, 32 representative small proteins from the protein databank were studied. The structures chosen were unbound, globular, crystal structures and are listed in Table 3 -1. It was important that each amino acid was well represented in the data set; the frequency of each amino acid for the data set is given in Table 3 -2. It was also critical to select proteins that were well -structured and sufficiently large as to exhibit a variety of secondary and tertiary characteristics. The proteins ranged in size from 54 to 99 amino acid residues and exhibited a fairly representative set of topological features, in cluding a mixture of # -helices, $ -sheets, and random coils. The resolution of the structures ranged from 1.5 to 2.5 and the structures were visually inspected to ensure no significant structural flaws were present. Proteins that ranged in charge (from -9 to +7e) were chosen in order to facilitate investigation of the effect of performing the geometry optimizations in solvent.

PAGE 51

51 Table 3-1. Protein data set PDB ID Description Resolution () N res Charge 1A80 HIV Capsid C Terminal Domain 1.70 70 1 1AIL N Ter fragment of Ns1 protein 1.90 70 2 1B0X Epha4 Receptor Tyrosine Kinase 2.00 72 0 1BCG Scorpion Toxin Bixtr It 2.10 74 0 1BMG Bovine Beta 2 microglobulin 2.50 98 1 1CEI Colicin E7 Immunity Protein 1.80 85 9 1CM3 H15D Hpr 1.60 85 3 1CQY Starch Bi nding Domain of Bacillus beta amylase 1.95 99 2 1CSP Major cold shock protein 2.50 67 6 1DSL Beta Crystallin (C ter) 1.55 88 0 1EM7 Helix variant of B1 domain from Strep Protein G 2.00 56 3 1ENH Engrailed Homeodomain 2.10 54 7 1F0M Ephrin Type B rec eptor 2.20 71 1 1FAS Fasciculin 1 (toxin) 1.80 61 4 1FNA Fibronectin Cell Adhesion Module 1.80 91 0 1H75 Glutaredoxin like protein Nrdh 1.70 76 1 1HOE Alpha amylase inhibitor Hoe 467A 2.00 74 5 1HPT Human Pancreatic Secretory Trypsin Inhibitor 2.30 56 1 1HYP Hydrophobic protein from Soybean 1.80 75 1 1KW4 Polyhomeotic Sam Domain Structure 1.75 70 2 1LPL Hypothetical 25.4 Kda Protein 1.77 95 3 1MJC Major Cold Shock Protein 2.00 69 1 1MWP Amyloid A4 Protein 1.80 96 1 1OPS Type III Antifreeze Protein 2.00 64 2 1ORC Cro Repressor Insertion Mutant 1.54 64 3 1PWT Alpha Spectrin SH3 1.77 61 1 1R69 Phage 434 Repressor (N ter) 2.00 63 4 1SN1 Neurotoxin Bmk M1 1.70 64 2 1UBI Ubiquitin 1.80 76 0 2CRO 434 Cro protein 2.35 65 6 1WHO Allergen Phl P 2 1.90 94 7 2OVO Ovomucoid Third Domain 1.50 56 0

PAGE 52

52 Table 3-2. Distribution of amino acids in protein data set. Amino Acid Frequency Percentage ALA 149 6.32 ARG 115 4.87 ASN 107 4.54 ASP 132 5.60 CYS 66 2.80 GLN 113 4.79 GLU 156 6.61 GLY 179 7. 59 HIS 37 1.57 ILE 121 5.13 LEU 172 7.29 LYS 166 7.04 MET 58 2.46 PHE 80 3.39 PRO 107 4.54 SER 138 5.85 THR 163 6.91 TRP 34 1.44 TYR 88 3.73 VAL 178 7.55 3.2.2 Data Set Preparation The crystal structures were obt ained from the protein databan k, and all crystallographic waters were removed. The protonation states of the amino acid residues were assigned assuming a pH of 7, and histidine was not protonated on the NE2 atom. Using the LEaP module of AMBER (AMBER 8 .0),9799 protons were added to all structures. Next, a restrained minimization was carried out using AMBER to minimize the protons while keeping the coordinates of the heavy atoms constant. These structures were designated as H -minimized structur es, crystal structure coordinates of heavy atoms with optimized hydrogen atoms. All of the geometry optimizations using AMBER were carried out with the parm94 parameter set Following the initial structure preparation, geometry optimizations were performed in vacuo at the AM1 and PM3 levels. In order to investigate the importance of solvation during the

PAGE 53

53 geometry optimizations, the proteins were also minimized in implicit solvent using a Poisson Boltzmann solvent model and the PM3 Hamiltonian. All optimizati ons used the steepest descent routine for 50 steps, followed by conjugate gradient, until certain convergence criteria were met. These criteria were maximum changes in the energy, gradients, and coordinates of 0.1 Kcal/mol, 1.5 Kcal/mol/, and 0.001, respectively. The structures were then carefully inspected to ensure that no significant errors occurred during the geometry optimization. In addition to comparisons with the H -minimized structures, the results from the semiempirical minimizations are compare d to those of an all-atom minimization performed for the H-minimized structures using AMBER. These minimizations were carried out by first performing a restrained minimization for 1,000 steps (with a restraint of 2.0 kcal/mol2), followed by 1,500 steps o f conjugate gradient using an unconstrained minimization and the Generalized -Born implicit solvent model with the AMBER/GBSA module. These structures were designated the AMBER-minimized structures. With few exceptions, statistics were gathered using a n in house package of Perl scripts that are specifically being developed for the evaluation of protein systems. When possible, parameters were set to be consistent with those previously established in the literature. For example, Ramachandran evaluations w ere done in 10 degree increments as implemented by Morris et al.100 The Procheck101 and VADAR102 programs were also used to assist in identifying structural flaws. 3.3 Results and Discussion 3.3.1 In Vacuo Optimization Artifacts The use of a solvation model during the geometry optimization resulted in several noteworthy advantages relative to vacuum optimizations. In general, in the absence of solvent the semiempirical methods could not mask the charge -charge interactions, and exhibited an

PAGE 54

54 increase in the instances of salt -bridge formation. To form the salt-bridges, the amino acid side chains moved closer together causing much more distorted motions i n the vacuum optimizations than the solvated counterparts; this resulted in an the overall RMSD between the minimized structure and the crystal structure being much higher for the vacuum minimizations than for the solvent minimizations (see Figure 3 -1). Figure 3-1. Average heavy atom RMSD (no hydrogen atoms included) for minimized structures relative to crystal structures. The first significant artifact observed that was a direct result of carrying out the geometry optimizations in vacuum was the transfer of a proton from the positively charged residues to the negatively charged neighbors. This is illustrated in Figure 3 -2, where the lysine residue loses a proton to a carboxyl group of the neighboring glutamate residue. A more troubling example of this was observed in one instance where the carboxylic acid oxygen of glutamic acid removed the proton from the C! of the same residue, forming a stable 6 -membered ring structure. This is illustrated in Figure 3 -3. These are unphysical results that arise becaus e the pKa of the amino

PAGE 55

55 acid residues cannot be accounted for in vacuum. Proton transfer was not observed in the minimizations run in solvent. Figure 3-2. Example of proton transfer occurring between charged residues. This instance was observed in the 1B MG vacuum minimized structure using the PM3 Hamiltonian. Figure 3-3. A more unphysical example of p roton transfer in which the C! from the amino acid backbone loses the hydrogen atom to the carboxylate group of the same amino acid.

PAGE 56

56 A similar unphysic al result was seen in several cases in which spurious bonds were formed between neighboring residues. The first instance of this is illustrated in Figure 3 -4. Rather than transferring a proton to neutralize the charges, a glutamic acid oxygen formed a bond with the carbon of the guanidyl group of a neighboring arginine residue. Another unphysical bond was formed in which two neighboring disulfide groups approach each other too closely, forming an additional bond between sulfur atoms. This is illustrated in Figure 3-5. The distance between the two adjacent sulfur atoms decreases from 3.22 to 2.00. When this minimization was carried out in solvent, the distance still decreased, but only to 3.02. The difference is likely because parts of the residues were sol vent exposed and therefore the interaction of the two sulfur atoms was not as strong. The training set used in the parameterization of PM3 does contain compounds with multiple sulfur bonds. Figure 3-4. Chemical bond formation in 1HOE vacuum minimized structure using the PM3 Hamiltonian. Contributing factors include the lack of solvation and the puckered conformation of the arginine side -chain.

PAGE 57

57 Other artifacts that arose as a consequence of performing the geometry optimizations in vacuum will be discussed in later subsections. As would be expected, performing the minimizations in vacuum affected neither the bond lengths nor angles for the structures studied here. Therefore, no comparisons of the solvent versus vacuum minimizations are made in the following two subsections. Figure 3-5. A spurious sulfur bond formed between two disulphide bridges in a vacuum minimized PM3 structure. 3.3.2 Effects on Bond Lengths There was generally good agreement among all of the methods with respect to the optimum bond lengths. Some common bond lengths for bonds involving carbon and hydrogen are given in Figures 3-6 and 3-7, respectively. In order to provide a more direct comparison to the AMBER minimized proteins, bond lengths are classified using the atom types used in parm94 and listed in Table 3 -3. This is also done in the comparison of the bond angles. AMBERs parm94 force field contains optimized values for the bond lengths and angles for

PAGE 58

58 protein systems and therefore serve as a useful reference for the comparisons made in Figures 3 6, 3-7, and 3-8. Figure 3-6. Average bond lengths in protein structures for some common bonds involving heavy atoms. In order: Blue AMBER minimized structures; Red -AM1 minimized structures; Green PM3vacuum minimized structures; PurpleH-minimized crystal structures. Bonds are labeled using the parm94 atom types found in Table 3 -3.

PAGE 59

59 Figure 3-7. Average bond lengths observed for common bonds involving hydrogen. In order: BlueAMBER minimized structures; Red -AM1 minimized struct ures; GreenPM3vacuum minimized structures. Bonds are labeled using the parm94 atom types found in Table 3 -3. Figure 3-8. Average bond angles observed for common bonds. In order: Blue AMBER minimized structures; Red -AM1 minimized structures; Green PM3vacuum minimized structures; Purple H-minimized crystal structures (Hydrogen atoms were added to the crystal structures using AMBER, therefore angles involving hydrogen are not shown for the crystal structure). Angles are labeled using the parm94 at om types found in Table 3 -3.

PAGE 60

60 The average lengths observed in the crystal structures were consistent with those given in the AMBER parm94 force field. Select heavy -atom bond lengths are compared in Figure 3 -6. The average differences between semiempirical b ond lengths and those of the crystal structure were quite small, but the largest difference was 0.06. The larger deviations in bond lengths were observed in bonds involving polar atoms. One notable instance in which the semiempirical methods tested did not yield lengths consistent with the crystal structure was for the carbon nitrogen bond lengths predicted by PM3. PM3 has previously been shown to overestimate the lengths of carbon-nitrogen bonds for small molecules.84 These correspond to the critical peptide bonds as well as bonds in the side -chains of glutamine and asparagine. These bonds were overestimated by PM3 by an average of 0.04 -0.06. AM1 did not exhibit the same problem, but did consistently overestimate the carbon -oxygen bond length for the carboxyl oxygen by an average of 0.05. Both of these discrepancies involve integral parts of the description of the amino acid main chains.

PAGE 61

61 Table 3 -3. AMBER atom types used in parm94 paramete r set Atom T ype Description H Hydrogen bonded to nitrogen HS Hydrogen bonded to sulfur HC Aliphatic hydrogen bonded to carbons with no electron withdrawing groups H1 Aliphatic hydrogen bonded to carbons with one electron withdrawing group H2 Aliphati c hydrogen bonded to carbons with two electron withdrawing groups H3 Aliphatic hydrogen bonded to carbons with three electron withdrawing groups HP Hydrogen bonded to carbon next to positively charged group HA Hydrogen bonded to aromatic carbon with no electron withdrawing groups H4 Hydrogen bonded to aromatic carbon with one electron withdrawing group H5 Hydrogen bonded to aromatic carbon with two electron withdrawing groups O Carbonyl group oxygen O2 Carboxyl group oxygen OH Oxygen in hydroxyl gro up CT SP3 aliphatic carbon CA Aromatic carbon C Carbon in carbonyl group CR SP2 aromatic carbon in histidine N Nitrogen in amide group N2 SP2 nitrogen in charged groups N3 SP3 nitrogen in charged groups NA SP2 nitrogen in 5 membered rings S Sulfur in disulfide linkage and thioethers SH Sulfur in cysteine Figure 3-7 illustrates the average bond lengths for select bonds involving hydrogen. When compared to the parm94 values, greater deviations in the bond lengths were observed for the AM1 structures than for PM3. Comparisons were not made with the crystal structures because the hydrogen atoms were not present in the experimental structures, but were added with AMBER and subsequently optimized using the parm94 force field. In general PM3 showed good agreement with parm94, with no instance in which the average difference was greater than 0.03. The two instances in which AM1 exhibited its largest average deviation from the parm94

PAGE 62

62 reported lengths (0.04) involved hydrogen -carbon bonds in which the car bon atom was also bonded to a positively charged group or electron -withdrawing group. 3.3.3 Effects on Bond Angles Although there was generally good agreement seen when comparing the bond angles among all of the different models examined here, there were a few instances in which important discrepancies were observed. These instances generally involved polar atoms, and are illustrated in Figure 3-8. The fist three angles examined in Figure 3-8 were those with the largest discrepancies observed in heavy -atom bond angles. The first angle corresponds to the angle of the peptide bond and does indicate that there is a slight overestimation of the angle by both semiempirical methods when compared to the consistent values observed in the crystal structure and the parm94 parameter. AM1 and PM3 predict angles 1.4 and 1.9 larger, respectively, than parm94. The second instance showed a large discrepancy between the parm94 value and that observed in all of the structures. This corresponds to the angles of histidines nitrogen atom in the imidazole ring. The parm94 data reports this angle as 120, but all of the structures have angles closer to ~108. This is an apparent error in the parm94 parameter because 5 -membered rings with all equivalent angles should be 108. Th e third discrepancy was observed in the oxygen-carbon-nitrogen angle of the peptide bond. PM3 showed the largest deviation, exhibiting an angle 4.2 smaller than that observed in the crystal structures. As with the bond lengths, comparisons for angles inv olving hydrogen atoms were not made with the crystal structures because the hydrogen atoms were added with AMBER and subsequently optimized using the parm94 force field. When compared to the parm94 values, both semiempirical methods underestimate the size of the the C -O-H angle of carboxylic acids, the H-N-H angle of amine groups and the H-C-H angle of aliphatics that are adjacent to charged groups. In all instances AM1 more closely matched the parm94 data. Lastly, the C-N-H angle of

PAGE 63

63 imidazole ring of the histidine residues was predicted to be 6.2 and 4.8 higher in the AM1 and PM3 proteins than the angle listed in the parm94 parameters. 3.3.4 Effects on Torsional Angles Unlike the small molecules on which AM1 and PM3 were parameterized, torsional angles are particularly important for protein systems because of their influence upon the secondary structure of polypeptides. The formation of !-helices and "-strands is critical to the protein folding process, and maintaining appropriate torsional angles enables the secondary structural motifs to develop. Additionally, certain torsional angles are more frequently observed experimentally than others .103 The torsional angles investigated here are of two different kinds. The first are the backbone ( /,)) torsional angles tha t give rise to the Ramachandran data. The second are improper torsional angles that are indicative of the degree to which certain planar amino acid side chains maintain their planarity. Studies have previously shown that PM3 fails to predict planarity of t he peptide group because of a preference for the pyramidal configuration of the nitrogen atom.104 For this reason, the semiempirical calculations were run using a molecular mechanics correction term (ADDMM in the DivCon program) that places an energy penalty on rotation about the peptide group. This correction is of the form k( sin20) and is added to the gradients for rotation about the peptide group. This term did not affect the improper torsions of the amino acid side ch ains. A summary of the Ramachandran data for all of the proteins evaluated is given in Table 3-4. There were very few instances in which any method yielded torsional angles in disallowed regions of the Ramachandran plots. The crystal structures fit the Ra machandran data the best, with the largest percentage of torsions in the core region and the smallest percentage of torsions

PAGE 64

64 in all other regions. This should be expected because the experimental structures are generally generated with some care taken to e nsure that deviations from the Ramachandran predictions are minimized. The semiempirical minimizations run in vacuum resulted in structures with the largest deviations from the Ramachandran regions. This is even more evident in structures generated at the AM1 level than at the PM3 level. The vacuum results for PM3 were similar to the AMBER minimized results. The use of solvent in the PM3 minimizations did make a noteworthy difference in the ability of this method to maintain the proper torsional angles. Thi s is probably due to the fact that the vacuum minimizations experienced more significant structural changes in general as the methods sought to neutralize the charged side chains. Therefore, in vacuum minimizations, compensating for the energy penalty of h aving charged side groups was more important than maintaining other structural features. Another important set of torsional angles is the set of improper torsional angles that help to determine whether certain side chain moieties are kept in the proper p lanar conformation. Figure 3-9 illustrates the percentage of residues in the protein data set that exhibited out -of-plane bending. Proteins minimized using PM3 suffered the most from out -of-plane bending of the planar side chains. The residues most signifi cantly impacted were arginine, tryptophan, and tyrosine. However, the source of the puckering effect in arginine residues appears to have been different from that of all other residues. The training set used in the parameterization of PM3 contains no compo und with a moiety that resembles the guanidyl group of the arginine side chain. PM3 shows a significant preference for the pyramidal configuration for all instances of sp2 hybridized nitrogen atoms as illustrated in Figure 3 -4. However, the conjugated pi s ystem stabilizes the planar conformation. This shortcoming in the PM3 Hamiltonian results in arginine residues that stray from planarity by as much as 9.07. AM1 also suffers from an inability to

PAGE 65

65 accurately model this functional group, though to a far less er extent than does PM3. Tryptophan residues were also affected by this preference for pyramidal nitrogen conformations, though to a lesser extent because the aromatic ring systems appeared to stabilize the planar conformation. The tyrosine residues appea r to suffer from a different problem that results in out ofplane geometries. The polar interaction of the oxygen atom with neighboring hydrogen atoms appears to distort the planarity of the side chain. This is illustrated in Figure 3 -10, where the oxygen of tyrosine attempts to form a hydrogen bond with the adjacent lysine residue. The use of solvent did lower the percentage of residues that were out -of-plane for the PM3 minimizations. Solvation had more of an effect on improving the number of incidences in which the tyrosine residues were out of plane than other residues. This is because the source of the arginine and tryptophan residues was in part a preference of the Hamiltonian to have a pyramidal nitrogen atom, whereas the source of the tyrosine problem was largely a desire to hydrogen bond with neighboring polar atoms that was quenched by the use of solvent. Table 3 -4. Percentage of torsional angles found in regions of the Ramachandran plot Region Structure Core Allowed Generous Disallowed Crysta l 92.3 7.3 0.3 0.1 Amber minimized 86.6 12.7 0.4 0.1 AM1 minimized 79.7 18.2 1.7 0.3 PM3 minimized 86.5 12.4 0.9 0.2 PM3 minimized solvent 89.1 7.8 0.3 0.1 Table 3 -5. Average number of hydrogen bonds in protein structures Structure H bond type Crys tal AMBER minimized AM1 minimized PM3 minimized PM3 m inimized solvent Main Main 46.4 45.8 41.6 41.5 43.5 Side chain 23.9 26.4 33.4 29.2 19.9 Total 70.3 72.2 75 71.7 63.3

PAGE 66

66 Table 3 -6. Secondary structural content of p roteins Structure Secondary structure Crystal Amber minimized AM1 minimized PM3 minimized PM3 m inimized solvent % helix 27.4 25.0 23.5 23.6 25.3 % sheet 31.3 32.5 29.3 29.2 30.4 % coil 41.2 42.5 47.2 47.2 44.2 Figure 3-9. Percentage of residues exhibiting out -of-plane side -chain co nformations. In order: BlueAMBER minimized structures; Red -AM1 minimized structures; Green PM3vacuum minimized structures; Purple H-minimized crystal structures; Turquoise PM3 minimized structures using implicit solvation (PB) model.

PAGE 67

67 Figure 3-10. Loss of planarity observed in the tyrosine residues of the vacuum minimized structures due to charge -charge interactions. 3.3.5 Effects on Atomic Charges The semiempirical calculations were performed using atomic partial charges calculated by the CM2 charge model.105 Single point calculations were run for all protein systems in order to determine whether there was a significant difference between CM2 and CM1 charges for these protein systems. Only very slight differences were observed. There was, however, significant variability among the charges calculated by the different Hamiltonians for some of the atoms. In general, the magnitude of the charges predicted by AM1 was significantly larger than that predicted by PM3. When implicit solvent was used, PM3 predicted charges that were slightly smaller in magnitude than for the non -solvated PM3 structures. Figure 3 -11 illustrates this point by depicting the magnitude of the average charges computed for lysine residues for all protein systems. The trends illustrated in Figure 3 -11 are representative of all of amino acids.

PAGE 68

68 Figure 3-11. Average lysine charges found in protein systems. In order: Blue AMBER minimized structures; Red -AM1 minimized structures; Green PM3vacuum minimized structures; Purple H-minimized crystal structures; Turquoise PM3 minimized structures using implicit solvation (PB) model. It is difficult to make a true comparison with experimental data for the charges for the protein systems examine. We therefore compare the semiempirical charges to those given in the parm94 force field. This comparison is less than ideal because MM charges may differ fr om QM charges for a variety of reasons. Molecular mechanics methods use fixed charges that cannot change to account for polarization and charge transfer and there is an integral charge for each amino acid residue. Conversely, in the semiempirical calculati ons the total charge of the system stays constant, but atoms (and consequently residues) may change their charges slightly depending upon their local environments. In order to properly compare the QM and MM

PAGE 69

69 charges, the QM charges for the residues were sca led to ensure that they sum to a whole number. This normalization is accounted for in Figure 3 -11. Figure 3-12. Typical salt bridge formation observed in when going from the crystal structure (left) to the vacuum minimized structure (right). Figure 3-13. Vacuum PM3-CM2 charges before and after minimization for side chain atoms in Figure 3-12. In general, the partial atomic charges did not experience a significant change when comparing the initial structures to the minimized structures. The largest changes were observed in the polar atoms and in particular when minimization resulted in the formation of salt -bridges. Figures 3-12 and 3 -13 illustrate a typical salt -bridge formed by the vacuum minimization and the resulting change in charge respectively In this example the carbonyl oxygen experiences a change in partial charge of 0.08e due to minimization of the protein. The difference in energy

PAGE 70

70 associated with this charge modification is on the order of 0.04 kcal/mol. While this value is still relative ly small, the cumulative effect for a larger protein could be significant 3.3.6 Effects on Non-Bonded Atomic Contacts Intermolecular forces play an important role in determining the structure of protein systems. Proteins differ from small molecules in th at the van der Waals and electrostatic forces between residues within a protein can make a more significant contribution to the total energy. As discussed earlier, the vacuum minimizations sought to neutralize the charged residues through the formation of salt -bridges, and sometimes through other courses that involved non physical interactions. In general this need to neutralize the charges resulted in an overabundance of unfavorable close contacts in the semiempirical geometry optimizations. Figure 3 -14 shows a decomposition of the energy averaged over all of the protein systems as measured by the parm94 force field. As would be expected, the AMBER minimized structures have the lowest energies by the AMBER potential and therefore serve as a benchmark by w hich to measure the other structures. It is not surprising that the bond energies are particularly high for the semiempirical methods because even small deviations from the parm94 parameters result in high energy penalties. Furthermore, as noted earlier th ere are several prominent instances in which the bond lengths for the semiempirical methods deviate from the expected values for protein systems.

PAGE 71

71 Figure 3-14. Energy decomposition using the MM force field parm94 in AMBER. In order: BlueAMBER minimize d structures; Red -AM1 minimized structures; Green PM3vacuum minimized structures; Purple H-minimized crystal structures; Turquoise PM3 minimized structures using implicit solvation (PB) model. It is significant that the van der Waals energies are hi gher for the semiempirical methods, and actually positive for PM3. In the vacuum minimizations, the desire to neutralize the charges is so strong for PM3 that it results in a significantly increased number of close contacts that AMBERs force field views a s destabilizing. Approximately half of the proteins minimized with PM3 exhibited a net van der Waals energy that was unfavorable by AMBERs force field. AM1 optimized structures did not exhibit such a high propensity for these sorts of unfavorable interact ions. The discrepancy between the AM1 and PM3 charges may offer an explanation for the large van der Waals forces observed in PM3. It appears that PM3 is less content with having large charged groups free in space and is willing to pay the energy penalty b y forming more close contacts that neutralize the charge. The problem of large repulsive van der Waals

PAGE 72

72 interactions that was seen in the vacuum minimizations for PM3 was no longer present when a solvation model was used. 3.3.7 Effects on Hydrogen-Bonds Hydrogen bonding in protein systems is generally considered to be stabilizing and is another critical area that needs to be captured in order to properly model the secondary and tertiary structures.106 It is known that the semiempirical methods are not able to accurately capture the angles of the hydrogen bonds,95 but both AM1 and PM3 have a qualitative ability to account for the energetics associated with hydrogen bonding interactions. For this study Hydrogen bonds were identified using geometric considerations, as shown in Equation 3-1, Ehb= c os2("D H A) e! ( RH A! 2. 0 )2( ) (3-1) where !D H A is the bo nd angle formed between the hydrogen bond donor, the hydrogen, and acceptor; RHA is the distance between the hydrogen and acceptor. If Ehb is evaluated to be less than -0.3, the atoms are considered to be hydrogen bonded. The average number of hydrogen bon ds for the protein systems is given in Table 3-5. The vacuum minimized AM1 and PM3 structures had the same number of main -chain hydrogen bonds, but this was slightly lower than the other protein structures. The vacuum semiempirical methods also exhibited t he highest numbers of side chain hydrogen bonds, again this is likely in an effort to quench the charges. The low numbers of main -chain hydrogen bonds translated into a decreased ability to maintain the secondary structures as illustrated in Table 3 -6. The solvated PM3 geometries maintained a higher number of main -chain hydrogen bonds and consequently yielded a higher percentage of proteins with the secondary structural characteristics shown in Table 3 -5.

PAGE 73

73 3.4 Conclusions This study highlights some of the important artifacts that arise as a result of applying the semiempirical approximations for the modeling of protein systems. Many of the systems examined contained numerous charged residues and geometry optimizations carried out in vacuum could not account for the need to neutralize the charges. This resulted in several instances of unphysical interactions including proton transfer and bond formations. By comparing the structures generated by in vacuo geometry optimization with those generated in implicit s olvent, a distinction is made in this study between the artifacts that arise due to the Hamiltonian and those that can be rectified by solvation. With a few noted exceptions, the bond lengths angles and torsions were well preserved by the semiempirical me thods compared to the crystal structures and the parameters used in the parm94 protein specific MM force field. Neither semiempirical Hamiltonian (AM1 nor PM3) consistently outperformed the other with respect to how well they were able to model the systems examined here. Because AM1 had fewer close contacts and maintained the planarity of the planar side-chains better than PM3 it may be a better option for studying protein systems. Even though PM3 did maintain the backbone torsional angles better than AM1, this did not result in a better description of the secondary structures. The geometries of the proteins minimized at the PM3 level with solvation did clearly outperform their vacuum counterpart. Both semiempirical methods appear to underestimate the barrier to out-of-plane bending for several of the planar amino acid side chains. There were also flaws in the length and angles that describe the peptide bond. Other important shortcomings are detailed in the results section. The use of a solvation model and additional molecular mechanics corrections for out -ofplane bending and to maintain the peptide bond lengths and angles appear to be the most significant ways in which to improve the applicability of these methods for the types of protein

PAGE 74

74 systems studied here. The semiempirical methods tested here were not parameterized specifically for the types of systems studied here. It is therefore a testament to the quality and extensibility of the methods that they perform as well as they do.

PAGE 75

75 CHAPTER 4 1H AND 13C NMR PARAMETERIZATION OF AM1 FOR PROTEIN SYSTEMS 4.1 Introduction The ability to predict NMR chemical shifts for protein systems (ranging in size from a few hundred atoms to many thousands) routinely, accurately, and quickly can aid in structure elucidation give insight into the binding modes of ligands in proteins, and add valuable information on dynamics of these systems. The versatility and accuracy of modern Quantum Mechanical (QM) methods make them a preferred approach to predicting chemical shifts for these systems. Ab initio and Density Functional Theory (DFT) methods perform very well in reproducing experimental chemical shifts for small molecules, but many protein systems are too large for the routine calculation of the shielding constants using these methods. Furthermore, the significant non -bonded interactions exhibited in globular protein systems makes it difficult to take advantage of many of the linear-scaling ab initio algorithms that are currently available. Here we present a semiempirical QM methodology that scales well enough to predict NMR chemical shifts for large protein systems, and is specifically parameterized for that purpose. The state-of-the-art empirical methods of NMR chemical shift prediction are not only fast enough to be used fo r high throughput applications, but they also have good accuracy for the prediction of chemical shifts of many carbon and hydrogen atoms found in protein systems.55 One drawback of empirical methods is that they generally take advantage of atom typing and cannot easily be applied to the large variety o f organic molecules that are of interest as ligands bound to proteins in biochemical and medicinal studies. Therefore, these approaches are generally not amenable to proteins with non -standard amino acids or protein -ligand complexes. It is in these non -standard systems that our approach shows its greatest promise.

PAGE 76

76 Here we employ the FPT -GIAO approach outlined in the methods section. In previous applications, the AM1107 Hamiltonian was used for geometry optimizations, and a subsequent single-point NMR calculation was performed using the MNDO Hamiltonian. This approach was chosen because AM1 has improved on some structural features that are important to biological systems. Important for proteins, is the qualitative ability of AM1 to account for the energetics associated with hydrogen bonding interactions that are critical. An improved descripti on of hydrogen bonding makes AM1 a more appropriate Hamiltonian than MNDO for geometry optimizations of protein systems. It would be more consistent to arrive at the geometry and perform the subsequent chemical shift calculations at the same level of theor y. Therefore, AM1 was chosen as the Hamiltonian for both geometry optimization and the NMR calculation. Furthermore, NMR -specific MNDO parameters have already been developed: thus, in order to make the distinction between these protein -specific NMR parameters and the more general NMR parameters, a different Hamiltonian was chosen for this parameterization. The new NMRspecific AM1 parameters are given in Table 4 -1.

PAGE 77

77 Table 4 -1. Comparison of s tandard versus NMR-optimized AM1 parameter s Atom Parameter AM1 AM1 NMR AM1 NMR H AM1 NMR C H ( s (a.u) 1.1880780 1.1503664 1.15994203 1.14478000 $ s (eV) 11.3964270 14.96943835 14.69571729 15.15458000 C ( s (a.u) 1.8086650 1.76509177 1.68883970 1.68679400 ( p (a.u) 1.6851160 1.63588167 1.62877282 1.65943900 $ s (eV) 15.7157830 18.89678664 17.95998381 17.66492000 $ p (eV) 7.7192830 11.94 65102 14.64670379 12.08854000 N ( s (a.u) 2.3154100 2.3445263 2.16847239 2.12992700 ( p (a.u) 2.1579400 2.0516015 2.06083458 2.11709500 $ s (eV) 20.2991100 26.05351514 27.62130377 29.62571000 $ p (eV) 18.2386660 18.66264348 12.20412355 14.5615 1000 O ( s (a.u) 3.1080320 3.10365447 3.63641586 3.40861800 ( p (a.u) 2.5240390 2.54342588 2.35923609 2.39947230 $ s (eV) 29.2727730 29.05642758 28.10519165 26.94522300 $ p (eV) 29.2727730 30.23181062 29.27050269 29.68913700 S ( s (a.u) 2.366515 0 2.3665150 2.45028965 2.51893987 ( p (a.u) 1.6672630 1.6672630 1.60125168 1.66726300 $ s (eV) 3.9205660 3.9205660 4.26878731 13.57459970 $ p (eV) 7.9052780 7.9052780 10.11576964 8.83340467 4.2 Methods 4.2.1 Parameterization Reparameterization of semiempirical QM methods in previous studies has been shown to significantly improve the agreement between experimental and calculated NMR chemical shifts for 1H, 13C, 15N, 17O and 19F nuclei.11,12,96 The NMR -specific parameters for the MNDO approximation and a detailed explanation of the choice of parameters to be optimized was given in these previous works. In the previous MNDO -NMR parameterization,12 the authors r an the initial tests avoiding the alteration of the one center/one electron terms, (Uss/pp). These parameters affect the core energies and heats of formation of single atoms, and large alterations can significantly change the charges, dipole moments and el ectronic structure Since these parameters in the standard MNDO formalism were optimized to reproduce the aforementioned quantities ,62 only the Slater atomic orbital exponents ( %s/p) and atomic orbital two center/one

PAGE 78

78 electron resonance parameters ($s/p) were changed, leaving Uss/pp terms at the MNDO value. While only the % and $ parameters were directly changed, the derived MNDO parameters were re-calculated as appropriate. Our method for the fast semiempirical QM NMR calculations has been outlined in a previous publication,77 in which the optimized "(s/p) and %(s/p) parameters are ad opted.12 The %(s/p) parameters describe the character of the atomic orbitals. The "(s/p) parameters describe the attraction of one electron to the nuclei of other atoms. The formalism for AM1-NMR is the same as our MNDO -NMR procedure and the choice of the type of param eters to be optimized is also the same. Although only 1H and 13C chemical shifts are reproduced in this study, it was necessary to optimize the "(s/p) and %(s/p) parameters for C, H, N, O and S to achieve the best agreement between experimental and calcula ted values. The chemical shifts are then calculated as a difference between the calculated shielding constant and a chosen reference value according to Equation 4 -1. "calculated=#reference$#calculated (4-1) In addition to generating new parameters for the calculation of the shielding co nstants, optimization of the reference value in Equation 4 -1 was performed to address systematic problem s of overor underestimation of the chemical shifts. During the parameterization this reference value was initially set to the value of shielding const ants calculated for methane for a given set of parameters. Next, t he average signed error was calculated (see Equation 4 -2) between the experimental and calculated chemical shifts. The signed error was then added to the initial reference value and this adjusted number was used as the final !ref value. Average Signed Error ="exp #$calc % & ( ) + N (4-2)

PAGE 79

79 For a given set of parameters, this method essentially sets the average signed error to zero and minimizes the root-mean -squared (rms) error for the data set. The scoring function was then used to optimiz e the parameters by minimizing the rms error between experimental and calculated chemical shifts via Equation 4-3. All chemical shifts and errors for QM calculations in this work were evaluated using the reported !ref value chosen to minimize the average signed and rms error. In all ins tances the calculations using the new parameters are single -point NMR calculations. The geometries were generated using standard AM1 parameters because the new parameters are NMR -specific and were not tested for their ability to provide realistic geometrie s, nor were they developed for that purpose. G = 1 N ("i+#i$#ref)2 i= 1 N% (4-3) The new AM1-NMR parameters were optimized using a modified genetic algorithm (GA). A full description of the type of GA used in this work has previously been published.79 A GA has been successfully used previously in several semiempirical parameterizations, and the applicability of this type of optimization routine has previously been discussed.79,80,96The GA and all handling of the data were performed using an in -house molecular tools package MTK++ (Molecular Tool Kit [written in C++]).108 All semiempirical calculations were performed using the DivCon program.78 The results obtained for protein systems using the AM1 -NMR approach are compared to those given by the MNDO -NMR protocol and the empirical SHIFTX55 program that is freely available. SHIFTX version 1.1, the version that is availab le as of the date of this publication, does not have the capability to predict chemical shifts for all atoms in the protein systems. Most notably, the chemical shifts for side -chain carbon atoms could not be predicted.

PAGE 80

80 4.2.2 Experimental Data The large nu mber of protein structures in the Protein Data Bank (PDB) that were solved by NMR is one of many testaments to the important role that NMR plays in the study of protein systems. It was therefore chosen as a primary goal in this parameterization to improve the ability of the semiempirical NMR protocol to predict the chemical shifts of atoms in this particular environment. The MNDO-NMR parameters were developed with the broad aim of reproducing the chemical shifts in a wide variety of small organic compounds. Various functional groups present in that training set are not frequently seen in biological applications A few of these include O3, O=C=C=C=O, CH3-N=N=N, and ON -NO2. Therefore, we initially sought to develop NMR-specific parameters that were suited for biological compounds using a dataset of small molecules with functional groups that are more frequently seen in biology. This data set consisted of 94 small molecules, of which 65 were adopted from the small reference set used in the parameterization of MN DO-NMR. The remaining compounds were added to incorporate more functional groups that may be useful in biological applications. The results for this data set are presented in Tables 4-2 and 4 -3. This first set of parameters (referred to as AM1 -NMR) resulte d in only modest improvement upon our current protocol for MNDO -NMR for this dataset. Furthermore, when tested on several large protein systems, the agreement with experiment was not as close as was hoped. It was therefore decided to carry out a parameteri zation using protein data.

PAGE 81

81 Table 4 -2. Small molecule 1H and 13C RMS errors using !ref values optimized on the small molecule data set. NMR method MNDO NMR AM1 NMR H AM1 NMR C AM1 NMR AM1 1 H rms error 0.61 0.63 0.64 0.48 1.42 154 Shifts R 2 0.96 0.94 0.94 0.96 0.92 average signed error 0 0 0 0 0 average unsigned error 0.49 0 .59 0.49 0.37 1.24 ref 50.010 50.423 49.672 49.716 47.680 13 C rms error 10.54 12.59 10.00 9.86 21.54 176 Shifts R 2 0.97 0.96 0.97 0.97 0.86 average signed error 0 0 0 0 0 average unsigned error 7.91 9.54 7.51 7.78 15.66 ref 44.226 85.079 71. 956 63.354 58.802 ref (C C pi bonds) 44.226 85.079 51.403 63.354 58.802 Errors include the complete data set of training and test set using the average signed error as the !ref value in Eq1. The average signed error s are zero because they are absorbed in the !ref values.

PAGE 82

82 Table 4 -3. Small molecule 1H and 13C RMS errors using !ref values optimized on the protein data set. NMR m ethod MNDO NMR AM1 NMR H AM1 NMR C AM1 NMR AM1 1 H rms e rror 0.71 0.71 0.68 0.76 1.43 154 Shifts R 2 0.96 0.94 0.94 0.96 0.92 average signed e rror +0.37 +0.33 +0.22 +0.59 +0.14 average unsigned e rror 0.59 0.54 0.53 0.65 1.28 ref 49.603 50.062 49.418 49.090 47.503 13 C rms e rror 10.82 12.74 10.10 9.91 24.15 176 Shifts R 2 0.97 0.96 0.97 0.97 0.86 average signe d e rror 2.46 +1.96 +1.30 0.95 10.91 average unsigned e rror 7.84 9.61 7.54 7.73 19.85 ref 46.689 83.114 70.905 64.309 47.8900 ref (C C pi bonds) 46.689 83.114 49.32 64.309 47.8900 Errors include the complete data set of training and test set. The protein structures chosen for the training and test sets are listed in Table 4-4. These are unbound structure s that range in length from 46 -61 amino acid residues. It was essential for the molecules to be sufficiently large and exhibit a range of s econdary structures, including alpha helices, beta sheets, and random coils. The proteins could not be too large because it would impede the speed with which the parameterization could be carried out. Figure 4 -1 illustrates the distribution of amino acid r esidues in the training set. Figure 4 -2 compares the distribution of amino acid residues in the test set with that of the complete protein data set including both training and test sets. The experimental chemical shifts were taken from the BMRB database.109 There was a single instance in which the reported experimental data point was not used because

PAGE 83

83 it was deemed to be inaccurate (2CA7: Y24-CD1). The proteins used contained only C, H, N, O and S atoms. No metals were present and all water molecule s were removed. Table 4 -4. Protein systems used in the parameterization PDB ID Description N conormers/ Resolution Model/ Chain Missing Residues BMRB accession 1HA8 Pheromone 20 1 H,I,K,M, R,V,W 4979 1N87 Pre mRNA splicing factor PRP19 20 1 H,W 5594 1Q2N Z Domain Staphyloccal Protein A 10 1 C,G,M,T 5656 1RZS P22 Cro Protein 21 1 C 6185 1SZL F spondin Protein 20 1 H 6175 2CA7 Conkunitzin S1 Kunitz Type Neurotoxin 20 2 H,M,V, W 6506 1YV8 Crambin 20 1 H,K,M, Q, W 6455 2B7E Pre mRNA Processing P rotein 1 st FF Domain 12 1 C,H 6850 2FS1 Bacterial Albumin Binding Protein 20 1 C,F,H,R, W 6945 2JN0 ygdR protein from E. Coli 20 1 C,F,W 15079 1EZG Antifreeze protein (T. molitor) 1.40 B N,C,E,M, W 5323 1F94 Bucandin 0.97 A C,H,P,Q 5097 1FD3 Beta def ensin 2 (Human) 1.35 D C,H,P,Q 4642 1L3K Ribonucleoprotein A1 (HNRP Human) 1.10 A 4084 1UBQ Ubiquitin 1.80 C,W 5387 *Indicates structures used in the training set. All others were included in the test set.

PAGE 84

84 Figure 4-1. Distribution of amino acids in protein training set. Figure 4-2. Distribution of amino acids in complete protein data set (black) compared to test NMR structures in the PDB are seldom given as a single structure. Quite frequently they are given as a series of models. The protein model used was that which the PDB file designated as the best representative conformer. The second model was used in 2CA7 because of structural flaws in the best representative conformer. All structures were minimized at the AM1 level using the standard parameters prior to performing the single-point NMR calculations. Geometry

PAGE 85

85 optimization, visual inspection and single -point calculations using AMBER were performed to ensure that no models that contained significant structural anomalies were used. Although using high -resolution crystal structures may lower the chance of structural defects affecting the quality of the parameterization, NMR structures were used because it is more consistent to use the structures that were actually generated from the NMR experi ments. Additionally, in many instances the problems associated with the use of crystal structures ultimately made the use of NMR structures a more suitable option for this type of parameterization. Among the problems encountered were: first, in many instances the combination of a chosen crystal structure and matching BMRB data did not meet the general criteria outlined above because they were protein -ligand complexes, had incommensurate lengths or lacked the secondary structure of interest. Secondly, they o ften had an insufficient number of reported chemical shifts ; therefore the number of chemical shifts that they introduced into the reference data was disproportionately small relative to the computational expense. A third important problem was that the cry stal structures often had mutations that resulted in a mismatch with the BMRB data. Although several of these problems would not affect the use of these structures in training a statistical method, they were more problematic in a QM study. Because QM calcu lations account for the long -range interactions, even a single mutation in the corresponding structure was deemed unsuitable for training purposes, more so than small structural flaws that can be remedied by geometry optimization. Furthermore, because of the computational expense of QM models, very large systems were not suitable for training. Nonetheless, the test set was augmented with five hig h-resolution crystal structures The challenge of accurately reproducing the chemical shifts of amide protons is well documented.110 This is due in part to the varying rates of exchange that these protons experience

PAGE 86

86 with solvent during the experimental NMR procedures. For this reason, exchan geable protons were not used in this parameterization and the only hydrogen atoms included were those involved in H -C bonds. 4.2.3 Data Set Preparation Because the goal of this work was to provide parameters that could be used for large biological molecu les, semiempirical geometries were used ; the large system sizes precludes the use of structures generated via higher level calculations. For consistency, the geometries were all optimized using AM1, thus minimizing the errors due to variations in the exper imental structures. It was important to normalize the structures instead of using experimental geometries because the calculation of the chemical shifts is highly sensitive to bond lengths and angles and slight variations could significantly impact the qu ality of the parameters. Several structures in the test set were high -resolution crystal structures. These included four structures from the cross -validation set initially used to evaluate the SHIFTX program. The last structure, ubiquitin, was chosen because it is commonly used to evaluate the predictive quality of NMR related programs. In some instances the crystal structure did not contain the coordinates for the side chains of every residue. It was therefore necessary to add all m issing atoms using the L EaP program in AMBER 9.111 After all side -chain heavy atom coordinates were built, all of the crystal structures were protonated using AMBER. In a few cases, this involved the removal of the hydrogen atoms that were present in the PDB file. This was done for consistency, and to ensure that there were no significan t van der Waals clashes. All crystallographic waters were removed and a restrained minimization was then carried out using AMBER to minimize the protons with the heavy atom coordinates held constant. All of the geometry optimizations using AMBER were carri ed out with the ff99SB force field.112

PAGE 87

87 Following the initial structure preparation geometry optimizations were performed in vacuo at the AM1 level. The optimization used the steepest descent routine for 30 steps, followed by conjugate gradient, until certain convergence criteria were met. These criteria were maximum changes in the energy, gradients, and coordinates of 0.1 Kcal/mol, 1.0 Kcal/mol/, and 0.001, respectively. The structures were then carefully inspected to ensure that no significant errors occurred during the geometry optimization. These structures were then used to generate the new parameters. 4.3 Results and Discussion 4.3.1 General Results Three sets of new NMR -specific parameters [(s,(p,"s,"p] for th e AM1 Hamiltonian are listed in Table 4 -1. The first set (AM1-NMR) consists of 14 parameters optimized on small molecules to reproduce both 1H and 13C chemical shifts. The second set (AM1 -NMR-C) and third set (AM1-NMR-H) were generated using protein system s as the training set (see Table 4 4); they consist of 18 parameters optimized to reproduce 13C and 1H chemical shifts, respectively. No single parameter set was able to simultaneously achieve the highest level of accuracy possible for all three types of s ystems of interest. However, the results from the AM1 NMR-C set of parameters suggest that they represent the best balance of accuracy and versatility for the prediction of 1H and 13C chemical shifts for proteins and small molecules. For consistency, all Q M chemical shifts were calculated using a &ref value that minimizes the average signed error for this entire data set. This procedure was also implemented in previous semiempirical NMR parameterizations.12,96 The &r ef values used in the implementation of Equation 4-1 are listed in the various tables in which the results are summarized With the assumption that the training set was sufficiently large in this parameterization, these reported &ref

PAGE 88

88 values should be appro priate for future application. An alternative approach is to use the value of the shielding constants calculated for methane as the &ref values. For the protein systems examined, the signed errors using the optimized &ref values for MNDO -NMR were found to be 0.74 and 5.11 for 1H and 13C respectively. Using the shielding constant for the carbon atom in methane as the &ref value increased the rms error very slightly to 5.12 for 13C. A more significant difference was made for hydrogen where using methane as th e reference value increased the rms error from 0.74 to 0.88ppm. Therefore, for protein studies using our implementation of the MNDO -NMR procedure, the new &ref value is likely to improve results for 1H NMR chemical shift predictions. Despite the small difference that it made for the 13C results, the optimized value for carbon was still used for all calculations presented here; this allowed for consistency when comparing the results to the new parameters. In the original parameterization of MNDO-NMR a differ ent reference value was used for hydrogen for (C -H) (O-H) and (N -H). Because polar hydrogen atoms were excluded from the present study, a single reference value was also used for all hydrogen atoms. A single ref erence value was also used for the AM1 -NMR and AM1-NMRH parameter sets. As discussed later, a single reference value was used for the AM1 -NMR-C parameter set for hydrogen, but two reference values were used for the carbon atoms. This was the only instance in which more than one reference value was u sed for an atom. Both the orbital exponents [ (] and the resonance parameters [ "] were critical for obtaining good agreement with experimental NMR data, and subtle changes in either one significantly affected the results. Semiempirical methods exhibit a str ong interdependence between their parameters. Changing the parameter of one atom has such a significant effect on neighboring atoms that it was necessary to change all of the (s,(p,"s and "p parameters for C, H, N, O and S in order to obtain the best agree ment between experimental and calculated chemical shifts. This

PAGE 89

89 interdependence makes interpreting the changes that were made to the parameters less straightforward. However, as is apparent from the smaller values of the Slater orbital exponent ((s and (p) terms for hydrogen and carbon in Table 4 -1, the description of both 1H and 13C chemical shifts benefit from the use of orbital exponents that are more diffuse than those in the standard AM1 parameter set. The first set of parameters developed (AM1 -NMR) is suited for the calculation of NMR chemical shifts for small molecules using the AM1 Hamiltonian. Encouraging results using these parameters for the small molecule data set are plotted in Figures 4 -3 and 4 -4. However, an improvement in the agreement betwee n experimental and calculated chemical shifts for protein systems was reached by narrowing the scope of the parameterization basis set in subsequent parameterizations. These improvements are made clear by examining the results for the individual protein systems as outlined in Tables 4 -5 and 4 -6. As shown in Table 4 -7, the new NMR parameters improved the average rms errors of 1H and 13C chemical shifts by 0.12ppm and 0.28ppm, respectively, over the more general MNDO -NMR parameters. In order to best reproduce the chemical shifts of 13C and 1H nuclei for protein systems it was necessary to develop two separate parameter sets (AM1 -NMR-C and AM1-NMR-H). Although the AM1 NMR-C parameters perform well for the prediction of 1H chemical shifts, by sacrificing the accuracy of 13C, a much better description of the 1H chemical shifts was attainable with the AM1-NMR-H parameters set (see Tables 4 -5 and 4-7). Because both the MNDO -NMR and AM1-NMR parameters already serve as more general -purpose sets, we decided to enhanc e the AM1-NMR-H parameters for 1H NMR in the protein systems without the limitation of having to ensure good performance for any other type of chemical shifts.

PAGE 90

90 Table 4 -5. Comparison of 1H RMS errors (ppm) by protein. PDB ID MNDO NMR AM1 NMR Hydrogen AM1 NMR Carbon AM1 NMR AM1 # 1 H shifts 1HA8 0.86 0.67 0.76 0.85 1.28 182 1N87 0.78 0.62 0.70 0.74 1.13 208 1Q2N 0.53 0.60 0.33 0.59 1.04 18 1RZS 0.70 0.57 0.60 0.67 1.06 290 1SZL 0.75 0.63 0.65 0.69 1.07 277 2 CA7 0.67 0.57 0.60 0.64 0.94 219 1YV8 0. 72 0.54 0.64 0.70 1.00 169 2B7E 0.79 0.63 0.68 0.75 1.13 257 2FS1 0.71 0.53 0.62 0.69 1.01 232 2JN0 0.72 0.54 0.62 0.65 1.06 191 1EZG 0.73 0.67 0.69 0.72 0.98 257 1F94 0.64 0.61 0.59 0.64 1.02 269 1FD3 0.58 0.50 0.52 0.60 0.94 202 1L3K 0.67 0.54 0 .59 0.64 0.93 255 1UBQ 0.90 0.83 0.85 0.86 1.07 314 Table 4 -6. Comparison of 13C RMS errors (ppm) by protein. PDB ID MNDO NMR AM1 NMR Hydrogen AM1 NMR Carbon AM1 NMR AM1 # 1 3 C shifts 1HA8 4.49 5.40 4.59 7.34 8.21 51 1N87 6.02 7.24 5.80 8.52 12.98 14 1 1Q2N 4.62 17.70 4.03 7.78 26.06 191 1RZS 5.09 6.50 4.41 6.43 14.62 210 1SZL 5.26 9.67 4.59 7.57 24.81 243 2 CA7 5.05 9.95 4.68 8.15 24.92 208 1YV8 3.87 10.64 3.64 6.77 25.17 162 2B7E 4.82 6.53 4.72 7.44 12.41 179 2FS1 4.32 10.01 4.45 7.10 24.16 209 2JN0 4.89 9.18 4.36 7.25 22.88 183 1L3K 5.20 6.10 4.91 7.53 10.73 181 1UBQ 6.37 10.29 6.43 8.37 23.21 275 Experimental 13C chemical shifts were not available for 1EZG, 1F94 and 1FD3

PAGE 91

91 Table 4 -7. Protein 1H and 13C RMS errors NMR m ethod MNDO NMR AM1 NMR H AM1 NMR C AM1 NMR AM1 1 H rms e rror 0.74 0.62 0.66 0.71 1.05 3340 shifts R 2 0.86 0.86 0.87 0.86 0.86 average unsigned e rror 0.58 0.48 0.51 0.56 0.84 ref 49.603 50.062 49.41 8 49.090 47.503 13 C rms e rror 5.11 9.95 4.83 7.60 21.28 2233 shifts R 2 0.99 0.97 0.99 0.98 0.92 average unsigned e rror 3.98 7.70 3.67 6.43 15.51 ref 4 6.689 83.114 70.905 64.309 47.8900 ref (C C pi bonds) 46.689 83.114 49.320 64.309 47.8900 Errors include the complete data set of training and test set The !ref value was chosen to minimize the average rms error and sets the average signed error to zero. The encouraging results obtained for the proteins of the test set indicate that the protocol used in this study does present an improvement upon other semiempirical QM methods to evaluate NMR chemical shifts in protein systems. This improvement is significant in comparison to standard AM1 parameters. Accounting for other factors might also lead to improvements in this method. First, the inclusion of solvent may be a first step toward achieving greater accuracy. The parameters were developed to reproduce solution phase chemical shifts but all of the calculations, includin g both geometry optimizations and single -point NMR calculations, were run in vacuum. Secondly, it is not clear what artifacts may have been brought about in this study by the use of NMR structures and the AM1 Hamiltonian to optimize the geometry of these protein systems. If there were undetected significant artifacts, addressing this issue may lead to improved results. Although a recent study has investigated the effect of geometry optimization of

PAGE 92

92 protein systems using PM6,113 it is not clear how similar the artifacts of AM1 geometry optimization would be on protein systems. A detailed examination of the effect of the geometry on the prediction of NMR chemical shifts will be described in a future publication. Higher level ab initio and DFT calculations cannot easily be performed on the protein systems used here; ther efore, in Tables 4 -8 and 4-9, the results for the protein systems are compared to those of the SHIFTX program. The SHIFTX calculations were performed on the experimental structures without any editing of the structure. Because SHIFTX could not perform 13C NMR predictions on the side -chain carbon atoms, only C, C!, and C" are compared. On average the SHIFTX program yielded results that were ~2.5ppm better than those predicted using the 13C parameters. The best semiempirical models currently have errors roughly twice as large as those of the SHIFTX. This is not surprising since SHIFTX and related programs can use extensive atom typing, whereas this study has one H or C parameter set to directly work along with indirect effects generated through modification of allied parameters for N, O and S. SHIFTX, through atom typing, outperforms the present approach, but a QM approach is still more versatile and can be used to facilitate computational studies of chemical shift perturbation upon ligand binding, and investi gations involving non -standard amino acids.

PAGE 93

93 Figure 4-3. 13C chemical shift correlation for small molecule data set using AM1 -NMR parameters. Compounds with the largest deviation from experiments are drawn with the arrow pointing to the particular atom. Figure 4-4. 1H chemical shift correlation for small molecule data set using AM1 -NMR parameters. Compounds with the largest deviation from experiments are drawn with the arrow pointing to the particular atom.

PAGE 94

94 Table 4 -8. RMS errors of 1H and 13C NMR chemical shifts for complete protein set. NMR m ethod MNDO NMR AM1 NMR H AM1 NMR C AM1 NMR SHIFTX # Shifts H 0.81 0.60 0.67 0.82 0.34 844 C 4.16 7.03 4.00 7.25 1.98 668 C 5.80 7.60 5.94 7.65 1.93 579 rms error C 3.46 17.23 3.38 8.26 1.82 358 Table 4 -9. Correlation of 1H and 13C NMR chemical shifts for complete protein set NMR m ethod MNDO NMR AM1 NMR H AM1 NMR C AM1 NMR SHIFTX range (ppm) H 0.22 0.22 0.24 0.26 0.62 1.70 6.14 C 0.34 0.06 0.39 0.28 0.84 40.7 68.4 C 0.87 0.68 0.86 0.86 0.98 15.4 73.4 R 2 C 0.02 0.17 0.01 0.01 0.38 170.2 181.7 4.3.2 Drawbacks While the overall rms error is quite low when using the new sets of parameters for protein systems, this method does not appear to differentiate well among the same atom -types in dissimilar chemical environments. One example of this can be seen in the C chemical shifts of residues ASP31 and THR6 of the 2JN0 structure. The experimental chemical shifts are reported as 176.93 and 177.34ppm, respectively; the calculated chemical s hifts using AM1-NMR-C are179.10 and 174.99, respectively. While the order is incorrect, both calculated chemical shifts are well within the rms error of 4.36ppm. This is illustrated in a decomposition of the R2 values for 2JN0 using the AM1 -NMR-C parameter s in Figure 4 -5. This clustering effect is present in all of the semiempirical methods evaluated here for the protein systems, and it is very evident in carbon atoms. While the overall R2 values are quite high for all carbon atoms, the R2 values for

PAGE 95

95 differ ent clusters are quite low, as is illustrated in Table 4 -9. Tests were run to determine whether the buffer size in the divide -and-conquer scheme limited the environmental effect on the chemical shifts. These tests showed no indication that this was the cas e. It appears that rms error is sufficiently large to encompass a significant portion of the range of experimental chemical shifts observed for each particular atom type. This renders the method weak at differentiating among these similar atoms in some cas es, and results in poor R2 values within these clusters. This is supported by the fact that the clusters with the smaller ranges exhibit worse R2 values. This effect is also found in SHIFTX as illustrated in Figure 4 -6, but to a lesser extent because their rms errors are lower (by a factor of three in this instance).

PAGE 96

96 Figure 4-5. Decomposition of 13C NMR R2 value for PDB ID 2JN0 using AM1 -NMR-C parameters. For direct comparison with SHIFTX results in Figure 4 -6, only C", C!, and C atom types are includ ed. Sections b, c, and d are the C ", C!, and C subsets of section a, respectively. The x -axis is the experimental and the y -axis is calculated chemical shifts. The average 13C rms error for this protein is 4.36ppm. (a)The overall R2 for all C", C!, and C atoms is 0.99. (b) R2 for C" is relatively high at 0.86 because large variety in the bonding situations for C atoms results in a large chemical shift range of >50ppm. (c) C R2=0.58. With less variety in bonding situations the chemical shift range decr eases and the rms error has a greater impact as exhibited in the decreased R2 value. (d) For very small chemical shift ranges such as C, the rms error is too large to distinguish among different atoms of this atom type. It is important to note that despit e the poor correlation, the magnitude of the errors is still quite small.

PAGE 97

97 Figure 4-6. Decomposition of 13C NMR R2 value for PDB ID 2JN0 with SHIFTX. Sections b, c, and d are the C", C!, and C subsets of section a, respectively. The x -axis is the experimental and the y -axis is calculated chemical shifts. The average 13C rms error for this protein is 1.33ppm. (a)The overall R2 for all C", C!, and C atoms is 1.00. (b) R2 for C" is still high, at 0.99, because large variety in the bonding situations for C" atoms results in a large chemical shift range of >50ppm. (c) C! R2=0.92. With less variety in bonding situations the chemical shift range decreases and the rms error has a greater impact as exhibited in the decreased R2 value. (d) For C, which covers a much smaller range of chemical shifts R2=0.46. 4.3.3 AM1-NMR The first set of parameters was generated using the small molecule data presented in Table 4-10. This dataset was geared towards biologically relevant compounds, but contained a more diverse set of functional groups than those that are present in standard unbound protein systems. The variety of functional groups in this data set makes these parameters more general than the others; therefore, this set of parameters is referred to simply as AM1 -NMR because they are the most general parameter set. The chemical shifts or this small molecule data set ranged from 0 to 211.5ppm and 0 to 9.3ppm for 13C and 1H respectively. As shown in Table 4 -2, this parameter set performed very well for the small mol ecule data set. The average rms errors were 9.86 and 0.48ppm

PAGE 98

98 corresponding to 4.7% and 5.2% of the chemical shift range, respectively. Therefore, using these parameters, the goal of obtaining 5% of the chemical shift range was met for carbon and a very close result was achieved for hydrogen. However, these results were generated using a &ref value that minimized the error for this data set. It is desirable to have a single &ref value that can be used for a variety of systems. In order to determine how exten sible this parameter set was, tests were run for the small molecule data set using the &ref value that minimized the rms error for the protein data set. As shown in Table 4 -3, using the &ref value optimized for the protein systems resulted in a significant difference in error being observed for 1H NMR calculations in the small molecule data set, changing the rms error from 0.48 to 0.76ppm. This difference suggests that the effect of the environment in protein systems was more important than was accounted for in the small molecule parameterization. The difference in error for 13C NMR calculations was much smaller, only changing the rms error from 9.86 to 9.91ppm. Furthermore, even when the different &ref value was used, the 13C NMR results were best for the small molecule data set using these AM1 NMR parameters. The larger errors were observed for both 13C and 1H NMR calculations for small crowded systems and those neighboring heteroatoms sulfur and nitrogen in particular. The molecules with the largest erro rs are given in the scatter plot in Figures 4 -3 and 4-4 for carbon and hydrogen respectively. Because the data set was more oriented towards biological compounds, there was an overabundance of aromatic systems, and consequently the parameters performed better in these cases. A detailed comparison of the experimental chemical shifts with those calculated using all parameters sets is given in the supplementary material. This parameter set showed significant improvement compared to the standard AM1 parameters for both the small molecule and protein data sets. It also exhibited an improvement

PAGE 99

99 relative to MNDO -NMR for the overall prediction of 1H NMR chemical shifts for the protein systems. However, the performance for 13C NMR prediction for protein systems was worse than that of the MNDO -NMR procedure. Because no improvement was made in this area using the small molecule training set, more parameterization was done using globular proteins in the data set, which represent an important target application area for us. 4.3.4 AM1-NMR-C (Carbon) The second set of parameters was optimized to reproduce 13C and 1H NMR data for a set of protein systems. In order to differentiate between these and the third set of parameters (which were generated to reproduce 1H NMR only) these parameters are referred to as AM1 -NMR-C, even though they do perform very well for 1H NMR calculations. The 13C chemical shifts for the complete protein data set ranged from 8.49 to 181.70ppm. When the AM1 -NMR-C parameter set was used the average rm s errors for the complete protein data set was 4.83ppm, corresponding to just below 2.5% error. When compared with the original AM1 parameters, the final parameters show a significant improvement in agreement with experimental results for both the small molecule and protein data sets. This parameter set also performed very well for both 1H and 13C NMR predictions in the small molecule data set. As shown in Table 4 -3, the &ref value generated to optimize performance for the protein system was still very well suited for the small molecule data set. In fact, this parameter set was least affected by this change, indicating that it is likely to be the most extensible of the methods developed and tested here. The quality of this set of parameters is also demonstr ated in Table 4 -7, where they show good improvement upon both the standard AM1 and MNDO-NMR results for our data sets. Large signed errors (-20ppm) were noticed for carbon atoms that participate in one or more C-C pi bonds when using the AM1-NMR-C paramet ers. In protein systems, this only affects the residues with aromatic side -chains (HIS, PHE, TRP, TYR). However, the problem was observed

PAGE 100

100 for all of the small molecules in which these bonding situations were present. For this reason a different &ref value was used for these carbon atoms. It is important to note that these errors only involved the C-C bonds, and errors associated with carbon having multiple bonds to other atoms were not affected. The use of more than one reference value significantly improve s the performance of this parameter set and may be a large part of the reason that these parameters outperform the other methods tested here. For no other semiempirical method tested did a particular functional group contain a large signed error that could be easily addressed by the use of a different reference value. Although significant improvements would be made by addressing other instances of systematic errors in this fashion, this procedure would lead to atom typing which limits the versatility of the se methods and was therefore avoided. 4.3.5 AM1-NMR-H (Hydrogen) The third set of parameters was developed to specifically reproduce only 1H chemical shifts in protein systems. Attaining high accuracy for proton chemical shifts in protein systems is a significant challenge for a QM model. However, with the focus limited to improving the accuracy of 1H chemical shift calculations for protein systems only, a reasonable goal was to achieve accuracy of 5% of the chemical shift range. The 1H chemical shifts for the complete set of protein systems ranged from -0.53 to 7.74ppm. The average rms error for the complete data set, including training and test sets was 0.62ppm, corresponding to 7.5% of the chemical shift range for this data set. This is an improvement upo n the MNDO -NMR procedure, which yielded an average rms error of 0.74ppm, corresponding to 8.9% of the chemical shifts range. While 7.5% error is at the high end of our target for the prediction of 1H chemical shifts, the R2 of 0.86 is still encouraging. Th e high R2 value suggests that the general trends are still well preserved. Furthermore, the results suggest that this parameter set is the best available semiempirical method for the prediction of 1H chemical shifts in protein systems.

PAGE 101

101 4.4 Conclusions The addition of three new NMR-specific parameter sets for the AM1 Hamiltonian has now extended the semiempirical QM methodology for a near quantitative description of NMR chemical shifts. This approach is more consistent, as it can be used with geometries gen erated with the same AM1 Hamiltonian. When compared to previously available semiempirical protocols, the reduction in error in protein systems is significant. Furthermore, the methods can be executed at a fraction of the cost of ab initio and DFT methods. The rationale for the development of three parameter sets is outlined and the possible limitations of the method are given in detail in the Results and Discussion. The results from the AM1 -NMR-C parameter set suggest that they represent the best balance of accuracy and versatility for the prediction of 1H and 13C chemical shifts for proteins and small molecules. Clearly the problem associated with using large systems for a training set in semiempirical QM methods is the computational cost. However, the bene fit is seen in the significant reduction of the errors associated with this type of procedure being applied to relevant systems. As research continues to focus on the development of linear scaling QM algorithms, and the speed of computers increases, these types of studies become more feasible and more important to truly explore the limitations of the methods in predicting different experimental quantities in larger biological systems. The large number of chemical shifts examined in this study, combined with the promising results seen in the test set makes it likely that the current parameters are extendible to other protein systems. While semiempirical methods are quantum mechanical and therefore do not require atom typing, they are limited in their flexibi lity due to the use of a minimal basis set and other approximations. The MNDO -NMR parameters were not developed with such a specific goal as to capture atoms in a protein environment. Therefore, the close agreement seen between the

PAGE 102

102 experimental chemical sh ifts and those calculated using the MNDO -NMR methodology for protein systems is a testament to the quality of the parameterization carried out by Patchkovskii and Thiel. However, by having a more focused target of achieving good agreement for the 1H and 1 3C chemical shifts found in protein environments, higher accuracy was achieved in these particular systems of interest.

PAGE 103

103 Table 4 -10 Experimental and calculated chemical shifts for small molecule data set. # Molecule atom Exp. MNDO NMR AM1 NMR AM1 NMR H AM1 NMR C Ref 1 C 113.00 113.22 122.39 137.14 142.00 12 H 2.83 4.02 2.94 5.90 6.26 12 2 methane C 0.00 2.89 0.80 25.30 6.35 12 H 0.00 0.07 0.04 0.64 0.05 12 3 ethane C 14.22 9.76 10.41 26.82 13.85 12 H 0.74 0.57 0.69 1.14 0.66 12 4 CH 2 (CH 3 ) 2 C (CH 3 ) 24.15 12.03 12.88 29.40 17.13 12 C (CH 2 ) 25.73 21.14 20.80 27.59 23.27 1 2 H (CH 3 ) 0.80 0.70 0.79 1.25 0.73 12 H (CH 2 ) 1.25 1.19 1.40 1.60 1.37 12 5 cyclopropane C 10.11 21.58 22.74 29.00 30.29 12 H 0.07 0.94 1.01 1.09 1.04 12 6 n Butane C (CH 3 ) 21.30 12.04 12.75 29.54 17.11 12 C (CH 2 ) 34.77 23.54 23.25 30.38 26.59 12 H (CH 3 ) 0.80 0.74 0.82 1.28 0.76 12 H (CH 2 ) 1.23 1.16 1.38 1.58 1 .37 12 7 Isobutane C (CH 3 ) 32.64 14.47 15.48 32.10 20.58 12 C (CH) 33.45 31.32 32.17 27.75 34.77 12 H (CH) 1.59 1.74 2.15 1.98 2.13 12 H (CH 3 ) 0.77 0.71 0.81 1.24 0.78 12 8 cyclobutane C 30.57 26.52 25.97 29.05 25.61 12 H 1.90 1.28 1.39 1.41 1.20 12 9 C(B) 32.06 23.53 23.07 30.52 26.55 12 C(C) 44.44 25.90 25.68 3 3.09 29.83 12 C(A) 21.59 11.99 12.72 29.48 17.05 12 10 C (E) 30.11 14.91 15.78 32.66 21.03 12 C (B) 40.19 33.59 34.43 30.48 37.81 12 C (C) 41.67 25.57 25.28 32.86 29.39 12 C (D) 19.38 11.94 12.70 29.59 17.24 12

PAGE 104

104 Table 4 10 Continued # Molecule atom Exp. MNDO NMR AM1 NMR AM1 NMR H AM1 NMR C Ref 11 C (A) 36.79 40.49 44.75 27.50 48.50 12 C (B) 39.66 17.06 18.19 34.92 24.20 12 H 0.82 0.72 0.86 1.24 0.86 12 12 ethene C 130.46 128.35 131.46 128.64 126.65 12 H 5.18 5.86 5.63 5.41 5.95 12 13 C (A) 123.22 123.13 124.54 123.29 119.57 12 C (B) 142.48 134.37 135.90 128.22 131.20 12 C (C) 26.21 15.95 17.28 32.24 21.79 12 1 4 C (A) 132.50 131.52 131.12 119.80 125.89 12 C (B) 33.10 15.91 17.07 33.63 23.39 12 H 1.51 1.42 1.44 1.69 1.40 12 15 ethyne C 77.90 86.55 94.86 99.81 90.28 12 H 1.33 1.26 0.84 1.09 0.40 12 16 Benzene C 137.23 134.80 133.67 126.20 130.08 12 H 7.09 8.00 7.63 7.12 7.97 12 17 C (B) 138.24 135.50 134.35 126.83 130.71 12 C (C) 137.62 134.09 132.61 125.93 129.00 12 C (D) 147.68 142.42 141.59 128.10 138.86 12 C (A) 134.76 132.88 131.58 124.19 127.76 12 C (E) 28.72 18.18 19.02 34.75 24.20 12 18 C (F) 146.00 144.33 142.14 134.66 138.30 12 C (E) 131.80 127.39 126.69 120.73 121.89 12 C (D) 145.50 144.28 141.75 136.49 138.79 12 C (C) 149.4 0 137.51 136.20 124.65 133.82 12 C (B) 127.70 127.93 126.99 121.19 125.43 12 C (A) 146.40 143.94 142.09 135.16 141.34 12 19 C (F) 193.60 190.96 184.57 173.54 183.60 12 C (A) 59.20 49.14 53.04 56.62 59.78 12 C (C) 143.90 135.07 134.25 126.98 129.87 12 C (D) 185.10 180.31 172.66 162.25 172.86 12

PAGE 105

105 Table 4 10 Continued # Mol ecule atom Exp. MNDO NMR AM1 NMR AM1 NMR H AM1 NMR C Ref 20 CH 3 OH C 58.09 54.36 62.28 67.31 58.97 12 H 3.30 2.26 2.39 2.62 2.25 12 21 C (B) 26.57 10.97 11.24 26.05 14.20 12 C (A) 67.49 62.60 69.52 65.51 63.35 12 H (B) 1.25 0.90 1.03 1.18 0.81 12 H (A) 3.53 2.71 2.96 2.92 2.72 12 22 C 3 6.80 27.25 35.19 48.17 30.05 12 H (A) 0.27 1.46 1.72 2.01 1.43 12 23 C (B) 201.00 201.15 182.02 163.43 168.07 12 C (A) 9.50 61.05 60.04 57.55 38.28 12 24 C (A) 201.72 209.58 198.65 176.65 198.24 12 C (B) 38.01 20.84 22.58 34.08 24.87 12 H (B) 1.79 2.05 2.21 2.39 1.82 12 25 C 195.20 216.15 209.22 187.2 8 208.02 12 26 C 171.38 190.31 179.44 162.58 175.26 12 27 C (B) 208.20 208.13 193.60 169.55 193.27 12 C (A) 37.10 20.23 21.62 34.51 25.05 12 H 1.79 1.95 2.08 2.26 1.74 12 28 C 172.00 167.35 164.90 160.91 174.48 12 H (A) 8.00 8.09 7.68 6.77 7.16 12

PAGE 106

106 Table 4 10 Continued # Molecule atom Exp. MNDO NMR AM1 NMR AM1 NMR H AM1 NMR C Ref 29 C (A) 177.00 172.99 168.61 160.70 176.19 12 H (B) 1.75 1.40 1.58 1.98 1.39 12 H (D) 2.71 3.67 3.67 3.27 2.85 12 H (C) 2.76 2.25 2.52 2.63 2.12 12 30 C (B) 143.40 129.35 127.51 124.32 126.57 12 C (A) 194.00 196.49 183.76 163.04 184.72 12 31 C (A) 7.54 11.16 12.34 28.61 17.22 12 H 1.53 1.97 1.99 3.18 2.85 12 C ( B) 121.30 112.40 118.74 131.54 136.93 12 32 C (B) 151.97 170.10 162.93 151.62 153.81 12 C (A) 117.84 125.26 123.36 117.41 120.50 12 33 C (C) 160.17 142.22 145.85 151.44 147.23 12 C (B) 132.07 130.78 129.93 122.40 125.85 12 C (A) 143.52 139.89 137.22 128.50 132.95 12 34 C (B) 47.20 36.55 45.0 3 57.56 41.93 12 H (B) 3.50 2.78 2.91 4.13 4.01 12 C (A) 39.40 36.26 44.70 56.02 40.14 12 H (A) 2.71 3.24 3.27 4.05 3.76 12 35 C (A) 47.10 45.22 51.28 58.34 48.61 114 C (B) 25.70 29.35 29.19 33.64 30.22 H (A) 2.87 2.35 2.69 2.83 2 .47 H (B) 1.69 1.58 1.78 1.95 1.70 36 C (C) 46.90 38.01 43.96 52.44 42.05 114 C (B) 26.90 22.74 22.61 29.42 25.74 C (A) 24.90 21.82 21.05 30.08 26.21 H (B) 1.55 1.17 1.35 1.65 1 .30 H (C) 2.80 1.79 2.19 2.78 2.30

PAGE 107

107 Table 4 10 Continued # Molecule atom Exp. MNDO NMR AM1 NMR AM1 NMR H AM1 NMR C Ref 37 C (B) 117.70 124.68 131.30 134.39 130.29 115 C (A) 108.00 125.09 122.83 120.86 126.66 H (B) 6.62 6.12 5.89 5.80 6.21 H (A) 6.05 6.31 5.92 5.19 6.02 38 C (D) 121.30 132.96 131.42 124.26 127.06 116 C (E) 120.30 128.70 128.00 122.99 125.98 C (F) 122.30 132.48 130.68 123 .74 127.83 C (G) 111.80 122.85 121.92 118.59 119.93 C (A) 125.20 130.14 135.57 138.58 134.41 C (B) 102.60 121.29 119.46 119.20 123.14 H (D) 7.64 8.09 7.69 6.98 7.86 H (B) 6.56 6.74 6.33 5.64 6.46 39 C (C) 136.23 128.52 139.52 15 8.79 165.25 117 C (A) 122.33 131.03 133.71 138.32 135.97 C (B) 122.33 124.08 129.54 131.36 128.00 H (C) 7.73 7.0 9 7.03 7.38 7.43 H (A) 7.15 7.14 6.87 6.26 6.82 H (B) 7.15 6.43 6.17 6.03 6.51 40 C (C) 134.00 137.57 140.47 151.10 166.19 118 C (A) 105.10 124.64 121.54 121.37 127.46 C (B) 134.00 134.95 140.54 147.35 144.51 H (C) 7.65 7.73 7.44 7.05 7.58 H (A) 6.25 6.62 6.20 5.81 6.65 H (B) 7.65 7.05 6.75 6.86 7.45 41 C (C) 150.60 167.58 167.10 165.15 173.60 119 C (A) 125.40 128.39 131.56 133.03 128.6 2 C (B) 138.10 166.19 158.27 145.41 148.74 H (C) 7.95 8.75 8.13 8.12 8.02 H (A) 7.09 7.11 6.86 6.49 6.85 H (B) 7.69 8.13 7.25 6.82 7.45 42 C (A) 158.58 144.20 152.28 169.08 178.97 117 C (B) 156.77 147.39 149.22 153.19 149.46 C (C) 121.38 127.68 126.91 119.25 122.69 H (B) 8.78 9.37 9.07 8.60 9.27 H (A) 9.26 9.64 9.69 9.69 9.96 H (C) 7.3 7 8.14 7.68 7.41 8.41

PAGE 108

108 Table 4 10 Continued # Molecule atom Exp. MNDO NMR AM1 NMR AM1 NMR H AM1 NMR C Ref 43 C (C) 115.02 122.80 122.04 119.23 121.28 120 C (B) 129.26 137.37 135.55 127.12 131.31 C (A) 1 18.37 127.28 126.87 121.64 124.54 C (D) 145.52 141.04 144.07 134.55 138.83 H (A) 6.72 7.47 7.13 6.78 7.59 H (B) 7.18 7.91 7.54 7.06 7.90 H (C) 6.67 7.12 6.85 6.60 7.31 44 C (C) 122.40 130.72 130.79 121.17 129.84 121 C (F) 127.10 135.70 133.70 125 .25 129.10 C (E) 115.20 123.52 122.71 119.80 121.93 C (D) 144.80 140.00 142.36 133.81 137.41 C (G) 17.20 16.12 16.30 33.41 22.88 H (G) 2.17 2.19 2.30 2.59 2.42 45 C (D) 149.54 141.98 145.21 135.13 139.03 120 C (A) 117.93 127.25 126.84 121.82 124.64 C (B) 129.25 137.23 135.46 127.00 131.11 C (C) 112.42 122.42 122.22 119.64 121.70 H (A) 6.71 7.45 7.11 6.77 7.57 H (B) 7.20 7.89 7.53 7.05 7.88 H (C) 6.60 7.10 6.85 6.60 7.25 C (E ) 29.95 30.11 38.87 50.92 33.45 H (E) 2.81 2.16 2.47 2.82 2.27 46 C (D) 150.40 142.78 146.22 135.82 139.36 122 C(C) 112.50 123.21 122.76 120.33 122.30 C (B) 128.80 136.94 135.04 126.78 130.84 C (A) 116.50 127.07 126.70 121 .95 124.70 H (B) 7.22 7.90 7.53 7.06 7.89 H (E) 2.93 2.29 2.60 2.93 2.41 47 C (B) 30.72 31.82 40.88 52.04 35.19 123 H (B) 3.47 2.13 2.38 2.50 1.95 C (C) 162.13 169.87 167.36 162.81 175.56 C (A) 35.25 34.46 42.78 54.03 37.54 H (C) 8.02 7.91 7.53 6.67 6.97 48 C (C) 129.60 141.35 139.31 131.02 135.75 124 C (B) 128.40 134.28 133.21 126.13 129.51 C (A) 133.70 140.13 138.42 129.48 134.10 C (D) 130.20 129.95 128.07 118.01 124.49 H (A) 7.79 8.35 7.94 7.33 8.21 H (B) 7.66 8.18 7.80 7.17 8.0 4 H (C) 8.21 8.99 8.51 7.50 8.55 C (E) 172.60 185.62 173.36 155.87 171.34

PAGE 109

109 Table 4 10 Continued # Molecule atom Exp. MNDO NMR AM1 NMR AM1 NMR H AM1 NMR C Ref 49 C (C) 128.20 140.61 138.95 130.59 135.47 125 C (B) 128.40 134.92 133.88 126.39 129.86 C (A) 133.10 139.27 137.76 129.28 133.64 C (D) 137.10 134.48 132.76 122.12 130.22 H (A) 7.55 8.30 7.89 7.31 8.20 H (B) 7.48 8.20 7.81 7.26 8.11 H (C) 7.98 8.66 8.23 7.60 8.53 C (E) 198.00 202.92 189.41 167.65 170.07 C (F) 26.50 20.49 21.82 35.39 25.97 H (F) 2.61 2.67 2.79 2.9 6 2.48 50 C (C) 128.33 140.89 138.70 130.29 135.22 126 C (B) 129.03 135.26 134.10 126.51 130.16 C (A) 132.08 137.95 136.52 128.47 132.77 C (D) 135.02 133.92 1 31.81 121.90 129.99 H (B) 7.48 8.12 7.74 7.22 8.06 H (C) 7.90 8.11 7.75 7.22 8.10 C (E) 169.05 166.84 162.07 156.55 172.78 51 C (C) 115.55 124.40 122.57 116.70 117.32 127 C (B) 129.79 137.03 135.35 128.85 133.15 C (A) 120.84 130.74 129.70 120.81 124.05 C (D) 155.36 155.41 152.92 140.03 145.89 H (A) 6.93 7.75 7.40 6.81 7.62 H (B) 7.24 8.05 7.71 7.10 7 .94 H (C) 6.48 7.68 7.26 6.44 7.20 52 H (B) 1.10 0.89 0.99 1.30 0.87 123 C (A) 65.94 65.94 72.32 69.68 67.25 H (A) 3.50 2.59 2.87 2 .87 2.66 C (B) 15.46 11.23 11.47 26.78 14.72 C (B) 171.30 192.89 179.36 161.64 177.27 128 53 C (A) 51.50 63.42 69.70 69.52 64.12 H (A) 3.68 3.50 3.40 3.17 3.06 C (C) 20.60 16.03 16.75 27.24 16.64 H (C) 2.21 1.93 2 .08 2.28 1.72 54 C (A) 31.20 31.54 32.71 41.60 35.82 129 C (B) 31.40 26.85 33.98 27.73 34.73 H (A) 1.88 1.88 2.18 2.33 1.99 H (B) 2.75 1.91 2.42 2.11 2.20 55 C 18.20 8. 12 15.49 20.85 17.44 130 H 2.14 0.99 1.43 1.36 1.36

PAGE 110

110 Table 4 10 Continued # Molecule atom Exp. MNDO NMR AM1 NMR AM1 NMR H AM1 NMR C Ref 56 C (B) 124.90 163.01 154.93 133.33 144.44 131 C (A) 126.40 137.48 133.51 124.73 130.96 H (B) 7.20 8.47 7.85 6.91 7.79 H (A) 6.96 7.46 6.99 6.36 7.24 57 C (B) 67.97 70.14 76.45 76.76 74.41 123 C (A) 25.62 30.20 29.00 41.31 36.92 H (B) 3.77 2.96 3.15 3.27 2.98 H (A) 1.87 1.84 1.97 2.47 2.17 58 C(C) 41.62 41.17 42.60 48.18 29.46 132 H (C) 2.90 2.86 3.13 3.53 3.11 C (B) 133.18 144.97 144.53 137.28 142.41 H (B) 6.40 6.79 6.41 5.84 6.62 C (A) 132.37 138.17 137.23 130.64 134.97 H (A) 6.50 6.79 6.39 5.85 6.63 59 C (B) 26.80 22.32 21.70 29.61 25.37 133 C (C) 68.80 64.28 70.55 67.70 65.72 C (A) 23.60 21.03 20.18 27.24 23.51 H (C) 3.65 2.49 2.75 2.67 2.45 H (B) 1.61 1.60 1.78 1.95 1.73 60 C (B) 46.61 36.91 42.69 51.00 40.50 120 C (C) 68.19 64.05 70.75 66.82 65.00 H (C) 3.70 2.70 3.00 2 .78 2.53 H (B) 2.89 2.25 2.62 3.15 2.67 61 C (C) 43.70 35.34 35.85 34.65 41.82 134 C (B) 34.30 26.50 25.98 34.48 30.74 C (A) 26.90 23.63 23.30 31.15 27.13 H (C) 1.42 1.62 2.03 1.91 2.06 62 C (B) 26.96 23 .73 23.36 31.20 27.18 135 C (C) 35.96 25.91 25.57 33.81 30.33 C (D) 33.33 33.77 34.44 31.50 38.68 C (A) 26.81 23.43 23.07 31.02 26.98 C (E) 23.14 14.42 15.18 32.28 20.38 H (E) 0.86 0.73 0.82 1.30 0.80 63 C (B) 27.10 24.99 24.90 31.78 27.79 136 C (C) 41.90 30.46 30.01 35.46 32.67 C (D) 211.50 210.81 196.38 172.06 196.08 C (A) 25.00 23.46 23.31 30.87 26.93 H (C) 2.36 2.20 2.37 2.45 2.02 H (B) 1.80 1.42 1.68 1.70 1.47

PAGE 111

111 Table 4 10 Continued # Molecule atom Exp. MNDO NMR AM1 NMR AM1 NMR H AM1 NMR C Ref 64 C (A) 167.80 141.09 151.61 156.31 165.27 137 C (B) 39.30 31.75 40.55 52.30 35.57 H (B) 2.83 1.96 2.27 2.58 2.03 65 C (A) 60.74 77.95 70.73 73.17 79.23 138 C (B) 37.37 26.45 25.92 32.70 29.46 C (C) 26.46 24.64 24.30 31.96 27.87 C (D) 24.68 23.46 23.27 31.01 27.06 H (2) 3.42 4.10 4.01 3.85 4.06 66 C (A) 90.99 74.47 68.46 73.54 77.73 139 C (B) 32.21 23.27 23.75 30.84 27.65 C (C) 22.68 23.51 23.21 30.91 26.81 C (D) 25.14 23.17 22.94 30.65 26.70 67 Cyclopentane H 1.45 1.50 1.66 1.92 1.62 12 68 H (B) 1.18 1.25 1.42 1.70 1.38 12 H (A) 1.62 1.11 1.32 1.52 1.28 12 69 C(C 2 H 5 ) 4 H (CH2) 1.27 1.34 1.58 1.73 1.54 12 H (CH3) 0.72 0.68 0.78 1.26 0.77 12 70 H (B) 1.89 1.98 2.18 2.43 2.15 12 H (A) 5.51 6.08 5.78 5.46 6.07 12 H (C) 1.56 1.33 1.53 1.71 1.50 12 71 H 3 C C 1 C CH 3 H 1.46 1.33 1.48 1.88 1.33 12 7 2 H (A) 0.83 0.75 0.86 1.27 0.75 12 H (B) 1.44 1.45 1.66 1.58 1.45 12 H (C) 3.45 2.65 2.89 2.89 2.70 12 7 3 H (A) 1.01 1.02 1.12 1.24 0.86 12 H (B) 3.81 3.23 3.67 3.28 3.37 12

PAGE 112

112 Table 4 10 Continued # Molecule atom Exp. MNDO NMR AM1 NMR AM1 NMR H AM1 NMR C Ref 7 4 H (B) 1.08 0.76 0.93 1.47 1.01 12 7 5 H (A) 4.14 3.58 3.98 3.68 3.72 12 H (B) 1.66 1.80 2.03 1.93 1.78 12 H (C) 1.46 1.58 1.75 2.03 1.75 12 7 6 HOCH 2 CH 2 OH H 3.50 2.79 3.03 2.80 2.68 12 77 H (C) 5.87 6.12 5.98 5.82 6.28 12 H (A) 5.87 5.81 5.48 4.68 5.33 12 H (D) 3.93 3.28 3.53 3.53 3.27 12 78 H (B) 3.38 2.93 3.19 3.10 2.81 12 H (A) 2.67 2.56 2.86 2.85 2.54 12 79 (CH 3 ) 2 O H 3.10 2.12 2.31 2.56 2.18 12 8 0 (C 2 H 5 ) 2 O H (CH 2 ) 3.29 2.57 2.86 2.85 2.64 12 H (CH 3 ) 1.02 0.55 0.72 1.37 0.69 12 8 1 H (A) 3.74 2.49 2.75 2.67 2.45 12 H (B) 1.52 1.60 1. 78 1.95 1.73 1 2 H (C) 1.41 1.23 1.43 1.61 1.39 12 8 2 H (B) 3.73 5.00 4.88 4.44 4.70 12 H (C) 3.97 4.92 4.60 4.38 4.61 12 H (A) 6.31 6.87 6.37 6.67 7.02 12 H (CH 3 ) 3.34 2.76 2.85 3.01 2.79 12 8 3 (CH 3 ) 3 N H 2.03 1.24 1.64 2.30 1.66 12 8 4 (C 2 H 5 ) 3 N H (CH 3 ) 0.90 0.60 0.73 1.10 0.62 12 H (CH 2 ) 2.39 2.05 2.44 2.58 2.19 12

PAGE 113

113 Table 4 10 Continued # Molecule atom Exp. MNDO NMR AM1 NMR AM1 NMR H AM1 NMR C Ref 85 H (A) 1.42 1.41 1.53 2.47 2.19 12 H (B) 1.17 1.46 1.66 2.09 1.78 12 86 H (eq.) (CH 2 ) 3.44 2.75 3.27 3.33 2.83 12 H (ax.) (CH 2 ) 2.58 2.18 2.84 3.87 3.21 12 H (CH 3 ) 2.11 1.60 1.90 2.12 1.48 12 87 H (A) 1.89 2.77 2.72 2.48 2.22 12 H (C) 3.18 4.68 4.62 4.10 4.01 12 88 H (A) 1.74 2.05 2.14 2.31 1.89 12 H (C) 5.17 6.47 6.27 5.33 5.81 12 89 H (A) 1.94 2.19 2.31 2.47 1.89 12 H (C) 3.04 3.74 3.87 3.70 3.24 12 H (E) 3.52 3.62 3.52 3.24 3.13 12 90 H (A) 1.70 2.06 2.16 2.36 1.92 12 H (C) 4.75 6.29 6.06 5.24 5.61 12 H (E) 3.52 3.51 3.40 3.21 3.10 12 9 1 C 2 H 5 CN H (CH 2 ) 2.00 2.47 2.55 3.54 3.40 12 H (CH 3 ) 1.08 0.98 1.01 1.71 1.33 12 9 2 CH 3 NO 2 H 3.91 4.14 4.47 3.96 3.49 12 9 3 H 3.36 3.87 3.72 3.69 3.75 12 94 H 4.45 3.64 3.49 3.49 3.57 12

PAGE 114

114 CHAPTER 5 1 9F NMR PARAMETERIZATION OF MNDO FO R BIOL OGICALLY RELEVANT SYSTEMS 5.1 Introduction Fluorinated compounds with impressive biological activity have drawn significant attention to the important role that fluorine plays in the world of biochemistry.18 The utility of fluorine (19F) NMR in the study of such biologica l compounds has been discussed in detail in previous publications.18,140 143 A few advantages that fluorine NMR provides relative to the more common 1H and 13C NMR methods derive from the natural properties of the f luorine atom. 19F is a spin nucleus and is 100% abundant, making it highly amenable to NMR studies. In biological systems the fluorine atom serves as a relatively small steric footprint. In many instances it can replace a hydrogen a tom in a ligand with influence on an event of molecular recognition, and on the subsequent biological response which can range from minimal to beneficial.18 The fluorine nucleus can be incorp orated into proteins and othe r biological compounds by routine methods including chemical synthesis and biosynthesis using a living organism.18 The 19F nucleus is highly sensitive to its chemical environment because of the lone pairs of electrons it possesses. Consequently, in protein systems 19F spin labeled amino acids provide a useful probe for determining conformational change in a specific area of a protein.144 Given that endogenous fluorine is found in biological systems quite infrequently,145 the NMR spectra can typically be attributed solely to 19F labeling; the result ing spectra contain more distinct signals and are easier to interpret than proton NMR. The simplified 19F NMR spectra can provide valuable information about protein -drug complexes, thereby aiding in the characterization of potential lead compounds at a reduced opportunity cost Because of the positive effects of fluorine labeling and the additional value of the fluorine atom as a critical component of several highly effective drug compounds currently on the market,146,147 NMR

PAGE 115

115 screening monitoring the 19F nucleus of fluorine containing ligands as selective markers has gained popularity.148151 The ideal method of calculating fluorine chemi cal shifts in the biological environment should be versatile and sufficiently rapid to enable it to be applied to large systems. Empirical models used to estimate fluorine chemical shifts for protein have been well established.152154 Modern QM approaches offer several advantages over empirical or classically based approaches to the prediction of NMR chemical shifts. A few of the main advantages were previously outlined. The NMR calculations are carried out using the FPT-GIAO approach discussed in the methods section. This method has previously been implemented using NMR-specific parameters developed for 1H, 13C, 15N and 17O, and has been shown to give fast and accurate results that can be applied to large protein -ligand complexes.155,156 Here we expand the scope of our approach to handle 19F chemical shift calculations. 5.2 Methods 5.2.1 Parameterization The syste matic underestimation of excitation energies using the standard MNDO parameters results in an overestimation of the variation of the paramagnetic contribution to the NMR chemical shifts.12 Patchkovskii and Thi el12 have generated NMR -specific MNDO parameters resulting in significant improvement in the agreement between experimental and calculated NMR chemical shifts for 1H, 13C, 15N and 17O. In the same work, the authors also provided a full explanation of the choice of parameters to be optimized. Our method for the fast semiempirical QM NMR calculations using these parameters has previously been published.156 Here the first addition to this set of MNDO-NMR parameters is presented.

PAGE 116

116 Chemical shifts are calculated as the difference between the calculated shielding constant and some re ference value as illustrated in Equation 5-1. "calculated=#reference$#calculated (5-1) Previously, the MNDO-NMR parameters for H, C, N and O were generated by minimizing the difference between the experimental and calculated chemical shifts via the goal function sho wn in Equation 5-2. G = WX X = H, C, N, O"* 1 NX (#i X+$i X%$ref X)2 i= 1 NX" + P (5-2) Because the parameters were generated simultaneously for all four atoms, a weighting factor, WX, was used to ensure that the difference in chemical shift ranges for each atom X was taken into account. In t his equation, NX is the number of chemical shifts for atom X. The reference shielding constants, "ref X were values chosen to minimize the overall RMS deviations from experiment for each atom, and "i X and "i X are the experimental chemical shifts and calculated absolute shielding respectively. This equation includes a penalty function, shown in Equation 5-3, which was introduced in order to avoid searching in areas of parametric space that would caus e large deviations from the known pro perties of the single atom. This penalty function specifically prevents the calculated energy, ei, of a single atom from straying too far from the atoms experimental energy, Ei. In the current method, the one center/ one electron terms were kept at the standard values, resulting in no variation of the atomic energies, thereby rendering the penalty function unnecessary. P = wX X = H C N O"* ( Ei X# ei X)2 Ei$ Emax" (5-3) In the present parameterization, the !ref value used in Equation 5-1 was initially set to the 19F shielding constant that was calculated for CFCl3 for each new set of parameters tested. This

PAGE 117

117 was done in an effort to be more consistent with experimental stu dies, which often use CFCl3 as the standard. However, in many cases parameters performed well for most other compounds but did not perform as well for CFCl3. In these instances the large RMS error was a false indication that the parameters yielded poor res ults for all compounds, when in fact a more appro priate !ref was needed (cf. Tables 5-2 and 5-3). This problem was addressed by calculating the average signed error (see Equation 5-4) between the experimental chemical shifts and the calculated shielding constants and using this average signed error as th e !ref value. Average Signed Error ="exp #$calc % & ( ) + N (5-4) For a given set of parameters, this method essentially sets the average signed error to zero and minimizes the RMS error for the data set. The scoring function then optimized the parameters by minimizing the RMS error between experimental and resulting calculated chemical shifts via Equation 5-5. All reported chemical shifts are given using the !ref value chosen to minimize the average signed and RMS error. G = 1 N ("i+#i$#ref)2 i= 1 N% (5-5) Since the goal of this p roject was to provide parameters that could be used for large biological systems such as ligands bound to proteins semiempirical geometries were used in the parameterization because the large size of proteins and other biomolecules precludes the use of structures obtained via higher -level calculations. Although our current procedure for NMR chemical shift calculations uses the MNDO Hamiltonian, geometry optimization was carried out at the AM117 and PM3157 levels because they are better able to capture some important geometric features, including hydrogen bonding, and are consequently more likely to be used for biological systems.17,157 As discussed below, our results indicate that the choice of Hamiltonian for the

PAGE 118

118 geometry optimization has minimal effect on the quality of the NMR calculation. It is important to note that in all instances the calculations using new parameters are single point NMR calculations using the standard geometries noted ( AM1, PM3, B3LYP158,159), since the new parameters are 19F-NMR specific and have not been tested for their ability to provide realistic geometries. In addition, the starting point for the parameterization incorporated paramet ers developed by Patchkovskii and Thiel for C H, N, and O and standard MNDO parameters for all other atoms.13,78,160 Changes were made only to the fluorine parameters. Gauge Including Atomic Orbitals were used for all calculations.10,41 The results obtained using the MNDO -NMR 19F parameters are compared to those of DFT calculations performed at the B3LYP level. This comparison required the selection of the most appropriate basis set for the DFT calculations. There have been several investigations into the optimal geometries and basis sets for accurate NMR chemical shift calculations at the DFT level.161163 Recently, a study of 19F NMR chemical shift calculations highlighted the importance of diffuse functions to accurately reproduce ex perimental values.163 The use of diffuse functions for 19F NMR calculations has yielded calculated values within ~10 ppm of experimental results, which is acceptable accuracy for a theoretical chemical shift range of ~500 ppm (~2%). The MNDO-NMR results are therefore compared to DFT calculations performed at the B3LYP/631G++(d,p)//B3LYP/6-31G(d,p) level as the previously mentioned study showed this to have the best agreement with experimental 19F chemical shifts for a range of compounds. Our own tests on the complete data set of 100 compounds supported this choice. The DFT calculations were performed using the Gaussian03 program.164

PAGE 119

119 5.2.2 Experimental Data The data set comprised a training set of 81 compounds and a test set of 19 compounds containing 100 and 23 unique 19F chemical shifts re spectively. The structures are shown in Table 5-4. The test set was chosen to represent each class of chemical environments (as they are grouped in Table 5-4.) While the goal was to ensure the robustness of the model, it seemed counterproductive to deliber ately exclude the data from a particular chemical environment for the sole purpose of observing the predictability of the method. Therefore, all chemical environments in the test set were closely represented in the training set. The data set is diverse in nature and contains C, H, N, O, F, Cl, Br, and S atoms. The compounds contained a variety of small cyclic (3 to 8-membered rings), fused-cyclic aliphatic and aromatic functional groups, and a range of compounds from singly fluorinated to perfluorinated. Particular effort was made to include compounds that have characteristics relevant to the study of biological systems (drugs, amino acids and bases). A relevant factor here is that some molecules of interest were large with many degrees of freedom and woul d require a conformation search to determine the appropriate conformation(s) to be used. In these instances, an alternate molecule was used that contained the particular functional group of interest. Included in the data set are fluorinated derivatives of three amino acids : tyrosine (TYR), phenylalanine (PHE), and tryptophan (TRP). Additionally, steroids, sugars, pyrrole, pyrazole, imidazole, oxazole, and thiazole rings, as well as two DNA bases (cytosine, uracil), were incorporated into the set Excluded f rom the data set were the chemical shifts of fluorine atoms that were involved in bonds other than a single C -F bond. This was done because C-F is the most common bonding arrangement seen in the biological systems in which we are interested. Of the 100 com pounds included in the statistics, the experimental chemical shifts ranged from 267.9ppm to +33.24ppm, covering a span of roughly 300ppm. Figure 5-1 illustrates the

PAGE 120

120 distribution of the experimental chemical shifts for the combined training and test set of 100 compounds. Figure 5-1. Histogram of the distribution of experimental chemical shifts in the full data set (training set and test set of 123 unique chemical shifts for 100 compounds.) One problem that arose was the discovery of discrepancies among different reports of experimental chemical shifts for the same compounds. In many instances this was found even when the solvent was the same. It is known that subjectivity in the methods of interpreting and validating raw NMR data l imits its reproducibility.165 Fortunately many of the discrepancies were sufficiently small (<1 ppm well within the error of higher level calculat ions) to not have a significant effect on our results. As with the previous MNDO -NMR parameterization, the reference data were preferentially chosen from experiments run under conditions that minimize association and solvent effects.!Additionally, in most cases the most recently reported data were used as the reference, as they were calculated using higher resolution instruments. In Tables 5-4A through 5 4H the actual references for the values used are those listed first in column 8; subsequent

PAGE 121

121 references listed were in close agreement, (off by no more than 2ppm) and may not have been run using the solvent listed in column 7. 5.3 Results and Discussion The new fluorine parameters [%s, %p, $s, $p] are listed in Table 5-1. The chosen parameters were those tha t yielded the closest agreement with experiment while straying minimally from the original MNDO parameters. To reiterate, this is in an effort to preserve any physical meaning that the original parameters were d esigned to reproduce to the extent that this is possible without detracting from the correlation for the NMR chemical shifts. For con sistency, all chemical shifts listed were calculated using a !ref value that minimizes the average error for this data set. As mentioned previously, this procedure was also implemented in the first MNDO NMR parameterization by Patchkovskii and Thiel.13 The !ref values used in the implementatio n of Equation 5-1 are listed in Tables 5 -2 and 5-3. With the assumption that our training set was sufficiently large and diverse in this parameterization, these reported !ref values should be applicable to future calculations.

PAGE 122

122 Table 5 -1. Comparison of standard and NMR-optimized MNDO parameters Atom Parameter MNDO MNDO NMR H % s (a.u) 1.3319670 1.1782700 $ s (eV) 6.9890640 15.2092800 C % s (a.u) 1.7875370 1.6544500 % p (a.u) 1.7875370 1.6544500 $ s (eV) 1 8.9850440 18.5418100 $p (eV) 7.9341220 12.8114400 N % s (a.u) 2.2556140 2.0265800 % p (a.u) 2.2556140 2.0265800 $ s (eV) 20.4957580 23.531550 $p (eV) 20.4957580 23.531550 O % s (a.u) 2.6999050 2.2027400 % p (a.u) 2.6999050 2.2027400 $ s (eV) 32.6880820 19.8048500 $p (eV) 32.6880820 19.8048500 F % s (a.u) 2.8484870 4.98552600 % p (a.u) 2.8484870 3.75447400 $ s (eV) 48.2904660 32.70953400 $p (eV) 36.5085400 87.01227727 Values currently used for MNDO-NMR calculations i n Wang-Merz method; these values were parameterized by Patchkovskii et al.12 The final parameter set resulting from this work includes just 4 optimized parameters to be added to the 16 optimized MNDO -NMR parameters previously published for H C, N and O.12. The %p and $p were critical terms for obtaining good agreement with the experimental NMR data. The larger values of the Slater orbital exponent ( %s/p) terms indicate a preference for less diffuse orbitals. Because the fluorine parameters are being added to pre -existing parameters, little knowledge can be gained about the shortcomings of the MNDO method solely by examining the changes made for fluorine parameters in this instance. This is because it is difficult to discern whether the changes deemed necessary wo uld be to compensate for a shortcoming in the pre existing parameters, or one in the method itself. Although AM1 geometries were used in the parameterization, it is unlikely that the geometries used in the parameterization were a significant

PAGE 123

123 source of erro r since the new parameters yield very similar results when DFT geometries were used. As illustrated in Table 5-3, it is appare nt that the choice of geometry does not significantly impact the overall quality of fit between the experimental and calculated chemical shifts when the new parameters are used. This suggests that the new parameters can be used to calculate NMR chemical shifts for structures using either the AM1 or PM3 Hamiltonian, and that the more computationally expensive ab initio geometries are not necessary. This is important because our semiempirical QM approach for NMR calculations on large molecules will rely only on semiempirical optimized geometries and n ot on a more expensive method for geometry optimization. Unlike the chemical shift c alculations, the absolute shielding constant s do appear to be influenced by the geometry used. This can be inferred by the deviations in the !ref values used in the different instances. Because of this difference, when comparing results for different molecules it is of course advisable to use geometries that were optimized using the same level of theory.

PAGE 124

124 Table 5 -2. Comparison of errors associated with each method for 100 compounds using the shielding constant calculated for CFCl3 as the !ref value in Equation 5-1. NMR Method Geometry B3LYP / 6 31++G(d,p) MNDO MNDO NMR AM1 16 .42 92.41 90.21 PM3 22.51 93.72 92.90 RMS Error B3LYP /6 31G(d,p) 13.79 94.29 89.47 AM1 0.94 0.38 0.94 PM3 0.95 0.36 0.94 R 2 B3LYP /6 31G(d,p) 0.96 0.33 0.93 AM1 6.81 81.78 89.14 PM3 15.04 82.83 91.94 Average Signed Error B3LYP /6 31G(d,p) 8.26 8 2.96 88.23 AM1 12.79 81.92 89.14 PM3 19.82 83.95 91.94 Average Unsigned Error B3LYP /6 31G(d,p) 10.59 83.12 88.23 AM1 172.29 69.07 466.66 PM3 188.91 75.10 470.65 Reference ( ref ) Value B3LYP /6 31G(d,p) 179.19 74.68 464.54

PAGE 125

125 Table 5 -3. Comparison of errors associated with each method for 100 compounds using the average signed error as the !ref value i n Equation 5-1. NMR Method Geometry B3LYP / 6 31++G(d,p) MNDO MNDO NMR AM1 14.95 43.03 13.85 PM3 16.75 43.85 13.35 RMS Error B3LYP /6 31G(d,p) 10.45 44.80 14.80 AM1 0.94 0.38 0.94 PM3 0.95 0.36 0.94 R 2 B3LYP /6 31G(d,p) 0.96 0.33 0.93 AM1 0.00 0.00 0.00 PM3 0.00 0.00 0.00 Average Signed Error B3LYP /6 31G(d,p) 0.00 0.00 0.00 AM1 11.77 35.90 10.73 PM3 12.96 36.69 10.34 Average Unsigned Error B3LYP /6 31G(d,p) 6.75 37.65 11.38 AM1 178.93 12.71 377.51 PM3 173.87 7.73 378.77 Reference ( ref ) Value B3LYP /6 31G(d,p) 187.57 8.28 376.31 The 19F chemical shifts of our data set (aimed at capturing many 19F environments that are of biological interest) ranged from -267.9ppm to +33.24ppm. Within this range, calculations using the new parameters on the complete data set of 100 compounds yielded an average RMS error of 13.85 ppm which is just below 5% of the total range of chemical shifts used in this data set. Furthermore, these values yiel ded an R2 value of 0.94. The new parameters performed particularly well for some fluorinated derivatives of DNA bases and amino acids including: uracil, cytosine, phenylalanine, glycine, and tyrosine. For some amino acids that are not included in the data set the functional groups of interest are represented and accurately calculated using our new parameters. For example, the parameters do well for prediction of the chemical shifts of fluorine at various positions on indole rings, the functional group prese nt in tryptophan.

PAGE 126

126 The complete list of errors, including average signed and unsigned errors calculated using CFCl3 and the signed error as the !ref values, is given in Tables 5 -2 and 5-3. For the training set of 100 19F chemical shifts, the optimized parameters yielded an RMS error of 13.47 ppm and an R2 value of 0.95. The results were very promising for the test set o f 23 19F chemical shifts, yielding an RMS error of 13.61 and an R2 of 0.90. Tables 5-4A-H contain the experimental chemical shifts and those calculated using the new parameters. The tables are grouped by chemical functionality so as to facilitate analysis of our results, and the errors associated with each group of compounds are given at the bottom of each table. This method of examining the errors by groups enables us to highlight some of the areas in which the parameters yield favorable results. In parti cular, the benzene derivatives and the 5 membered heterocycles show good agreement with experiment. This appears to indicate that the chemical shifts of aromatic systems are well described using the new parameters. For comparison purposes, the chemical shifts calculated at the B3LYP level and at the MNDO level using the MNDO-NMR parameters with the original or old fluorine parameters are also shown in Table 5-4. The compounds included in the test set are denoted with the dagger, symbol in the Table 5-4 series; all other compounds were included in the training set Since the parameters were optimized using AM1 geometries, the NMR data listed in Table 5-4 were calculated using AM1 geometries for the semiempirical NMR calculations and B3LYP geometries for the DFT NMR calculations. When compared with the original MNDO -NMR parameters, the final parameters show significant improvement in agreement wi th experimental results for our data set. Figure 5-2 illustrates via box plot the magnitude of the error that resulted from calculations using our new parameters versus those using the original fluorine parameters and the DFT calculations. The

PAGE 127

127 extreme outliers (marked by asterisks) were greater than 3*IQR (interquartile range) above the value of the third quartil e. These outliers are discussed below, along with possible areas in which the new method may be limited. The results are significantly better than those using the standard fluorine parameters and compare very well to DFT results calculated at the B3LYP-GIAO/631++G(d,p) level for our data set. As shown in Figu re 5 -2, seventy -five percent of the calculations using our new parameters yielded an absolute error less than 13.8ppm, as compared to the standard fluorine parameters (56.4ppm), and DFT (7.4ppm). To re iterate this basis set has been shown to provide the closest agreement with experiments for 19F NMR chemical shift calculations using the B3LYP -GIAO method.163 The correlation between the experimental and theoretical chemical shifts usi ng the new fluorine parameters is shown in Figure 5-3. Mild outliers are highlighted chemical environments in which the deviations between the calculated and experimental chemical shifts exceeded 27.2ppm (1.5*IQR above the third quartile). There were three main conditions under which use of the new parameters resulted in chemical shifts that were not in close agreement with the experimental results. These are summarized in the paragraphs below.

PAGE 128

128 Figure 5-2. Box and whiskers plot of the average unsigned error for 19F NMR chemical shifts calculated by I. B3LYP//B3LYP, II. MNDO -New 19F Parameters//AM1 and III. MNDO -Standard Fluorine Parameters//AM1. All MNDO calculations were performed using NMR-specific parameters for C,H,N,O given in the forth column of Table 5-1. The first difficult chemical environment involved systems in which the fluorine was bonded to an alpha carbon adjacent to a carbon involved in multiple bonds with a heteroatom. The alpha carbon problem is also seen in t he B3LYP calculations accounting for all of the outliers (compounds 30,31,32,60 and 73 in Table 5-4.) This may be due to an effect of the solvent forming hydrogen bonds with the heteroatom as well as the fluorine atom, and deshielding the 19F nucleus to a greater extent than can be accounted for by the in vacuo calculations.

PAGE 129

129 Figure 5-3. Correlation between experimental and calculated (MNDO -NMR//AM1) 19F chemical shifts using the newly generated fluorine parameters. Compounds with chemical shifts that deviate from experiment by greater than 27.2ppm are indicated by the compound number used in Table 5-4. Secondly, rather large errors were noted in CH3F and CF4, where the fluorine atoms were too deshielded by ~30ppm. In these very s mall systems it is possible that the contributions of the core electrons are more significant than in the larger systems. Systematic problems of the core electrons were addressed by using a chosen !ref value. However, a different !ref value may be more appropriate for these particular systems, as the core electrons may be playing a more important role. In the previous MNDO-NMR parameterization it was found that different !ref values were required to capture the relative chemical shifts for 1H under differe nt circumstances as well (in this instance hydrogen atoms involved in N -H, C-H and O-H bonds used unique !ref values) Since this problem occurred in so few cases in our data set, there were not sufficient data points to determine a more appropriate !ref value to use in these cases. These errors may stem from insufficient flexibility in the single zeta basis sets being used in the MNDO approximation. The source of this error may have also been the source of larger errors found in some small ring

PAGE 130

130 systems cro wded with polar atoms. Polarization functions may be necessary in order to simultaneously describe the shielding felt by the fluorine atom in these and other systems. Thirdly, important large errors were also found for sev eral compounds with chlorine or sulfur atoms. In this case the errors may be a result of the standard MNDO parameters for these atoms, possibly making the electron density of chlorine and sulfur too diffuse, and consequently excessively shielding the fluorine atom Because of the large de viation seen in the calculation of CFCl3, (too shielded by ~90ppm) it was not appropriate to use this as the reference value. 5.4 Conclusions The addition of NMR-specific MNDO parameters for fluorine has now extended the semiempirical QM NMR -based methodo logy for qualitative (to near quantitative) description of chemical shifts. Our approach can be employed using semiempirical ( AM1/PM3) geometries with good accuracy, and can be executed at a fraction of the cost of ab initio methods, thereby assisting in t he computational studies of 19F NMR for much larger systems including DNA and protein -ligand complexes The results are comparable to calculations performed at the B3LYPGIAO level for our data set. There is a marked improvement over the standard MNDO parameters in the agreement between experimental and calculated chemical shifts for our data set. This improvement in the correlation is clearly indicated by the increased R2 value from 0.38 to 0.94 using the standard and new MNDO fluorine parameters respectiv ely. Our parameters work well for a variety of functional groups frequently found in biological molecules where 19F is present. Furthermore, the results seen in the test set are promising for the extendability of the method. Although the parameters were not generated by a method that allows us to state conclusively that they correspond to a global minimum, the genetic algorithm has allowed us to search a broad range of parameter space and there has been no evidence suggesting that a significantly lower min imum exists.

PAGE 131

131 Halogens present a peculiar problem for the semiempirical methods in the Neglect of Diatomic Differential Overlap ( NDDO ) family of approximations in that certain calculations may require polarization or multi -zeta representations that the min imal basis sets do not offer. As discussed previously, this inflexibility appears to prevent us from capturing the relative chemical shifts for compounds including CFCl3. For this reason we have found that the most effective method of obtaining qualitative accuracy for the 19F NMR chemical shifts is through the use of an optimized reference value and not through the use of the fluorine chemical shift of CFCl3 as the reference. When CFCl3 is used as the reference value, large signed errors are found in spite of the high correlations, as illustrated by the R2 values. The limitations of these parameters that we have found have been outlined in the results section. These limitations include but are not limited to, the fact that the parameters were generated fo r fluorine atoms involved in C -F bonds only. Nonetheless, the present parameters allow one to study the effect of the environment on 19F chemical shifts that are seen in many biological systems. Application of the new 19F parameters to biological problems will be illustrated in future publications.84

PAGE 132

132 Table 5-4. Comparison of NMR chemical shifts (ppm) grouped to facilitate analysis. &ref values used were optimized to minimize the average sig ned error for the complete data set of 100 compounds and are given in Table 3. denotes compounds used in test set Fluorinated aliphatic chains are presented below. Structure Exp. DFT MNDO MNDO NMR Medium Ref 1 267.90 271.75 197.86 231.38 CH 2 Cl 2 166 168 2 219.02 220.76 186.13 198.94 CFCl 3 169,170 3 211.50 217.43 187.28 197.75 CH 2 Cl 2 166,171 4 143.40 143.09 168.78 140.72 CFCl 3 172,173 5 164.00 176.82 178.27 175.06 CFCl 3 166,168, 174 6 78.60 78.84 146.23 71.17 Neat 173,175, 176 7 63.50 67.14 134.81 65.89 CCl 4 177 179 8 a. 74.60 b. 189.20 a. 71.33 b. 186.82 a. 126.48 b. 139.85 a. 78.00 b. 205.43 Not Reported 180,181 9 64.60 61.96 134.42 35.61 Not Reported 173,175 R 2 0.99 0.74 0.95 RMSE 5.04 52.21 17.98 Signed 1.96 12.49 7.64 Unsigned 3.68 47.82 14.25

PAGE 133

133 Table 5-5. Chains containing heteroatoms Structure Exp. DFT MNDO MNDO NMR Medium Ref 10 226.00 249.70 189.67 207.57 CFCl 3 182 11 130.42 128.97 152.99 138.25 Not Reported 183 12 a. 83.60 a. 82.63 a. 131.79 a. 80.14 CDCl 3 184 1 3 76.57 74.45 136.09 78.92 CD 2 Cl 2 185, 186 14 74.21 76.38 139.12 77.90 DMSO 187, 188 15 232.30 233.89 179.73 210.83 Not reported 189 16 a. 125.20 a. 123.72 a. 131.98 a. 136.45 CFCl 3 190 17 55.80 55.12 122.43 66.46 Not reported 191 18 a. 23.80 a. 36.92 a. 32.01 a. 9.92 Not Reported 192 19 33.24 46.76 26.06 14.60 CFCl 3 151 20 68.00 68.17 126.51 69.10 Not Reported 183 R 2 0.99 0.83 0.98 RMSE 10.67 53.43 16.04 Signed 2.94 30.39 0.06 Unsigned 7.28 49.96 12.00

PAGE 134

134 Table 5-6. Aliphatic ring s Structure Exp. DFT MNDO MNDO NMR Medium Ref 21 165.81 177.74 177.81 176.43 CFCl 3 193 166, 194 22 184.47 194.05 178.16 176.70 CFCl 3 193 166, 194 23 135.15 133.93 133.05 132.94 Not Reported 195, 196 24 132.90 129.64 130.60 138.64 Not Reported 195 25 142.30 146.72 159.62 133.28 Not Reported 197 199 26 197.50 191.42 180.14 180.14 CFCl 3 200 27 178.50 195.46 180.01 179.64 CFCl 3 200 28 196.10 196.82 174.40 179.55 CFCl 3 200 29 180.10 196.44 178.99 182.61 CFCl 3 200 30 194.00 159.07 155.95 1 59.25 CFCl 3 201 31 191.00 152.65 161.29 167.42 CFCl 3 201 32 186.00 154.56 161.87 168.27 CFCl 3 201

PAGE 135

135 Table 5 6 Continued 33 a. 164.08 a. 160.04 a. 130.78 a. 173.23 CFCl 3 183, 202 34 a. 167.87 a. 160.06 a. 130.24 a. 175.56 CFCl 3 183, 202 35 a. 108.86 b. 107.06 a. 123.21 b. 115. 49 a. 127.85 b. 123.87 a. 107.88 b. 109.42 CFCl 3 183, 202 R 2 0.59 0.56 0.81 RMSE 17.48 21.42 13.93 Signed 2.75 9.13 5.65 Unsigned 13.11 17.46 10.59

PAGE 136

136 Table 5-7. Non-aromatic double bonds Structure Exp. DFT MNDO MNDO NMR Medium Ref 36 113.00 108.44 119.90 123.74 Not Reported 171 37 81.30 78.3 0 110.23 70.51 Not Reported 171 38 a. 205.00 b. 100.00 c. 1 26.00 a. 197.30 b. 93.36 c. 122.71 a. 135.07 b. 106.82 c. 121.74 a. 207.37 b. 87.59 c. 133.88 Not Reported 171, 203 39 134.00 126.63 118.71 139.69 Not Reported 171, 204 40 a. 150.00 b. 117.80 c. 130.10 a. 142.83 b. 115.64 c. 127.90 a. 89.61 b. 135.50 c. 134.62 a. 128.35 b. 124.16 c. 136.25 Not Report ed 195, 205 41 a. 111.50 b. 109.10 a. 97.07 b. 104.83 a. 129.39 b. 75.06 a. 114.50 b. 122.54 Not Report ed 206 42 a. 108.4 0 a. 107.56 a. 138.06 a. 119.44 Not Reported 207 43 a. 114.10 a. 109.20 a. 133.84 a. 125.53 Not Reported 207 44 b. 120.20 b. 118.04 b. 133.73 b. 125.45 Not Reported 206 45 a. 114.50 a. 111.25 a. 131.99 a. 126.73 CFCl 3 208 R 2 0.99 0.02 0.87 RMSE 5.92 30.07 10.49 Signed 4.93 1.38 3.38 Unsigned 4.93 23.48 9.36

PAGE 137

137 Table 5-8. Bicyclic compounds Structure Exp. DFT MNDO MNDO N MR Medium Ref 46 a. 190.40 b. 163.30 a. 208.38 b. 171.05 a. 168.40 b. 161.96 a. 187.69 b. 171.91 Not Reported 209 47 a. 200.50 b. 196.30 a. 198.15 b. 190.40 a. 171.60 b. 170.04 a. 182.73 b. 179.71 Not Rep orted 209 48 177.60 195.82 177.35 190.16 CFCl 3 200 49 188.70 195.10 174.68 184.63 CFCl 3 200 50 a. 146.30 a. 156.70 a. 162.48 a. 158.24 Not Reported 209 51 57.20 51.05 131.44 96.43 Nea t 210 52 a. 124.70 b. 146.10 c. 166.70 a. 126.28 b. 144.22 c. 169.47 a. 111.70 b. 115.02 c. 125.71 a. 130.02 b. 138.90 c. 155.24 CCl 4 211 53 a. 149.40 b. 166.50 c. 163.10 d. 170.30 a. 147.39 b. 163.81 c. 164.07 d. 167.60 a. 110.74 b. 113.90 c. 121.74 d. 116.54 a. 145.89 b. 155.13 c. 157.28 d. 156.50 CCl 4 212 54 16.6 0 12.26 67.12 13.16 CCl 4 213 R 2 0.98 0.49 0..92 R MSE 7.15 37.52 14.11 Signed 2.01 14.35 1.62 Unsigned 5.51 31.96 11.33

PAGE 138

138 Table 5-9. Five member heterocycles Structure Exp. DFT MNDO MNDO NMR Medium Ref 55 133.70 127.33 95.22 135.91 CDCl 3 214 56 136.60 141.45 114.43 130.56 CDCl 3 214 57 131.30 137.82 113.34 130.17 CDCl 3 214 58 a. 149.60 b. 156.70 a. 152.47 b. 147.59 a. 109.94 b. 92.30 a. 135.35 b. 146.93 CDCl 3 214 59 142.70 148.61 124.50 142.15 CD 3 COCD 3 214 60 130.30 162.71 129.90 160.46 CDCl 3 214 61 118.00 120.32 96.36 120.72 CDCl 3 214 62 176.10 175.22 130.33 165.14 CDCl 3 214 63 a. 187.80 b. 130.10 a. 188.18 b. 129.62 a. 127.61 b. 89.46 a. 165.59 b. 124.45 CDCl 3 214 64 134.70 119.68 112.85 130.68 CDCl 3 214 65 a. 143.50 b. 189.50 a. 143.95 b. 190.54 a. 112.18 b. 127.50 a. 148.32 b. 170.00 CDCl 3 215 66 62.50 67.02 138.34 68.76 CD 3 COCD 3 216 67 a. 67.70 b. 64.40 c. 115.40 a. 63.80 b. 61.20 c. 113.19 a. 135.87 b. 135.14 c. 66.01 a. 77.38 b. 7 0.63 c. 109.87 Not reported 217

PAGE 139

139 Table 5-9 Continued 68 64.60 67.77 137.55 72.48 CDCl 3 218 69 a. 200.30 b. 80.10 a. 192.44 b. 80.78 a. 163.58 b. 137.30 a. 201.98 b. 73.29 CDCl 3 219 70 a. 139.10 b. 62.40 a. 131.85 b. 66.19 a. 74.66 b. 138.13 a. 134.11 b. 68.26 CDCl 3 220 71 a. 87.04 a. 84.84 a. 128.34 a. 122.81 CFCl 3 221, 222 72 84.50 76.07 69.65 97.11 CDCl 3 223 R 2 0.96 0.00 0.91 RMSE 8.40 49.47 12.86 Signed 0.56 7.58 0.58 Unsigned 5.33 44.46 9.49

PAGE 140

140 Table 5-10. 6-member heterocycles Structure Exp. DFT MNDO MNDO NMR Medium Ref 73 b. 113.10 b. 165.25 b. 110.96 b. 150.76 Not Reported 224 74 b. 157.9 0 b. 15 5.84 b. 107.93 b. 145.81 Not Reported 224 75 a. 137.80 b. 152.40 a. 140.88 b. 147.14 a. 119.78 b. 112.61 a. 155.50 b. 144.88 Not Reported 225 76 a. 135 .70 a. 145.85 a. 118.32 a. 151.12 Not Reported 225 77 a. 51.80 a. 40.69 a. 54.62 a. 46.44 CFCl 3 208 78 171.00 178.86 140.03 173.65 DMSO D6 226 79 170.19 165.13 124.39 171.77 C 6 D 6 227, 228 80 203.00 210.08 179.10 195.16 CDCl 3 229 R 2 0.84 0.84 0.86 RMSE 18.63 30.48 15.89 Signed 6.31 25.04 4.69 Unsigned 11.53 25.67 11.98

PAGE 141

141 Table 5-11. Benzene Derivatives Structure Exp. DFT MNDO MNDO NMR Me dium Ref 81 164.90 159.73 102.81 150.44 Not Reported 230, 231 82 113.41 114.75 117.01 123.39 CDCl 3 232 233 83 119.67 121.23 114.67 126.39 CCl 4 233 84 106.46 108.42 111.68 120.47 Not Reported 183 85 105.94 106.58 110.92 120.22 Not Reported 183 86 a. 160.10 a. 155.97 a. 115.61 a. 153.15 CFCl 3 234 87 115.8 0 118.53 118.83 125.27 D 2 O 235 88 b. 159.30 b. 156.03 b. 110.85 b. 156.20 CFCl 3 234 89 75.02 76.70 139.34 77.35 CH 2 Cl 2 187

PAGE 142

142 Table 5 11 Continued 90 73.85 76.99 138.77 77.18 Not reported 187 91 58.40 62.93 139.12 63.99 Not reported 183 92 43.50 47.24 135.54 52.60 C 6 F 6 236, 237 93 18.20 10.89 57.39 2.24 Not reported 204 94 16.80 12.09 65.49 10.96 Not Reported 238 95 140.30 150.30 129.19 143.77 Not Reported 239 96 207.00 192.96 181. 50 191.20 Neat 240 233 97 a. 183.35 a. 185.32 a. 147.38 a. 188.35 Not Reported 183 98 64.00 67.86 139.69 63.95 CH 2 Cl 2 166, 241

PAGE 143

143 Table 5 11 Continued 99 89.20 100.43 152.26 103.45 Not Reported 207 166 100 a. 160.70 d. 148.00 a. 167.16 d. 136.68 a. 124.93 d. 92.14 a. 167.75 d. 138.45 CD 3 COCD 3 242 R 2 0.99 0.20 0.98 RMSE 6 .19 51.30 9.64 Signed 0.93 11.86 3.08 Unsigned 4.99 43.00 8.39

PAGE 144

144 CHAPTER 6 CONCLUSIONS This dissertation has generally served to investigate the applicability of semiempirical QM methods to study NMR chemical shifts in biomolecules. The performa nce of semiempirical methods was examined because the routine application of higher levels of theory on large systems is not currently feasible. The research here focuses mainly on the study of protein systems with several hundreds of atoms. Because geomet ry optimization is an important and frequently employed step prior to the evaluation of NMR chemical shifts, a study of the effects of geometry optimization upon protein structures was also performed. In the first study, the semiempirical QM Hamiltonians, AM1 and PM3, were evaluated for their ability to accurately model protein structures. Geometry optimizations were performed on a collection of globular protein systems with a variety of secondary structures and sizes of up to 99 amino acid residues. These unbound crystal structures had an average resolution of 1.91. In order to perform the geometry optimizations on these large systems, the semiempirical calculations were carried out using the linear -scaling divide -and-conquer algorithm in our DivCon program.78 We present a detailed analysis of the structures that includes an investigation of the effect of performing geom etry optimization in vacuum versus utilizing an implicit Poisson-Boltzman solvation model. Important artifacts observed included an inability of the methods to maintain planarity for the planar side chains of several amino acids. Furthermore, the inability of the vacuum minimized structures to mask the charge -charge interactions resulted in several unphysical artifacts including proton transfer from positively to negatively charged groups. In this study it was found that the frequency with which the most si gnificant unphysical artifacts occurred could be minimized through the inclusion of implicit solvation during the geometry optimization.

PAGE 145

145 In the second study t he semiempirical description of NMR chemical shift s was implemented at the AM1 level with NMR -specific parameters in order to reproduce experimental 1H and 13C NMR chemical shifts in protein systems The methodology adopted here is formally the same as that of the previously published finite perturbation theory GIAO -MNDO -NMR approach .77 Protein-specific NMR parameters were developed on a training set that comprised five globular protein systems with varied secondary structure s and a range in size from 46 -61 amino acid residues. A separate set of parameters was developed using a training set of small organic compounds with an emphasis on functional groups that are relevant to biological studies. This approach can be employed using semiempirical (AM1) geometrie s. Analysis carried out on 3,340 1H and 2,233 13C chemical shifts for protein systems show ed significant improvement over the standard AM1 parameters, reducing the rms errors from 1.05 and 21.28ppm to 0.62 and 4.83ppm for hydrogen and carbon respectively. In the final study the semiempirical MNDO methodology for qualitative description NMR chemical shifts was extended with the addition of NMR -specific parameters for the fluorine atom. This approach can be employed using semiempirical ( AM1/PM3) geometries wi th good accuracy and can be executed at a fraction of the cost of ab initio and DFT methods, providing an attractive option for the computational studies of 19F NMR for much larger systems. Fluorine NMR has grown as a tool to study the structure and dynami cs biological systems due in part to the increasing value of fluorine in pharmaceutical applications. The data set used in the parameterization was large and diverse and specifically geared towards biologically relevant compounds. The new parameters are ap plicable to fluorine atoms involved in carbon -fluorine bonds. Using these parameters yield ed results comparable to NMR calculations performed at the DFT ( B3LYP) level using the 6-31++G(d,p) basis set. The average R2 and RMS Error for this

PAGE 146

146 data set is 0.94 and 13.85ppm respectively, compared to 0.96 and 10.45ppm when DFT methods are used. It has been shown that in addition to the choice of basis set and Hamiltonian used for the NMR calculation, the choice of geometry is also an important factor in the qualit y of the ab initio QM NMR calculations.161163 In the 19F study, it was seen that changing the geometry did not significantly impact the quality of semiempirical NMR calculations. The 1H and 13C calculations using the MNDO and AM1 Hamiltonian also showed very similar results. An expected improvement was seen with the AM1 Hamiltonian because the parameters were specifically designed to reproduce the chemical shifts in the protein systems. However, the improvement was small even after a highly focused parameterization. It appears that the limiting factor in the semiempirical evaluation of NMR chemical shifts is the use of a minimal basis set. For small molecules, it appears that the limitations introduced by the use of a minimal basis set approximation cannot be accounted for through parameterization to the degree that would enable this method to surpass the results obtained by DFT. While the semiempirical methods are able to differentiate well among atoms in different bonding situations, they are less capable of differentiating among atoms that vary only in their chemical environment by non bonding interactions. This is an important finding of the studies outlined in chapter 4 because only through a rigorous test such a s this could this shortcoming have been identified. One component of the strategy of the semiempirical QM approach was to compensate for the deficiencies in the Hartree -Fock approximations by using parameters that can implicitly account for electron correlation. However, the embedded minimal basis set removes one of the main advantages of ab initio QM methods the ability to systematically improve upon the results by the inclusion of more subtle effects via the use of more sophisticated basis sets.

PAGE 147

147 Practic ally, the semiempirical approximations are too large to be considered an overall improvement in accuracy relative to Hartree -Fock theory. The only way to determine whether the semiempirical methods can achieve more accurate results than other QM methods in the evaluation of any quantity of interest is to compare them directly with experiments. The advantage of the semiempirical approach over other QM methods is their speed, which for decades has enabled them to be applied to the study of much larger system s. In time, the speed of the semiempirical approaches will likely be less relevant because ab initio approaches will offer better accuracy for a majority of even the largest relevant systems. This has been the hope for several decades already, but to date has not been realized. R egardless of the advances that improve the speed of ab initio procedures, these semiempirical methods will always be able to perform calculations on much larger systems than the ab initio approaches. It is often assumed and hoped that through new algorithms and computational hardware ab initio methods will become sufficiently fast for the routine application to large systems. Until this is achieved, the improvement and evaluation of the currently available QM methods is of great valu e.

PAGE 148

14 8 LIST OF REFERENCES (1) Clore, G. M.; Gronenborn, A. M. Crit. Rev. Biochem. Mol. Biol. 1989, 24, 479. (2) Palmer, A. G. Chem. Rev. 2004, 104, 3623. (3) Spera, S.; Bax, A. J. Am. Chem. Soc. 1991, 113, 5490. (4) Ikura, M.; Spera, S.; Barbato, G.; Kay, L. E.; Krinks, M.; Bax, A. Biochemistry 1991, 30, 9216. (5) Vila, J. A.; Ripoll, D. R.; Scheraga, H. A. J. Phys. Chem. B 2007, 111, 6577. (6) Cavalli, A.; Salvatella, X.; Dobson, C. M.; Vendruscolo, M. Proceedings of the National Academy of Sciences of the United States of America 2007, 104, 9615. (7) Robustelli, P.; Cavalli, A.; Vendruscolo, M. Structure 2008, 16, 1764. (8) Shen, Y.; Lange, O.; Delaglio, F.; Rossi, P.; Aramini, J. M.; Liu, G. H.; Eletsky, A.; Wu, Y. B.; Singarapu, K. K.; Lemak, A.; Ignatchenko, A.; Arrowsmith, C. H.; Szyperski, T.; Montelione, G. T.; Baker, D.; Bax, A. Proceedings of the National Academy of Sciences of the United States of America 2008, 105, 4685. (9) Baran, M. C.; Huang, Y. J.; Moseley, H. N. B.; Montelione, G. T. Chem. Rev. 2004, 104, 3541. (10) Wolinski, K.; Hinton, J. F.; Pulay, P. J. Am. Chem. Soc. 1990, 112, 8251. (11) Weixiong, W.; You, X.; Dai, A. Sci. Sin., Ser. B (Engl. Ed.) 1988, 31, 1048. (12) Patchkovskii, S.; Thiel, W. J. Comput. Chem. 1999, 20, 1220. (13) Dewar, M. J. S.; Thiel, W. J. Am. Chem. Soc. 1977, 99, 4899. (14) Wang, B.; Merz Jr., K. M. J. Am. Chem. Soc. 2005, 127, 5310. (15) Wang, B.; Merz, K. M. J. Chem. Theory Comput. 2006, 2, 209. (16) Wang, B.; Brothers, E. N.; va n der Vaart, A.; Merz Jr., K. M. J. Chem. Phys. 2004, 120, 11392. (17) Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. P. J. Am. Chem. Soc. 1985, 107, 3902. (18) Gerig, J. T. Fluorine NMR ; Biophysical Society Biophysics Textbook, 2001. (19) Pople, J. A.; Santry, D. P.; Segal, G. A. J. Chem. Phys. 1965, 43, S129.

PAGE 149

149 (20) Raff, L. Principles of Physical Chemistry ; Prentice -Hall, Inc.: Upper Saddle River, NJ, 2001. (21) Wishart, D. S.; Case, D. A. Methods Enzymol. 2002, 338, 3. (22) Osapay, K.; Case, D. A. J. Am. Chem. Soc. 1991, 113, 9436. (23) Laws, D. D.; Dedios, A. C.; Oldfield, E. J. Biomol. NMR 1993, 3, 607. (24) Williamson, M. P.; Kikuchi, J.; Asakura, T. J. Mol. Biol. 1995, 247, 541. (25) Le, H. B.; Pearson, J. G.; Dedios, A. C.; Oldfield, E. J. Am. Chem. Soc. 1995, 117, 3800. (26) Beger, R. D.; Bolton, P. H. J. Biomol. NMR 1997, 10, 129. (27) Ramsey, N. F. Phys. Rev. 1950, 78, 699. (28) Ramsey, N. F. Phys. Rev. 1953, 91, 303. (29) Ruud, K.; Helgaker, T.; Kobayashi, R.; Jorgensen, P.; Bak, K. L.; Jensen, H. J. A. J. Chem. Phys. 1994, 100, 8178. (30) Helgaker, T.; Jaszunski, M.; Ruud, K. Chem. Rev. 1999, 99, 293. (31) Gauss, J.; Werner, H. J. Phys. Chem. Chem. Phys. 2000, 2, 2083. (32) Haser, M.; Ahlrichs, R.; Baron, H. P.; Weis, P.; Horn, H. Theor. Chim. Acta 1992, 83, 455. (33) Lee, A. M.; Bettens, R. P. A. J. Phys. Chem. A 2007, 111, 5111. (34) Gauss, J.; Stanton, J. F. J. Chem. Phys. 1995, 102, 251. (35) Gauss, J.; Stanton, J. F. J. Chem. Phys. 1995, 103, 3561. (36) Gauss, J.; Stanton, J. F. J. Chem. Phys. 1996, 104, 2574. (37) Kollwitz, M.; Gauss, J. Chem. Phys. Lett. 1996, 260, 639. (38) Kollwitz, M.; Haser, M.; Gauss, J. J. Chem. Phys. 1998, 108, 8295. (39) Stevens, R. M.; Lipscomb, W. N.; Pitzer, R. M. J. Chem. Phys. 1963, 38, 550. (40) London, F. J. Phys. Et Le Radium 1937, 8, 397. (41) Ditchfield, R. J. Chem. Phys. 1972, 56, 5688. (42) Wolinski, K.; Hinton, J. F.; Pulay, P. J. Am. Chem. Soc. 1990, 112, 8251.

PAGE 150

150 (43) Fukui, H.; Miura, K.; Yamazaki, H.; Nosaka, T. J. Chem. Phys. 1985, 82, 1410. (44) Ribas-Prado, F.; Giessnerprettre, C.; Daudey, J. P.; Pullman, A.; Hinton, J. F.; Young, G.; Harpool, D. J. Magn. Reson. 1980, 37, 431. (45) Chesnut, D. B.; Foley, C. K. Chem. Phys. Lett. 1985, 118, 316. (46) Gauss, J. Chem. Phys. Lett. 1992, 191, 614. (47) Gauss, J. J. Chem. Phys. 1993, 99, 3629. (48) Hohenberg, P.; Kohn, W. Phys. Rev. B 1964, 136, B864. (49) Kohn, W.; Sham, L. J. Phys. Rev. 1965, 140, 1133. (50) Malkin, V. G.; Malkina, O. L.; Salahub, D. R. Chem. Phys. Lett. 1993, 204, 80. (51) Malkin, V. G.; Malkina, O. L.; Salahub, D. R. Chem. Phys. Lett. 1993, 204, 87. (52) Malkin, V. G.; Malkina, O. L.; Casida, M. E.; Salahub, D. R. J. Am. Chem. Soc. 1994, 116, 5898. (53) Orendt, A. M.; Facelli, J. C.; Radzisze wski, J. G.; Grant, D. M.; Michl, J. J. Am. Chem. Soc. 1996, 118, 846. (54) Xu, X. P.; Case, D. A. J. Biomol. NMR 2001, 21, 321. (55) Neal, S.; Nip, A. M.; Zhang, H. Y.; Wishart, D. S. J. Biomol. NMR 2003, 26, 215. (56) Le, H. B.; Oldfield, E. J. Biomol NMR 1994, 4, 341. (57) Iwadate, M.; Asakura, T.; Williamson, M. P. J. Biomol. NMR 1999, 13, 199. (58) Slater, J. C. Phys. Rev. 1930, 36, 0057. (59) Boys, S. F. Proceedings of the Royal Society of London Series a -Mathematical and Physical Sciences 1950, 200, 542. (60) Leach, A. R. Molecular Modeling Principles and Applications 2nd ed.; Pearson Education Limited, 2001. (61) Roothaan, C. C. J. Rev. Mod. Phys. 1951, 23, 69. (62) Dewar, M. J. S. T., W. J. Am. Chem. Soc. 1976, 99, 4899. (63) Stewart, J. J. P. J. Comput. Chem. 1989, 10, 209. (64) Thiel, W.; Voityuk, A. A. Theor. Chim. Acta 1996, 93, 315.

PAGE 151

151 (65) Thiel, W.; Voityuk, A. A. J. Phys. Chem. 1996, 100, 616. (66) Thiel, W.; Voityuk, A. A. THEOCHEM 1994, 119, 141. (67) Thiel, W.; Voityuk, A. A. Int. J. Quantum Chem. 1992, 44, 807. (68) Thiel, W.; Voityuk, A. A. Theor. Chim. Acta 1992, 81, 391. (69) Voityuk, A. A.; Rosch, N. J. Phys. Chem. A 2000, 104, 4089. (70) Pople, J. A.; Mciver, J. W.; Ostlund, N. S. J. Chem. Phys. 1968, 49, 2960. (71) Yang, W. T.; Lee, T. S. J. Chem. Phys. 1995, 103, 5674. (72) Dixon, S. L.; Merz, K. M. J. Chem. Phys. 1996, 104, 6643. (73) Dixon, S. L.; Merz, K. M. J. Chem. Phys. 1997, 107, 879. (74) York, D. M.; Lee, T. S.; Yang, W. T. Chem. Phys. Lett. 1996, 263, 297. (75) Lee, T. S.; York, D. M.; Yang, W. T. J. Chem. Phys. 1996, 105, 2744. (76) Van der Vaart, A.; Gogonea, V.; Dixon, S. L.; Merz, K. M. J. Comput. Chem. 2000, 21, 1494. (77) Wang, B.; Brothers, E. N.; van der Vaart, A.; Merz, K. M. J. Chem. Phys. 2004, 120, 11392. (78) Wang, B.; Raha, K.; Liao, N.; Peters, M. B.; Kim, H.; Westerhoff, L. M.; Wollacott, A. M.; van der Vaart, A.; Gogonea, V.; Suarez, D.; Dixon, S. L.; Vincent, J. J.; Brothers, E. N.; Merz, K. M., Jr. DivCon. (79) Brothers, E. N.; Merz, K. M., Jr. J. Phys. Chem. B 2002, 106, 2779. (80) Rossi, I.; Truhlar, D. G. Chem. Phys. Lett. 1995, 233, 231. (81) Hutter, M. C.; Reimers, J. R.; Hush, N. S. J. Phys. Chem. B 1998, 102, 8080. (82) Duffy, E. M.; Jorgensen, W. L. J. Am. Chem. Soc. 2000, 122, 2878. (83) Gogonea, V.; Merz, K. M. J. Phys. Chem. A 1999, 103, 5171. (84) Stewart, J. J. P. J. Comput. Chem. 1989, 10, 221. (85) Wollacott, A. M.; Merz, K. M. J. Chem. Theory Comput. 2007, 3, 1609. (86) Raha, K.; Merz, K. M. J. Am. Chem. Soc. 2004, 126, 1020.

PAGE 152

152 (87) Raha, K.; Merz, K. M. Abstracts of Papers of the American Chemical Society 2005, 229, U761. (88) Rocha, G. B.; Freire, R. O.; Simas, A. M.; Stewart, J. J. P. J. Comput. Chem. 2006, 27, 1101. (89) Stewart, J. J. P. J. Mol. Model. 2007, 13, 1173. (90) Stewart, J. J. P. J. Mol. Model. 2008. (91) Shaffer, A. A.; Wierschke, S. G. J. Comput. Chem. 1993, 14, 75. (92) Feigel, M.; Strassner, T. THEOCHEM 1993, 102, 33. (93) Stewart, J. J. P. Semiempirical Molecular Orbital Methods. In Rev iews in Computational Chemistry; Lipkowitz, K. B., Boyd, D. B., Eds.; John Wiley & Sons, Inc., 1990; pp 45. (94) Ferguson, D. M.; Gould, I. R.; Glauser, W. A.; Schroeder, S.; Kollman, P. A. J. Comput. Chem. 1992, 13, 525. (95) Anh, N. T.; Frison, G.; Solladie -Cavallo, A.; Metzner, P. Tetrahedron 1998, 54, 12841. (96) Williams, D. E.; Peters, M. B.; Wang, B.; Merz, K. M. J. Phys. Chem. A 2008, 112, 8829. (97) Case, D. A.; Darden, T. A.; Cheatham, I., T.E.; Simmerling, C. L.; Wang, J.; Duke, R. E.; Luo, R.; Merz Jr., K. M.; Wang, B.; Pearlman, D. A.; Crowley, M.; Brozell, S.; Tsui, V.; Gohlke, H.; Mongan, J.; Hornak, V.; Cui, G.; Beroza, P.; Schameister, C.; Caldwell, J. W.; Ross, W. S.; Kollman, P. A. AMBER 8.0 2004. (98) Weiner, S. J.; Kollman, P. A.; Nguyen, D. T.; Case, D. A. J. Comput. Chem. 1986, 7, 230. (99) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J. Am. Chem. Soc. 1995, 117, 5179. (100) Morris, A. L.; Macarthur, M. W.; Hutchinson, E. G.; Thornton, J. M. Proteins 1992, 12, 345. (101) Laskowski, R. A.; Macarthur, M. W.; Moss, D. S.; Thornton, J. M. J. Appl. Crystallogr. 1993, 26, 283. (102) Willard, L.; Ranjan, A.; Zhang, H. Y.; Monzavi, H.; Boyko, R. F.; Sykes, B. D.; Wishart, D. S. Nucleic Acids Res. 2003, 31, 3316. (103) Dunbrack, R. L.; Karplus, M. Nat. Struct. Biol. 1994, 1, 334. (104) Repasky, M. P.; Chandrasekhar, J.; Jorgensen, W. L. J. Comput. Chem. 2002, 23, 1601. (105) Li, J. B.; Zhu, T. H.; Cramer, C. J.; Truhlar, D. G. J. Phys. Chem. A 1998, 102, 1820.

PAGE 153

153 (106) Myers, J. K.; Pace, C. N. Biophys. J. 1996, 71, 2033. (107) Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. J. Am. Chem. Soc. 1985, 107, 3902. (108) Peters, M. B.; Williams, D. E.; Merz Jr., K. M. Unpublished Results (109) Ulrich, E. L.; Akutsu, H.; Doreleijers, J. F.; Harano, Y.; Ioannidis, Y. E.; Lin, J.; Livny, M.; Mading, S.; Maziuk, D.; Miller, Z.; Nakatani, E.; Schulte, C. F.; Tolmie, D. E.; Weng er, R. K.; Yao, H. Y.; Markley, J. L. Nucleic Acids Res. 2008, 36, D402. (110) Moon, S.; Case, D. A. J. Biomol. NMR 2007, 38, 139. (111) D.A. Case; T.A. Darden; T.E. Cheatham, I.; C.L. Simmerling; J. Wang; R.E. Duke; R. Luo; K.M. Merz; D.A. Pearlman; M. Crowley; R.C. Walker; W. Zhang; B. Wang; S. Hayik; A. Roitberg; G. Seabra; K.F. Wong; F. Paesani; X. Wu; S. Brozell; V. Tsui; H. Gohlke; L. Yang; C. Tan; J. Mongan; V. Hornak; G. Cui; P. Beroza; D.H. Mathews; C. Schafmeister; W.S. Ross; Kollman, P. A. AMBER 9 2006, University of California, San Francisco (112) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Proteins 2006, 65, 712. (113) Stewart, J. J. P. J. Mol. Model. 2008, 14, 499. (114) Buchanan, G. W.; Morin, F. G. Can. J. Chem. 1979, 57, 21. (115) Elguero, J.; Marzin, C.; Roberts, J. D. J. Org. Chem. 1974, 39, 357. (116) Fraser, R. R.; Passannanti, S.; Piozzi, F. Can. J. Chem. 1976, 54, 2915. (117) Pugmire, R. J.; Grant, D. M.; Townsend, L. B.; Robins, R. K. J. Am. Chem. Soc. 1973, 95, 2791. (118) Fruchier, A.; Pellegrin, V.; Elguero, J.; Claramunt, R. M. Org. Magn. Res. 1984, 22, 473. (119) Adamczeski, M.; Quinoa, E.; Crews, P. J. Am. Chem. Soc. 1988, 110, 1598. (120) Bose, A. K.; Srinivasan, P. R. Tetrahedron 1975, 31, 3025. (121) Smith, W. B.; Proulx, T. W. Org. Magn. Res. 1976, 8, 205. (122) Katritzky, A. R.; Yannakopoulou, K.; Lue, P.; Rasala, D.; Urogdi, L. J. Chem. Soc., Perkin Trans. 1 1989, 225. (123) Gottlieb, H. E.; Kotlyar, V.; Nudelman, A. J. Org. Chem. 1997, 62, 7512. (124) Giam, C. S.; Goodwin, T. E.; Yano, T. K. Journal of the Chemical Society -Perkin Transactions 2 1978, 831.

PAGE 154

154 (125) Prakash, G. K. S.; Bae, C. S.; Rasul, G.; Olah, G. A. Journal of Organic Chemistry 2002, 67, 1297. (126) Chou, W. N.; Pomerantz, M.; Witzcak, M. K. Journal of Organic Chemistry 1990, 55, 716. (127) Bradamante, S.; Pagani, G. A. Journal of Organic Chemistry 1980, 45, 105. (128) Pelletier, S. W.; Djarmati, Z.; Pape, C. Tetrahedron 1976, 32, 995. (129) Mcintosh, J. M. Can. J. Chem. 1979, 57, 131. (130) Dauphin, G.; Cuer, A. Org. Magn. Res. 1979, 12, 557. (131) Gronowitz, S.; Zanirato, P. J. Chem. Soc., Perkin Trans. 2 1994, 1815. (132) Laurens, T.; Nicole, D.; Rubini, P.; Lauer, J. C.; Matlengiewicz, M.; Henzel, N. Magnetic Resonance in Chemistry 1991, 29, 1119. (133) Pothier, N.; Rowan, D. D.; Deslongchamps, P.; Saunders, J. K. Can. J. Chem. 1981, 59, 1132. (134) Batsanov, A.; Chen, L. G.; Gill, G. B.; Pattenden, G. J. Chem. Soc., Perkin Trans. 1 1996, 45. (135) Aliev, A. E.; Harris, K. D. M. J. Am. Chem. Soc. 1993, 115, 6369. (136) Stothers, J. B.; Tan, C. T. Can. J. Chem. 1974, 52, 308. (137) Kalinows.Ho; Kessler, H. Org. Magn. Res. 1974, 6, 305. (138) Abraham, R. J.; Warne, M. A.; Griffiths, L. J. Chem. Soc., Perkin Trans. 2 1997, 881. (139) Wiberg, K. B.; Pratt, W. E.; Bailey, W. F. J. Org. Chem. 1980, 45, 4936. (140) Lau, E. Y.; Gerig, J. T. J. Am. Chem. Soc. 2000, 122, 4408. (141) Lepre, C. A.; Moore, J. M.; Peng, J. W. Chem. Rev. 2004, 104, 3641. (142) Opella, S. J.; Marassi, F. M. Chem. Rev. 2004, 104, 3587. (143) Prestegard, J. H.; Bougault, C. M.; Kishore, A. I. Chem. Rev. 2004, 104, 3519. (144) Leone, M. R.; Rodriguez -Mias, R. A.; Pellecchia, M. ChemBioChem 2003, 4, 649. (145) O'Hagan, D.; Harper, D. B. J. Fluorine Chem. 1999, 100, 127. (146) Gerig, J. T. 2000.

PAGE 155

155 (147) Isanbor, C.; O'Hagan, D. J. Fluorine Chem. 2006, 127, 303. (148) Dalvit, C.; Ardini, E.; Flocco, M.; Fogliatto, G. P.; Mongelli, N.; Veronesi, M. J. Am. Chem. Soc. 2003, 125, 14620. (149) Dalvit, C.; Ardini, E.; Fogliatto, G. P.; Mongelli, N.; Veronesi, M. Drug Discovery Today 2004, 9, 595. (150) Shikii, K.; Sakurai, S.; Utsumi, H.; Seki, H.; Tashiro, M. Anal. Sci. 2004, 20, 1475. (151) Haas, A.; Reinke, H. Angew. Chem. Int. Ed. E ngl. 1967, 6, 705. (152) Gregory, D. H.; Gerig, J. T. Biopolymers 1991, 31, 845. (153) Pearson, J. G.; Oldfield, E.; Lee, F. S.; Warshel, A. J. Am. Chem. Soc. 1993, 115, 6851. (154) Lian, C. Y.; Le, H. B.; Montez, B.; Patterson, J.; Harrell, S.; Laws, D .; Matsumura, I.; Pearson, J.; Oldfield, E. Biochemistry 1994, 33, 5238. (155) Wang, B.; Merz, K. M., Jr. J. Am. Chem. Soc. 2005, 127, 5310. (156) Wang, B.; Brothers, E. N.; van der Vaart, A.; Merz, K. M., Jr. J. Chem. Phys. 2004, 120, 11392. (157) Stewart, J. J. P. J. Comput. Chem. 1989, 10, 209. (158) Becke, A. D. J. Chem. Phys. 1993, 98, 5648. (159) Lee, C.; Yang, W.; Parr, R. G. Phys. Chem. Rev. B 1987, 37, 785. (160) Dewar, M. J. S.; Rzepa, H. S. J. Am. Chem. Soc. 1978, 100, 58. (161) Tanuma, T. ; Irisawa, J. J. Fluorine Chem. 1999, 99, 157. (162) Ying, Z.; Wu, A.; Xu, X.; Yan, Y. J. Phys. Chem. A 2007, 111, 9431. (163) Fukaya, H.; Ono, T. J. Comput. Chem. 2003, 25, 51. (164) Gaussian 03, R. C., Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Sc useria, G. E.; Robb, M. A.; Cheeseman, J. R.; Montgomery, Jr., J. A.; Vreven, T.; Kudin, K. N.; Burant, J. C.; Millam, J. M.; Iyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N.; Petersson, G. A.; Nakatsuji, H.; Hada, M. ; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.; Morokuma, K.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Zakrzewski, V. G.; Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas, O.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; F oresman, J. B.;

PAGE 156

156 Ortiz, J. V.; Cui, Q.; Baboul, A. G.; Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith, T.; Al Laham, M. A.; Peng, C. Y.; Nanayakkara, A.; Challacombe, M.; Gi ll, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W.; Gonzalez, C.; and Pople, J. A.; Gaussian, Inc., Wallingford CT, 2004. (165) Baran, M. C.; Huang, Y. J.; Mosely, H. N. B.; Montelione, G. T. Chem. Rev. 2004, 104, 3541. (166) Weigert, F. J. J. Org. Chem. 1980, 45, 3476. (167) Singer, R. J.; Eisenhut, M.; Schmutzler, R. J. Fluorine Chem. 1971, 1, 193. (168) Dmowski, W.; Kaminski, M. J. Fluorine Chem. 1983, 23, 219. (169) Filipovich, G.; Tiers, G. V. D. J. Phys. Chem. 1959, 63, 761. (170) Vcelak, J.; Chva lovsky, V.; Voronkov, M. G.; Pukhnarevich, V. B.; Pestunovich, V. A. Collect. Czech. Chem. Commun. 1976, 41, 386. (171) Weigert, F. J. J. Fluorine Chem. 1990, 46, 375. (172) Harris, R. K. J. Mol. Spectrosc. 1963, 10, 309. (173) Sartori, P.; Habel, W. J. Fluorine Chem. 1980, 16, 265. (174) Schmutzler, R. J. Chem. Soc. 1964, Nov., 4551. (175) Naumann, D.; Kischkewitz, J. J. Fluorine Chem. 1990, 47, 283. (176) Christe, K. O.; Wilson, W. W. J. Fluorine Chem. 1990, 47, 117. (177) Solovev, D. V., Rodin, A.A., Zenkevich, I.G., Lavrentev, A.N. Zhurn.bsch.Khim. (Russ. Lang.) 1988, 58, 1544. (178) Bloshchitsa, F. A.; Burmakov, A. I.; Kunshenko, B. V.; Alekseeva, L. A.; Yagupol'skii, L. M. Zh. Obsch. Khim. 1985, 21, 1414. (179) Aktaev, N. P.; IL'in, G. F.; Sok ol'skii, G. A.; Knunyants, I. L. Izv. Akad. Nauk SSSR Ser. Khim. 1977, 5, 1112. (180) Schreider, V. A.; Rozhkov, I. N. Izv. Akad. Nauk SSSR, Ser. Khim. 1979, 3, 673. (181) Burdon, J.; Huckerby, T. N.; Stephens, R. J. Fluorine Chem. 1977, 10, 523. (182) Jullien, J.; Martin, J. A.; Ramanadin, R. Bull. Soc. Chim. Fr. 1964, 171.

PAGE 157

157 (183) Dungan, C. H.; Van Wazer, J. R. Compilation of Reported 19F NMR Chemical Shifts (1951-1967); John Wiley & Sons, Inc.: New York, 1970. (184) Hemer, I.; Havlicek, J.; Dedek, V. J. Fluorine Chem. 1986, 34, 241. (185) Mironova, A. A.; Maletina, I. I.; Iksanova, S. V.; Orda, V. V.; Yagupolsky, L. M. Zh. Org. Khim. 1989, 25, 306. (186) Bogachev, Y. S.; Serebryanskaya, A. I.; Khutsishvili, V. G.; Korenkova, V. M.; Shapet'ko, N. N. Zh. Obsch. Khim. 1986, 56, 909. (187) Manatt, S. L. J. Am. Chem. Soc. 1966, 88, 1323. (188) Pellerite, M. J. J. Fluorine Chem. 1990, 49, 43. (189) Burdon, J.; Knights, J. R.; Parsons, I. W.; Tatlow, J. C. J. Chem. Soc., Perk. Trans. 1 1976, 18, 1930. (190) Pitcher, E.; Buckingham, A. D.; Stone, F. G. A. J. Chem. Phys. 1961, 36, 124. (191) Burger, H.; Niepel, H.; Pawelke, G.; Frohn, H. J.; Satori, P. J. Fluorine Chem. 1980, 15, 231. (192) Lustig, M.; Ruff, K. J. Inorg. Chem. 1965, 4, 1441. (193) Bovey, F. A.; Anderson, E. W.; Hood, F. P.; Kornegay, R. L. J. Chem. Phys. 1963, 40, 3099. (194) Schneider, H. J.; Gschwendtner, W.; Heiske, D.; Hoppen, V.; Thomas, F. Tetrahedron 1977, 33, 1769. (195) Gash, V. W.; Bauer, D. J. J. Org. Chem. 1966, 31, 3602. (196) Feeney, J.; Sutcliffe, L. H.; Walker, S. M. Mol. Phys. 1966, 11, 117. (197) Mitsch, R. A. J. Am. Chem. Soc. 1965, 87, 758. (198) Cullen, W. R.; Waldman, M. C. J. Fluorine Chem. 1971, 1, 151. (199) Wheaton, G. A.; Burton, D. J. J. Fluorine Chem. 1977, 1, 25. (200) Jullien, J.; Stahl -Lariviere, H. Bull. Soc. Chim. Fr. 1967, 1, 99. (201) Cantacuzene, J.; Ricard, D. Bull. Soc. Chim. Fr. 1967, 5, 1587. (202) Boswell, G. A. J. J. Org. Chem. 1966, 31, 991. (203) Koroniak, H.; Palmer, K. W.; Dolbier, W R., Jr ;; Zhang, H. Q. Magnetic Resonance in Chemistry 1993, 31, 748.

PAGE 158

158 (204) Krause, L. J.; Morrison, J. A. J. Am. Chem. Soc. 1981, 103, 2995. (205) Chambers, R. D.; Edwards, A. R. J. Chem. Soc., Perk. Trans. 1 1997, 3623. (206) Campbell, S. F.; Hudson, A. G.; Mooney, E. F.; Pedler, A. E.; Stevens, R.; Wood, K. N. Spectrochim. Acta, Part A 1967, 23, 2119. (207) Olah, G. A.; Chambers, R. D.; Comisarow, M. B. J. Am. Chem. Soc. 1967, 89, 1268. (208) Mitsch, R. A. J. Am. Chem. Soc. 1965, 87, 328. (209) Merritt, R. F.; Johnson, F. A. J. Org. Chem. 1966, 31, 1859. (210) Bystrov, V. F.; Utyanskaya, E. Z.; Yagupol'skii, L. M. Opt. Spektrosk. 1961, 10, 138. (211) Petrova, T. D., Savchenko, T.I., Kukovinets, O.S., Yakobson, G.G. Izv. Sib. Otdel. Akad. Nauk SSSR Ser. Khim 1974, 2, 117. (212) Petrova, T. D.; Savchenko, T. I.; Kukovinets, O. S.; Yakobson, G. G. Izv. Sib. Otdel. Akad. Nauk SSSR Ser. Khim 1973, 2, 104. (213) Christe, K. O.; Pavlath, A. E. J. Org. Chem. 1965, 30, 4104. (214) Dvornikova, E.; Bechc icka, M.; Kamienska -Trela, K.; Krowczynski, A. J. Fluorine Chem. 2003, 124, 159. (215) Fabra, F.; Fos, E.; Vilarrasa, J. Tetrahedron Lett. 1979, 34, 3179. (216) Owen, D.; Plevey, R. G.; Tatlow, J. C. J. Fluorine Chem. 1981, 17, 179. (217) Koshelev, V. M.; Barsukov, I. N.; Vasilev, N. V.; Gontar, A. F. Chem. Heterocycl. Compd.(N.Y.) 1989, 12, 1699. (218) Gerus, I. I.; Gorbunova, M. G.; Vdovenko, S. I.; Yagupol'sky, Y. L.; Kukhar, V. P. Zh. Org. Khim. 1990, 26, 1877. (219) Vasil'ev, N. V.; Savostin, V. S.; Kolomiets, A. F.; Sokolsky, G. A. Khim. Geterotsikl. Soedin. 1989, 5, 663. (220) Burger, K.; Geith, K.; Norbert, S. J. Fluorine Chem. 1990, 46, 105. (221) Tiers, G. V. D. J. Phys. Chem. 1962, 66, 764. (222) Abe, T.; Shreeve, J. M. J. Fluorine Chem. 1973, 3, 17. (223) Lowe, G.; Potter, B. V. L. J. Chem. Soc., Perkin Trans. 1 1980, 2026. (224) Chambers, R. D.; Drakesmith, F. G.; Musgrave, W. K. R. J. Chem. Soc. (Resumed) 1965, 5045.

PAGE 159

159 (225) Chambers, R. D.; Hutchinson, J.; Musgrave, W. K. R. J. Chem. Soc. (Resumed) 1965, 5040. (226) Robins, M. J.; Maccoss, M.; Naik, S. R.; Ramani, G. J. Am. Chem. Soc. 1976, 98, 7381. (227) Ellermann, J.; Schamberger, J.; Knock, F. A.; Moll, M.; Bauer, W. Monatsh. Chem. 1997, 128, 399. (228) Robins, M. J.; MacCoss, M.; Naik, S. R.; Ramani, G. J. Am. Chem. Soc. 1976, 98, 7381. (229) Nakai, K.; Takagi, Y.; Tsuchiya, T. Carbohydr. Res. 1999, 316, 47. (230) Dean, P. A. W.; Ibbott, D. G. Can. J. Chem. 1976, 54, 177. (231) Sheppard, W. A.; Foster, S. S. J. Fluorine Chem. 1972, 2, 53. (232) Kitching, W.; Adcock, W.; Aldous, G. J. Org. Chem. 1979, 44, 2652. (233) Zweig, A.; Fischer, R. G.; Lancaster, J. E. J. Org. Chem. 1980, 45, 3597. (234) Cavalli, L. J. Chem. Soc. B 1967, 384. (235) Soloshonok, V. A.; Belokon, Y. N.; Kukhar, V. P.; Chernoglazova, N. I.; Saporovskaya, M. B.; Bakhmutov, V. I.; Kolycheva, M. T.; Belikov, V. M. Izv. Akad. Nauk SSSR Ser. Khim. 1990, 7, 1630. (236) Haas, A., Hellwig, V. J. Fluorine Chem. 1975, 6, 521. (237) Clark, J. H.; Jones, C. W.; Kybett, A. P.; McClinton, M. A.; Miller, J. M.; Bishop, D.; Blade, R. J. J. Fluorine Chem. 1990, 48, 249. (238) Christe, K. O.; Pavlath, A. E. J. Org. Chem. 1965, 30, 3170. (239) HEBEL, D.; KIRK, K. L. J. Fluorine Chem. 1990, 47, 179. (240) Muller, N.; Carr D. T. J. Phys. Chem. 1963, 67, 112. (241) Kobayashi, Y.; Kumadaki, I. J. Chem. Soc., Perk. Trans. 1 1980, 3. (242) Homer, J.; Thomas, L. F. J. Chem. Soc. 1966, 141.

PAGE 160

BIOGRAPHICAL SKETCH Duane E. Williams was born in Nassau, Bahamas in 1980. He attend ed Richard Montgomery High School in Rockville, Maryland and graduated from the International Baccalaureate program. He then obtained a Bachelor of Science degree in chemistry with a minor in biology from Randolph-Macon College in Ashland, Virginia. As an undergraduate, his research projects were in the areas of organic and inorganic synthesis. Duane then enrolled in the Ph.D. program at Penn State University as a Bunton Waller Scholar where he studied quantum mechanics applied to biological systems. After completing all of the qualifying requirements at Penn State, Duane moved to the Quantum Theory Project at the University of Florida in order to continue his dissertation research with Professor K.M. Merz, Jr.