<%BANNER%>

Infrared Spectroscopy of Peptide Fragment Structures

Permanent Link: http://ufdc.ufl.edu/UFE0042229/00001

Material Information

Title: Infrared Spectroscopy of Peptide Fragment Structures Parameterization, Conformational Searching and Frequency Calculation
Physical Description: 1 online resource (63 p.)
Language: english
Creator: Yu, Long
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: conformational, infrared, parameterization, peptide
Chemistry -- Dissertations, Academic -- UF
Genre: Chemistry thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The structures of b2 and b5 product ions from oligoglycine peptide fragmentation by collision-induced dissociation (CID) are studied through comparison between experimental infrared multiple-photon dissociation (IRMPD) spectra using the Free Electron Laser for Infrared eXperiment (FELIX) and theoretical results presented here. As peptide fragmentation of protonated peptides is mediated by nucleophilic attacks, three distinct chemical structures are considered here: oxazolone structures protonated on the N-terminus and the oxazolone ring N, as well as macrocycle structures. Conformational searches for the aforementioned chemical structures are first performed with molecular dynamics simulation, followed by geometry optimizations at various levels of density functional theory (DFT) methods, with selection and refinement by the electronic and zero-point corrected (ZPE) energies. The structures are analyzed for trans amide bonds, and structural redundancy is eliminated based on root-mean-square deviation (RMSD) analysis. IR linear absorption spectra calculated B3LYP/6-31g** for each of the chemical structures allow an interpretation of the experimental vibrational spectra from FELIX. This shows that b2 exclusively forms oxazolone structures, where the majority is protonated on the oxazolone ring N. Conversely, for b5 a mixture of oxazolone and macrocycle structures is confirmed. A detailed analysis is shown to determine 1) the effect of theory on geometry optimization, 2) the effect of temperature on the molecular dynamics results, 3) the difference in predicted IR spectra between chemical families, as well as 4) the difference in predicted IR spectra within a chemical families, and 5) the validity in employing RMSD thresholds in eliminating structural redundancies. The main conclusions from this study are that representative IR spectra can be obtained using this approach, but that further improvements are necessary to decrease the overall computational cost.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Long Yu.
Thesis: Thesis (M.S.)--University of Florida, 2010.
Local: Adviser: Polfer, Nicolas Camille.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0042229:00001

Permanent Link: http://ufdc.ufl.edu/UFE0042229/00001

Material Information

Title: Infrared Spectroscopy of Peptide Fragment Structures Parameterization, Conformational Searching and Frequency Calculation
Physical Description: 1 online resource (63 p.)
Language: english
Creator: Yu, Long
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: conformational, infrared, parameterization, peptide
Chemistry -- Dissertations, Academic -- UF
Genre: Chemistry thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The structures of b2 and b5 product ions from oligoglycine peptide fragmentation by collision-induced dissociation (CID) are studied through comparison between experimental infrared multiple-photon dissociation (IRMPD) spectra using the Free Electron Laser for Infrared eXperiment (FELIX) and theoretical results presented here. As peptide fragmentation of protonated peptides is mediated by nucleophilic attacks, three distinct chemical structures are considered here: oxazolone structures protonated on the N-terminus and the oxazolone ring N, as well as macrocycle structures. Conformational searches for the aforementioned chemical structures are first performed with molecular dynamics simulation, followed by geometry optimizations at various levels of density functional theory (DFT) methods, with selection and refinement by the electronic and zero-point corrected (ZPE) energies. The structures are analyzed for trans amide bonds, and structural redundancy is eliminated based on root-mean-square deviation (RMSD) analysis. IR linear absorption spectra calculated B3LYP/6-31g** for each of the chemical structures allow an interpretation of the experimental vibrational spectra from FELIX. This shows that b2 exclusively forms oxazolone structures, where the majority is protonated on the oxazolone ring N. Conversely, for b5 a mixture of oxazolone and macrocycle structures is confirmed. A detailed analysis is shown to determine 1) the effect of theory on geometry optimization, 2) the effect of temperature on the molecular dynamics results, 3) the difference in predicted IR spectra between chemical families, as well as 4) the difference in predicted IR spectra within a chemical families, and 5) the validity in employing RMSD thresholds in eliminating structural redundancies. The main conclusions from this study are that representative IR spectra can be obtained using this approach, but that further improvements are necessary to decrease the overall computational cost.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Long Yu.
Thesis: Thesis (M.S.)--University of Florida, 2010.
Local: Adviser: Polfer, Nicolas Camille.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0042229:00001


This item has the following downloads:


Full Text

PAGE 1

1 INFRARED SPECTROSCOPY OF PEPTIDE FRAGMENT STRUCTURES: PARAMETERIZATION, CONFORMATIONAL SEARCHING AND FREQUENCY CALCULATION By LONG YU A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2010

PAGE 2

2 2010 Long Yu

PAGE 3

3 To my mom, my family and all my dear friends

PAGE 4

4 ACKNOWLEDGMENTS I gratefully treasure the opportunity to work with Dr. Nicolas Polfer, and benefit from his long time and generous guidance through my graduate study and researches. I research group and his generosity to grant me the access to h is Fourier transform ion cyclotron resonance mass spectrometer, which started my experience in mass spectrometry experiments. I thank Dr. Nicolo Omenetto and Dr. Adrian E. Roitberg for their precious time and energy spent in the serving on my committee. I would like to thank Yue Yang, Yilin Meng and Xiao He in the Quantum Theory Project of chemistry department for their kindly sharing of knowledge and experience which greatly helped me in the understanding of molecular dynamics and ab initio calculations. I specially thank Xinyu Miao, my undergraduate roommate and one of my best friends, whose timely directions overcame many obstacles I engaged during my methodology development. I thank my alumni and group mate Xian Chen for her tutorials in the experimental techniques and her considerate care during my graduate study. I appreciate the cooperation with Warren Mino during the discovering of AMBER and Gaussian functions, which is of great aid to my calculation and data analysis. The Polfer and the Eyler researc h groups have been a great support to me and this research could not have been accomplished. I thank my alumni and long time friend Ou Chen for his ungrudging accommodation in every aspect of my living and studying since my arrival in the United States and his crucial help during my application to the University of Florida, without which my entrance in the Ph.D. program would be impractical.

PAGE 5

5 I thank all my friends and relatives for your support of all kind. Without you life is a pure misery. I thank my moth er for her lifetime devotion in my education, and her deep sacrifice of time and energy that lead to a huge compromise in her otherwise even more successful business career. I will try my best to be her biggest pride.

PAGE 6

6 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF FIGURES ................................ ................................ ................................ .......... 8 ABSTRACT ................................ ................................ ................................ ................... 10 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 12 2 EXPERIMENTAL BACKGROUNDS AND TECHNIQUES ................................ ...... 15 2.1 IRMPD ................................ ................................ ................................ .............. 15 2.2 FT ICR M ass S pectrometer ................................ ................................ .............. 16 2.3 Free Electron Laser for Infrared Ex periments (FELIX) ................................ ...... 17 2.4 IR S pectra: E xperimental vs. C omputational ................................ .................... 18 3 COMPUTATIONAL METHODS ................................ ................................ .............. 21 3.1 Overview ................................ ................................ ................................ ........... 21 3.1.1 Peptide F ragments of I nterest ................................ ................................ 21 3.1.2 Computational Procedure ................................ ................................ ........ 21 3. 2 HyperChem ................................ ................................ ................................ ....... 22 3.2.1 Overview ................................ ................................ ................................ 22 3.2.2 Visualized Peptide B uilder ................................ ................................ ....... 23 3.2.3 Preliminary Geometry Optimization at S emi empirical L evel ................... 23 3.3 AMBER: Assisted Model Building with Energy Refinement .............................. 23 3.3.1 Overview ................................ ................................ ................................ 23 3.3.2 Antechamber ................................ ................................ ........................... 24 3.3.3 Annealing ................................ ................................ ................................ 25 3.4 Gaussian ................................ ................................ ................................ ........... 25 3.4.1 Overview ................................ ................................ ................................ 25 3.4.2 ESP C alculation ................................ ................................ ...................... 26 3.4.3 DFT C alculations ................................ ................................ ..................... 26 3.4.3.1 Ge ometry o ptimization ................................ ................................ ... 27 3.4.3.2 Frequency c alculation ................................ ................................ .... 27 3.5 In House P rograms and S cripts ................................ ................................ ........ 27 3.5.1 Overview ................................ ................................ ................................ 27 3.5.2 PDB C onverters and I nput F ile G enerators ................................ ............. 28 3.5. 3 Semi automatic A nalyzers and F ilters ................................ ..................... 28 3.5. 3 .1 Overview ................................ ................................ ........................ 28 3.5. 3 .2 C is tran s f ilters ................................ ................................ ............... 29 3.5. 3 .3 RMSD calculators and analyzers ................................ ................... 29 3.5. 3 4 AMBER/Gaussian energy analyzers ................................ .............. 32

PAGE 7

7 3.5. 3 5 IR spectra grabbers and analyzers ................................ ................ 32 4 RESULTS AND DISCUSSIONS ................................ ................................ ............. 35 4.1 Overview ................................ ................................ ................................ ........... 35 4.2 Testing of P rocedures ................................ ................................ ....................... 35 4.2.1 Energy C omparison ................................ ................................ ................. 35 4.2.1.1 AMBER vs. B3LYP/3 21g DFT single point ................................ ... 35 4.2.1.2 AMBER vs. 3 21g and 3 21g vs. 6 31g* optimization .................... 36 4.2.1.3 Energy comparison between 6 31g* and 6 31g** optimized structures ................................ ................................ ................................ 38 4.2.2 Influence of A nnealing T emperature ................................ ........................ 38 4.2.3 Cis trans S tructure D ifferentiator ................................ ............................. 39 4.3 Final R esults I nterpretation and F urther M ethodology D evelopment ................ 40 4.3.1 Comparison of IR S pectra b etween E xperimental R esults and E nergetically B est R esults ................................ ................................ ............. 40 4.3.2 Distribution of IR S pectra P eaks in D iagnostic F requency R egions for B 5 I ons ................................ ................................ ................................ .......... 42 4.3.3 RMSD based R edundancy R eduction ................................ ..................... 42 4.3.4 Family S orting or S tructure C lustering ................................ ..................... 43 5 CONCLUSIONS ................................ ................................ ................................ ..... 60 LIST OF REFERENCES ................................ ................................ ............................... 61 BIOGRAPHICAL S KETCH ................................ ................................ ............................ 63

PAGE 8

8 LIST OF FIGURES Figure page 1 1 Mechanistic scheme showing oxazolone b fragment formation, followed by cyclization into a macrocycle and loss of sequence information for the peptide. ................................ ................................ ................................ ............... 14 2 1 Impression o f the IR MPD mechanism in polyatomic molecules, showing how the energy pumped into a specific vibrational mode is quickly redistributed over the bath of background states by virtue of IVR. The molecule can thus sequentially absorb many photons on the same transition, while the energy is stored in the bath. Once the internal energy reaches the dissociation threshold (red mark), the molecule can undergo unimolecular dissociation (19) ................................ ................................ ................................ .................... 19 2 2 The geometry of an open, cylindrical ICR cell. The ions are trapped axially by applying a small DC voltage (Utrap) to each capping cylinder. The cyclotron m otion of a trapped ion is indicated. Figure is adapted from Ref. 7. ................... 20 3 1 B 2 structures of interest: (left to right) b2g3_cyc, b2 g3_ox_n, b2g3_ox_ox ...... 33 3 2 B 5 ions: (left to right) b5G8_cyc, b5G8_ox_n, b5G8_ox_ox ............................... 33 3 3 Schematic view of the calculation procedure ................................ ...................... 34 4 1 (a) B3LYP/3 21G single point vs. AMBER calculated energy for b5g8_cyc. (b) B3LYP/3 21G single point vs. AMBER calculated energy for b5g8_ox_n (c) B3LYP/3 2 1G single point vs. AMBER calculated energy for b5g8_ox_ox ....... 45 4 2 (a) E nergy comparisons for bottom: 3 21g optimized vs AMBER, and top: 6 31g* optimized vs. 3 21g optimized for b5g8_cyc (b) E nergy comparisons for AMBER and 3 21g optimized and 3 21g optimized vs 6 31g* optimized for b5g8_ox_n (c) E nergy comparisons for AMBER and 3 21g optimized and 3 21g optimized vs 6 31g* optimize d for b5g8_ox_ox ................................ .......... 47 4 3 (a) Electronic energy comparison between 6 31g* and 6 31g** results for b5g8_cyc (b) Electronic energy comparison between 6 31g* and 6 31g** results for b5 g8_ox_n (c) Electronic energy comparison between 6 31g* and 6 31g** results for b5g8_ox_ox ................................ ................................ ......... 50 4 4 Distribution of electronic energ ies for structures initially generated with annealing temperature of 300K and 500K for b5g8_cyc (left) and b5g8_ox_n (right) ................................ ................................ ................................ ................. 51 4 5 IR MPD spectrum of the b 2 G3 fragment generated from Gly Gly Gly, compared to compute spectra for (A) diketopiperazine structure protonated on a carbonyl O, (B) oxazolone structure protonated on the oxazolone ring N,

PAGE 9

9 and (C) oxazolone structure protonated on the N terminus. Figure is modified from reference (22) ................................ ................................ ............................ 53 4 6 Mid IR MPD spectrum of b 5 G8 (generated from octa glycine), compared to the lowest energy conformers for the various chemical structures: (A) macrocycle structure protonated on backbone carbonyl, (B) oxazolone structure protonated on N terminus, and (C) oxazolone structure protonat ed on oxazolone ring N. The relative energies to the lowest conformer are indicated. The chemically diagnostic bands are labeled. Figure is modified from reference (22) ................................ ................................ ............................ 54 4 7 Comparison of mid IR MPD spectra of b2 G3, b5 G8 and b8 G8 ..................... 55 4 8 Distribution of peaks in the 1760 to 1850 cm 1 region for b5g8_ox_n, with an experimental IR spectra background. ................................ ................................ 56 4 9 Distribution of peaks in the 1880 to 2000 cm 1 region for b5g8_ox_ox, with an experimental IR spectra background. ................................ ................................ 56 4 10 IR spectra from 6 31g* DFT calculation for 300003 and 300043 structures ...... 57 4 11 Correlation between energy deviation and RMS deviation for any possible pair of candidate structures for b5g8_ox_n. 96 pairs of structures have RMSD values less than 0.8 Angstrom ................................ ............................... 58 4 12 Correlation between frequency deviation and RMSD for b5g8_ox_n structures. 94 pairs of structures have RMSD values less tha n 0.8 Angstrom .. 59

PAGE 10

10 Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science INFRARED SPECTROSCOPY OF PEPTIDE FRAGMENT STRUCTURES: PARAMETERIZATION, CONFORMATIONAL SEARCHING AND FREQUENCY CALCULATION By Long Yu D ecember 2010 Chair: Nicolas Polfer Major: Chemistry The structures of b2 and b5 product ions from oligoglycine peptide fragmentation by collision induced dissociation (CID) are studied through comparison between experimental infrared multiple photon dissociation (IRMPD) spectra using the Free Electron Laser for Infrared eXperiment (FELIX) and theoretical results presented here. As peptide fragmentation of protonated peptides is mediated by nucleophilic attacks, three distinct chemical structures are considered here: oxazolone structures protonated on the N t erminus and the oxazolone ring N, as well as macrocycle structures. Conformational searches for the aforementioned chemical structures are first performed with molecular dynamics simulation, followed by geometry optimizations at various levels of density f unctional theory (DFT) methods, with selection and refinement by the electronic and zero point corrected (ZPE) energies. The structures are analyzed for trans amide bonds, and structural redundancy is eliminated based on root mean square deviation (RMSD) analysis. IR linear absorption spectra calculated B3LYP/6 31g** for each of the chemical structures allow an interpretation of the experimental vibrational spectra from FELIX. This shows that b2 exclusively forms oxazolone

PAGE 11

11 structures, where the majority i s protonated on the oxazolone ring N. Conversely, for b5 a mixture of oxazolone and macrocycle structures is confirmed. A detailed analysis is shown to determine 1) the effect of theory on geometry optimization, 2) the effect of temperature on the molecula r dynamics results, 3) the difference in predicted IR spectra between chemical families, as well as 4) the difference in predicted IR spectra within a chemical families, and 5) the validity in employing RMSD thresholds in eliminating structural redundancie s. The main conclusions from this study are that representative IR spectra can be obtained using this approach, but that further improvements are necessary to decrease the overall computational cost.

PAGE 12

12 CHAPTER 1 INTRODUCTION Prote omics which analytical protein identification tools are essential (1 5) spectrometry has taken the place of Edman degradation method as the most predominant method for protein identification, or peptide sequ encing, due to its much better sensitivity, speed and high tolerance to mixtures. approach (1) (6) w here this approach involves mass analysis of peptides following protein enzymatic digestion. The peptide mixtures (7) are separated using various techniques, such as high performance liquid chromatography (HPLC), and then ionized, usually using electrospray ionization (ESI) (8) or matrix assisted laser desorption/ionization (MALDI) (9) The peptide ions can then be mass analyzed by various mass spectrometric methods. T andem mass spectrometry methods, or MS MS, involve the dissociation of peptides to derive their sequence (7, 10) The basic premise in peptide sequencing involves a comparison between the sequence information from tandem mass spectra and DNA/protein databases. Only a portion of fragment ions are typically recognized and used in the interpretation of mass spectra. Many of the peaks (ions) in tandem mass spectra are reg ularly not identified. Failure to identify these product ions is mainly due to the lack of understanding for the chemistry behind the generation of these ions. In the most widespread approach for peptide dissociation, namely collision induced dissociation (CID), protonated peptide cations are collided with an inert gas. CID typically yields abundant cleavage at amide backbone bonds, yielding so

PAGE 13

13 (6, 11) ns are thought to be made via a nucleophilic attack from an adjacent carbonyl O, as shown in Figure 1. H owever, as reported by Paizs et al. (12) I t is possible that these b ions can rearrange to form a cyclic pep tide (macrocycle) structure. Re opening of the macrocycle at a different amide bond than where it was put together leads to oxazolone structures with permutated sequences. prim ary structure information. Unfortunately, conventional mass spectrometry cannot give direct structural information on these fragment structures, and other experimental/computational approaches are required to obtain a deeper understanding. Additional struc tural information can be obtained from techniques such as hydrogen deuterium exchange (HDX) (13, 14) ion mobility (15, 16) iso tope labeling (17) and infrared multiple photon dissociation (IRMPD) spectroscopy (18, 19) Of these, IRMPD spectroscopy is particularly useful, as cer tain diagnostic vibrations yield direct information on the presence of chemical groups. This technique has also provided direct evidence for oxazolone and macrocycle structures, based on diagnostic vibrations (20, 21) Meanwhile, in order to interpret the experimental IR spectra, theoretically comparable IR spectra are also needed for the proposed candidate fragment ion structures. This involves several approaches from molecular dynamic s to density functional theory methods. Previous studies have shown that this approach leads to detailed information on b ions (22) In this thesis we focus on a discussion of the computational approaches in light of a comparison to IRMPD experiments. The peptides of interest are b2 and b5 fragments

PAGE 14

14 generated from oligoglycine peptides. Both oxazalone and macrocycle structures are considered. Various different protonation sites are also taken into consideration, with co rresponding fragments calculated. Conformational searches for each isomer are candidate structures, which are then optimized using density functional theory methods. The final st ructures, optimized with the B3LYP 6 31g** DFT method, are presented with their calculated IR spectra compared with experimental results. The diagnostic frequencies in each IR spectra are utilized for structure determination. Figure 1 Mechanis tic scheme showing oxazolone b fragment formation, followed by cyclization into a macrocycle and loss of sequence information for the peptide.

PAGE 15

15 CHAPTER 2 EXPERIMENTAL BACKGRO UNDS AND TECHNIQUES The experimental techniques will only be described briefly, as the work described here is purely computational in nature. In terms of the infrared multiple photon dissociation (IRMPD) experiments, it is important though to point out the differences between experiment and theory. 2.1 IRMPD The dissociation of mass selected ions with line tunable CO 2 lasers was first reported in the late 1970s (23) Due to the very limited tuning ranges of CO 2 lasers, it took until the emergence of high power widely tunable free electron lasers (FELs) that infrared multiple photon dissociation (IRMPD) spectroscopy of trapped ions became useful. The previous development of soft ionization techniques for biomolecules, such as electrospray ionization (ESI) and matrix assisted laser desorption /ionization (MALDI), opened up novel avenues for experiments. Unlike the single photon dissociation, ions in IRMPD absorb multiple photons in a sequential manner, during which the internal energy grows gradually until the dissociation energy threshold is r eached. However, the dissociation threshold for most polyatomic molecules demands absorption of several tens of infrared photons in typical cases. This threshold makes absorption of photons in the same vibrational ladder, also referred to as coherent multi photon excitation, impractical. Due to the anharmonic nature of molecular vibrations, the energy difference between levels decreases as one climbs up the ladder, which precludes ladder climbing at a particular frequency. This phenomenon is commonly referre d to as the anharmonicity bottleneck Instead, in IRMPD the absorbed energy is quickly dissipated to the bath of background vibrational

PAGE 16

16 states of the moleculeby intramolecular vibrational redistribution (IVR) (18) In large molecules with large enough densities of states, the IVR process will rapidly remove the population from excited states into the background states, and the molecule is thus ready for the next single photon absorption event. This process is common ly referred to incoherent photon absorption, as both photon absorption events are not related with another. A carton representation of the IRM PD process is shown in Figure 2 1 (18, 19) Fund amental aspects and applications of IRMPD spectroscopy have been reviewed recently (18) 2.2 FT ICR M ass S pectrometer Fourier transform ion cyclotron resonance (FT ICR) mass spectrometers offer the advantages of high mass resolution and accuracy, as well as an ultra high vacuum environment (i.e., < 10 8 Torr). The latter aspect is important in IRMPD experiments, as de excitation of the ions by collisions is minimized. FT ICR is based on the principle of measuring cyclotron frequencies in a fixed magnetic field. The ions in the Penning trap are first excited by an oscillating electric field perpendicular to the magnetic field to achieve a larger cyclotron radius, and bringing the ions into a coherent phase. The freq uency of the cyclotron motion is then measured on two opposing plates by induced current detection circuitry. The superposition of multiple sine wave components can be deconstructed with a Fourier transform analysis. Finally, the cyclotron frequency is rel ated to m/z by the cyclotron equation:

PAGE 17

17 W here is the cyclotron frequency, is the magnetic field strength, is the ion charge, and is the ion mass. The equation is usually give n in terms of angular frequency : W hile the angular frequency is defined as : Figure 2 2 (24) is a schematic representation of an open ICR cell. An FT ICR mass spectrometer differs from other mass analyzers in several aspects. First, unlike other analyzers whose detection relies on contact between sensors and ions, FT ICR MS (Fig. 3) only need the ions of interest to be close to the detecting plates. Second, instead of space or time, the FT ICR MS is solely resolved by the ion cyclotron motion, thus all ions in the ICR cell can be detected simultaneously, instead of being detected at different plac es or time. Also, using superconducting magnet for the magnetic fields, the FT ICR MS can provide an unparalleled high level mass resolution, which makes even more competitive when large bio molecules are of interest. 2.3 Free Electron Laser for Infrared E x periments (FELIX) For IRMPD experiments, the absorption of many (tens to hundreds) photons is needed to overcome the dissociation thresholds, which in turn requires a powerful laser source. The emergence of free electron lasers, with both high laser power and wide range of tunable wavelength, truly enables a full spectroscopic analysis for IRMPD experiments.

PAGE 18

18 FELs use a relativistic electron beam as the lasing medium which moves freely through a magnetic structure, hence the term free electron. To create a free electron laser, electrons are accelerated to a relativistic speed (i.e., near the speed of light), and pass through the FEL oscillator, which consists of a transverse ma gnetic field. The electron undergoing such acceleration will release a the emitted light, and both fields add coherently. Unlike conventional undulators which let electrons to radiate individually, in an FEL the instabilities of electron beam and the radiation they emit bunches, and continue to radiate in phase with each other, resulting in higher laser intensity. The wavelength of FEL can be conveniently tuned by adjusting the electron beam energy and magnetic field strength. The IRMPD experiments shown here were performed with the Free Electron Laser for Infrared eXperiments (FELIX) laser located at the FOM Institute for Plasma Physics ecifications of the FELIX light source are as follows 2.4 IR S pectra: E xperimental vs. C omputational Photodissociation of a mass selected ion is detected by the depletion of the precursor ion and the appearance of fragment ions. The IR spectrum is obtaine d by plotting the IRMPD yield versus irradiating wavelengths, where the yield is given by yield = ln[1 ( photofragments/ all_ions)]. Such spectra are then compared to their theoretical counterparts of proposed peptide fragment structures, calculated using DFT method. Note that such calculations overlook the anharmonicity of vibrational modes for

PAGE 19

19 the molecule, and thus a comparison of experimental results to theory gives rise to certain well known effects: the IR MPD bands are red shifted and broadened. Fur ther, the IRMPD intensities deviate from linear absorption cross sections (29) Figure 2 1 Impression of the IR MPD mechanism in polyatomic molecules, showing how the energy pumped into a specific vibrational mode is quickly redistributed over the bath of background states by virtue of IVR. The molecule can thus sequentially absorb many photons on the same transition, while the energy is stored in the bath. Once the internal energy reaches the dissociation threshold (red mark), the molecule can undergo u nimolecular dissociation (19)

PAGE 20

20 Figure 2 2 The geometry of an open, cylindrical ICR cell. The ions are trapped axially by applying a small DC voltage (Utrap) to each capping cylinder. The cyclotron mo tion of a trapped ion is indicated. Figure is adapted from Ref. 7.

PAGE 21

21 CHAPTER 3 COMPUTATIONAL METHOD S 3.1 Overview In this chapter the candidate structures for b2 and b5 oligoglycine fragment ions will be explored. The computational approach involves commercially available software, as well as in house developed scripts to facilitate data flow and analysis 3.1.1 Pepti de F ragments of I nterest For the b2 fragment generated from protonated tri glycine, two chemical structures are possible, namely the cyclic diketopiperazine and the oxazolone structure. For the oxazalone structure, protonation at both the amino terminus an d oxazolone ring N are considered. The 3 isomeric structures are shown in Figure 3 1 For the sake of convenience, the following nomenclature is used: b2g3_cyc denotes the cyclic b2 structure generated from triglycine (GGG). The oxazolone structure protona ted at the N terminus is labeled as b2g3_ox_n, whereas b2g3_ox_ox denotes the oxazolone structure protonated at the C terminal oxazolone ring. Similarly, for the larger b5 fragments generated from protonated octaglycine, three distinct chemical structures are considered: b5g8_cyc, b5g5_ox_n and b5g8_ ox_ox, as shown in Figure 3 2 3.1.2 Computational Procedure In order to obtain reliable frequency spectra from quantum chemical calculations, low energy candidate structures must be found on the potential energ y surface (PES). This requiresa thorough exploration of the conformational space for each of these task, as the computational cost is minimized vis vis quantum, che mical calculations.

PAGE 22

22 Thus, a combination of force field and quantum chemical calculations are necessary to produce theoretical infrared spectra that serve as a comparison to the experimental results. In this project, the AMBER (Assisted Model Building with Energy Refinement) force field model is employed, as it was specifically developed to model the conformational space of peptides/proteins. While peptide CID product ions are not directly parametrized in AMBER, a parameterization procedure is available to these structures. This procedure involves manually constructing the chemical structures in HyperChem (Gainesville, FL), as explained in detail below. The conformational searching in AMBER is effected by simulated annealing cycles. These conformati ons are then further optimized with density functional theory (DFT) approaches in the Gaussian03 software package, and harmonic frequency calculations are carried out. All calculations involving AMBER and Gaussian03 are performed in the Unix/Linux environm ent at the High Performance Computing Center in University of Florida. Given the large number of conformations and range of software packages employed, the data flow is managed by in house scripts and programs. A schematic flow chart of the computational p r ocedure is presented in Figure 3 3 This is explained in detail in the sections below. 3.2 HyperChem 3.2.1 Overview HyperChem TM is a commercial software developed by HyperCube, Inc., a company headquartered in Gainesville, Florida. It is a powerful tool for visually building molecule with a ready to use amino acid database, and provides a variety of calculation tools, ranging from molecular dynamics to semi empirical to ab initio methods.

PAGE 23

23 machine software, which lacks the capability of parallel computing for geometry optimization. Consequently, it i s not suitable for large scale calculations that involve thousands of multi hour or multi day jobs. 3.2.2 Visual ized Peptide B uilder HyperChem comes with a visualized molecule editing interface, which allows convenient construction of barebone structures u sing the built in amino acid database. These molecules can be readily modified, by adjusting the charge state, intramolecular distances and dihedral angles, and addition of chemical moieties (e.g. oxazolone ring) through bond formation. Each chemical stru cture is built separately in HyperChem by the procedure described above. 3.2.3 Preliminary Geometry Optimization at S emi empirical L evel HyperChem has limited capability in geometry optimization, for the reasons mentioned above. The freshly built seed stru ctures are optimized by semi empirical methods (AM1) implemented in HyperChem. The Cartesian coordinates for the different chemical structures are then employed in the subsequent calculations. 3.3 AMBER: Assisted Model Building with Energy Refinement 3.3.1 Overview Originally developed under the leadership of Peter Kollman, AMBER is now being developed in an active collaboration by groups in Rutgers University, University of Utah, SUNY Stony Brook, UC Irvine, University of Florida and Encysive Pharmaceuticals (25) Today it is one of the most powerful molecular dynamics simulation software available to the public. The AMBER software contains two parts: a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public

PAGE 24

24 domain, and are used in a variety of simulation programs); and a package of molecular simulation programs which includes source code and demos. In principle, AMBER treats all atoms in a molecule or ion as point charges while simulating electrostatic interactions and hydrogen bonding, and also takes into account bond distances, angles, torsions, etc, to define force constants, and the molecular dynamics simulation is based on interactions between atoms obey ing rules from the restricted electrostatic potential. 3.3.2 Antechamber Within the software, molecular dynamics (MD) parameters, such as force constants for bond distance, torsional and dihedral angles for existing chemical structures are stored. When a m olecule is imported into AMBER, each atom is assigned to an atom type with predefined MD parameters. However, molecules with novel chemical moieties (e.g. oxazolone ring) are not recognized in AMBER, and hence a parameterization procedure is necessary viat he Antechamber program. Before the Antechamber procedure, the seed structures from HyperChem, already optimized at the semi empirical level (AM1), are optimized further with B3LYP/6 31g* to achieve adequate structures, before the electrostatic potential (E SP) distributions on and around the ions are calculated using the Hartree Fock (HF) method. Based on this ESP information, Antechamber first performs a restricted ESP fit, and then determines all of the MD parameter required to perform a MD simulation from the RESP (restricted electrostatic potential) procedure (28) The actual implementation of the above procedure involves many steps, and hence a Linux/Unix script was written to automate this operation. This script has been tested and applied numerous times, and its robustness and efficiency is well demonstrated. Th e MD force constants and point charges for the molecule are stored in a topology file (file extension .prm), which

PAGE 25

25 remains unchanged throughout AMBER simulations. Conversely, the Cartesian coordinates of the molecule are stored in a coordinate file (file e xtension .crd). 3.3.3 Annealing AMBER provides various MD simulation methods, based on a change in internal such an approach, the starting structure is first heated t o a specified temperature (here: step wise fashion (e.g. 10 steps). In other words, the temperature is lowered by a fixed step and stays at this temperature for a particul ar time interval. The temperature is then lowered again, before finally reaching 0K. The minimized structure is saved, and is treated as the s tarting structure for the next annealing cycle. After repeating a specific number of loops (100 or 300 in our case ), the simulation comes to an end, generating the same number of candidate structures as annealing cycles that are employed. It is expected that a multitude of conformers are obtained in this way, which is the purpose of a conformational search. An AMBER r unning script example is attached at the end of this thesis. The generated candidate structures will then be optimized by DFT methods. 3.4 Gaussian 3.4.1 Overview Gaussian TM is one of the most widely used and powerful softwares to perform ab initio calculations and electronic structure modeling. In these calculations, Gaussian is used for ESP potential calculations, geometry optimizations and vibrational IR spectra calculations.

PAGE 26

26 3.4.2 ESP C alculation The necessity of electrostatic potential calculati ons to parameterize b fragment structures for AMBER has already been discussed. In Gaussian, the keyword ( #HF/6 31G* pop=mk IOp(6/33=2) 31G* gives the level of theory and basis set to Merz Singh Kol lman charges for the molecule and potentials ) is used to perform a such calculation s using the Hartree Fock method and a 6 31g* basis set. In the Gaussian output file, charges for each individual atom are provided, and ESP distribution is given in the form of the ESP potential values at a large group of sample points around the molecule/ion. 3.4.3 DFT C alculations Primarily for computational cost concerns, optimizations are first effecte d at the lower 3 21g basis set, prior to going to 6 31g* and finally 6 31g**. The b3LYP (Becke 3 Parameter, Lee, Yang and Parr) functional is employed in these calculation. The final IR spectra are calculated at b3LYP/6 31g**. The theory of DFT (density fu nctional theory) has been thoroughly discussed elsewhere (30) quantum mechanics to describe the electronic structure of many b ody system using functional of electron density. The current DFT theory is on the framework of Kohn Sham system (31) within which the many body system with interacting electrons in an external potential is converted into non interacting electrons in an effective potential. The total energy for such system is given by functional of electron density:

PAGE 27

27 Where T s is the Kohn Sham kinetic energy: v ext is the external potential on the interacting system, E xc is the exchange correlation energy and V H is the Coulomb energy: 3.4.3.1 Geometry o ptimization The cost for b2 ion calculations is relatively cheap, with a highest recorded running time for 6 31g** DFT optimization below 5 hours. In the case of b5 ions, the cost is much more expensive as a single optimization can take up to 2 days on a single processor with 1G memory on the HPC server. 3.4.3.2 Frequency c alculation Frequency calculations are performed using 6 31g** The Gaussian keyword : #B3LYP/6 31G** freq test The calculated frequencies need to be multiplied by a scaling factor before comparisons can be made with experimental results. 3.5 In House P rograms and S cripts 3.5.1 Overview Given the different software packages involves and the large number (100 to 300) of candidate structures, scripts are required to automate some of the tedious tasks in converting file formats and manage the data flow. Moreover, interpretation and analysis for these results also makes manual checking impractical. The programs described below are written in house as Linux/Unix shell scripts and C/C++/Fortran programs, to m ake the above tasks either fully or at least semi automated.

PAGE 28

28 3.5.2 PDB C onverters and I nput F ile G enerators One task is bridging between different software formats, e.g., conversion from HyperChem output to Gaussian input, Gaussian output to AMBER input, A MBER output to Gaussian input, as well as output/input conversion between various DFT basis set levels. The standardized form and widely applicable nature of the PDB (Protein DataBase) file format makes it an attractive intermediate in converting formats. A series of single or double directional convertor between PDB and other formats are developed here, so that virtually every pair of file formats are connected through a PDB intermediate. 3.5. 3 Semi automatic A nalyzers and F ilters 3.5. 3 .1 Overview To expedite the submission of large amounts of jobs, several batch job submission scripts are generated. Within a batch of submitted jobs, the outcomes can vary. An analysis script automatically recognizes various outcomes, such as normal termination, unfi nished jobs, or error bearing jobs, and sorts them accordingly. The unfinished jobs are then recovered from the Gaussian checkpoint files, and the error bearing jobs are individually checked and resolved. The large number of conformations also makes it dif ficult to manually inspect the running results, requiring automated data analysis tools Such analysis includes structure validation, energy extraction and ranking, in particular zero point corrected energy and Gibbs free energy corrected energy. Another i mportant aspect of the efficient use of computational resources is redundancy reduction for structure calculations Thanks to our in house codes and scripts, many of these tasks are now largely automated

PAGE 29

29 3.5. 3 .2 C is trans f ilters One obstacle en route to our final energetically selected structures is that sometimes our conformational search end ed up with chemically unfavored or impractical candidate structures One example concerns cis amide bonds, as opposed to trans amide bonds where the latter are more favored. AMBER conformational search produces cis structures, which suggests that cis trans isomerization barrier is not parameterized correctly. A FORTRAN code is used to remove non trans structures, by calculating the dihedral angle of the dihedral angle. In detail, this was done by using an empirically selected threshold value for the cosine of the dihedral angle (see Chapter 4) 3.5. 3 .3 RMSD calculators and analyzers The annealing procedure results in many similar conformers. The high redundancy of structures requires sorting them into families to reduce the computational cost of quantum chemistry calculations. In order to automatically sort conformations into familie s of structures, a family recognition code is developed. This code is based on root mean square deviation calculations between Cartesian coordinates of geometries. RMSD minimization The RMSD between two structures is given by: Where and are two structures, and / is the coordinate value of the atom in /

PAGE 30

30 However, in order to compare two structures, the coordinates must have maximum overlap with that of the other, to obtain minimum RMSD. T he Kabsch algorithm (26) provides a mathematical solution to this challenge. Starting from a standardized C source code, a program is developed in house to calculate minimized RMSD values between all conformer pairs within a given chemical family. The results from this RMSD calcu lation are summarized in a triangle matrix with dimension s identical to the number of conformers Weighed RMSD vs. non weighed RMSD In the above RMSD approach, the deviation of hydrogen atoms and that of nitrogen/oxygen/carbon atoms are of the same contribution, whereas the similarity for the position of heavy atoms is actually of higher priority Hence the RMSD calculation was modified by multip lying an atom weight term to the RMSD value of each atom, and dividing the sum of the weighed RMSD by the total molecular weight, instead of number of atoms. The program actually has an internal switch to control such added function. When the weighted opti on is turned off, it simply set the weight of each atom as 1, to recover the non weighed RMSD calculation. RMSD calculation: whole molecule vs. sub set of molecule In our calculation, the peptide fragments of interest are oligoglycine b2 and b5 fragments, the simplest amino acid residue. In future studies peptides involving amino acid residues with more complicated side chains will be considered. The complexity of such molecules will adversely affect the usefulness of the RMSD value for the whole molecule RMSD s of subsets of the molecule (e.g. backbone), or excluding less relevant information (side chains), may therefore be more suitable. The in house RMSD code allows specif ication of the sub set of atoms that should be considered for an RMSD calculation

PAGE 31

31 RMSD analysis: redundancy reduction With th e development of tools to calculate the RMSD, it is now meaningful to think about how to make use of the RMSD matrix, where the conformers within a chemical family are compared in a pair wise fashion. Since the RMSD value reflects the similarity between both conformers, this arbitrary. A high t whereas a low threshold value will not significantly reduce structural redundancy. Other parameters of merit that reflect the similarity in structure are the calculated energies, as well as the computed infrared spectra. A correlation of these parameters to the RMSD value will hence serve as a useful indicator for the RMSD threshold value. RMSD analysis: family recognition The most efficient way to perform our conformational search and IR sp ectra calculation is to adopt a funnel like procedure, where a large number of candidate conformers are generated at the molecular dynamics level. Following sorting into families, only representatives from each family are submitted to DFT optimization. Als o, during multi level DFT calculations, further refining can be performed from one level to the next, to reduce computational cost. The family sorting or structure clustering (27) has such requirements: First, the g enerated conformers must all be represented by typical structures. Second, while this is fulfilled, the number of families or typical structures should be minimized. In (27) several typical clustering methods have b een presented and compared, in either top down (i.e., start from the whole pool of structures and iterate the pool splitting till desired number of clusters are obtained) or bottom up (i.e., start from individual structures and iterate the

PAGE 32

32 merging of clust ers till total number of clusters shrinks to the desired value) manner. However, most of these methods are not designed to comprehensively cover all the conformers with representatives in a strict manner, which in turn is hard to meet our requirement 1. Th us an in house algorithm is designed especially for our application. Procedurally, a straight down experimental calculation is performed, that is, optimization is done to the highest level with DFT method for all the generated conformers, and their IR spec tra are also calculated. The correlation between all possible pairs of structures, such as the mass weighted RMS deviation, electronic energy deviation and characteristic IR peak deviation, are analyzed to suggest a final set of criteria for such structure clustering. The actual results from this study are presented in the subsequent chapter. 3.5. 3 4 AMBER/Gaussian energy analyzers The energy results are of obvious importance in the refining of structures. In order to conveniently obtain such energies for large number of program outputs, several UNIX shell scripts are written to extract energies from AMBER and Gaussian output files, including the generation of ZPE or Gibbs free energy corrected total energies. 3.5. 3 5 IR spectra grabbers and analyzers Simil arly, the computed infrared frequencies and intensities for each conformer are automatically extracted by scripts. These results are further analyzed by selectively extracting the frequencies of diagnostic vibrations. This serves to investigate the positio n of diagnostic peaks as a function of the conformation.

PAGE 33

33 Figure 3 1. B 2 structures of interest: (left to right) b2g3_cyc, b2g3_ox_n, b2g3_ox_ox Figure 3 2 B 5 ions: (left to right) b5G8_cyc, b5G8_ox_n, b5G8_ox_ox

PAGE 34

34 Figure 3 3 Schematic view of the calculation procedure

PAGE 35

35 CHAPTER 4 RESULTS AND DISCUSSI ONS 4.1 Overview The chapter is separated into two sections. In the first section, the procedures used here are systematically tested to provide an understanding of their shortcomings. For instan ce, the correct parameterization of the peptide fragment structures in AMBER is tested by comparing the AMBER energies to DFT energies. Other aspects of the calculations that are tested include the capability of AMBER in finding adequate minimum structures and the dependency of annealing temperature on finding low energy structures. In section two, the comparison between experimental IR spectra and the calculated spectra from the energetically most favored structures is presented. The predicted frequencies of diagnostic bands are shown for conformers within a chemical structure. Note that since the b2 structures have very limited conformational freedom (only two or three different conformers with in the same chemical structure) section 1 will focus on the b5 ions. 4.2 Testing of P rocedures 4.2.1 Energy C omparison In order to test the accuracy of AMBER energies, as well as the validity of the parameterization procedure for the b5 chemical structures, AMBER energies are compared to DFT energies for conformations generated by the annealing procedure. 4.2.1.1 AMBER vs. B3LYP/3 21g DFT single point In Figure 4 1 a c, single point DFT electronic energies (B3LYP/3 21g) of all conformers are compared to their AMBER energies for the various chemical structu res, b5g8_cyc, b5g8_ox_n and b5g8_ox_ox. Each dot on the graph represents a candidate

PAGE 36

36 structure, whose AMBER and 3 21g single point relative energies are reflected by the x and y axis coordinate values, respectively. In the absence of a geometry optimizat ion at the DFT level, the relative energy ranking of identical structures from both computational approaches can be compared. Broadly, the relative ranking shows an agreement between both methods with no significantly off diagonal points. The R 2 value is g iven as a statistical measurement of this diagonal fit. This suggests that the parameterization procedure in AMBER was successful, at least mirroring the energetic picture in DFT. Also, the >1 slopes in the figures indicate a larger range in energy variati ons for DFT calculation compared to the AMBER results. This might be due to the loosely set cis trans differentiator threshold and the subsequent structures with near but non trans amides, which are less favored by DFT calculation and thus results in high er electronic energy. The number of such near trans amides in a structure and the off 180 degrees for each amide then generate the larger range for the energy distribution. 4.2.1.2 AMBER vs. 3 21g and 3 21g vs. 6 31g* optimization During an AMBER annealing cycle, the molecule is minimized at the molecular dynamics level. To investigate how the optimization in AMBER compares to DFT, each conformation within a chemical structure is optimized at 3 21g, and those energies are compared to the minimum energies fr om AMBER. The different levels of DFT, 3 21g, 6 31g*and 6 31g** are also compared to one another in this way to determine at what levela the results converge. Figure 4 2 shows the comparison for 3 21g vs. AMBER, as well as 6 31g* vs. 3 21g, for the three c hemical structures. The comparison between AMBER energies and 3 21g DFT optimized energies for the different chemical structuresdisplay different trends. For macrocycle b5g8_cyc, most

PAGE 37

37 points are distributed in the diagonal region, which means that the AMBE R energy ranking does a reasonable job compared to 3 21g. Conversely, for the oxazolone structures b5g8_ox_n and b5g8_ox_ox, a large number of points appear in the lower off diagonal region. This means that many structures that are energetically not favore d by AMBER actually end up as lower energy structures after optimization with 3 21g. This suggests that AMBER has a limited capability in geometry optimization, and that her implication from this study is that structures should not be triaged based on their AMBER energetics. Other strategies must be found to reduce the number of AMBER candidate structures that are submitted to DFT calculations. The comparison between the e nergies of 6 31g* vs. 3 21 optimized structures shows that a much tighter diagonal distribution is observed than for 3 21g vs. AMBER, as indicated by the higher R 2 values. For b5g8_cyc even fewer points lie in an off diagonal position. For the oxazolones t he effect is much more pronounced. While there is no ideal correlation in the data, few high energy 3 21g structures are minimized to low energy structures at 6 31g*. In conclusion, the geometry optimized energy ranking following AMBER is of little indicat ion in the eventual energy ranking at DFT for highly flexible oxazolone structures. On the other hand, the optimized energy rankings between 3 21g and the higher level 6 31g* do show a higher level of convergence, even if some outliers are observed. Beside s, the slopes of 6 31g* vs 3 21g results for all three structures are all remarkably lower than one, which means the gap between low and high energy structures after 3 21g optimization has shrunk after further optimized at

PAGE 38

38 6 31g* level. This is understand able since the higher supposed to benefit more from optimization with higher level basis set. 4.2.1.3 Energy comparison between 6 31g* and 6 31g** optimized structures Concluded from all cases in 4.1.2, energetical refinement, in which only low energy structures are retained for higher level optimization, is not applicable for AMBER optimized candidate structures. Geometry optimization at 6 31g** is compared to the 6 31g results in Figure 4 3 In all chemical structure cases, the data points follow a strictly diagonal distribution. This indicates a convergence in results between both levels of theory, which suggests that merely low energy conformers should be submitted t o 6 31g**. 4.2.2 Influence of A nnealing T emperature Procedure for AMBER annealing cycle conformational search is given in 3.3.3 (See Figure 4 4) In order to investigate the influence of the annealing temperature on the quality of the conformational search i.e., whether a particular temperature results in more energetically favored structures, the 6 31g* optimized structures are sorted by their electronic energy. Between the highest and lowest energies within the structures, ten evenly spaced energy interv als are set, and a structure is sorted into a group while its electronic energy drops in the corresponding interval. For structures within the same chemical structure, like b5g8_cyc, those from 300K and 500K annealing are separately sorted, and the results are given in Figure 10. Note that instead of using the absolute number of structures in each interval, the normalized result, or distribution percentage, is used for a direct comparison between 300K and 500K.

PAGE 39

39 The trends among chemical structures are uneve n. For b5g8_cyc, plenty of low energy structures are generated for both temperatures, even if 300K annealing generated more structures in the lowest energy interval. For b5g8_ox_n, the distributions for both 300K and 500K closely resemble each other, but f ew structures end up in the lowest energy group. In this case, neither temperature is optimal. The b5g8_ox_ox results behave completely differently. 500K annealing yields many more low energy structures than 300K annealing. In principle, the rationale beh ind higher molecular dynamics temperatures are 1) that interconversion barriers between different structures are overcome more easily, and 2) that the conformational searching occurs at a faster rate than at lower temperatures. Only the res ults for b5g8_ox _ox seem to warrant these hypotheses. It is conceivable that the number of structures in this search was insufficient to give a truly stochastic behavior. 4.2.3 Cis trans S tructure D ifferentiator The motivation and mechanism of such differentiator has been discussed in 3.5.4.2. The experimentally less favored cis structures have to be excluded from further geometry optimization to reduce overall cost. The threshold for the dihedral angle, or its co sine value, between the C=O and N H bonds, is set empirically. Structures with one threshold) will be labeled as cis structures and excluded from higher level optimization or fr equency calculations. It is then natural to investigate the impact of the threshold selection to the effectiveness of cis trans differentiating. The first step several 3 21g and 6 31g* optimized structures with both cis and trans amides are picked to vali date the in house program. Results show that cosine values under 0.99 (or 171.89 degrees) are obtained for all the trans amides, which is then set as the threshold for all DFT

PAGE 40

40 optimized structures. Furthermore, the original candidate structures generated by the AMBER conformational search are selectively tested. For some trans amides the cosine value for their dihedral angles remains in between 0.99 and 1. For others, the C=O N H dihedral is further away from 180 o and hence the cosine value > 0.99. On the other hand, the dihedral angles for cis amides vary greatly, with their cosine values ranging from 0.45 (or 116.74 degrees) to 1. We have hence decided to employ a cosine threshold value of 0.75 (or 138.59 degrees), which generally yield perfect tran s bonds at the DFT optimization.to the value of 0.75 is somewhat arbitrary, but it is empirically found to exclude all cis structures, and the non ideal trans structures adopt trans configuration during DFT optimization. The automated program that handles ci s trans filtering of structures converts all AMBER/Gaussian03 files into the PDB file format; the dihedral angle cosine value for each amide is determined, and structures are weeded out based on the thresholds set above. 4.3 Final R esults I nterpretati on and F urther M ethodology D evelopment 4.3.1 Comparison of IR S pectra b etween E xperimental R esults and E nergetically B est R esults The experimental IR MPD spectrum of b2_G3 is compared to the energetically most favored candidate structures for b2G3_cyc, b2G3_ox_ox and b2G3_ox_n in the mid IR range (1200 2000cm 1 ) in Fig ure 4 5 Although b2G3_cyc has a lower predicted energy, the experimental spectra fail to reproduce the predicted bands, which indidates a lack for such a structure. For the oxazolone struc tures, b2G3_ox_n yields the most convincing agreement for all of the bands in the spectrum, whereas b2G3_ox_ox shows a slightly poorer match In terms of reaction mechanism, both of these structures are oxazolones and only differ in the site of proton attac hment. The most diagnostic band

PAGE 41

41 for identification of the oxazolone structure is the oxazolone C=O stretch band at 1960 cm 1 The comparison to theory for b5 fragments is presented in Fig ure 4 6 In this larger structure, the amide I (C=O stretch) and amid e II (N H bend) dominate the IR MPD spectrum. Nonetheless, there are other bands that can yield diagnostic information. The band at 1830 cm 1 while weak, is consistent with the presence of b5G8_ox_n, as this matches the oxazolone C=O stretch band of such a structure. The corresponding oxazolone stretch band for b5G8_ox_ox is not observed; however, this may result from an incomplete scan in this region in the spectrum. T he IR MPD spectra we re scanned only to 1940cm 1 compared to a predicted band position o f 1960 cm 1 While the presence of oxazolone structures is validated by diagnostic bands in the higher frequency (1780 1950 cm 1 ) uncongested region of the spectrum, the presence of the macrocycle structure is more difficult to establish. Such a structure is thought to be protonated at a backbone carbonyl, which gives rise to a CO H + bending mode at 1430 cm 1 for b5G8_cyc. In fact, such a band is confirmed in the IR MPD spectrum. The corresponding band intensities for oxazolone structures are much weaker in this region, even if some intensity is predicted. A more compelling case for the presence of the macrocycle structure can be made for the b8 fragment generated from octaglycine, b8G8. The IR MPD spectra for b2G3, b5G8 and b8G8 are contrasted in Figure 4 7 The oxazolone C=O stretch at 1830 cm 1 is not present in this case, and the general appearance of the spectrum matches that of the macrocycle. No computations were attempted for b8G8. Instead, the computations for b5G8 were employed to characterize the d iagnostic vibrations.

PAGE 42

42 4.3.2 Distribution of IR S pectra P eaks in D iagnostic F requency R egions for B 5 I ons The interpretation of the IR MPD spectrum in Fig ure 4 7 is based on a comparison to the lowest energy conformers for the 3 chemical structures that are considered. Given the many conformations considered for each of these chemical structures, this begs the question how reliably the diagnostic oxazolone C=O stretch is predicted as a function of conformation. Fig ure 4 8 displays the predicted IR frequency for the oxazolone band for the 93 lowest energy conformations for b5g8_ox_n. Most of the frequencies are predicted in the 1760 1820 cm 1 region, which is unique among the 3 chemical structures and can thus be treated as its signature peak. The same approa ch described above is done for b5g8_ox_ox, where the oxaolone C=O stretching mode is shown to be in the 1930 1950 cm 1 region. Figure 4 9 shows that the differences between conformations within a chemical structure are much less important than differences between different chemical structures. This also validates the approach used in this study to interpret the IR MPD results. 4.3.3 RMSD based R edundancy R eduction The importance of root mean square deviation (RMSD) calculations in the light of family sortin g has been discussed in 3.5.4.3. The first utilization of RMSD analysis is to eliminate structural duplicates generated by AMBER. Note that this was the initial motivation we develop the RMSD approach, for we found plenty of redundancy structures with both identical electronic energy and virtually identical geometries. Below is a typical example. Structures 300003 and 300043 are the 3th and 43 rd conformational search results from 300K annealing. Below are their geometry and energetic comparison after

PAGE 43

43 optimization by AMBER and 3 21g/6 31g* DFT. The IR spectra obtained from 6 31g* calculation are also included in Figure 4 10 As can be seen from the comparisons, the two structures started with completely different Cartesian coordinate representations; ho wever, their minimized RMSDs are small at the AMBER and DFT levels. Both structures are very close in energy throughout the different levels of theory. Moreover, the computed IR spectra at 6 31g** are virtually indistinguishable. This is a clear example of a structural duplicate, which suggests that structure 300043 should have been depleted after the molecular dynamics level. In reality, such structures represent a significant portion of the entire candidate pool. To improve overall computational efficienc y and robustness we aim to apply more annealing cycles during the MD simulation, and perform redundancy reduction to results from each level of optimization by RMSD filtering and energy comparison. After manually inspection of around 50 cases, the RMSD and energy difference threshold is set to 0.5 and 0.1 kcal mol 1 for AMBER results, respectively. 0.1/1E 4 for 3 21g, and 0.01/1E 4 for 6 31g*. 4.3.4 Family S orting or S tructure C lustering As previously mentioned in 3.5.4.2.5, automatically sorting candidat e conformers into families is essential in the proposed funnel like procedure. The mass weighted RMSD value is used as a primary criterium for the sorting, and similarity can be established based on a low RMSD value. In the procedure above for instance, th e RMSD threshold was empirically determined. Nonetheless, such a procedure is bound to result in an arbitrary determination of the threshold. Here, it is attempted to determine the RMSD threshold in a more objective way, by comparing pair wise RMSD values to

PAGE 44

44 other parameters, such as the similarity in electronic energies and the calculated frequencies. The correlation between RMSD values and energies, as well as IR frequencies, is expected to yield a more reliable estimate of the RMSD threshold value. The b 5g8_ox_n results from 6 31g** DFT calculation, including the electronic energies and IR spectra, are analyzed here. After cis trans validation, redundancy reduction and energy refinement, the pool contains 77 valid structures. First, the correlation betwee n electronic energy deviation (absolute value) and RMSD of any possible pair of structures is analyzed, and the results are given in Fig ure 4 11 As shown in Fig ure 4 11, there is a clear cutoff for RMSD value at around 0.8 angstrom. For a pair of structure whose RMSD is lower than 0.8, their energy deviation is remarkably small, whereas pairs with greater RMSD may either have small or large energy deviation. Thus, the ana angstrom. Next, the correlation between IR spectra deviation and RMSD deviation is considered. Unlike the energy comparison, to completely compare the IR spectra from two structures is less practical. C onsequently, a simplified approach has to be taken. For the b5g8_ox_n, its signature peak is the oxazolone C=O stretching mode, which is within 1760 to 1850cm 1 In other words, instead of comparing the spectra across the entire wave number range, only the peak in this region is picked for each structure. Then the absolute deviation of frequencies from two structures is used in the correlation analysis versus RMS deviation. The results are given in Figure 4 12 In Fig ure 4 12 a cutoff for RMSD can also be recognized at around 0.8 Angstrom. Similar to the case of energy versus RMSD, frequency deviations are kept very small

PAGE 45

45 when the RMSD between two structures are within 0.8 angstrom. However, when the RMSD grows over such threshold, frequency deviation can b e either very great, or very small, thus a safe RMSD threshold is still suggested at 0.8 Angstrom. The above correlation analysis gives relationship between RMS deviation and the results which are experimentally comparable. However, these structures are al ready optimized with high level DFT methods, whereas in order to reduce computational cost the sorting must be performed far before then, i.e., after the molecular dynamics minimization and low level DFT calculation. A detailed analysis of the data is stil l on going to derive useful RMDS thresholds for lower levels of theory. Given the change in geometries from AMBER to 3 21g to 6 31g*, it is not clear at present how useful this approach will be. (a) Figure 4 1 (a) B3LYP/3 21G si ngle point vs. AMBER calculated energy for b5g8_cyc. (b) B3LYP/3 21G single point vs. AMBER calculated energy for b5g8_ox_n (c) B3LYP/3 21G single point vs. AMBER calculated energy for b5g8_ox_ox

PAGE 46

46 (b) (c) Figure 4 1. Continued

PAGE 47

47 (a) Figure 4 2 (a) E nergy comparisons for bottom: 3 21g optimized vs AMBER, and top: 6 31g* optimized vs. 3 21g optimized for b5g8_cyc (b) E nergy comparisons for AMBER and 3 21g optimized and 3 21g optimized vs 6 31g* optimized for b5g8_ox_n (c) E nergy comparisons for AMBER and 3 21g optimized and 3 21g optimized vs 6 31g* optimized for b5g8_ox_ox

PAGE 48

48 (b) Figure 4 2. Conti nued

PAGE 49

49 (c) Figure 4 2. Continued

PAGE 50

50 (a) (b) Figure 4 3 (a) Electronic energy comparison between 6 31g* and 6 31g** results for b5g8_cyc (b) Electronic energy comparison between 6 31g* and 6 31g** results for b5g8_ox_n (c) Electronic energy comparison between 6 31g* and 6 31g** results for b5g8_ox_ox

PAGE 51

51 (c) Figure 4 3. Continued Figure 4 4 Distribution of electronic energies for structures initially generated with annealing temperature of 300K and 500K for b5g8_cyc (left) and b5g8_ox_n (right)

PAGE 52

52 Figure 4 4. Continued

PAGE 53

53 Figure 4 5 IR MPD spectrum of th e b 2 G3 frag ment generated from G ly Gly Gly, compared to compute spectra for (A) diketopiperazine structure protonated on a carbonyl O, (B) oxazolone structure protonated on the oxazolone ring N, and (C) oxazolone structure protonated on the N terminus. Figure is modified from reference (22)

PAGE 54

54 Figure 4 6 Mid IR MPD spectrum of b 5 G8 (generated from octa glycine), compared to the lowest energy conformers for the various chemical structures: (A) macrocycle structure protonated on backbone carbonyl, (B) oxazolone structure protonated on N terminus, and (C) oxazolone structure protonated on oxazolone ring N. The relative energies to the lowest conformer are indicated. The chemically diagnostic bands are label ed. Figure is modified from reference (22)

PAGE 55

55 Figure 4 7 Comparison of mid IR MPD spectra of b2 G3, b5 G8 and b8 G8

PAGE 56

56 Figure 4 8 Distribution of peaks in the 1760 to 1850 cm 1 region for b5g8_ox_n, with an experimental IR spectra background Figure 4 9 Distribution of peaks in the 1880 to 2000 cm 1 region for b5g8_ox_ox, with an experimental IR spectra background

PAGE 57

57 Figure 4 10 IR spectra from 6 31g* DFT calculation for 300003 and 300043 structures

PAGE 58

58 Figure 4 11 Correlation between energy deviation and RMS deviation for any possible pair of candidate structures for b5g8_ox_n. 96 pairs of structures have RMSD values less than 0.8 Angstrom

PAGE 59

59 Figure 4 12 Correlation between frequency deviation and RMSD for b5g8_ox_n structures. 94 pairs of structures have RMSD values less than 0.8 Angstrom

PAGE 60

60 CHAPTER 5 CONCLUSIONS The procedure of IR spectra calculation and the prior conformational search and geometry optimizations are investigated in details to improve overall efficiency. Calculated best IR spectra are compared with experimental results to provide structural information. Different levels of DFT geometry optimization are tested and calculation with 6 31g* basis set is confirmed as an efficient choice. For the conformational search, different starting annealing temperatures are tested and t he 300K annealing temperature is a better choice for th e macrocyclic structures, whereas 500K prevails for b5g8_ox_ox. Each possible structure of our interested shows its characteristic IR spectra with signature vibrational modes, which is essential in determining the existence of such structure in the b5 frag ments. The distribution of signature peaks within one chemical family is analyzed, which justifies the effectiveness of the proposed method in calculating the IR spectra. Finally, the use of root mean square deviation (RMSD) as structure clustering/redunda ncy reduction parameter has been tested, and an RMSD threshold of 0.8 Angstrom is concluded as the criteria for determining structures in the same family. The comparison between calculated and experimental IR spectra shows the exclusive existence of oxazo lone structures for the b2 fragments, whereas for the b5 fragments both N protonated oxazolone structure and macrocyclic structure exist.

PAGE 61

61 LIST OF REFERENCES 1. M. Tyers, M. Mann, Nature 422 193 (2003). 2. H. Zhu, M Bilgin, M. Snyder, Annu. Rev. Biochem. 72 783 (2003). 3. E. Phizicky, P. I. H. Bastiaens, H. Zhu, M. Snyder, S. Fields, Nature 422 208 (2003). 4. A. Sali, R. Glaeser, T. Earnest, W. Baumeister, Nature 422 216 (2003). 5. S. Hanash, Nature 422 226 (2003). 6. K. Biemann, Biomed. Environ. Mass Spectrom. 16 99 (1988). 7. H. Steen, M. Mann, Nature reviews|molecular Cell Biology 5 699 (2004). 8. J. B. Fenn, M. Mann, C. K. Meng, S. F. Wong, C. M. Whitehouse, Science 246 64 (1989). 9. M. Karas F. Hillenkamp, Anal. Chem. 60 2299 (1988). 10. B. T. Chait, R. Wang, R. C. Beavis, S. B. H. Kent, Science 262 89 (1993). 11. P. Roepstorff, J. Fohlman, J. Biomed. Mass Spectrom 11 601 (1984). 12. A. G. Harrison, A. B. Young, C. Bleiholder, S. Suhai B. Paizs, J. Am. Chem. Soc. 128 10364 (2006). 13. S. Campbell, M. T. Rodgers, E. M. Marzluff, J. L. Beauchamp, J. Am. Chem. Soc. 117 12840 (1995). 14. M. K. Green, C. B. Lebrilla, Mass Spectrom. ReV. 16 53 (1997). 15. D. E. Clemmer, M. F. Jarrold, J. Mass Spectrom. 32 577 (1997). 16. T. Wyttenbach, M. T. Bowers, Top. Curr. Chem. 225 207 (2003). 17. I. Garcia, K. Giles, R. H. Bateman, S. J. Gaskell, J. Am. Soc. Mass. Spec. 19 1781 (2008). 18. J. R. Eyler, Mass Spectrometry Reviews 28 448 (2009). 19. N. C. Polfer, J. Oomens, Mass Spectrometry Reviews 28 468 (2009). 20. N. C. Polfer, J. Oomens, S. Suhai, B. Paizs, J. Am. Chem. Soc. 129 5887 (2007).

PAGE 62

62 21. U. Erlekam et al. J. Am. Chem. Soc. 131 11503 (2009). 22. X. Chen, L. Yu, J. D. Steill, J. Oomens, N. C. Polfer, J. Am. Chem. Soc. 131 18272 (2009). 23. R. L. Woodin, D. S. Bomse, J. L. Beauchamp, J. Am. Chem. Soc. 100 3248 (1978). 24. K. Hakansson, H. J. Cooper, R. R. Hudgins, C. L. Nilsson, Current Organic Chemistry 7 1503 (2003). 25. http://ambermd.org/ 26. W. Kabsch, Acta Cryst. A32 922 (1976). 27. J. Shao, S. W. Tanner, N. Thompson, T. E. I. Cheatham, J. Chem. Theory Comput. 3 2312 (2007). 28. J. Oomens, B. G. Sartakov, G. Meijer, G. v. Helden, Int. J. Mass Spectrom. 254 1 (2006). 29. C. I. Bayly, P. Cieplak, W. D. Cornell, P. A. Kollman, J. Phys. Chem. 97 10269 (1993). 30. N. Argaman, G. Makov arXiv:physics/9806013v2 (1999) 31. W. Kohn, L. J. Sham, Phys. Rev. 140 A1133 (1965). 32. http://www.rijnhuizen.nl/felix/ 33 N. C. Polfer, J. Oomens Phys. Chem Chem. Phys. 9 3804 ( 2007 ). 34 D. Oepts, A. F. G. van der Meer, P. W. van Amersfoort Infrared Phys Technol. 36 297 (19 9 5).

PAGE 63

63 BIOGRAPHICAL SKETCH Long Yu was born in the city of Liaoyang, Liaoning province, in the People s Republic of China. He received a Bachelor of Science degree in physics from the department of Special Class for the Gifted Young, University of Science and T echnology of China in June 2004. After graduation he worked as a research assistant first in Uni versity of Science and Technology of China, and then in City University of Hong K ong. In March 2006 he was admitted by the Ph.D. program in the University of Florida chemistry department and was officially enrolled in July 2006. Long received a Master of Science degree in December 2010 and continued for his Ph.D. study in the D epartment of C hemistry, University of Florida, supervised by Dr. Nicolas Polfer in the physical division.