<%BANNER%>

Constant pH Replica Exchange Molecular Dynamics Study of Protein Structures and Dynamics

Permanent Link: http://ufdc.ufl.edu/UFE0041533/00001

Material Information

Title: Constant pH Replica Exchange Molecular Dynamics Study of Protein Structures and Dynamics
Physical Description: 1 online resource (221 p.)
Language: english
Creator: Meng, Yilin
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: Chemistry -- Dissertations, Academic -- UF
Genre: Chemistry thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: CONSTANT pH REPLICA EXCHANGE MOLECULAR DYNAMICS STUDY OF PROTEIN STRUCTURE AND DYNAMICS Solution pH is a very important thermodynamic variable that affects protein structure, function and dynamics. Enormous effort has been made experimentally and computationally to understand the effect of pH on proteins. One category of computational method to study the effect of pH is the constant-pH molecular dynamics (constant-pH MD) methods. Constant-pH MD employs dynamic protonation in simulations and correlates protein conformations and protonation states. Therefore, constant-pH MD algorithms are able to predict pKa value of an ionizable residue as well as to study pH-dependence directly. A replica exchange constant-pH molecular dynamics (constant-pH REMD) method is proposed and implemented to improve coupled protonation and conformational state sampling. By mixing conformational sampling at constant pH (with discrete protonation states) with a temperature ladder, this method avoids conformational trapping. Our method was tested on seven different biological systems. The constant-pH REMD not only predicted pKa correctly for model peptides but also converged faster than constant pH MD. Furthermore, the constant-pH REMD showed its advantage in the efficiency of conformational samplings. The advantage of utilizing constant-pH REMD is clear. We have studied the effect of pH on the structure and dynamics of C-peptide from ribonuclease A by constant-pH REMD. The mean residue ellipticity at 222 nm at each pH value is computed, as a direct comparison with experimental measurements. The C-peptide conformational ensembles at pH 2, 5, and 8 are studied. The Glu2-Arg10 and Phe8-His12 interactions and their roles in the helix formation are also investigated. Constant-pH REMD method is applied to the study of hen egg white lysozyme (HEWL). pKa values are calculated and compared with experimental values. Factors that could affect pKa prediction such as hydrogen bond network and interaction between ionizable residues are discussed. Structural feature such as coupling between conformation and protonation states is demonstrated in order to emphasize the importance of accurate sampling of the coupled conformations and protonation states.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Yilin Meng.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Roitberg, Adrian E.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0041533:00001

Permanent Link: http://ufdc.ufl.edu/UFE0041533/00001

Material Information

Title: Constant pH Replica Exchange Molecular Dynamics Study of Protein Structures and Dynamics
Physical Description: 1 online resource (221 p.)
Language: english
Creator: Meng, Yilin
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: Chemistry -- Dissertations, Academic -- UF
Genre: Chemistry thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: CONSTANT pH REPLICA EXCHANGE MOLECULAR DYNAMICS STUDY OF PROTEIN STRUCTURE AND DYNAMICS Solution pH is a very important thermodynamic variable that affects protein structure, function and dynamics. Enormous effort has been made experimentally and computationally to understand the effect of pH on proteins. One category of computational method to study the effect of pH is the constant-pH molecular dynamics (constant-pH MD) methods. Constant-pH MD employs dynamic protonation in simulations and correlates protein conformations and protonation states. Therefore, constant-pH MD algorithms are able to predict pKa value of an ionizable residue as well as to study pH-dependence directly. A replica exchange constant-pH molecular dynamics (constant-pH REMD) method is proposed and implemented to improve coupled protonation and conformational state sampling. By mixing conformational sampling at constant pH (with discrete protonation states) with a temperature ladder, this method avoids conformational trapping. Our method was tested on seven different biological systems. The constant-pH REMD not only predicted pKa correctly for model peptides but also converged faster than constant pH MD. Furthermore, the constant-pH REMD showed its advantage in the efficiency of conformational samplings. The advantage of utilizing constant-pH REMD is clear. We have studied the effect of pH on the structure and dynamics of C-peptide from ribonuclease A by constant-pH REMD. The mean residue ellipticity at 222 nm at each pH value is computed, as a direct comparison with experimental measurements. The C-peptide conformational ensembles at pH 2, 5, and 8 are studied. The Glu2-Arg10 and Phe8-His12 interactions and their roles in the helix formation are also investigated. Constant-pH REMD method is applied to the study of hen egg white lysozyme (HEWL). pKa values are calculated and compared with experimental values. Factors that could affect pKa prediction such as hydrogen bond network and interaction between ionizable residues are discussed. Structural feature such as coupling between conformation and protonation states is demonstrated in order to emphasize the importance of accurate sampling of the coupled conformations and protonation states.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Yilin Meng.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Roitberg, Adrian E.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0041533:00001


This item has the following downloads:


Full Text





CONSTANT pH REPLICA EXCHANGE MOLECULAR DYNAMICS STUDY OF
PROTEIN STRUCTURE AND DYNAMICS















By

YILIN MENG


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2010



























2010 Yilin Meng



























To my family









ACKNOWLEDGMENTS

At the completion of my graduate study at the University of Florida, I would like to

take great pleasure in acknowledging the people who have supported me over these

years.

I primarily thank my advisor, Professor Adrian E. Roitberg. Throughout the years

working in his group, I have learned a tremendous amount from him. His guidance and

encouragement supported me to overcome the obstacles not only in research but also

in my personal life. There is no way I would have achieved my goal without his support

and help.

I am thankful for the support and guidance of my committee members, Professors

Kenneth M. Merz Jr., Nicolas C. Polfer, Stephen J. Hagen, and Arthur S. Edison. I also

would like to thank Professors So Hirata, Joanna R. Long, Carlos L. Simmerling, and

Wei Yang for their guidance in my research. I am very grateful for the assistance and

helpful discussions from my colleagues in the Roitberg group, especially Dr. Daniel

Sindhikara, Dr. Gustavo Seabra, Dr. Lena Dolghih, Dr. Seonah Kim, Jason Swails,

Danial Dashti, Billy Miller, Dwight McGee, and Sung Cho. I appreciate all my friends at

the Quantum Theory Project, the Department of Chemistry and Physics.

I thank the source of funding that supported my graduate study. My research was

supported by National Institute of Health under Contract 1R01 A1073674. Computer

resources and support were provided by the Large Allocations Resource Committee

through grant TG-MCA05S010 and the University of Florida High-Performance

Computing Center.









I want to acknowledge my wife, Xian who encouraged me and supported me to

complete this work. Finally, I am very grateful for my whole family for their love and

encouragement.









TABLE OF CONTENTS

page

A C KNOW LEDG M ENTS ............... ....................... ................ ............... 4

L IS T O F T A B LE S ........................ ................. ........... ..... .............................. 9

LIST OF FIGURES.................................. ......... 10

LIST OF ABBREVIATIONS ............... .................... ............ ............... 17

A BSTRACT ........................ ............................................. 19

CHAPTER

1 INTRODUCTION .............................. ............. .................. 21

1.1 Acid-Base Equilibrium ............... ................ 21
1.2 Amino Acids and Proteins.................................. .... ........ .. ................. 22
1.3 lonizable Residues in Proteins and the Effect of pH on Proteins................... 25
1.4 Measuring pKa Values of lonizable Residues ............ ................................ 29
1.5 M olecular M odeling ............. ..... ............. ......... ... ....... ........ 38
1.6 Potential Energy Surface .............................................................................. 39
1.7 Molecular Dynamics, Monte Carlo Methods and Ergodicity........................... 41
1.8 Theoretical Protein Titration Curves and pKa Calculations Using Poisson-
Boltzm ann Equation ....... .... ................ .......................... ... .. .... ...... 44
1.9 Computing pKa Values by Free Energy Calculations.................................... 48
1.10 pKa Prediction Using Empirical Methods......................................... 53
1.11 Constant-pH Molecular Dynamics (Constant-pH MD) Methods................. 53

2 THEORY AND METHODS IN MOLECULAR MODELING............................ 59

2.1 Potential Energy Functions and Classical Force Fields .................................. 59
2.1.1 Potential Energy Surface... .. .............. .......................... 59
2.1.2 Force Field Models .......... ................................... 60
2 .1.3 P rotein Force F ield M odels............................................. ... .. ............... 63
2.2 Molecular Dynam ics (M D) Method ....... ... ................. ......... ..... ............. 64
2.2.1 M D Integrator ........................ ........ ...................... 64
2.2.2 Thermostats in MD Simulations......................... ..... ............. 65
2.2.3 Pressure Control in MD Simulations............... .................... 68
2.3 Monte Carlo (MC) Method ......................... ........ ...... .................... 70
2.3.1 Canonical Ensemble and Configuration Integral ................................... 70
2.3.2 Markov Chain Monte Carlo (MCMC) ......... .... .... ..................... 71
2.3.3 The Metropolis Monte Carlo Method ......... .... .... ..................... 73
2.3.4 Ergodicity and the Ergodic Hypothesis...... ........ .................... 74
2.4 Solvent M models ........... ........... ............... ............... .... ........... 74
2.4.1 Explicit S olvent M odel ................................................... ............... 75









2.4.2 The Poisson-Boltzmann (PB) Implicit Solvent Model............................ 77
2.4.3 The Generalized Born (GB) Implicit Solvent Model............................. 79
2.5 pKa Calculation Methods.................................................. 80
2.5.1 The Continuum Electrostatic (CE) Model ............................................. 80
2.5.2 Free Energy Calculation Methods ......... ....... ........................... 82
2.5.3 Constant-pH MD Methods ......................................................... 87
2.6 Advanced Sam pling Methods ....................................................................... 94
2.6.1 The Multicanonical Algorithm (MUCA)............................ .... ........... 95
2.6.2 Parallel Tem pering .................. ............. ... ................... .......... 96
2.7 Replica Exchange Molecular Dynamics (REMD) Methods............................. 97
2.7.1 Temperature REMD (T-REMD) ............. .......... ........... .......... ........ 99
2.7.2 Ham iltonian REM D (H-REM D) ......... ............................................. 105
2.7.3 Technical Details in REM D Sim ulations ............................................ 105

3 CONSTANT-pH REMD: METHOD AND IMPLEMENTATION........................... 114

3.1 Introduction ........................................... ........... 114
3.2 T heory and M ethods ............................... .. ....... .. .... .............. .............. 114
3.2.1 Constant-pH REMD Algorithm in AMBER Simulation Suite .................. 114
3.2 .2 S im ulation D etails................................. .................................... 118
3.2.3 Global Conformational Sampling Comparison Using Cluster Analysis.. 120
3.2.4 Local Conformational Sampling and Convergence to Final State ......... 122
3.3 Results and Discussion........................................... ............... 122
3.3.1 Reference Compounds.............................. ............... 122
3.3.2 M odel peptide A D FDA .................................. ............... 124
3.3.3 Heptapeptide derived from OMTKY3................................................. 128
3.4 Conclusions ........ .......... ................................ ... ............ 136

4 CONSTANT-pH REMD: STRUCTURE AND DYNAMICS OF THE C-PEPTIDE
O F RIBO NUCLEASE A ........... ................ ............... ......... ............... 137

4.1 Introduction ............. ...... ... ... ....... ......... ............... .......... 137
4 .2 M methods ......................... ....... ...... ......................... ............... 143
4.2.1 Simulation Details......................................... 143
4.2.2 C luster A analysis ............ .... ... ...... ......... ... .. ... ...... ......... 144
4.2.3 Definition of the Secondary Structure of Proteins (DSSP) Analysis ...... 145
4.2.4 Computation of the Mean Residue Ellipticity ..................................... 145
4 .3 R results and D discussion ............................................................. 150
4.3.1 Testing Structural Convergence ..................... .... .. .................. 150
4.3.2 pKa Calculation and Convergence...................................... 151
4.3.3 The Mean Residue Ellipticity of the C-peptide ............... ................ 151
4.3.4 Helical Structures in the C-peptide ................................................. 153
4.3.5 The Two-Dimensional Probability Densities ....................................... 157
4.3.6 Important Electrostatic Interactions: Lysl-Glu9 and Glu2-Argl0........... 160
4.3.7 Important Electrostatic Interactions: Phe8-His12................................. 164
4.3.8 Cluster Analysis Results................... ....................... ............... 167
4.4 Conclusions ............. .... ................................ ..... .... ........... 168









5 CONSTANT-pH REMD: pKa CALCULATIONS OF HEN EGG WHITE
LYSOZYME .............................. .... ............. ........ ....... .......... 170

5.1 Introduction .............. ................................... ...... .................. ... 170
5.2 Simulation Details .............. ......... ..................... ........ .. ..... 174
5.3 Protein Conformational and Protonation State Equilibrium Model .................. 176
5.4 NM R Chem ical Shift Calculations ...................................... ......................... 177
5.5 R results and D iscussions.................................................. 178
5.5.1 Structural Stability and pKa Convergence.............................. ............ .. 178
5.5.2 pKa Predictions ............ .. .. .................. .... ..................... 182
5.5.3 Constant-pH REMD Simulations with a Weaker Restraint .................... 184
5.5.4 Active Site lonizable Residue pKa Prediction: Asp52 .......................... 187
5.5.5 Active Site lonizable Residue pKa Prediction: Glu35........................... 189
5.5.6 Correlation between Conformation and Protonation............................. 193
5.5.7 Conformation-Protonation Equilibrium Model............ .. ..... ....... 197
5.5.8 Theoretical NMR Titration Curves ........ ........ .................... 201
5.6 Conclusions ........ .......... .............. .................. ... ............ 203

LIST OF REFERENCES .......... ............ ......... ................ ............... 206

BIOGRAPHICAL SKETCH ............... ........... ... ... ...................... 221









LIST OF TABLES


Table page

1-1 Intrinsic pKa values of ionizable residues in proteins.26 ................. ............ 29

3-1 The REMD pKa predictions of reference compounds ................................... 123

3-2 pKa predictions and Hill coefficients fitted from the Hill's Plot........................ 125

3-3 Correlation coefficients between MD and REMD cluster populations............... 128

4-1 Correlation coefficients between two sets of cluster populations...................... 151

5-1 Simulation details of constant-pH REM D runs............................................... 175

5-2 Predicted pKa values and their RMS errors relative to experimental
measurements from the restrained REMD simulations .............................. 183

5-3 Predicted pKa values and their RMS errors relative to experimental
measurements from weakly restrained REMD simulations. ........................... 185

5-4 Distance between Glu35 carboxylic oxygen atoms and neighboring residue
side-chain atoms in 1AKI crystal structure......................... ....................... 190









LIST OF FIGURES


Figure page

1-1 A) Structure of an amino acid named alanine. An amino group (-NH2), a
carboxylic acid group (-COOH), a side chain (-R, in this case, a methyl
group) and a hydrogen atom are bonded to a central carbon atom (Ca). B)
Dihedral angles (p and yp of alanine dipeptide................ .................... 23

1-2 A Ramachandran plot (a contour plot showing the probability density of (0,p)
pairs) of tyrosine generated from the simulation of a heptapeptide which will
be described later in chapter 3. In this figure, a left-handed a-helix is also
show n. ......... .... .............. ................................. ........................... 2 5

1-3 A diagram showing the cartoon representation of an enzyme at low pH
(acidic) and at around the optimal pH value. EH indicates the structure at low
pH and E stands for the zwitterion form, which is the active species in our
model.13.................... ............................. ............... 26

1-4 The reaction schemes showing the enzyme reactions at which pH values are
smaller than the optimal pH value. Ks, Ks, K, and K2 are equilibrium
constants of corresponding reactions and kcat is the rate constant of the rate-
determining step. This model can be used to explain how pH value affects
enzyme catalysis in the pH range that is larger than optimal pH.13'14................. 27

1-5 A) An example of titration curve. B) An example of Hill's plot on the basis of
the titration described in Figure 1-5A. The two plots are generated from
constant-pH MD simulations of an aspartic acid in a pentapeptide. ................. 30

1-6 13C NMR titration curves of aspartate residues in HIV-1 protease/KNI-272
complex taken from Wang et a/.,1996.27 In this figure, Asp Cy chemical shifts
are plotted as a function of pD. Asp25 and Asp125 do not change
protonation states in this pD range. But isotope shift experiments show that
Asp25 is protonated and Asp125 is deprotonated in this pD range. "Reprinted
with permission from Wang, Y. X.; Freedberg, D. I.; Yamazaki, T.; Wingfield,
P. T.; Stahl, S. J.; Kaufman, J. D.; Kiso, Y.; Torchia, D. A. Biochemistry 1996,
35, 9945-9950. Copyright 1996 American Chemical Society." ..................... 32

1-7 Thermodynamic cycle used to compute pKa shift. Both acid dissociation
reactions occur in aqueous solution. A thermodynamic cycle is a series of
thermodynamic processes that eventually returning to the initial state. A state
function, such as reaction free energy in this case, is path-independent and
hence, unchanged through a cyclic process.......... ...... ....... ............... 49

1-8 Thermodynamic cycle utilized to calculate the difference between AG1 and
AG2. In Figure 1-7 and Figure 1-8, protein-AH represents the ionizable
residue in protein environment. AH represents the reference compound









which is usually the ionizable residue with two termini capped. In practice, a
proton does not disappear but instead becomes a dummy atom. The proton
has its position and velocity. The bonded interactions involving the proton are
still effective. However, there is no non-bonded interaction for that proton.
The change in protonation state is reflected by changes of partial charges in
the ionizable residue...................................... ........... 50

2-1 A diagram showing bond-stretching coupled with angle-bending. A cross
term calculating coupling energy is adopted when evaluating the total
potential energy. .................. ............. .. .......................... ............... 62

2-2 A diagrammatic description of TIP3P and TIP4P water models. A) TIP3P
model. The red circle is oxygen atom and the black circles are the hydrogen
atoms. Experimental bond length and bond angle are adopted. B) TIP4P
model. Oxygen and hydrogen atoms are labeled with same color as in the
TIP3P model. TIP4P model also employs the experimental OH bond length
and HOH bond angle. Clearly, the fourth site (green circle) which carries
negative partial charge has been added to the TIP4P model ............................ 77

3-1 Methods to perform exchange attempts. A) Only molecular structures are
attempted to exchange. The protonation states are kept the same. B) Both
molecular structures and protonation states are attempted to exchange. ........ 115

3-2 Titration curves of blocked aspartate amino acid from 100 ns MD at 300K
and REMD runs. Agreement can be seen between MD and REMD
s im u la tio n s ................................................ .................... 1 2 3

3-3 Cumulative average protonation fraction of aspartic acid reference compound
vs Monte Carlo (MC) steps at pH=4. .................................................... 124

3-4 The titration curves of the model peptide ADFDA at 300K from both MD and
REMD simulations. MD simulation time was 100 ns and 10 ns were chosen
for each replica for REMD runs. ................................................. 125

3-5 Cumulative average protonation fraction of Asp2 in model peptide ADFDA vs
Monte Carlo (MC) steps at pH=4 .............. ............................ ...... 126

3-6 Backbone dihedral angle (cp, yp) normalized probability density
(Ramachandran plots) for Asp2 at pH 4 in ADFDA. Ramachandran plots at
other solution pH values are similar. For Asp2, constant-pH MD and REMD
sampled the same local backbone conformational space. Phe3 and Asp4
Ramachandran plots also display the same trend ................................ ..... 127

3-7 Cluster populations of ADFDA at 300K. A) MD vs REMD at pH 4.
Trajectories from MD and REMD simulations are combined first. By
clustering the combined trajectory, the MD and REMD structural ensembles
will populate the same clusters. The fraction of the conformational ensemble
corresponding to each cluster (fractional population of each cluster) was









calculated for MD and REMD simulation, respectively. Two sets of fractional
population of clusters were generated, and hence plotted against each other.
B) Two REMD runs from different starting structures at pH 4. Large
correlation shown in Figure 3-7B suggests that the REMD runs are
converged. Large correlations between two independent REMD runs are also
observed at other solution pH values. Correlations between MD and REMD
simulations can be found in Table 3-3 ...... ..... ............ ............... .... .... 128

3-8 A) Titration curves of Asp3 in the heptapeptide derived from protein
OMTKY3. B) Titration curves of Lys5 and Tyr7 in the heptapeptide derived
from protein OMTKY3. C) shows the Hill's plots of Asp3. The pKa values of
Asp3 are found through Hill's plots ........... .... ..... .. .. ........... ........ ..... 129

3-9 A) Cumulative average protonation fraction of Asp3 of the heptapeptide
derived OMTKY3 vs MC steps. B) and C) is cumulative average protonation
fraction of Tyr7 and Lys5 in the heptapeptide vs MC steps, respectively.
Clearly, faster convergence is achieved in contant-pH REMD simulations. ..... 131

3-10 Dihedral angle ((p, p) probability densities of Asp3 at pH 4. A) Constant-pH
MD results. B) Constant-pH REMD results. The two probability densities are
almost identical, indicating that constant-pH MD and REMD sample the same
local conformational space. All others also show very similar trend. ................ 133

3-11 The root-mean-square deviations (RMSD) between the cumulative ((p, p)
probability density up to current time and the ((p, p) probability density
produced by entire simulation. ((p, p) probability density convergence
behaviors at other pH values also show that REMD runs converge to final
distribution faster. ............................. ....... .............................. 134

3-12 Cluster population at 300 Kfrom constant pH MD and REMD simulations at
pH=4. Cluster analysis is performed using the entire simulation. The
populations in each cluster from the first and second half of the trajectory are
compared and plotted. Ideally, a converged trajectory should yield a
correlation coefficient to be 1. A) Constant pH MD. B) Constant pH REMD.
Much higher correlation coefficient can be seen in constant pH REMD
simulation, suggesting much better convergence is achieved by the constant
pH REM D run. ................................. ......................... ................ 135

4-1 Cluster population at 300 K from constant pH REMD simulations at pH 2. A)
Cluster analysis is performed on the trajectory initiated from fully extended
structure. The populations in each cluster from the first and second half of
the trajectory are compared and plotted. B) Two REMD runs from different
starting structures at pH 2. Correlation coefficients at other pH values can be
found in Table 4-1 ....................... ................. ............... .............. 150

4-2 Cumulative average fraction of protonation vs Monte Carlo (MC) steps. Only
the two glutamate residues are shown here and the histidine residue is found









to show the same trend. The pH values are selected such that the overall
average fraction of protonation is close to 0.5. .................. .............. 152

4-3 Computed the mean residue ellipticity at 222 nm as a function of pH values.
A bell-shaped curve at 300 K is obtained with a maximum at pH 5. The effect
of temperature on mean residue ellipticity at 222 nm is also demonstrated..... 153

4-4 Helical Content as a function of residue number................ .. ........... 154

4-5 A) Time series of Ca RMSDs vs the fully helical structure at pH 5. The first
two residues at each end are not selected because the ends are very
flexible. B) Probability densities of the Ca RMSDs. Clearly, the structural
ensemble at pH 5 contains more structures similar to the fully helical
structure. C) Time series of Ca radius of gyration at pH 5. D) Probability
density of the Ca radius of gyration. More compact structures are found at pH
5 ............... ..... ....................... ............................................ 15 5

4-6 A) Probability densities of number of helical residues in the C-peptide. B)
Probability densities of the number of helical segments in the C-peptide. A
helical segment contains continuous helical residues. The probability of
forming the second helical segment is very low at all three pH values, thus
only the first helical segment is further studied. C) Probability densities of the
starting position of a helical segment. D) Probability densities of the length of
a helical segment (number of residues in a helical segment) ............ .......... 156

4-7 2D probability density of helical starting position and helical length, pH = 2..... 158

4-8 2D probability density of helical starting position and helical length, pH=5....... 158

4-9 2D probability density of helical starting position and helical length, pH=8....... 159

4-10 2D probability density of helical length and Ca-RMSD at pH = 2. ....... ........ 159

4-11 2D probability density of helical length and Ca-RMSD at pH = 5. ....... ........ 160

4-12 2D probability density of helical length and Ca-RMSD at pH = 8. ....... ........ 160

4-13 A) Probability density of Lysl-Glu9 distance (A). The distance is the
minimum distance between the side-chain nitrogen atom of Lysl and the
side-chain carboxylic oxygen atoms of Glu9. B) Probability density of Glu2-
Arg10 distance (A). The distance is the minimum distance between side-
chain carboxylic oxygen atoms of Glu2 and guanidinium nitrogen atoms of
A rg ......... ...................................................... ............................ 16 2

4-14 Two-dimensional probability density of Lysl -Glu9 and Glu2-Argl0 at pH 5.
Apparently, Lysl-Glu9 and Glu2-Argl0 salt-bridges cannot be formed
sim ultaneously .... ........ ......... .............. ................ ..... .......... 162









4-15 A) Two-dimensional probability density of Glu2-Arg10 salt-bridge formation
and helical length at pH 5. According to the plot, the Glu2-Arg10 salt-bridge
can be found in four-residue, six-residue and non-helical structures. B) Two-
dimensional probability density of Glu2-Arg10 salt-bridge and the helix
starting position at pH 5. If a helix begins from Thr3, it cannot have a Glu2-
Arg10 salt-bridge. Thus, one role of the Glu2-Arg10 salt-bridge is to prevent
helix formation from Thr3. ..... ............................... ... .. ............. 163

4-16 A) Probability density of Phe8 backbone to His12 ring distance. The distance
is the minimum distance between Phe8 backbone carbonyl oxygen atom and
His12 imidazole nitrogen atoms. B) Probability density of Phe8 ring to His12
ring distance. The distance is the minimum distance between Phe8 aromatic
ring carbon atoms and His12 imidazole nitrogen atoms ............. ............... 164

4-17 A) Two-dimensional probability density of Glu2-Arg10 distance and Phe8-
His12 backbone-to-ring distance at pH 5. B) Correlations between Glu2-
Arg10 salt-bridge and Phe8-His12 contact at pH 5........................................ 166

4-18 A) Two-dimensional probability density of helical segment length and Phe8-
His12 interaction. B) Two-dimensional probability density of helical segment
starting position and Phe8-His12 interaction. Phe8-His12 also stabilizes four-
residue and six-residue structures. Helices begin at Lys7 and Phe8-His12 is
coupled. Unlike Glu2-Arg10, Phe8-His12 stabilizes helices starting from
T h r3 ...................... .. .. ......... .. .. ......... ...................................... 1 6 7

4-19 A) Top 20 populated clusters and average helical percentage. B) Probability
densities of the Ca-RMSD vs the fully helical structure of the top 2 populated
clusters. C) Helical Percentage as a function of residue number of the top 2
populated clusters. D) Probability density of the Glu2-Arg10 and Phe8
backbone-His12 ring interactions in the second most populated cluster......... 169

5-1 Crystal structure of HEWL (PDB code 1AKI). Residues in red represent
aspartate and residues in blue are glutamate ................................................ 171

5-2 A simple schematic view of the conformation-protonation equilibrium in a
constant-pH simulation. .............. .. ..... ....... ........ ............... 176

5-3 Ca RMSD vs crustal structure (PDB code: 1AKI). A) Ca RMSD vs 1AKI from
REMD without restraint on Ca. B) Ca RMSD vs 1AKI from REMD with
restraint on Ca. The restraint strength is 1 kcal/mol-A2.............. ................. 179

5-4 pKa prediction error as a function of time. The predicted pKa at a given time is
a cumulative result. For each ionizable residue, the time series of its pKa
error is generated at a pH where the average predicted pKa is closest to that
pH value. In this way, we try to eliminate any bias toward the energetically
favored state. A flat line is an indication of convergence. Glu35 is not shown
here due to poor convergence ........ ............. ........... ............. 180









5-5 A) pKa prediction convergence to its final value. Similarly, the pKa value at a
given time is a cumulative average. A flat line having y-value of 0 is expected
when pKa calculation convergence is reached. The same pH values are
chosen for each ionizable residue as in Figure 5-4. B) Asp52 pKa prediction
convergence to its final value at multiple pH values. The pH values are
selected in such a way that the pKa calculated at this pH will be used to
compute composite pKa........ .. ................ ......... ... ............... 181

5-6 RMS error between predicted and experimental pKa vs pH value. A minimum
of pKa RMS error can be found near the pH at which 1AKI crystal structure is
reso lived ............ .. .................................................................. .. 184

5-7 A) Ca RMSD of HEWL from weaker restraint REMD simulations. The RMSDs
are larger than those with stronger restraints. When comparing RMSDs at
different pH for simulations using weaker restraint, RMSDs are greater at pH
3 and 4 than those at pH 4.5. B) pKa prediction deviation from final value at
pH 4.5 from constant-pH REMD with 0.1 kcal/molA2................. ................. 186

5-8 Asp52 in the crystal structure of 1AKI. Its neighbors that having strong
electrostatic interactions are also shown. ...... .. ................. ................... 188

5-9 A) Time series of Asp52 carboxylic oxygen atom OD1 to Asn59 and Asn44
ND2 distances at pH 3 in the 1 kcal/mol-A2 constant-pH REMD run. B) Time
series of Asp52 carboxylic oxygen atom OD2 to Asn59 and Asn44 ND2
distances under the same condition. Hydrogen bonds which are stabilizing
deprotonated Asp52 are formed in a large extent even at a low pH............... 188

5-10 A) Time series of the Glu35 heavy atoms (excluding two carboxylic oxygen
atoms) RMSD relative to crystal structure 1AKI. B) Probability distribution of
the RMSD. The conformation centered at RMSD ~0.1 A is labeled as
conformation 1. The one centered at ~0.6 A is named conformation 2.
Apparently, an extra conformation (conformation 3) is visited by the weakly
restrained REM D sim ulation ............................................ ......... .................. 191

5-11 A) Representative Structure of conformation 1. B) Representative Structure
of conformation 2. The structure ensemble is generated from REMD
simulations with stronger restraining potential. The carboxylic group of Glu35
in conformation 2 is clearly pointing toward the amide group of Alal 10.
Deprotonated form of Glu35 tends to decrease the electrostatic energy.
Furthermore, conformation 1 does not particularly favor the protonated
Glu35. No significant stabilizing factor is found for the protonated Glu35......... 192

5-12 Representative Structure of conformation 3 from cluster analysis. Glu35 is in
the hydrophobic region, consisting of Gln57, Trpl08 and Ala110.
Conformation 1 and 2 in the weakly restrained simulations are basically the
same as those demonstrated in Figure 5-11 ............................... .............. 193









5-13 A) Correlation between side chain dihedral angle land protonation states.
B) Correlation between side chain dihedral angle X2and protonation states.... 194

5-14 Minimal distance between Asp119 side chain carboxylic oxygen atoms (OD1
and OD2) and Arg125 guanidinium nitrogen atoms. Since guanidinium group
has three nitrogen atoms, the minimal distance is the shortest distance
between Asp119 OD1 (or OD2) and those three nitrogen atoms.................... 196

5-15 A) Probability distribution of Asp119 CG to Arg125 CZ distances. The
Asp119 CG to Arg125 CZ distance is used to distinguish conformations. B)
Coupling between conformations and protonation states ............................ 197

5-16 K12/K12,h as a function of pH and its dependence on pKa,i and pKa,2................ 199

5-17 A) Fraction of each species as a function of pH titrationn curves) obtained
from equations based on conformation-protonation equilibrium. The effect of
K12, h is tested. B) Comparison of titration curves derived from actual
simulations and from the equilibrium equations................. ....... ............ 200

5-18 Theoretical NMR chemical shifts as a function of pH. It's plotted to see if the
conformation-protonation equilibrium model can reproduce experimental
titration curve based on NMR chemical shift measurements ......................... 202









LIST OF ABBREVIATIONS


ACE Analytical Continuum Electrostatic

BAR Bennett Acceptance Ratio

CD Circular Dichroism

CE Continuum Electrostatic

CPHMD Continuous Constant-pH Molecular Dynamics

CPL Circularly Polarized Light

DOF Degree of Freedom

DOS Density of States

DSSP Definition of the Secondary Structure of Proteins

EAF Exchange Attempt Frequency

EFP Effective Fragment Potential

FEP Free Enery Perturbation

FDPB Finite Differece Poisson-Boltzmann

GB Generalized Born

HEWL Hen Egg White Lysozyme

HH Henderson-Hasselbach

H-REMD Hamiltonian Replica Exchange Molecular Dynamics

LCPL Left Circularly Polarized Light

MC Monte Carlo

MCMC Markov Chain Monte Carlo

MCCE Multiconformation Continuum Electrostatic

MD Molecular Dynamics

MDFE Molecular Dynamics based Free Energy (calculation)









MM

MUCA

NMR

NPT

NVE

NVT

PB

PBC

PES

PDF

PMF

QM

QM/MM

RCPL

REM

REMD

REX-CPHMD

RF

RMSD

TI

T-REMD

V-REMD


Molecular Mechanics

Multicanonical

Nuclear Magnetic Resonance

Isothermal-isobaric Ensemble

Microcanonical Ensemble

Canonical Ensemble

Poisson-Boltzmann

Periodic Boundary Condition

Potential Energy Surface

Probability Distribution Function

Potential of the Mean Force

Quantum Mechanics

Hybrid Quantum Mechanical Molecular Mechanical

Right Circularly Polarized Light

Replica Exchange Method

Replica Exchange Molecular Dynamics

Replica Exchange Continuous Constant-pH Molecular Dynamics

Radio-Frequency

Root-Mean-Square Deviation

Thermodynamic Integration

Temperature Replica Exchange Molecular Dynamics

Viscosity Replica Exchange Molecular Dynamics









Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

CONSTANT pH REPLICA EXCHANGE MOLECULAR DYNAMICS STUDY OF
PROTEIN STRUCTURE AND DYNAMICS

By

Yilin Meng

August 2010

Chair: Adrian E. Roitberg
Major: Chemistry

Solution pH is a very important thermodynamic variable that affects protein

structure, function and dynamics. Enormous effort has been made experimentally and

computationally to understand the effect of pH on proteins. One category of

computational method to study the effect of pH is the constant-pH molecular dynamics

(constant-pH MD) methods. Constant-pH MD employs dynamic protonation in

simulations and correlates protein conformations and protonation states. Therefore,

constant-pH MD algorithms are able to predict pKa value of an ionizable residue as well

as to study pH-dependence directly.

A replica exchange constant-pH molecular dynamics (constant-pH REMD) method

is proposed and implemented to improve coupled protonation and conformational state

sampling. By mixing conformational sampling at constant pH (with discrete protonation

states) with a temperature ladder, this method avoids conformational trapping. Our

method was tested on seven different biological systems. The constant-pH REMD not

only predicted pKa correctly for model peptides but also converged faster than constant

pH MD. Furthermore, the constant-pH REMD showed its advantage in the efficiency of

conformational samplings. The advantage of utilizing constant-pH REMD is clear.









We have studied the effect of pH on the structure and dynamics of C-peptide from

ribonuclease A by constant-pH REMD. The mean residue ellipticity at 222 nm at each

pH value is computed, as a direct comparison with experimental measurements. The C-

peptide conformational ensembles at pH 2, 5, and 8 are studied. The Glu2-Argl0 and

Phe8-Hisl2 interactions and their roles in the helix formation are also investigated.

Constant-pH REMD method is applied to the study of hen egg white lysozyme

(HEWL). pKa values are calculated and compared with experimental values. Factors

that could affect pKa prediction such as hydrogen bond network and interaction between

ionizable residues are discussed. Structural feature such as coupling between

conformation and protonation states is demonstrated in order to emphasize the

importance of accurate sampling of the coupled conformations and protonation states.









CHAPTER 1
INTRODUCTION

1.1 Acid-Base Equilibrium

Acids and bases are common in our daily lives. For example, vinegar is acidic and

ammonia is basic. According to the Bronsted-Lowry definition, an acid is a chemical

compound that can donate protons and a base is a chemical compound that can accept

protons. An acid can be converted to its conjugate base by transferring a proton to a

base and a base is converted to its conjugate acid by accepting a proton. For simplicity,

the conversion between an acid and its conjugate base can be described by the

reaction: HA H+ + A- where HA is an acid, A- is its conjugate base, and

H+ represents proton (in aqueous environment, H+ is hydronium ion H30O).

There exists an equilibrium state between any acid-base conjugate pair. At

equilibrium, the concentration of each species is constant. In an acid-base reaction, an

acid dissociation constant is used to describe this equilibrium. The acid dissociation

constant has the definition of Eq. 1-1.

K (H,+)(aA-) (1-1)
aHA

Here Ka is the acid dissociation constant and aH+, aA- and aiA represent the

activity of each species, respectively. In Eq. 1-1, the activity of each individual species

(take alHA as an example) can be expressed as:

[HA]
aHA YHA] (1-2)

In Eq. 1-2, YHA is the activity coefficient of HA, [HA] is the concentration of HA, and

c is the standard concentration which is 1 M. In an ideal solution, the activity

coefficients are unity. The concentration of each species is divided by standard









concentration in order to make the acid dissociation constant dimensionless. For

simplicity, the acid dissociation constant is expressed using the concentration of each

species from now on.

The Ka indicates the strength of an acid: the stronger the acid is, the larger the Ka

is. The order of magnitude of Ka can span over a broad range. Therefore, a logarithmic

(base 10) measure of the Ka is more frequently adopted:

pKa = loglo Ka (1-3)

Combining Eq. 1-1 and Eq. 1-3, we can express the pKa value as:

([ ] (1-4)
pKa = pH loglo ) (1-4)

Eq. 1-4 is the Henderson-Hasselbalch (HH) equation. It allows one to solve

directly for pH values instead of calculating the concentration of hydronium ions first.

When [A-] = [HA], the HH equation becomes pKa = pH. Therefore, the pKa value of an

acid is numerically equal to the pH value at which the acid and its conjugate base have

the same concentrations. The acid dissociation constant represents the

thermodynamics of an acid dissociation reaction because the pKa value is proportional

to the Gibbs free energy of the reaction. For simple compounds such as acetic acid,

temperature is the most important factor that affects its pKa value. However, for complex

molecules such as proteins and peptides, the effect of environment is also crucial and

will be discussed in this dissertation.

1.2 Amino Acids and Proteins

The goal of this dissertation is to study the acid-base equilibrium in peptide and

protein systems and its effect on peptide and protein conformations by constant-pH

REMD method. Thus, an introduction to peptide and protein, especially their structures









will be helpful. Amino acids have the generic structure as shown in Figure 1-1A. Each

amino acid consists of an amino group (-NH2), a carboxylic acid group (-COOH) and a

distinctive side chain (-R). All three groups are connected to a carbon atom which is

called carbon alpha (Ca). There are twenty naturally occurring side chains and they can

be divided into groups based on their physical or chemical properties. For example, one

way to categorize the twenty side chains is based on their acid/base properties in

aqueous solution. Therefore, an aspartic acid is an acidic amino acid and a lysine is a

basic amino acid. For an amino acid, its carboxylic group can react with the amine

group of another amino acid. This condensation reaction forms a peptide bond which

links the two amino acids and yields a water molecule.

As a consequence of the condensation reaction, proteins are formed. A protein is

a string of amino acids connected by peptide bonds and folded into a globular structure.

A protein often consists of a minimum of 30 to 50 amino acids.1 Shorter chains of amino

acids are often called peptides. Each amino acid in a protein or peptide is called a

residue. The peptide bonds form the backbone of a protein.











A B

Figure 1-1. A) Structure of an amino acid named alanine. An amino group (-NH2), a
carboxylic acid group (-COOH), a side chain (-R, in this case, a methyl group)
and a hydrogen atom are bonded to a central carbon atom (Ca). B) Dihedral
angles (p and yp of alanine dipeptide.









A protein usually has four levels of structure which are called primary structure,

secondary structure, tertiary structure and quaternary structure. The primary structure is

the sequence of amino acids. The folding of a protein is determined by its primary

structure.

Next, the secondary structure (e.g. a-helix, 3-strand, or loop) is the three-

dimensional structure of local segments of a protein. As mentioned earlier, proteins fold

themselves into functional structures after they are formed. After folding, protein

backbones often possess certain types of fold or alignment. The term of secondary

structure is used to describe the three-dimensional structures of such manners. The two

most common secondary structures found in proteins are a-helices and 3-strands.

The local secondary structure of a particular residue in a protein can be described

by a Ramachandran plot which is a two-dimensional histogram (or probability

distribution) of backbone dihedral angle pair (,0p). As demonstrated in Figure 1-1B,

backbones can rotate around the N-Ca and Ca-C bonds, forming dihedral angles 0 and

p. Backbone conformations of a residue can be described by specifying (0,P). Three

main regions are populated in general in a Ramachandran plot, corresponding to the

three main stable conformations a residue has: the right-handed a-helix region near

(4=-57o, 0=-470), the 3-strand region near (0=-1250, p=1500) and the polyproline II

region near (0=-750, p=1450). The most populated region indicates the most stable

conformation of a residue. An example of Ramachandran plot is shown in Figure 1-2.

Furthermore, the tertiary structure is the three-dimensional positions of all atoms in

a protein. The tertiary structures yield information about protein side chains, for

example, salt bridges. Finally, the quaternary structure defines the positions of all atoms











in a protein containing multiple peptide chains, for example, the hemoglobin tetramer. It

is the highest level of protein structures.


150- 'l'

1.6E-3
100- P-sheet PPII 3.2E-3

50- -- 4.8E-3
6.4E-3
0- I 8.0E-3
S, Left-handed
I 9.6E-3
-50 a-helix
-50l
1.1E-2
a-helix 1.3E-2
-100-
1.4E-2
-150- 1.6E-2

-150 -100 -50 0 50 100 150



Figure 1-2. A Ramachandran plot (a contour plot showing the probability density of
(4,ip) pairs) of tyrosine generated from the simulation of a heptapeptide which
will be described later in chapter 3. In this figure, a left-handed a-helix is also
shown.

Proteins perform vital functions, which are important to our lives. Almost all cell

activities depend on proteins. For example, hemoglobin can transport oxygen molecules

from lung to cells;1 many chemical reactions occurring in living organisms are catalyzed

by proteins called enzymes; and proteins are also involved in cell signaling. Mutations in

the proteins, aggregation and misfolding of proteins can cause many diseases. For

example, many cancers result from the mutations in the tumor suppressor p53.2'3 Thus,

understanding protein structures and functions is important.

1.3 lonizable Residues in Proteins and the Effect of pH on Proteins

An ionizable residue in a protein is a residue with a side chain that can donate or

accept proton(s). There are seven ionizable residues: ASP, GLU, HIS, CYS, TYR, LYS

and ARG. Ionizable residues define the acid-base properties of that protein.








Consequently, the solution pH value becomes an important thermodynamic variable

affecting protein structure, dynamics, folding mechanism, and function4. Many biological

phenomena such as protein folding/misfolding,5-8 substrate docking9 and enzyme

catalysis are pH-dependent.10-12

A good example of how pH value affects proteins is the pH-dependence of

enzyme kinetics. Most enzymes possess an optimal pH value, at which the reaction rate

is largest. Enzyme catalysis is pH-dependent because the active sites of enzymes in

general contain important acidic or basic residues. Only one form (acidic or basic) of the

ionizable residue is catalytically active, thus the concentration of the catalytically active

species will affect the kinetics. Consider a simple reaction model (Figure 1-3 and Figure

1-4) to demonstrate how pH value affects enzyme reaction rate. In this model, only the

zwitterion form is active; no intermediate exists for the enzyme reaction and the

protonation-deprotonation steps are faster than catalysis steps. Furthermore, the rate-

determining step does not depend on pH value.


HOOC NH3+ -00C NH3



Enzyme Enzyme





EH E

Figure 1-3. A diagram showing the cartoon representation of an enzyme at low pH
(acidic) and at around the optimal pH value. EH indicates the structure at low
pH and E stands for the zwitterion form, which is the active species in our
model.13









K5
Ks kt
E+S ES E+P

K K2


EH EHS

Figure 1-4. The reaction schemes showing the enzyme reactions at which pH values
are smaller than the optimal pH value. Ks, Ks, K, and K2 are equilibrium
constants of corresponding reactions and kcat is the rate constant of the rate-
determining step. This model can be used to explain how pH value affects
enzyme catalysis in the pH range that is larger than optimal pH.13'14

The equilibrium constants shown in Figure 1-4 are not independent of each others.

The relationship among them is given by:

KsK2 = K,, K1 (1-5)

According to the above equation, if K1 = K2, then the substrate binding will not be

affected by pH value of the solution. If it is not the case, then the binding is pH-

dependent. After applying steady-state approximation to the [ES], the reaction rate can

be written as:

kcat [E]o[S]
v= (1-6)
K,+[S] (1+[H+]/K2)+K,[H+]/K1

where [E]0 is initial concentration of the enzyme and [H+] is the concentration of

hydronium ions. At low pH, increasing the concentration of hydronium ions (pH value

decreases) will decrease the reaction rate. The same kind of model can also be applied

to derive the effect of pH on reaction rate when the pH is higher than optimal. Likewise,

only the zwitterion form is catalytically active. The conclusion is that pH value too high

or too low will lower the enzyme catalytic reaction rate.









Given the importance of the solution pH, knowing the pKa value of an ionizable

residue in a protein is important because it will indicate the average protonation state of

that ionizable residue at a certain pH value. However, the pKa value of an ionizable

residue is highly affected by its protein environment.15'16 Two major factors affect protein

pKa values: one is the desolvation effect and the other is the electrostatic interaction.

Other factors such as hydrogen bonding and structural rearrangement are also able to

affect protein pKa values.

An ionizable side chain in the interior of a protein can have a different pKa value

from the isolated amino acid in solution, which is caused by dehydration effect.17-19 For

example, Asp26 of the thioredoxin, which lies in a deep pocket of the protein, has a pKa

value of 7.517 while the pKa value of a water-exposed aspartic acid is 4.0.20 The Garcia-

Moreno group has been employing site-direct mutagenesis method to study the effect of

desolvation18'19,21-23 and will be described later in this chapter. Their research on the

buried ionizable residues provides a probe of the dielectric constant inside the protein,

which is an important parameter for the pKa prediction on the basis of the Poisson-

Boltzmann equation.

Electrostatic interactions such as salt-bridges are also able to affect pKa values.

For example, His31 and Asp70 form a salt-bridge in the T4 lysozyme.24 The formation of

this salt-bridge shifts the pKa of Asp70 to 0.5 and changes the pKa of His31 to 9.1.

Interestingly, Asp26 in the thioredoxin has been shown to form a salt-bridge with Lys57

when it is in the deprotonated form.25 The formation of a salt-bridge should reduce the

pKa value of Asp26. Therefore, the pKa value of 7.5 is the combined result of

desolvation effect and electrostatic interaction.









Each ionizable residue has its own intrinsic pKa value. The intrinsic pKa value of an

ionizable residue is defined as the pKa value measured when this residue is fully solvent

exposed and is not interacting with any other groups,20 for example, an aspartate

residue with two termini blocked. This kind of dipeptide is often used as reference (or

model) compound in the theoretical calculation of protein pKa values. The intrinsic pKa

values are reported in Table 1-1:

Table 1-1. Intrinsic pKa values of ionizable residues in proteins.26
Residue Name Intrinsic pKa value
ASP 4.0
GLU 4.4
HIS 6.7
CYS 8.0
TYR 9.6
LYS 10.4
ARG 12.0

1.4 Measuring pKa Values of Ionizable Residues

A general way to determining the pKa value of an acid experimentally is through

titration. In experiments, the pH values are measured by a pH meter as a function of the

volume of base added to the solution. Therefore, a titration curve will be obtained

(Figure 1-5A shows an example of titration curve) and the pKa value is the pH value at

which the deprotonated and protonated species have the same concentrations. Another

way of presenting a titration curve is by plotting the fraction of deprotonation

(protonation) vs the pH value. A Hill plot (an example is shown in Figurel-4B), which

can be obtained by plotting log([A-]/[HA]) as a function of pH, is used to study titration

behavior. After fitting to the modified HH equation: pH = pKa + k log( ), the x-

intercept is the pKa value and the slope (k) is the Hill coefficient which reflects

interactions between ionizable residues. The HH equation will be represented as a










straight line in a Hill plot, with a slope of unity. If only one ionizable residue is present in

the system of interest, or an ionizable residue does not couple with other ionizable

residue(s), the HH equation should be reproduced. A non-zero slope reflects statistical

error (random error). Interacting ionizable residues will demonstrate non-HH behavior

and possess non-unity slope in a Hill's plot. When k > 1, we say the proton binding is

positively cooperative which means binding of the first proton will increase the binding

affinity of the other one. When k < 1, the binding of protons is negatively cooperative

which means the binding of one proton will decrease the affinity of the other proton.

w Titrations (Constant-pH MD runs)
0.8 2- Linear Fit, slope=0.89, RZ=1.O
C
S 0.6- 1

0A4 0.
S 2 3 4/ 5 6
T Solution pH
LL 0.2- -1

0.0 -2
2 3 4 5 6
Solution pH A B

Figure 1-5. A) An example of titration curve. B) An example of Hill's plot on the basis of
the titration described in Figure 1-5A. The two plots are generated from
constant-pH MD simulations of an aspartic acid in a pentapeptide.

However, determining pKa value of protein ionizable residues by measuring

solution pH as a function of the volume of base is difficult because there are multiple

ionizable residues in a protein in general. An experimental technique that is site-specific

is preferred.

Nuclear Magnetic Resonance (NMR) is one of the most frequently employed

spectroscopic methods in chemistry, physics and biological science. One application of

the NMR method is to measure pKa values of individual ionizable residues. NMR









spectroscopy measures the absorption of radio-frequency (RF) radiation by a nucleus in

magnetic field. Only a nucleus with a spin quantum number that equals half of an

integer is able to generate NMR signal. Furthermore, the absorption is affected by the

chemical environment around that nucleus. Electron density around a nucleus provides

a shielding effect to the external magnetic field for the nucleus. Thus, different chemical

environment (electron density) around a nucleus will affect its resonance frequency,

resulting in chemical shift. Changes in protonation state are able to result in changes in

the chemical shift of the nuclei around the ionizable site (for example, Cy of Asp, C5 of

Glu, and N6 and N, of His). Subsequently, at a given pH value, the equilibrium between

the protonated and deprotonated species can yield a weighted average chemical shift,


obs = p + l+l1n(PKa-pH) (1-7)

Here Sobs, 6p and AS are the chemical shift observed, chemical shift of the

protonated species, the change in chemical shifts caused by titration, respectively, and

n is the Hill's coefficient. In Eq. 1-7, the HH equation is implied. Therefore, chemical

shifts will be measured at different pH values and a titration curve will be obtained.

Figure 1-6 demonstrates a titration curve generated by NMR spectroscopy.

However, in practice, one-dimensional NMR spectra are often too complicated to

be interpreted for proteins. Introducing a new spectrum dimension will allow the ability to

simplify the spectra and yield more useful information. In two-dimensional NMR

spectroscopy, the sample is excited by one or more pulses in the so-called "preparation

time". Then the resulting magnetization is allowed to evolve for time tl, and the signal is

not recorded during time t,. Following the evolution time, one or more pulses will be









applied to the sample and the resulting signal will be measured as a function of a new

time variable t2.

1H, 13C and 15N NMR are frequently employed in experiments to determine protein

pKa values.14 Proton NMR has shown to be particularly useful in studying histidine pKa

values. It is also employed to study the acid-base equilibrium of tyrosine residues. 13C

NMR experiments can be performed to determine the pKa values of lysine and

aspartate.

182

Asp129
181- --- -----* -
"Asp29 0
I :---* -
s 180- Asp6. &.Asp _..- .-.D -.--


179 .---
S- 3Asp0







S176- Asp25
176


175
2 3 4 5 6 7
pD

Figure 1-6. 13C NMR titration curves of aspartate residues in HIV-1 protease/KNI-272
complex taken from Wang et a/.,1996.27 In this figure, Asp Cy chemical shifts
are plotted as a function of pD. Asp25 and Asp125 do not change protonation
states in this pD range. But isotope shift experiments show that Asp25 is
protonated and Asp125 is deprotonated in this pD range. "Reprinted with
permission from Wang, Y. X.; Freedberg, D. I.; Yamazaki, T.; Wingfield, P. T.;
Stahl, S. J.; Kaufman, J. D.; Kiso, Y.; Torchia, D. A. Biochemistry 1996, 35,
9945-9950. Copyright 1996 American Chemical Society."

One example of measuring the pKa value of an ionizable residue using NMR

technique is the determination of the pKa value of Asp26 in Escherichia coli









thioredoxin.17'25'28-30 NMR method, especially the 2D-NMR technique, has been

intensively employed in the investigations of the pKa value of Asp26. Escherichia coli

thioredoxin has two redox forms. The oxidized form has a disulfide bond linking Cys32

and Cys35, while the two cysteine residues are not bonded in the reduced form. Hence,

the two cysteine residues are ionizable in the reduced form, which makes the

investigations more complicated. Asp26 is located at the bottom of a hydrophobic cavity

near the active site disulfide and is completely buried in the protein. In 1991, Dyson et

al. investigated pH effect on the thioredoxin in the vicinity of active site, using 2D

NMR.28 Both oxidized and reduced thioredoxin have been studied. CaH and COH

chemical shifts of Cys32 and Cys35, and NH, CaH and CH chemical shifts of Asp26 as

a function of pH value have been measured. Those chemical shifts have been found to

titrate with a pKa value of 7.5. Since the cysteine residues in the oxidized thioredoxin

are not ionizable, they proposed that the apparent pKa is the pKa value of Asp26. In the

same year, experiments performed by Langsetmo et al. measured electrophoretic

mobility of the wild-type and D26A mutation of the oxidized thioredoxin, as a function of

pH. A pKa of 7.5 has been obtained from their experiments.17 In 1995, Wilson et al.

measured the chemical shifts of CGH1, CGH2 and C3 atoms of Cys32 and Cys35 using

the reduced form of thioredoxin.30 Both the wild-type and D26A mutation have been

studied. Comparing the titration curves between the wild-type and the D26A mutation, a

titration showing pKa value > 9 has been found missing in the D26A thioredoxin

experiment. Adopting that the cysteine residues in the reduced thioredoxin have pKa

values of 7.1 and 7.9 derived from Raman spectroscopy, they concluded that Asp26

has an apparent pKa of greater than 9. However, their results were challenged by the









pKa determinations of Cys32 and Cys35 in the reduced form of thioredoxin. In 1995,

Jeng et al. studied the titration behaviors of Cys32 and Cys35 in the reduced form of

thioredoxin by 13C NMR experiments.29 Their pKa values were found to be 7.5 and 9.5.

Their pKa values of Cys32 and Cys35 challenged the results obtained by Wilson et al. In

order to elucidate the pKa value of Asp26 in the reduced thioredoxin, Jeng and Dyson

measured the pKa value of Asp26 in 1996 using 2D-NMR.29 The 13C chemical shift of

the carboxylic group, which is bonded to titrating site, as well as the CPH1 and CPH2

proton chemical shifts was measured as a function of pH value. The authors believed

that the pH effect on 13C chemical shift of the carboxylic group should result from

titration due to its close distance to the titrating site. The apparent pKa value obtained

from their experiments has been shown between 7.3 and 7.5, which is the same as the

pKa value of Asp26 in the oxidized form.

Fluorescence spectroscopy can be utilized to determine pKa values as well.

Fluorescence is the emission of light by a substance when it is relaxing from electronic

excited state (Si) to electronic ground state (So). In fluorescence spectroscopy, the

substance is first excited from So to one of many vibrational states of S, by absorbing a

photon. Following the excitation, relaxation to the vibrational ground state S, occurs

through collisions with other molecules. Once in the ground vibrational state of S1, the

substance will return to one of many vibrational states of So by emitting a photon. Since

the substance can return to various vibrational states in the electronic ground state, a

band of emission wavelengths will be observed. The absorption and emission

wavelengths are different (emission photons have a larger wavelength) and the









difference in wavelength is called Stokes shift. The average time the substance stays in

its electronic excited state is called the fluorescence lifetime.

In biophysical chemistry, the tryptophan fluorescence is frequently employed to

study the conformational changes in proteins. In general, tryptophan has a maximal

absorption wavelength of 280 nm31 and maximal emission wavelength of 300~350

nm.32'33 Changes in the environment of a tryptophan residue will affect the emission

wavelength and/or intensity. Furthermore, it has been noticed that tryptophan

fluorescence is sensitive to the polarity of the local environment. One advantage of

tryptophan fluorescence spectroscopy is that the chromophore is intrinsic; no change is

made to the protein.

If the change in protonation state of an ionizable residue affects the spectrum of a

neighboring tryptophan residue, which is the main fluorescent species in a protein, then

fluorescence spectroscopy can be employed to generate a titration curve. Therefore,

the pKa value will be obtained. One example of determining pKa value by fluorescence

spectroscopy is measuring the pKa of Glu35 in HEWL performed by the Imoto group.34

The Trpl08 is in van der Waals contact with Glu35. Changes in protonation state of

Glu35 can induce a large shift in intensity of Trpl08 fluorescence signal.

Another way of obtaining a titration curve is the potentiometric method. The

potentiometric titration measures pH value as a function of the volume of titrant added.

The volume of titrant added at each dosing can be used to calculate moles of hydrogen

ion released from (or bound by) a peptide or protein, and hence number of hydrogen

ions released (or bound) per molecule. Plotting number of hydrogen ions released (or

bound) per molecule as a function of pH will generate a titration curve. By utilizing









potentiometric titration, a titration curve of the entire peptide or protein can be obtained.

The Garcia-Moreno group has been utilizing the potentiometric method, combined with

other experimental techniques and protein pKa calculations, to investigate pKa values of

ionizable residues buried deep in a protein.18,19,21-23 As mentioned earlier in the last

section, protein environment can shift the pKa value of an ionizable residue. In nature, a

small portion of the ionizable residues are buried in the deep pockets of the protein,

inaccessible to water.22,35 Those buried ionizable residues are crucial to the protein

functions such as catalysis,12'36 and ion or electron transport.37'38 Determining and

understanding the pKa values of buried ionizable residues is important for biological

research. The Garcia-Moreno group performed site-directed mutagenesis experiments,

mutating a nonpolar residue which is inaccessible to water to an ionizable residue. The

pKa value of the mutated ionizable residue is determined experimentally and predicted

theoretically. By combining experimental and theoretical determination, the dielectric

effect and electrostatic interactions will be elucidated. One example of the mutagenesis

experiment is mutating Val66 in a "hyperstable variant" of the staphylococcal nuclease

(SNase) to glutamate.19'21 The original and mutated forms of the "hyperstable variant" of

SNase are called PHS and PHS/V66E. The PHS nuclease can be made by mutating

three residues of the wild-type SNase: P117G, H124L, and S128A. Val66 has been

found in the core region of the SNase and inaccessible to aqueous environment. The

potentiometric titrations have been performed on both PHS and PHS/V66E. The

difference between the two titration curves represents the Glu66 titration plus other

titrations affected by the mutation, although it is assumed that the latter effect is not









significant. The difference in hydrogen ions (Avi) bound to PHS and PHSN66E was

fitted to the following equation,

10n(pH-pKa)
1+10n(pH -pKa)
Ai l+lOn(pH-pK,) (1-8)

where n is the Hill's coefficient, pH is the solution pH value, and pKa in this case is the

pKa value of Glu66.

The pH-dependence of PHS and PHSN66E stability was also demonstrated by

the guanidine hydrochloride denaturation free energy profiles. The Trpl40 fluorescence

was recorded as a probe of the denaturation. The difference in denaturation free energy

profiles was also fitted nonlinearly to obtain the pKa value of Glu66. The pKa value of

Glu66 has been determined to be 8.8 from potentiometric titration and 8.5 from the

protein stability study. The pKa shift of 4.4 (on the basis of the potentiometric

measurements, and glutamate has an intrinsic pKa value of 4.4) is among the largest

ones for acidic ionizable residues. Once the experimental pKa value is accurate

obtained, a "reverse pKa prediction" can be performed to investigate the dielectric

constant inside the protein, which is an important parameter in the continuum

electrostatic model and will be explained later this chapter. In fact, the direct

potentiometric measurements were first carried out by the Garcia-Moreno group on

PHS and PHSN66K.18 A pKa value of 6.38 was found for Lys66, while the pKa value of

lysine model compound is 10.4.

Recent site-directed mutagenesis studies on PHS have extended to Leu38.22

Mutations to aspartate, glutamate and lysine were conducted. Similar to their treatment

on Val66 mutations, potentiometric titration and protein denaturation experiments were

conducted to determine pKa values by the Garcia-Moreno group. For the PHS/L38E,









NMR technique was employed to facilitate Glu38 pKa measurement. PHS/L38K has

shown a pKa value close to the intrinsic value of lysine. After mutation, lysine was found

to adjust its side-chain to let water molecules penetrate. However, L38D and L38E have

shown elevated pKa values. Both Asp38 and Glu38 were still inaccessible to water,

although structural rearrangement was also observed. Their pKa values were further

perturbed by electrostatic interactions with surface carboxylic groups. Their

investigations have unveiled how conformational changes, desolvation and electrostatic

interactions affect pKa values.

1.5 Molecular Modeling

Experimental techniques such as spectroscopy are fundamental to the study of

protein structure and function. For example, NMR spectroscopy is frequently employed

in biological science, X-ray crystallography can be applied to resolve protein structures

and circular dichroism (CD) spectrometry is employed to determine the secondary

structure of a protein. However, the advances in computational power combined with

the leap in theory make experiments not the only way to understand biological

molecules.

Molecular modeling offers another way to investigate structures and properties of

biological molecules. It combines theories developed in the fields of physics, chemistry

and biology with the computer resources to simulate the behaviors of molecules.

Results from simulations are often compared to experimental observations in order to

validate the method and understand the behavior of biological molecules from an

atomistic level.









1.6 Potential Energy Surface

Molecules possess more than one stable configuration in general. In principle, all

possible molecular configurations need to be considered in order to simulate a molecule

correctly. A potential energy surface (PES), which is a surface defined by the potential

energies of all possible configurations, can be utilized to fulfill this requirement. The

local minima of a PES indicate stable conformations of a molecule. There are multiple

ways to generate a PES. Quantum mechanical calculations offer the most accurate way

to construct a PES. By solving the Schrodinger Equation, one can obtain energies and

wave function of the molecule. In the field of chemistry, electronic structure theory

utilizes quantum mechanics to describe the motion of electrons, in the framework of

Born-Oppenheimer approximation. The Born-Oppenheimer approximation states that

the electronic relaxation caused by nuclear motion is instantaneous because of the

huge difference in the masses of electrons and nuclei. Thus, electronic motion and

nuclear motion are decoupled. The eigenvalue of the electronic Schrodinger equation at

each nuclear configuration is the potential energy of nuclei at that geometry. Solving

Schrodinger equation at different configurations will yield the PES of a molecule.

However, the cost of electronic structure calculations is very expensive, which hinders

the use of high-level of theory when studying large biological molecules.

Due to the cost of electronic structure methods, an alternative way to describe a

PES is to use a classical mechanical model. One of the commonly used algorithms is

the all-atom force field in which the PES is computed without solving the Schrodinger

equation. In an all-atom force field model, no electrons are present and each atom is

represented by a single particle (in contrast to the united-atom force field model where a

functional group is represented by a particle). Atoms interact with each other via bonded









and non-bonded potential energy terms. Equation 1-9 shows an example of all-atom

force field model that is frequently adopted in the simulations of proteins:
1 1
U(qN) = Zbonds kb(rT )2 + Eangles ka ( 0o)2 + dihedrals n= [1 +


cos(n y)] + iE + I + 4E[ 12 (al6 (1-9)
i=I j + 4cr [rri

The first three summations are bonded terms and they represent interactions of

bond stretching, valence angle bending, and torsions, respectively. In Eq. 1-9, bond

stretching and angle bending are considered by a harmonic potential. The torsion term

is expressed as Fourier series due to the periodic nature of a dihedral angle. The latter

two summation terms are the non-bonded interaction terms. The two components in the

double summation represent electrostatic interactions and van der Waals interactions,

respectively. Electrostatic potential is represented by Coulomb interaction. qj and q, are

partial charges on atom i, and j, respectively. ri is the distance between the two atoms.

In Eq. 1-9, van der Waals interaction is calculated by the Lennard-Jones potential, in

which Eif is the well depth and ocy is the distance when repulsive and attractive

potentials are equal. Solvent effect is also considered when implicit solvent such as the

Generalized Born (GB) model39'40 is adopted (solvent models will be briefly described in

the next chapter). The cost of all-atom force field model is low compared with ab-initio

methods because it utilizes pre-defined parameters when calculating potential energies.

The strategy of generating those parameters is via fitting to experimental data and

quantum mechanical calculations. One must notice that the parameters are often

internally consistent which means parameters of different force fields are in general

non-transferrable. The all-atom force field models are utilized much more frequently









than the quantum mechanical methods when simulating large systems such as proteins.

However, force fields such as Eq. 1-9 do not allow bond breaking or forming. Thus, they

are not able to study reactions. Nowadays, linear scaling techniques in electronic

structure theory are developed in order to fill the gap between force fields and the high

accuracy ab-initio methods.41'42 One example of the linear scaling algorithm is the

DivCon program developed by the Merz group.43

The balance between computational accuracy and cost is the main theme in the

computational chemistry.44 One category of schemes attempting to achieve this balance

is the so-called hybrid quantum mechanical molecular mechanical (QM/MM)

methods.41'45-47 The basic idea of the QM/MM methods is that different regions of a

system may play different roles. For example, if one wants to study an enzymatic

reaction, the potential energy calculation involving the active site should be done by a

quantum mechanical model because the classical force field is not able to describe

bond forming/breaking. On the other hand, the bulk water (assuming no water molecule

participates in enzymatic reaction) and the protein environment of the enzyme can be

represented by the force field in order to save simulation time. In the QM/MM

methodology, different regions of a system are treated by different level of theory and

interact with each other. The QM/MM approaches have become a key area in the

simulation of proteins.48'49

1.7 Molecular Dynamics, Monte Carlo Methods and Ergodicity

Accurately simulating the behavior of a molecule requires more than knowing the

PES. A molecule often has more than one minimum on the PES. Finding the correct

probability distribution of molecular conformations is also important because the

majority of experiments measure molecular properties as averages over molecular









structures. Sampling algorithms such as molecular dynamics (MD) and the Metropolis

Monte Carlo (MC) method are crucial to molecular modeling.

For a system containing N number of particles, there are 6N degrees of freedom

(DOF). Half of the DOF comes from coordinates and the other half represents the

momentum of all particles. The 6N-dimensional space defined by those DOF is called

the phase space. Both MD and MC methods sample the molecular phase space. Over

time, the system will generate a trajectory in the phase space.

MD utilizes the equation of motion to propagate a system in the phase space (The

details of molecular dynamics will be presented in the next chapter). Each particle in the

system has velocity and position and Newton's second law (Eq. 1-10) is applied to

control the dynamics:

F = m a = -VU (1-10)

The force on any particle in the system is given by the negative gradient of the

potential energy. The equation of motion is usually solved numerically. By propagating

the equation of motion, the phase space will be explored and a probability distribution

for DOFs will be obtained. Therefore, molecular properties are able to be computed by

averaging over times:

(A)MD =li too =0 A(ti) (1-11)

In Eq. 1-11, A is the property of interest. t is the total simulation time. N is the size

of the sample taken during the entire simulation. The bracket stands for taking average.

A(ti) is the value of A at time ti in the simulation.

In contrast to MD, the Metropolis MC method (from now on, we will call the

Metropolis MC method as MC method unless otherwise mentioned) does not utilize the









equation of motion. MC method samples the phase space through a Markov chain (the

details of Monte Carlo method will be presented in the next chapter). In MC algorithm, a

new state (for example, a new molecular configuration) is randomly selected and the

transition probability relationship between the current state and the new state is

calculated by the detailed balance equation. Then a Metropolis criterion50 is applied to

accept or reject the transition to the new state. The Markov chain can be applied

because the system is assumed to be at equilibrium. Likewise, after a sufficient number

of transitions, the phase space will be explored and molecular properties can be

computed by averaging over ensemble:

(A)Mc = JA() p(x)d2 (1-12)

Here A(2) is the value of A in state x. p(2) is the normalized probability density of

state x.

The MD and the MC methods represent two different ways of sampling phase

space and computing average molecular properties. According to the ergodic

hypothesis, the time average is equal to the ensemble average:

(A)MC = J A(2)p(x)dx = limrtoo Et=oA(ti) = (A)MD (1-13)

The ergodic hypothesis is often assumed to be true in molecular simulations. This

hypothesis makes MD and MC methods equivalent in sampling phase space. If the

system is ergodic, the phase spaces generated by MD and MC should be the same

because the phase space does not depend on sampling technique. The same behavior

should also extend to any observable properties.

Conformational sampling in a MD or MC simulation is essential in the study of

complex systems such as polymers and proteins. One major concern is that the PES of









a complex system is very rugged and contains a lot of local energy minima.51 Thus,

kinetic trapping would occur as a result of the low rate of potential energy barrier

crossing, especially when the barrier is high. In order to overcome this kinetic trapping

behavior, generalized ensemble methods (advanced sampling methods)52'53 are

frequently employed in molecular simulations. Popular generalized ensemble methods

include multicanonical algorithm,54'55 simulated tempering method,56'57 parallel

tempering method,5860 and replica exchange molecular dynamics (REMD) method.61'62

A more thorough description of MD, MC and the advanced sampling methods will be

presented in the next chapter.

1.8 Theoretical Protein Titration Curves and pKa Calculations Using Poisson-
Boltzmann Equation

Studying protein titration curves theoretically has a long history. As early as 1957,

Tanford and Kirkwood presented their study of protein titration curve.63 In their model,

proteins were considered to be low-dielectric spheres with discrete unit charges on

ionizable residues. They proposed that the pKa value of an ionizable residue can be

calculated from its intrinsic pKa value and pair-wise electrostatic interactions with other

ionizable residues. Calculating the pair-wise electrostatic interactions involves using

empirical parameters. A protein titration curve showing average charge as a function of

pH value was plotted. The Tanford-Kirkwood model was further extended and utilized to

study lysozyme by Tanford and Roxby.64 The equations used to generate a titration

curve in the Tanford and Roxby paper were the same as those Tanford and Kirkwood

used. However, they employed an iterative approach to generate titration curves and

pKa values for all ionizable residues. In their approach, each ionizable residue was

initially assigned a pKa value that is equal to its intrinsic value. At a given pH, the









average charge on each site (representing fraction of deprotonation/protonation) can be

computed. Those average charges were then employed to update pKa values. This

process was repeated until self-consistent average charge and pKa value of a site was

obtained. Therefore, a titration curve can be produced by plotting average charge as a

function of pH value.

In 1990, Bashford and Karplus utilized the finite difference Poisson-Boltzmann

(FDPB) equation in the calculation of pKa values.65 A detailed description of the FDPB

method will be present in the next chapter. The pKa shift of an ionizable residue relative

to a model compound is calculated (in their paper, intrinsic pKa is a quantity defined as

the pKa value of an ionizable residue when other sites are neutral, that is, no

interactions between ionizable sites). Given a molecular configuration, three terms are

calculated by FDPB equation for each ionizable site: the Born solvation free energy, the

pair-wise electrostatic interactions with non-ionizable residues (represented by partial

charges), and the pair-wise electrostatic interactions between ionizable sites. Summing

the three terms yields the electrostatic work of charging the ionizable side-chain, and

hence yields the pKa shift.

A protein titration curve is represented by plotting fraction of protonation vs pH

value. Considering a protein with N ionizable sites and each site can have two states

(protonated and deprotonated), there are 2N possible macro-states and each macro-

state can be represented by an N-dimensional vector. Once the FDPB equation is

solved, free energy differences of each vector relative to completely deprotonated are

computed. Thus, the fraction of protonation of an ionizable site can be calculated by

taking the Boltzmann weighted average of the 2N macro-states.









The FDPB method forms the foundation of the continuum electrostatic (CE)

models, which are frequently utilized when studying protein pKa values.16'65-71 The

FDPB method has been implemented into many modeling software packages such as

UHBD72 and DELPHI.73 Many modifications have been done to improve its

performance. In 1991, Beroza et al. employed the Metropolis MC method to sample 2N

numbers of protonation states, instead of calculating the protonation fraction at a given

pH value directly.74 After using MC sampling of protonation states, the number of

ionizable residues included in the simulation can increase dramatically.

Solving the FDPB equation requires the dielectric constant in a protein as an input

parameter and the dielectric constant is very important because the electrostatic energy

is inversely proportional to it. It is considered as the most important adjustable

parameter in FDPB-based pKa calculations.16 Thus, one question arisen from utilizing

FDPB method is how to choose dielectric constant for proteins. The values between 4

and 20 are typically adopted in the FDPB calculations.67 Direct experimental

determination of the interior dielectric constant is extremely difficult. In practice, the

protein dielectric constants are measured utilizing protein powders, which will cause

problems in interpreting the resulting dielectric constants.18,75,76 Research has been

performed to find an optimal interior dielectric constant for protein pKa predictions.

However, considering the difference in protein environment, no single dielectric constant

can yield experimental pKa values for both internal and surface residues in a protein.77

In 1996, Simonson and Brooks studied charge screening effect and protein dielectric

constant by MD simulations.78 What they found was that protein dielectric constant can

range from ~4 in the interior of protein to a much higher value (~30) in the region near









the surface. As mentioned in section 1.4, the Garcia-Moreno group conducted site-

directed mutagenesis experiments in the deep pocket of a protein where water is

inaccessible and measured the pKa value of mutated ionizable residue.18,19,21-23,77 Then,

the experimental pKa value was put back into FDPB equation in order to examine

protein interior dielectric constant. The protein interior dielectric constants were found to

be ~11.18 Mehler and his co-worker employed a sigmoidal screened electrostatic

interaction to treat the protein dielectric environment.79'80 Their method had been

applied to Glu35 and Asp66 in hen egg white lysozyme and had obtained satisfactory

results.80

Another problem in the FDPB-based pKa calculation is that the FDPB equation is

often solved on the basis of one structure such as X-ray crystal structure. The entropic

effect is missing when a single structure is used. To improve the performance of the CE

model in pKa calculations, protein conformational sampling is also considered in order to

incorporate conformational flexibility into pKa calculations.81-86 In the 1990s, You and

Bashford developed an algorithm in which 36 side-chain conformations of ionizable

residues are adopted in the calculation of pKa values.86 In 1997, Alex and Gunner

proposed to use Monte Carlo method to sample (2M)NLK possible states instead of just

2N protonation states.81 Here N is the number of ionizable residues and each one can

have M possible conformations. Furthermore, each one of the K non-ionizable residue

possesses L number of possible conformations. The Gunner group further extends this

algorithm to the so-called multiconformation continuum electrostatic method (MCCE).83

Recently, Barth et al. proposed a rotamer repacking technique combined with FDPB

method and was given the name FDPB_MF.82 In the FDPB_MF method, the









conformational space of side-chain of ionizable residues was defined by a rotamer

probability distribution. Each rotamer was given a weight and was interacting with other

ionizable residues in a mean-field scheme.

1.9 Computing pKa Values by Free Energy Calculations

MD-based free energy (MDFE) calculations87'88 have also been employed to

predict pKa values. MDFE calculations combine free energy calculation algorithms with

MD propagations. MD propagations sample phase space and generate a

conformational ensemble. Free energy calculation methods calculate the free energy

difference between two states on the basis of the phase space sampled by MD. Free

energy perturbation (FEP) and thermodynamic integration (TI) are two frequently

employed free energy calculation methods and will be explained with more details in the

next chapter. Free energy calculation algorithms such as FEP and TI methods can be

used to compute pKa because Ka is associated with the free energy of reaction.

Early pKa calculations utilizing free energy calculations were conducted by the

Warshel et al.,89'90 Jorgensen et al.,91 and Merz92 with the FEP method and classical

force fields. In the 1980s, Warshel et al. proposed a protein dipole Langevin dipole

(PDLD) model for the pKa calculations.90 In the PDLD model, proteins were treated as

particles having partial charges and polarizable dipoles, while the solvent molecules

nearby were viewed as Langevin dipoles. The bulk water that is far away from ionizable

residues was still treated as dielectric continuum. Electrostatic interactions between

charges and dipoles, and dipoles and dipoles were computed.

Jorgensen et al. combined ab-initio quantum mechanical calculations and classical

FEP calculations in 1989.91 Jorgensen et al. calculated the pKa difference between two

acids, AH and BH. The gas-phase dissociation free energy of AH and BH were








computed by quantum mechanical methods. The solvation free energy calculations

were conducted using MC FEP method for the neutral molecules and the anions. One

shortcoming of their calculations is that only small organic molecules were investigated

due to the computational cost of quantum mechanical methods.

In 1991, Merz performed classical FEP calculations for three glutamate residues in

two proteins (HEWL and human carbonic anhydrease II).92 The glutamate dipeptide was

utilized as a model compound to eliminate the gas-phase dissociation free energy

calculations.

When MDFE calculations utilizing the classical force fields are performed,

quantum effects such as bond forming/breaking cannot be simulated. Thus, the pKa

shift of an ionizable residue relative to its intrinsic pKa value (pKa value of the reference

compound which is defined in section 1.3 of this dissertation) is computed by the free

energy calculations. A diagrammatic explanation of pKa shift calculation utilizing the

MDFE method is demonstrated in Figure 1-7 and Figure 1-8.

Model
AH > A- + H*


AG1 AG2 AG3


AGprotein
Protein-AH > Proetin-A- + H*

Figure 1-7. Thermodynamic cycle used to compute pKa shift. Both acid dissociation
reactions occur in aqueous solution. A thermodynamic cycle is a series of
thermodynamic processes that eventually returning to the initial state. A state
function, such as reaction free energy in this case, is path-independent and
hence, unchanged through a cyclic process.









G model(AH->A-)
AH > A-


AGI AG2



Protein-AH N Proetin-A-

AGprein(proteinAH->proteinA-)

Figure 1-8. Thermodynamic cycle utilized to calculate the difference between AG1 and
AG2. In Figure 1-7 and Figure 1-8, protein-AH represents the ionizable
residue in protein environment. AH represents the reference compound which
is usually the ionizable residue with two termini capped. In practice, a proton
does not disappear but instead becomes a dummy atom. The proton has its
position and velocity. The bonded interactions involving the proton are still
effective. However, there is no non-bonded interaction for that proton. The
change in protonation state is reflected by changes of partial charges in the
ionizable residue.

Equations 1-14 to 1-20 explain how pKa values will be computed from free energy

calculations using force fields:
1
PKa,protein 2.303kT AGprotein (1-14)

1
pKa,model 2303kT Gmodel (1-15)

In Eq. 1-14 and 1-15, AGprotein and AGmodel are the acid dissociation reaction free

energy of the ionizable residue in protein and the reference compound, respectively.

Therefore, the pKa shift between ionizable residue in protein environment and the

reference compound can be calculated as pKa,protein PKa,model 2.33k (AGprotein -

AGmode ). According to the thermodynamic cycle shown in Figurel-6A, (AGproten -

AGmodel) = -AG1 + AG2 + AG3. Here, AGI and AG2 are the free energy difference

between two protonated species, and between two deprotonated species, respectively.









AG3 is equal to zero because the free energy difference between two protons that are in

the same environment is zero. However, calculating AGI and AG2 directly utilizing MDFE

calculations is not preferable because the difference between the reference compound

and the protein system is very large. A simple way to determine the difference between

AGI and AG2 is needed. Therefore, the thermodynamic cycle shown in Figurel-6B is

employed. By utilizing that thermodynamic cycle, (-AG1 + AG2) can be expressed as

(AG(proteinAH proteinA-) AG(AH A-)), where AG(proteinAH proteinA-)

and AG(AH A-) are the free energy difference between the protonated and

deprotonated ionizable residue in protein and the reference compound, respectively.

AG(proteinAH proteinA-) and AG(AH A-) can be further expressed as:

AG(proteinAH proteinA-) =

AGQM(proteinAH proteinA-) + AGMM(proteinAH proteinA-) (1-16)

And

AG(AH A-) = AGQM(AH A-) + AGMM(AH A-) (1-17)

In Eq. 1-16 and Eq. 1-17, the MM in the subscripts stands for the free energy

differences which are calculated by classical force fields. The quantum mechanical

contributions (labeled by QM in the subscripts) to the free energy difference of an

ionizable residue in protein environment and its reference compound are assumed to be

the same:

AGQM(proteinAH -< proteinA-) = AGQM(AH A-) (1-18)

Combining all derivations and assumption, the difference between two acid

dissociation reaction free energies can be written as:

AGprotein AGmodel = AGM(proteinAH proteinA-) AGMM(AH A-) (1-19)









Thus, subtracting Eq. 1-15 from Eq. 1-14 yields:


pKaprotein = pKamode 2.303T (AGMM(proteinAH proteinA-) AGMM(AH -

A-)) (1-20)

AGMM(proteinAH -> proteinA-) and AGMM(AH A-) are are computed by MDFE

calculations (for example, TI). A more detailed description of the MDFE methodology

and how to compute AG(proteinAH proteinA-) and AG(AH A-) will be explained

in the next chapter.

An example of using classical force field MDFE calculations to study pKa values is

given by Simonson et al.15 The pKa values of Asp20 (experimental pKa of 2, which is

lower than the intrinsic Asp pKa value), Asp26 (experimental pKa of 7.5) in thioredonxin,

and Asp14 (with an experimental pKa around 4) in ribonuclease A were evaluated by TI

calculations. The aspartate dipeptide was taken as the model compound; both explicit

and implicit water models were used in their simulations. Proton dissociation was

represented by changes in the partial charges of carboxylic group only. The free energy

change caused by the disappearance of the proton van der Waals interaction was not

considered because the van der Waals radius of the proton in aspartate is zero in the

AMBER force field. Correct protonation free energies have been obtained. Entropic and

enthalpic effects are also correctly obtained. However, several problems have also been

found with the MDFE-based pKa calculations. For example, interactions between

ionizable sites are not able to be incorporated directly. Furthermore, their free energy

differences have shown dependence on the force fields and solvation models.

Hybrid quantum mechanical/molecular mechanical (QM/MM) methods can be

coupled with free energy calculation simulations.48'93 Recently, the Cui group has









conducted pKa calculations using FEP calculations coupled with SCC-DFTB

method.94'95 A detailed description of QM/MM free energy calculations of pKa values can

be found in a recent review by Kamerlin et al.48

1.10 pKa Prediction Using Empirical Methods

Empirical models are also employed to study protein pKa values. According to Lee

and Crippen,16 the seemingly most accepted empirical method is PROPKA which is

developed by the Jensen group.96-101 The PROPKA method involves using 30

parameters obtained from 314 residues in 44 proteins. QM calculations and the

effective fragment potential (EFP) method,102'103 which is a QM/MM method, are

employed to generate those parameters. In the PROPKA method, a pKa value is

calculated by adding "perturbations" to its intrinsic pKa values. Three types of

perturbations are considered: the hydrogen bonding, desolvation effect and charge-

charge interactions. A detailed description of the PROPKA method can be found in a

review by Jensen et al.97

1.11 Constant-pH Molecular Dynamics (Constant-pH MD) Methods

Traditionally, MD simulations have been performed in a manner of constant

protonation state. The protonation state of an ionizable residue is assigned before a MD

simulation is started. Moreover, the protonation states are not allowed to change during

MD propagations. Performing constant protonation state MD simulations requires

knowing the pKa values of all ionizable residues beforehand. Not knowing the pKa value

may result in wrong assignment of protonation state. In addition, if pKa values are near

the solution pH values, constant protonation state MD simulations are not able to reflect

this situation. More importantly, constant protonation state MD simulations cannot be

employed to study the coupling between conformations and protonation states. Thus,









constant-pH MD algorithms were developed in order to correlate protein conformation

and protonation state.104 The purpose of constant-pH MD is to describe protonation

equilibrium correctly at a given pH value. Therefore, its applications include pKa

predictions and studying pH effects. One category of constant-pH MD methods uses a

continuous protonation parameter.105-115 Earlier models include a grand canonical MD

algorithm developed by Mertz and Pettitt in 1994115 and a method introduced by

Baptista et al. in 1997.106 In the Mertz and Pettitt model, protons are allowed to be

exchanged between a titratable side chain and water molecules. Baptista et al. used a

potential of mean force to treat protonation and conformation simultaneously. Later,

Borjesson and HCnenberger developed a continuous protonation variable model in

which the protonation fraction is adjusted by weak coupling to a proton bath, using an

explicit solvent.107'108 More recently, the continuous protonation state model has been

further developed by the Brooks group.109-114 They developed a constant-pH MD

algorithm by the name of continuous constant-pH molecular dynamics (CPHMD). In the

CPHMD method, Lee et al.114 applied A-dynamics116 to the protonation coordinate and

used the Generalized Born (GB)40,117 implicit solvent model. They chose a A variable to

control protonation fraction and introduced an artificial potential barrier between

protonated and deprotonated states. The potential is a biasing potential to increase the

residency time close to protonation/deprotonation states and it centered at half way of

titration (A=1/2). The CPHMD method was then extended by incorporating improved GB

model and REMD algorithm for better sampling. The applications of CPHMD and replica

exchange CPHMD included predicting pKa values of various proteins,110'114 studying









proton tautomerism109 and pH-dependent protein dynamics such as folding112'113 and

aggregation.111

In addition to continuous protonation state models, discrete protonation state

methods have also been developed to study pH-dependence of protein structure and

dynamics.118-131 The discrete protonation state models utilize a hybrid molecular

dynamics and Monte Carlo (hybrid MD/MC) method. Protein conformations are sampled

by molecular dynamics and protonation states are sampled using a Monte Carlo

scheme periodically during a MD simulation. A new protonation state is selected after a

user-defined number of MD steps and the free energy difference between the old and

the new state is calculated. The Metropolis criterion is used to accept or reject the

protonation change. Various solvent models and protonation state energy algorithms

were used in discrete protonation state constant pH MD simulations.

Burgi et al.130 presented their constant-pH MD method using discrete protonation

state model and applied it to hen egg white lysozyme (HEWL). The lysozyme was

dissolved into explicit water. Short TI calculations (20 ps of dynamics) were carried out

to provide classical free energy difference between old and new protonation states at

each MC attempt. The MC move is evaluated based on the following free energy

difference:

AG = kBT In 10 (pH pKa,ref) + AGprot,MM AGref,MM (1-21)

In the above equation, pH is a parameter and represent the pH value of the

solution, pKa,ref is the pKa value of the model compound (reference compound),

AGprot,MM and AGref,MMis the classical force field proton dissociation free energy given

by TI for the protein and reference compound, respectively. One pitfall of the method









developed by Burgi et al. is the choice of simulation time of TI. The 20 ps TI calculation

represents neither single-structure protonation free energy nor an average of the entire

ensemble.

The Baptista group proposed their constant-pH MD method using the FDPB

method to calculate protonation energies and their MD was done in explicit

solvent.118,123-126 The MD propagations are conducted at fixed protonation states. The

MC moves in the protonation states are performed at fixed molecular configurations.

The MD propagation is able to generate a conditional PDF of coordinates and moment

given protonation states, while the MC sampling is able to yield a conditional PDF of

protonation states given molecular configurations. Baptista et al. proved that the hybrid

MD and MC method is able to generate an ergodic Markov chain.118 Hence, conditional

probability distributions yielded by MD and MC are able to generate a joint probability

distribution satisfying semigrand canonical ensemble. The work done by Baptista et al.

provides the theoretical justification for combined MD and MC sampling in the discrete

protonation state constant-pH methods. In practice, MD simulations are conducted in

explicit water to sample conformational space. A new protonation state is selected and

the free energy difference is calculated using the structure at that moment and the

continuum electrostatic model. The MC transition is evaluated and if the move is

accepted, a short MD run is performed to relax the solvent. After solvent relaxation, MD

steps continue for solute and solvent. The Baptista group applied their constant-pH MD

method to the study of protonation-conformation coupling effect,123 the pH-dependent

conformation states of kyotorphin,124 pKa predictions of the HEWL125 and the redox

titration of cytochrome c3.126









Walczak and Antosiewicz also employed the FDPB method to determine

protonation energy but they used Langevin Dynamics to propagate coordinates

between MC steps.128 This method is further extended by Dlugosz and Antosiewicz.119-
122,128 The extended method combines conventional MD simulation using the analytical

continuum electrostatic (ACE)132 scheme to sample conformations with the FDPB

method for the MC moves. Succinic acid119 and a heptapeptide derived from ovomucoid

third domain (OMTKY3)122 have been studied by Dlugosz and Antosiewicz. This

heptapeptide corresponds to residues 26-32 of OMTKY3 and has the sequence of

acetyl-Ser-Asp-Asn-Lys-Thr-Tyr-Gly-methylamine. Nuclear magnetic resonance (NMR)

experiments indicated the pKa of Asp is 3.6,122 0.4 pKa unit lower than the value of

blocked Asp dipeptide. In their studies, the conventional molecular dynamics (MD)

simulations were carried out to sample peptide conformations. Their method predicted

the pKa to be 4.24.

Mongan et al. developed a method combining the GB model and the discrete

protonation state model and implemented it into the AMBER simulation suite.127 In

Mongan's method, the GB model was used in protonation state transition energy as well

as solvation free energy calculations. Therefore, solvent models in conformational and

protonation state sampling are consistent and the computational cost is small. More

recently, the accelerated molecular dynamics (AMD)133'134 method was combined with

Mongan's constant-pH algorithm to enhance conformational sampling.129 This model

has been utilized to calculate pKa values of an enzyme and to explore the protonation-

conformation coupling. The continuous protonation state model developed by the









Brooks group, the discrete protonation state model proposed by Baptista et al. and by

Mongan et al. will be further explained in chapter 2.









CHAPTER 2
THEORY AND METHODS IN MOLECULAR MODELING

Molecular Modeling or molecular simulation is a way to study molecules using

theories developed in the fields of physics, chemistry and biology coupled with the

computer resources. With the development of computer power and parallel

computation, molecular modeling is more and more often involved in the research of

biology, chemistry and physics.42 Understanding the underlying theory and methods of

molecular modeling is necessary in order to perform simulations and analyze the data

generated. In this chapter, the basic theory and methods of constant-pH replica

exchange molecular dynamics method and protein pKa calculations methods are

described.

2.1 Potential Energy Functions and Classical Force Fields

2.1.1 Potential Energy Surface

Molecular modeling studies molecules, which in general possess more than one

configuration for a chemical formula in general. In principle, all possible molecular

configurations need to be considered in order to simulate a molecule correctly. A

potential energy surface (PES), which is a surface defined by the potential energies of

all possible configurations, can be utilized to fulfill this requirement. The concept of PES

is a result of the Born-Oppenheimer approximation. The Born-Oppenheimer

approximation states that the electronic relaxation caused by nuclear motion is

instantaneous because of the huge difference in the masses of electrons and nuclei.

Thus, electronic motion and nuclear motion are decoupled. Electronic energy, which is

computed at a fixed nuclear geometry (molecular structure), is the potential energy of

nuclei at that structure. Local minima on the PES indicate stable conformations of a









molecule. Quantum mechanics forms the foundation of understanding the molecular

behaviors and offers the most accurate way to construct a PES. Ideally, the Schrodinger

equation is solved for electronic energy at all possible nuclear configurations and hence,

yields the PES of a molecule.

2.1.2 Force Field Models

Although quantum mechanical calculations generate very accurate energies,

performing a molecular simulation using quantum mechanical method is too time-

consuming even through the use of parallel computation, especially for large systems

such as polymers and proteins. Force field (equivalent to molecular mechanics) models

have been designed to solve this problem. Force field models ignore electrons and

calculate the potential energy of a system based on nuclear geometry only. Force field

calculations are fast because the potential energy functions are simple and

parameterized.

In a force field model, the potential energy of a system has the following

contributions in general: bond stretching (vibration), angle bending, bond rotation

(torsion), electrostatic interaction, and the van der Waals interaction. The former three

contributions are often called the bonded interactions and the last two belong to non-

bonded interactions.

In many force field models, such as the AMBER force field,135 bond stretching

energy between atoms i and j is the second order truncation of the Taylor expansion of

potential energy function about equilibrium distance and hence, can be formulated as a

harmonic potential:

Ubond = kj(rij rij, )2 (2-1)









where ky is the force constant, ri is the distance between two atoms and ri,eq is the

equilibrium distance between the two atoms. One drawback of this function is that a

bond cannot be broken and has infinite energy when two atoms are infinitely apart.

Therefore, such a potential energy can be applied to bond stretching near equilibrium

distance only. A simplest remedy is to include higher order Taylor expansion terms but

this increases the computation time. For example, expansions up to the fourth-order are

adopted in the general organic force field MM3.136 This Taylor expansion strategy is

also employed in deriving angle-bending potential functions. Torsions (or dihedral

angles) are periodic and hence, Fourier series is adopted as torsion potential energy

function. One example of the formula of torsion potential energy is displayed in Eq. 1-9.

The van der Waals interaction in a force field model should be able to reproduce

the repulsion and attraction between two particles having no permanent charges. This

attractive interaction is generally called dispersion. Quantum mechanics indicates that

the dispersion energy is inversely proportional to the sixth-power of the distance

between two particles (say atoms) i and j (under the dipole-dipole interaction

approximation):137

Udispersion (2-2)
rij6

where by is a constant specific to i and j and ry is the distance between i and j. There

is no theoretical derivation for the repulsive interaction. However, for computational

simplicity, the repulsive energy is taken to be inversely proportional to the twelfth-power

of the distance. A simple way to combine repulsive and attractive potentials is just

adding up the two potentials. Thus, van der Waals interaction is governed by the

Lennard-Jones potential shown in Eq. 1-9. Due to the fact that van der Waals









interaction decays very fast as a function of inter-particle distance, it is often called

"short-range interaction".

Electrostatic interaction is often considered as the "long-range interaction". The

simplest model of electrostatic interaction is the point-charge model which is adopted in

the AMBER force field. Partial charges are assigned to each atom and Coulomb's law is

applied to calculating interaction energy. More complicated models such as calculating

electrostatic energy through dipole moment-dipole moment interaction have also been

employed.137

Bond, angle and torsion interactions are coupled. Thus, the coupling effects

(cross terms) should be incorporated into force fields. Mathematically, cross terms are

generated from multi-dimensional Taylor expansions. For example, the angle-bending

accompanied by two bond-stretching motions (shown in Figure 2-1) is formulated to be

(as in MM3):

Bond -ngle ijk [( rij,eq) + (rik rik,eq )](ijk Oijk,eq) (2-3)








] (k
Stretching-Bending
Coupling
(Cross Term)

Figure 2-1. A diagram showing bond-stretching coupled with angle-bending. A cross
term calculating coupling energy is adopted when evaluating the total
potential energy.

The force field is simply a function and corresponding parameters. Thus, obtaining

parameters is crucial for force field development. Given a potential energy function,









parameters are required to reproduce experimental data or quantum mechanical

calculation results as much as possible.

2.1.3 Protein Force Field Models

Computer simulations of biological molecules often involve thousands of atoms or

even more,138 especially when using explicit solvent models. Many simulations on

proteins choose to use force fields to reduce computational cost. Popular protein force

fields include (but are not limited to) AMBER99SB,139 CHARMM22,140 GROMOS96,141

and OPLS force fields.142 In general, a simple potential energy function like Eq.1-9 is

employed in the protein force fields. Protein force field parameters are in general

optimized on the basis of small molecules. Take the AMBER force field (Eq. 1-9) as an

example; there are bonded and non-bonded terms in it. In the non-bonded terms, the

partial charges are fitted to quantum mechanical calculation using Hartree-Fock/6-31G*

level of theory in vacuum. This level of theory typically overestimates dipole moment,

and hence the resulting partial charges can satisfactorily approximate the condensed-

phase charge distribution. The Lennard-Jones parameters have been obtained from

reproducing liquid properties following the work of Jorgensen et al.142 After the partial

charges are assigned, the Lennard-Jones parameters are fitted to reproduce

experimental data such as heat capacity, liquid density, and the heat of vaporization.

The bond stretching and angle bending parameters are derived by fitting to

structural and vibrational experimental data of small molecules that make up proteins.

The bond and angle parameters should ensure that the geometries of simple protein

fragments are close to experimental data. The torsion dihedrall angle) parameters can

be obtained from quantum mechanical conformational energy calculations. Determining

torsion parameters is often the last step of force field parameter optimizations. Given









the previous obtained individual energy term parameter sets, the torsion parameters are

adjusted to best fit quantum mechanical conformational energies, for example, the

Ramachandran plot of a model compound. Detailed description of the protein force field

parameter determinations can be found in the paper of Cornell et a.,143 MacKerell et

al.,140 and Hornak et al.139

2.2 Molecular Dynamics (MD) Method

2.2.1 MD Integrator

As mentioned in the introduction, MD samples the phase space utilizing the

equation of motion. A trajectory in the phase space will be generated over time. The

ergodic hypothesis is assumed to be true, that is, the time average of any property at

equilibrium is equivalent to the ensemble average. Thus, given a set of initial positions

and moment and a method to compute forces, a MD simulation can be applied to any

system. For a simple system such as a harmonic oscillator moving along one axis, there

exists an analytical solution of the trajectory (the coordinate and momentum as a

function of time can be expressed analytically). However, it's almost impossible to know

the analytical solution of complex systems such as polymers or proteins. Therefore,

numerical integrators are implemented to propagate positions and velocities of particles.

One of the frequently used integrator is the leap-frog algorithm:41'144

(t + At) = q(t) + ( t +I At) At (2-4)

v(t+ t) = (t At)+ a(t)t (2-5)

(t) =- F(t) VU(t) (2-6)
a(t>)=- (2-6)
m m

Here, q and v stand for the position and velocity of a particle respectively; a(t), F(t)

and U(t) represent the acceleration, the force and the potential energy at time t; and At









is the time step used in MD simulation. One frequently employed potential energy

function is the force field model introduced in the previous section. According to Eq. 2-4,

2-5 and 2-6, the leapfrog algorithm propagates positions and velocities in a coupled

way. The velocity at time t can be calculated by velocities at t + At and t At by the

following equation:

v(t) = [vt + At) + v(t At) (2-7)

One important issue in the MD propagation is choosing a proper time step that

optimizes speed of propagation and accuracy of the simulation. A too small time step

will waste simulation time in sampling the same conformation, whereas a too large time

step can bring two atoms too close and hence cause instability of the trajectory. In

general, a time step is a tenth of the period of fastest motion. In biological molecules,

the fastest motion is the bond stretching and bonds with hydrogen atoms in particular.

Thus, one way to increase time step without reducing accuracy is to remove the degree

of freedom having highest frequency. One commonly employed algorithm to achieve

this goal is the SHAKE algorithm.145 When using the SHAKE algorithm to remove

heavy-atom-to-hydrogen DOF, the heavy-atom-to-hydrogen bond length is fixed. The

fixed bond lengths act as distance constraints between heavy and hydrogen atoms.

Lagrangian multipliers have been utilized to keep the bond lengths constant. By

employing the SHAKE algorithm, a large time step such as 2 fs could be used. Methods

that can integrate the equation of motion more efficiently are popular area of research.

2.2.2 Thermostats in MD Simulations

Before describing thermostats in MD simulations, the concept of thermodynamic

ensemble (statistical ensemble) should be introduced first. An ensemble is a large









amount of replicas of the system of interest (it may contain infinite number of replicas).

All replicas in an ensemble are considered at once. Each replica represents the system

in one possible state. Thermodynamic ensembles are characterized by macroscopic

thermodynamic properties. Several frequently employed thermodynamic ensembles are

microcanonical ensemble (NVE ensemble), canonical ensemble (NVT ensemble),

isothermal-isobaric ensemble (NPT ensemble), and grand canonical ensemble.

MD simulations are controlled by Newton's second law. This makes a MD

simulation conserve the total energy and represent a system in the microcanonical

(NVE) ensemble, where number of particles (N), volume (V), and total energy (E) are

constant. However, our system of interest is in the canonical (NVT) ensemble, in which

number of particles (N), volume (V), and temperature (T) are constant. Therefore,

maintaining a constant temperature in a MD simulation is necessary. Any algorithm that

can maintain constant temperature and approximate the NVT ensemble is called a

thermostat. Popular thermostats include Berendsen thermostat,146 Langevin

dynamics147 and Nose-Hoover thermostat.148 The Berendsen thermostat and Langevin

dynamics are utilized in our MD simulations and thus explained here.

In a MD simulation, the temperature can be written as:

T 1N mi (2-8)
S(3N-n)kB i=l 2

Here N is the number of particles, n is number of constrained degree of freedom,

mi and vi are the mass and velocity of particle i. Thus, temperature is a function of

velocities of all particles. The simplest way to control temperature is to rescale velocity

at each time step. However, this will cause discontinuity in the momentum trajectory in

phase space.









Berendsen et al. introduced a weak coupling method to an external heat bath to

MD simulations. The heat bath can add or remove heat from the system in order to

maintain a constant temperature. The rate of temperature change is governed by Eq. 2-

9:

dTt =1 (To T(t)) (2-9)
dt TT

where To is the temperature of the bath and CT is the coupling time which indicates the

time scale a system relaxes to target value. By employing a coupling time, the MD

propagation can avoid sudden change in velocities.

Since temperature is computed from velocities of all the atoms, what the

Berendsen thermostat really does is to multiply all velocities with a scaling factor 2

(shown in Eq. 2-10) in order to rescale the current temperature Tto the target value To.


S= At 12 (2-10)
I TT

By rescaling velocities, the Berendsen thermostat controls the temperature in MD

simulations. As mentioned before, the coupling time t, determines how tightly the

system and the heat bath coupled together. A large t, means the coupling is weak. It

takes long time for the system to relax from current temperature to target temperature.

As T, co, the internal energy will be conserved and the microcanonical ensemble will

be restored. If t, is small, the coupling between the system and the heat bath is strong

and the velocity scaling factor is large. However, large velocity scaling factor will cause

large disruption in the momentum part of the phase space trajectory. The larger the

scaling factor is, the less natural the trajectory is.









Langevin dynamics belongs to the category of stochastic thermostat.137 It mimics

the Brownian motion of a particle. Instead of Newton's second law, the equation of

motion of MD method when using stochastic thermostat becomes:

di_ 1 VU
d- = -- -yi +A(t) (2-11)
dt mi dqi

In Eq. 2-11, vi, qi and mi are the velocity, position and mass of particle i

respectively, U is the potential energy, y is the friction coefficient and A(t) is a random

force at time t. The amplitude of this force is determined by fluctuation-dissipation

theorem (Eq. 2-12).

(Ai (tl)Aj(t2)) = 2ykBT6i (tl t2) (2-12)

(Ai(t)Aj (t2)) is the time correlation of A on particle i at time t, with A on particle j

at time t2. y is the friction coefficient, kB is the Boltzmann constant, T is the temperature,

6,i is the Kronecker delta function and S(t, t2) is the Dirac delta function. Langevin

dynamics can be used as thermostat because the equation of motion is temperature

dependent via the random force term.

2.2.3 Pressure Control in MD Simulations

Most biological experiments are performed in a constant pressure and constant

temperature situation (NPT ensemble). Therefore, pressure control techniques

(barostats) should be used in simulations to maintain system pressures and it is done

by adjusting the system volumes. Since the number of particles is constant during a

simulation, another application of maintaining pressure is to regulate system density

which should be at certain appropriate value. A generally employed barostat is the

Berendsen barostat.146









The pressure of a system in a simulation is calculated using the virial theorem of

Clausius and can be expressed as:

p= INkBT .I E .i+ r .j (2-13)

In the above equation, P is pressure, N is the number of particles, and T is the

temperature. ri and v(?n) are the distance and interaction energy between atoms i and

j, respectively.

Analogous to temperature control, the pressure can be maintained simply by

rescaling volume at each time step although the system volume will be disrupted too

much. Berendsen barostat was developed in order to smooth the change in volume.

The Berendsen barostat, in which the algorithm is the same as Berendsen thermostat,

utilizes a pressure bath. The rate of pressure change is governed by following equation:

dP(t) 1(Po- P(t) (2-14)
dt rp

where -p is the coupling constant and Po is the pressure of the bath.

The change in pressure is reflected by adjusting system volume. The coordinates

of all particles in the system are scaled by a factor 11/3 and A is formulated as:

A = 1 K -(P Po) (2-15)
Tp

The K in the above equation is the isothermal compressibility. It represents the

volume fluctuation caused by pressure change:

K = -- (2-16)
v ap









2.3 Monte Carlo (MC) Method

2.3.1 Canonical Ensemble and Configuration Integral

In statistical mechanics, an ensemble is a collection of a very large number of

systems and each system is a replica (on a thermodynamic level) of a particular

thermodynamic system of interest. If the thermodynamic system of interest has a

volume of V, N number of particles and temperature T, then an ensemble containing a

very large number of such systems is called the canonical ensemble. The canonical

ensemble is important because it best represents systems of interest in practice.

Because each system of the canonical ensemble is not isolated, the energy of each

system is not fixed. Thus, there is a probability of finding a system with energy Ei and

the probability distribution of systems in the canonical ensemble is the so-called

Boltzmann distribution (Eq. 2-17).

pi = le-Ei/(kBT) (2-17)

Here Q is the partition function and is essentially a normalization factor. E, is the

quantum energy of a system.

Q = Zi e-El/(kBT) (2-18)

In classical mechanics, the Hamiltonian function H is employed to describe the

total energy of a system and can be expressed as H(p, q) where p and q are moment

and positions respectively. In general, the Hamiltonian can be separated into kinetic

energy which depends only on moment and potential energy which depends only on

positions. In addition to using the Hamiltonian instead of quantum energy, the energy

levels become continuous in the classical limit. Thus, the partition function will be written

as an integral.









Q = ffe-PH(pq)dpdq (2-19)

Here /p = 1/(kBT). After integrating the kinetic energy term, the partition function

has the form of Eq. 2-20 and is called configuration integral.

Z = f e-U() dq (2-20)

Thus, the Boltzmann distribution in the classical limit is given by Eq. 2-21:

P = e-pu (2-21)
z

2.3.2 Markov Chain Monte Carlo (MCMC)

The definition of Markov chain is crucial to the MCMC methods, so it will be

explained first in this section. Consider a stochastic process at discrete steps (tl,t2, ...)

for a system that has a set of states (S,S2, ...) with finite size. We define that the system

is in state Xt at step t. The conditional probability of XtS = S, given that Xtn_, is in state

Si, etc, is:

P(Xtn = Si IXt_, = SiXtn2 = Sk, ...Xt, = Sh) (2-22)

A Markov process is defined in Eq. 2-22 with the property that the conditional

probability of Xt, = S, only depends on its previous state Xt"_ = Si:

P(Xtn = Si IXtn_ = SXtn-2 = Sk, ...,Xt = Sh) = P(Xtn = Sj IXtn_ = Si) (2-23)

The corresponding sequence of states (XI,X2,...) is called a Markov chain. The

conditional probability P(Xt, = Sj IXt,_ = Si) is essentially the transition probability from

state Si to S, and is denoted as w(i j). Based on the probability theory, a transition

probability has the properties w(i j) 2 0 and yi w(i j) = 1. Thus, the probability of

Xt, = S, can be written as:

P(Xt, = Sj) = P(Xtn = Sj IXtl = Si) P(Xt,, = Si) = w(i j)P(Xt n = Si) (2-24)









A change in P(X,, = Si) with respect to step is governed by the master equation:

dPX = i w(j i)P(Xt. = S) + Ej w(i j)P(Xt = Si) (2-25)

At equilibrium (or under steady-state approximation), it is clear that P(Xt = Si)

should not change with steps. This leads to:

w(j i)(X = Si) = Ej w(i j)P(Xtn = Si) (2-26)

Since the Markov chain introduced above possesses discrete and finite number of

states, the transition probability can be described as a matrix, which is called the

transition matrix. The (i,j)th element of the transition matrix represents w(i j). The

probability distribution can be represented by a row vector. Multplying a probability

distribution with transition matrix will generate a new probability distribution. If a Markov

chain is time-homogeneous (the definition of time is essentially a step due to the

stochastic nature of a Markov chain), the elements of transition matrix are constants

(time-independent). When a probability distribution vector is not changed by multiplying

with the transition matrix, the distribution is said to be stationary.

At equilibrium, the elements of the transition matrix are independent of time. The

equilibrium distribution is an eigenvector of the transition matrix with an eigenvalue of 1.

Hence, multiplying equilibrium probability distribution with transition will not change it.

Properties of a Markov chain include: a Markov chain is irreducible, if all states

communicate with each other; a Markov chain is called periodic, if number of steps

needed to move between two states is not periodic; it is positive recurrent, if the

expectation value of the return time to a state is finite. These properties are closely

related to the ergodicity of a Markov chain.









The MCMC methods are Monte Carlo samplings from a probability distribution by

employing a Markov chain whose equilibrium probability distribution is the intended

probability distribution. States sampled by Monte Carlo method form a Markov chain.

The transitions in MCMC must satisfy the detailed balance equation:

w(j i)P(Xeq = Si) = w(i j)P(Xeq = Si) (2-27)

A Markov chain is said to be reversible when it satisfies the detailed balance

equation.

2.3.3 The Metropolis Monte Carlo Method

In 1953, Metropolis et al.50 proposed an algorithm to sample the phase space of a

system at equilibrium by the MC method. According to the Metropolis algorithm, at

configuration i, a new configuration j is chosen, both configurations are weighted by

Boltzmann distribution (Eq. 2-21) and the detailed balance condition (Eq. 2-27) is

employed to evaluate the transitions (MC moves) between configurations,

P(i)w(i j) = P(j)w(j i) (2-28)

In the above equation, P(i) is the Boltzmann weight of configuration i and w(i -> j)

is the transition probability from configuration i toj. Inserting Eq. 2-21 into Eq. 2-28 and

rearranging Eq. 2-28 yields:

w(i-j) -P)= P -fe(U()-U(i)) = e-A (2-29)
w -j i) P(i)

And the transition probability from configuration i toj can be written as:

w(i j) = minl{1, e-^} (2-30)

In practice, the new configuration is accepted if A < 0. However, if A > 0, a random

number between zero and one is generated and is compared with e-A. If the random

number is less than or equal to e-, then the new configuration is accepted. Otherwise,









the current configuration is kept and is added to the configuration ensemble. This

accept/reject criterion is the so-called Metropolis criterion. The MC sampling with the

Metropolis criterion generates a Markov chain whose equilibrium PDF is the Boltzmann

distribution. Compare the Metropolis MC with MD, MC method simulates a system in

the canonical ensemble without controlling temperatures; the bottleneck of MC sampling

is the potential energy difference while the bottleneck of MD is the energy barrier.

2.3.4 Ergodicity and the Ergodic Hypothesis

In statistical mechanics, ergodic (adjective of ergodicity) is a word used to describe

a system which satisfies the ergodic hypothesis. The ergodic hypothesis states that

over a long period of time, the time average and the ensemble average of a property

should be the same. In our simulations, the ergodic hypothesis is often assumed to be

true. Ergodicity breaking (the ergodic hypothesis does not hold) often means that the

system is trapped in a local region of the phase space. One example when the ergodic

hypothesis does not hold is the spontaneous magnetization of a ferromagnetic system

below Curie temperature. The ensemble average of net magnetization is zero since spin

up and spin down are degenerate states and the population of either states should be

the same. However, a net magnetization exists when temperature is below Curie

temperature. Ergodicity is often discussed in a Markov chain. A Markov chain is called

ergodic when all its states are irreducible, periodic and have positive recurrent.

2.4 Solvent Models

Because proteins are stable and perform their functions in condensed phase,

especially in aqueous solution, representing the solvation effect is of great importance.

One frequently used solvent model in MD simulations is the water model. Two ways of

representing aqueous solution are present here: the explicit and the implicit solvent









models. As its name indicates, the explicit water model employs water molecules in the

simulation and the implicit water model treats water as a dielectric continuum.

2.4.1 Explicit Solvent Model

Different types of water molecules such as SPC/E,149 TIP3P,150 and TIP4P150 are

developed. Water molecules parameters are fitted to bulk water properties such as

density, heat of vaporization, and dipole moment.150 The density of liquid water is an

important physical quantity to check the water models. The density of liquid water

shows a maximum at 40 C and water models should correctly reflect this. TIP3P failed

to achieve that, while TIP4P and TIP5P151 and their variants were able to reproduce this

trend. Take the TIP3P and TIP4P water models as examples. A simple diagrammatical

description of TIP4P and TIP4P water models are shown in Figure2-2. The TIP3P water

model has one oxygen atom and two hydrogen atoms. The geometry of TIP3P water is

the same as experimental geometry with OH bond length of 0.9572 A and HOH angle of

104.520. Only oxygen atom has a van der Waals radius. Thus, the van der Waals

interactions only occur among oxygen atoms. Partial charges are placed on oxygen

atom and hydrogen atoms. The partial charge on the oxygen atom is -0.834e and the

partial charge on each hydrogen atom is 0.417e, where e is the charge of an electron.

When computing interactions (Coulomb interaction and Lennard-Jones interaction)

between two TIP3P water molecule, there are 3x3=9 distances needed to be

calculated. The TIP4P water model, as its name implies, has four sites. Similar to the

TIP3P water model, experimental geometry (bond length and bond angle) is also

adopted in the TIP4P model. The only atom, in the TIP4P molecule, having the van der

Waals interaction is oxygen too. However, for the TIP4P model, the negative partial









charge is located on the fourth site, instead of being placed on the oxygen atom, as in

the TIP3P model. The use of the fourth site carrying negative charge is able to improve

electrostatic properties of water such as dipole moment. The positive partial charges are

still placed on hydrogen atoms. The new partial charges are -1.04e and 0.52e. New

Lennard-Jones potential parameters have also been employed for the TIP4P water

model to achieve better fitting results. Computing the interactions between a pair of the

TIP4P molecules requires knowing 9 distances for electrostatic interactions and 1

distance for the Lennard-Jones potential. Therefore, using TIP4P model in a simulation

will be computationally more expensive than using TIP3P model. For a five-site water

model such as TIP5P, 17 distances are needed in order to calculate water-water

interactions.

When simulating a molecule with explicit water molecules, the periodic boundary

condition (PBC) is utilized in order to mimic reality.152 Otherwise, water molecules

evaporate into vacuum. Ewald summation153 or Particle-Mesh Ewald (PME)

summation154 is employed to compute the long-range electrostatics efficiently when the

PBC is employed.

One advantage of employing the explicit water model is that the solvent-solute

interaction can be represented. For example, studying the hydrogen bonding between

water molecules and proteins requires using the explicit water model. However, it

suffers from computational cost. CPU time is approximately proportional to number of

inter-atomic interactions.









q=-0.834e


0.9752 A


0.9752


0.9752


0.9752 A


q=0.417e q=0.417e A q=0.52e q=0.52e B

Figure 2-2. A diagrammatic description of TIP3P and TIP4P water models. A) TIP3P
model. The red circle is oxygen atom and the black circles are the hydrogen
atoms. Experimental bond length and bond angle are adopted. B) TIP4P
model. Oxygen and hydrogen atoms are labeled with same color as in the
TIP3P model. TIP4P model also employs the experimental OH bond length
and HOH bond angle. Clearly, the fourth site (green circle) which carries
negative partial charge has been added to the TIP4P model.

2.4.2 The Poisson-Boltzmann (PB) Implicit Solvent Model

An alternative way of representing solvation effect is to reproduce the PES after a

molecule is dissolved in solvent. The solution-phase potential energy of a molecule can

be computed by adding solvation free energy to the gas-phase potential energy. Given

the correct solution-phase PES, correct forces can be generated for the equation of

motion. Thus, the key issue is finding the accurate free energy of solvation. A dielectric

continuum model can be employed to calculate free energy of solvation. In the dielectric

continuum model, the free energy (work) of assembling a charge distribution is

expressed as:

G = Jp(r')(r)dq (2-31)

Here p(r) is the charge density of the molecule and 0(r) is the electrostatic

potential.

The Poisson-Boltzmann model utilizes the Poisson-Boltzmann equation to

describe the electrostatic potential as a function of charge density. In practice, the









linearized PB equation (Eq. 2-32), which utilizes the first order truncation of Taylor

series expansion of the hyperbolic sine, is often employed.

V [E(r)V0(r)] = -47tp(r) + E()/i()K204() (2-32)

In the above equation, E is the dielectric constant, A is a switching function which is

zero when electrolyte is inaccessible and otherwise one, and K2 is the Debye-Hcckel

parameter.

For simple cases such as spherical charge distributions, the solutions to PB

equation are analytical and simple. Consider dissolving a sphere with charge q and

radius a and the charge is uniformly distributed on the surface. The charge density on

the surface can be expressed as:

p(x)= q (2-33)
47ra2

Here x is any point on the surface. From outside of the sphere, the electrostatic

potential at r is calculated by:

(r) q (2-34)

Integrating the right-hand side of Eq. 2-31 from infinity to a with Eq. 2-33 and Eq.

2
2-34 will yield G = The free energy of solvation is the difference between gas-phase
2Ea

and solution-phase free energies. Thus, it can be written as:

cAGso = 1- 2 (2-35)

This is the so-called Born equation and is the basis of the generalized Born (GB)

method which will be introduced later.









For complex systems such as proteins, there is no analytical solution to the

linearized PB equation.73 Therefore, this equation is solved iteratively until self-

consistent is achieved for the charge density and electrostatic potential.

2.4.3 The Generalized Born (GB) Implicit Solvent Model

Solving the linearized PB equation is computationally expensive. An approximate

method to the PB implicit solvent model is proposed as the GB method.39'117 Using the

GB implicit solvent can greatly shorten the simulation time, which makes the GB

frequently employed in molecular simulations. Similar to Eq. 2-35, the free energy of

solvation in the GB method is given by:

AGso0 = 2-1) fj (2-36)

Here qj and qj are charges on nuclei i and j. fGB is calculated by:


fGB = (rij + aae- /4ai) (2-37)

Here ai is the effective Born radius of charge qi, and riy is the distance between

the two charges.

Another approximation in the GB method is the Coulomb field approximation.40

This approximation estimates the effective Born radius by integrating the energy density

of a Coulomb field over the molecular volume. The integral is often evaluated

numerically. One should notice that the GB theory involves two approximations to

reproduce the PB results. The first approximation contains Eq. 2-36 and 2-37. The

second one is the Coulomb field approximation. Further approximations are often

introduced to reduce the time computing the effective Born radii in practice. The pair-

wise approximation155 is often applied. In this approximation, the van der Waals radius









of an atom and a function dependent on positions and the van der Waals radii of atom

pairs are utilized to compute the effective Born radius.

2.5 pKa Calculation Methods

2.5.1 The Continuum Electrostatic (CE) Model

The basic idea of the CE model is also given in Figurel-6. Since computing the

pKa value of an ionizable residue in a protein directly is difficult (breaking a bond plus

dissolving all species into water), a model compound is utilized and the pKa shift is

calculated via the thermodynamic cycles shown in Figure 1-7 and Figure 1-8. Like the

MDFE calculations, the CE model also computes the pKa value of an ionizable residue

relative to its intrinsic value (or model compound value according to the definition of

Bashford and Karplus; the definition of the intrinsic pKa can be found in section 1.3).

The pKa value of an ionizable residue is written as:

pKa(protein) = pKa(model) + 2.303k(GAH (protein) AGAHA(model)) (2-38)

In the above equation, pKa(model) is the intrinsic pKa value of an ionizable

residue and can be found in Table 1-1. AGAHA(protein) and AGAHA(model) is the free

energy difference between protonated and deprotonated species for that ionizable

residue and its reference compound (the reference compound utilized in the CE model

is an isolated ionizable residue with two ends capped and fully exposed to aqueous

environment.), respectively. Eq. 2-38 is essentially the same as Eq. 1-20. The difference

between MDFE methods and the CE model is how the free energy differences between

the protonated and deprotonated species on the right-hand side of Eq. 1-20 are

generated. MDFE methods compute the two free energy differences via free energy

calculation algorithms while the CE model calculates them via FDPB method. In this









continuum electrostatic model, proteins are considered as low-dielectric regions

surrounded by high-dielectric continuum representing water. Protonation is represented

by adding a unit charge to the ionizable site.

In the continuum electrostatic model, AGAHA(protein) and AGAHfA(model) are

assumed to differ only in their electrostatic contributions. This assumption will result in

the cancellation of non-electrostatic free energy contributions. Thus, calculating the

electrostatic work of charging a site in the ionizable residue and in the reference

compound from zero to unit charge is required. This electrostatic work can be further

decomposed into three terms. For any ionizable site in a fixed protein structure, the

electrostatic work consists of three terms: the Born solvation free energy (AGBorn), the

background free energy which is the interaction of the ionizable site with non-ionizable

charges (AGback), and the interaction with other ionizable sites (AGinteract ). For the

reference compound, only the first two terms exist. Thus, AGAHA(protein) can be

written as:

AGAH _A(protein) = AGorn (protein) + AGback (protein) + AGinteract (protein)(2-39)

And AGAHA(model) can be written as:

AGAHA(model) = AGorn (model) + AGback (model) (2-40)

Linearized PB equation (described in Section 2.3.2) is solved for electrostatic

potentials using finite difference method. For an ionizable site i, the Born solvation is

determined by Eq. 2-35. The background free energy is calculated using Eq. 2-41:

AGback = k qi qkp(r, ,rk) (2-41)

Here qk is non-ionizable partial charge and (rT, rk) stands for the electrostatic

potential produced at rkby a unit charge place at ri. The electrostatic interaction with









other ionizable sites can also be evaluated by Eq. 2-41 except that charges on ionizable

sites must be used. After computing all components on the right-hand sides of Eq. 2-38

and Eq. 2-39, the pKa of ionizable residue i will be obtained.

To produce a titration curve, a protein containing N ionizable residues is

considered here. Each ionizable residue has two states: protonated and deprotonated.

Thus, there are 2N numbers of macro-states for that protein. Each macro-state can be

represented by a vector x=(x,, x2, XN), whose elements xi is 0 or 1 according to

whether ionizable site i is deprotonated or protonated. The free energy of x relative to

the vector whose components are all zero (this is equivalent to the free energy change

when charging the non-zero components in the vector) is given by Eq. 2-42:

AG(j) = WAGi -xi +2i=1Wj ,(q +xi)(qf +x) (2-42)

Here AGi = AGBorn (protein) + AGback (protein) AGBorn (model) AGback (model)

for ionizable site i, Wij is the electrostatic interaction between unit charges at ionizable

site i and j, and qO is the charge of site i when it is in the deprotonated state. Thus, 80,

which is the fraction of protonation of site i, can be written as (Eq. 2-43):

0 -xe e-paG()-2.303v( )pH
Oi e-pAG(n)-2.303v(2)pH (2-43)

Here p/ = 1/kT and v(x) is the number of non-zero components in x. Summing

up individual 0i will generate a titration curve of the entire protein.

2.5.2 Free Energy Calculation Methods

As mentioned previously, the pKa value is proportional to the standard free energy

of reaction. Therefore, free energy calculation methods can be employed to compute

the pKa value of ionizable residue one is interested in. In this section, two frequently









used free energy calculation methods: thermodynamic integration (TI)156,157 and free

energy perturbation (FEP)158 are described. Both TI and FEP belong to the so-called

"slow-growth" or equilibrium method and can be employed to compute the free energy

difference between two states. In other words, each transition should be reversible.

In the TI method, initial state A (having potential energy UA(q), where q is the

molecular structure) and final state B (having potential energy Us(q)) are connected by

a reaction coordinate A (this reaction coordinate does not necessarily have any physical

significance). The simplest scheme of constructing the potential energy as a function of

A is:

U(A) = (1 )UA +UB (2-44)

Slowly transforming A from zero to one converts state A to B; the intermediate

values of A correspond to a mixed system without physical meaning.

The Helmholtz free energy A in the canonical ensemble (or the Gibbs free energy

G in the isothermal-isobaric ensemble) is formulated as:

A = -ksT In Q = -ksT InZ (2-45)

where Q is the partition function and Z is the configuration integral. From now on, our

derivation will focus on the canonical ensemble and the Helmholtz free energy but can

be extended to isothermal-isobaric ensemble and the Gibbs free energy in the same

manner (this statement also holds when the free energy perturbation method is

described later). Following Eq. 2-45, the Helmholtz free energy as a function of A is:

A(A) = -kBT In Z(A) = -ksT f e-U(q,~)/kBTdq (2-46)

Here, U is the potential energy function and q is molecular structure.

The free energy difference can be written as:









AAA-B = AB AA = fo A/dA dA (2-47)

Then,

aA (A) In Z(A) 1 0Z(A)
OA(= --kBT nn = -kBTiZ( (2-48)
an an z(n) an

Plugging the explicit form of configuration integral into the derivative leads to:

S-A e-U(q,,A)/kBTdq (e-(q)kBdq (2-49)

a (e"U(qB e-U(q,)/kBT(_l/kaT) au() (2-50)
OA OA

Therefore,

-kT z()= kT e-U(q,)/kBT(- 1/kT)an) dq (2-51)
z(n) 0an z(= a

Since the integration is over coordinate space, the configuration integral can be

moved into the integral. Eq. 2-51 now becomes:

OA(A) 1 Z(A) -U(q,_)/kBT 1)
= _-kT z _= f B dq (2-52)
a0 ) 0z Z(1) dA

The first term in the integrand is the Boltzmann weight factor P(q,A). Rewriting Eq.

2-51 yields:

OAA) U ) UW(A)
d= fP(qA) dq = ( -n (2-53)

Thus, the final form of AAAB is:

AAA-B = Jo OA/d dA = fo (-)n dA (2-54)

In both Eq. 2-53 and 2-54, the bracket represents an ensemble average generated

at 2.

In pKa calculations, state A (or B) represents the protonated species and the other

represents the deprotonated species. Each intermediate value A corresponds to a

mixed protonated and deprotonated state, without any physical meaning. When









classical force fields are applied, the proton becomes a dummy atom in the

deprotonated state but retains its position and velocity in the protein (or model

compound). Furthermore, state A and B only differ in charge distributions. Dissociation

free energy can be computed using methods of numerical integration (such as

trapezoidal rule or Gaussian quadrature) to treat Eq. 2-54. As explained in the previous

chapter, the quantum mechanical contributions to the proton dissociation free energy

are assumed to be the same for protein and the model compound. Therefore,

subtracting dissociation free energy of model compound from that of protein will yield

the pKa shift relative to the pKa value of the model compound.

The FEP method, which was initially introduced by Zwanzig in 1954,158 is another

frequently employed free energy calculation method. Consider two states (A and B) with

partition functions QA and QB, respectively, and the Helmholtz free energy AA and Ag,

respectively. The free energy difference from A to B can be expressed as:

AAA-B = AB AA = -kBT ln(QB/QA) (2-55)

Suppose the configuration integrals Z are adopted instead of partition functions.

The potential energy function of state A and B is UA(q) and Us(q), respectively, where q

is the molecular structure. Thus,

AAA B = -kT ln(ZB/ZA) = -ksT n(f(e-UB (q)/k T/ZA)dq) (2-56)

According to Zwanzig, Us(q) can be written as the sum of UA(q) and a

perturbation term AU(q).

UB(q) = UA(q) + AU(q) (2-57)

AAAB = -kBT In(f(e-(UA (q)+AU(q))/kB T/ZA)dq) (2-58)

AA _B ---kBTln ( U A(q)/kBT'-e-AU(q)/kBT )
AAAzB -kT n T e -U A dq) (2-59)
ZA









The Boltzmann weight factor of state A has the form:

PA(q) = eUA(q)/kBT/ZA (2-60)

Therefore,

AAA-B = -kBTln(f PA(q)e-AU(q)/kBTdq) = -kBT In(e-AU(q)/kBT)A (2-61)

The bracket with subscript A stands for the ensemble average performed on the

structural ensemble generated from state A. Substituting AU(q) with Us(q) UA(q), Eq.

2-61 becomes:

AAA-B = -kBTln(e(UB(q)-UA(q))/kBT)A (2-62)

In order to compute AAA-B, one simulation of state A is performed. Once a

configuration q is generated, the potential energy difference at configuration q is

computed. The ensemble average of e-(UB(q)-UA(q))/kBTcan be calculated easily and

hence, AAA-B is obtained. According to Eq. 2-62, if the potential energy difference

between the two states (perturbation) is too large, the free energy difference given by

FEP calculation can be unreasonably large. Thus, FEP calculations cannot accurately

reflect the true free energy difference of large changes in Hamiltonian (basically,

potential energy). Only similar Hamiltonians contributes to the free energy difference.

In order to compute the free energy difference between two very different systems

(such as calculating free energy difference from benzene to toluene), intermediate

systems mixing the two very different systems (end points) are adopted in such a way

that the differences between neighbors can be treated as perturbations. To be specific,

a coupling parameter can be adopted in the same fashion as TI. The sum of free energy

difference between intermediate systems (each intermediate state has a specific

coupling parameter A,) will be the targeted free energy difference.









In practice, computing AAAB (forward free energy difference) is equally easy (or

hard) as computing AABA (backward free energy difference) and one is exactly the

opposite of the other in principle. Evaluation of forward and backward free energy

differences provides an indication of convergence. The Bennett Acceptance Ratio

(BAR) method159'160 is a frequently employed scheme to reduce sampling bias and

statistical error.

In 1985, Jorgensen et al.161 proposed a "double-wide" scheme to perform FEP

calculations in order to reduce the computational cost. The double-wide FEP can be

explained by the following example. Suppose AA(A2 -, A,) is to be computed. Instead of

performing two MD simulations at Ai and A, only one MD simulation at A(lj is
\2 I

conducted. The AA (At A,(j)) and AA (A, -, A(i)) are calculated then the objective

free energy difference can be obtained. If N configurations of each MD simulation are

taken in order to compute AA(At -, A), the conventional FEP scheme requires 4N

potential energy calculations, while double-wide FEP only requires 3N.

2.5.3 Constant-pH MD Methods

As described in the previous chapter, the constant-pH MD methods want to

describe protonation equilibrium correctly at a given pH value. The constant-pH MD

models sample protonation state space explicitly, along with the sampling of

conformational space. In practice, two protonation state sampling schemes have been

developed. One scheme utilizes a binary protonation state space: only the protonated

and deprotonated states are defined. MC steps have been performed periodically during

MD propagations, which sample the conformational space. At each MC step, a new









protonation state is selected and the free energy difference between the old and new

states is computed. The Metropolis criterion is the applied to evaluate the MC move.

Since a binary protonation state space is adopted, this scheme is generally called the

discrete protonation state model. The other scheme employs a continuous protonation

state space. Not only the completely protonated and deprotonated species are defined,

fractional protonation states also exist in the simulation. The MD propagations sample

both conformational and protonation state space. The latter scheme is named

continuous protonation state model. In this section, the CPHMD model developed by

the Brooks group and two discrete protonation state constant-pH MD methods

developed by Baptsta et al. and by Mongan et al. are described to provide a brief

overview.

In the CPHMD method, Lee et al.114 applied A-dynamics116 to the protonation

coordinate and used the Generalized Born (GB) implicit solvent model. They chose a A

variable, which is bound between 0 and 1, to control protonation fraction. A = 0

represents an ionizable residue in its protonated state, while = 1 corresponds to the

deprotonated ionizable residue. Due to its continuous nature, = 0 and A = 1 are rarely

sampled. Thus an arbitrary value A, is adopted such that any A value smaller than Ap is

defined to be protonated, while any A is greater than 1 Ap is set to be deprotonated.

To ensure an unbounded reaction coordinate is practically used, a new coordinate 0 is

introduced and is propagated in a MD simulation. A is expressed as:

A = sin2 (0). (2-63)

An artificial potential barrier between the protonated and deprotonated states has

been introduced. The potential is a biasing potential to increase the residency time








close to protonation/deprotonation states and it is centered at half way point of titration

(A=1/2). The formula of this biasing potential used by Lee et al. is

Ubias = -4 ( )2 (2-64)

where p/ is an adjustable parameter controlling the height of the biasing potential. A

value of 1.25 kcal/mol is found enough to provide occupation time in the protonated and

deprotonated states.

The total potential of the system, which provides the forces for MD propagation,

has the form:

Utotal = Ubond + Uangle + Utorsion + Uelec () + Uvdw () + UGB () + Unonpolar +

ni= [-Umodel () + UH(Oi) + Uias, (0i)] (2-65)

Here, the first five terms are essentially defined by Eq. 1-9. UGB is the GB solvation

free energy which will be explained in the next chapter. Unonpolar is the energy related to

surface accessible areas. i in Eq. 2-65 represents an ionizable residue. Umodel is a

potential of the mean force (PMF) in the titration coordinate for a model compound. The

AGMM(AH A-) shown in Eq. 1-17 can be represented by Umodel (2 = 0) Umodel ( =

1). The Umodel in Eq. 2-65 is fit to a two-parameter parabolic function having the form

Umodel = -Ai(sin2() B)2. UH (i) = 2.303ksT sin2(O) (pKA pH), which is the

chemical potential of adding a fractional proton to the solution at pH. The term

-Umodei (0i) + UPH(Oi) is essentially the quantum mechanical dissociation free energy of

a fractional proton. The CPHMD method also assumes Eq. 1-18 is true.

Another feature of the CPHMD method is using an extended Hamiltonian. A kinetic

energy term of titration coordinate 0 is employed in CPHMD:









Ko = ~1M72 (2-66)

The fictitious mass Mi controls the speed of response of the protonation state

change to the force on it.

Baptista et al.118 proposed that MD simulations incorporating protonation state

change is essentially a semigrand canonical ensemble. The joint PDF can be written as:

exp [pn -pH (p,q,Ps,,qsn)
P(p,q,s,s,) exp [fn -fl (p,q,ps,qs)] (2-67)
Sf exp [fPn -pH (p,q,ps,qsn)]dpdqd psdqs

Here, p, q is the moment and coordinates of solute, respectively. Ps and qs is the

moment and coordinates of solvent, respectively. n is the vector containing protonation

state information of each ionizable residue. The details of ni is explained in the

continuum electrostatic model. n is essentially the number of protonated ionizable

residues. [ is the chemical potential of protons and fl = 1/kT. The Hamiltonian

contains quantum mechanical and classical force field terms. The quantum mechanical

part in their model is assumed not to depend on coordinates and moment. The

introduction of dummy atom to replace the proton in a deprotonated residue makes

kinetic energy only a function of moment.

Two conditional samplings have been considered by Baptista et al.: one is

conformational sampling under a fixed protonation state, the other one is protonation

state samping under a fixed structure. The PDF of conformations at fixed protonation

state is:

P(p,q, p, q Iexp[- (p,qpsqs) (2-68)
f exp [-fpHc(p,q,ps,qs,n)ldpdqd psdqs

where Hc is the classical Hamiltonian. Due to the fact that quantum mechanical

Hamiltonian depends only on protonation state, which is fixed in conformational









sampling, the quantum contribution is a constant and is canceled. The PDF of

protonation states at fixed coordinates is given in Eq. 2-69:

P ) exp [-2.303npH G(q,n)] (2-69)
P exp [-2.303npH -fl G(q,n)]

where AG is the free energy of a protonation state relative to the completely

deprotonated state. In their model, FDPB-based method is executed to calculate free

energy difference.

Combining the two conditional samplings, one is able to generate an ensemble

satisfying Eq.2-67. In order to prove the above statement, one must show the Markov

chain constructed by transition matrix and the two conditional probabilities satisfies the

following condition,

p = limn,,o pWn (2-70)

In the above equation, p is the joint PDF as defined in Eq. 2-67, p is a joint PDF

depend on the same variables as p, and W is transition matrix. Proving Eq. 2-70 holds

means that one must prove the Markov chain defined by p and W is ergodic.

In order to prove a Markov chain is ergodic, one needs to prove (a) the Markov

chain is irreducible; (b) the chain needs to be periodic; (c) the transition matrix

elements are time-independent; and (d) the limiting distribution should be stationary.

The detailed proof is given by Baptista et al. in their 2002 paper. Their proof justified the

discrete protonation state constant-pH method which samples conformational space at

fixed protoation state and samples protonation state at fixed structure.

In 2004, Mongan et al.127 proposed a constant-pH MD method and implemented in

the AMBER suite. This algorithm follows the scheme proposed by Baptista et al.118 but

employs the GB model in both MD and MC. Given a protein with N titratable sites, the









discrete protonation state model means protonation states of a protein are described by

a vector x=(xl, X2, ..., XN) where each xi is some integer representing the protonation

state of titratable residue i. In AMBER, five amino acids are designed to be titratable:

aspartate, glutamate, histidine, lysine and tyrosine. For each titratable residue, different

protonation states have different partial charges on the side chain. This model also

includes syn and anti forms of protons for the aspartate and glutamate side chains as

well as the 6 and E proton locations for histidine.

At each Monte Carlo step, a titratable site and a new protonation state for that site

are chosen randomly and the transition free energy at this fixed configuration is used to

evaluate the MC move.

Considering a titratable site A in a protein environment, its protonated form is

protA-H and deprotonated form is protA-. The equilibrium between the two forms is

governed by their free energy difference. This free energy difference is the ensemble

average of different configurations. However, the free energy difference cannot be

computed by a molecular mechanics (MM) model since the transition between two

forms deals with bond breaking/forming and solvation of a proton which involves

quantum mechanical effects.

The above problems can be solved by using a reference compound. The

reference compound has the same titratable side chain as protA-H but with known pKa

value (pK,,ref). Following Mongan et al., we assume the transition free energy can be

divided into the quantum mechanics (QM) part and the molecular mechanics (MM) part.

We further assume that the quantum mechanical energy components are the same

between the reference compound and the protA-H. Since the pKa of the reference









compound is known, its transition free energy from deprotonated form to protonated

form at a given pH is:

AGref = kT In 10 (pH pK,ref) (2-71)

So the QM component of the transition free energy can be expressed as:

AGref,M = AGref AGref,MM (2-72)

Here AGref,MM is the molecular mechanics contribution to the free energy of

protonation reaction for that reference compound. In practice, the QM component of the

transition free energy also contains errors from MM calculations so it's actually a non-

MM component. Since the approximation of the QM component of the transition free

energy is:

AGref,QM = AGprotein ,QM (2-73)

Then the transition free energy from protA- to protA-H can be calculated as:

AG = kBT In 10 (pH pKa,ref ) + AGMM AGref,MM (2-74)

Here, AGMM is the molecular mechanics contribution (electrostatic interactions in

nature) to the free energy of the protein titratable site. Hence, by using a reference

compound, the QM effects are not needed. Effectively, we compute ApKa relative to the

reference compound. Computing ApKa can also help canceling some error introduced

by GB solvation model through the use of AGef,MM. In AMBER, a reference compound

is a blocked dipeptide amino acid possessing titratable side chain (for example, acetyl-

Asp-methylamine). Five reference compounds were constructed corresponding to five

titratable residues. The values of AGef,MM for each reference compound are obtained

from thermodynamic integration calculations at 300 K and set as internal parameters in

AMBER. The AGMM is calculated by taking the difference between the potential energy









with the charges of the current protonation state and the potential energy with the

charges of the new protonation state. If the transition is accepted, MD steps are carried

out to sample conformational space in the new protonation state. If the MC attempt is

rejected, MD steps are also carried out with no change to the protonation state.

2.6 Advanced Sampling Methods

Conformational sampling in a MD or MC simulation is essential in the study of

complex systems such as polymers and proteins. One major concern is that the PES of

a complex system is very rugged and contains a lot of local energy minima. Thus,

kinetic trapping would occur as a result of the low rate of potential energy barrier

crossing, especially when the barrier is high. To overcome this kinetic trapping behavior,

generalized ensemble methods can be employed in molecular simulations. As its name

implies, a generalized ensemble method differ from the canonical ensemble method in

the weight factor of a state. The weight factor in the canonical ensemble is Boltzmann

weight. However, a non-Boltzmann weight factor can be used in a generalized

ensemble method (This does not mean that Boltzmann factor is prohibited in a

generalized ensemble method. In fact, parallel tempering which belong to the family of

generalized ensemble method, does adopt Boltzmann factor.). By choosing a non-

Boltzmann weight factor, the system is able to perform a random walk in the potential

energy space. Thus, potential energy barriers will be overcome easily and more

conformations will be visited. Frequently utilized generalized ensemble algorithms

include the multicanonical (MUCA) method and replica exchange molecular dynamics

(REMD) method. In this section, the MUCA and parallel tempering will be introduced

briefly. Due to the importance of REMD method to this dissertation, the details of REMD

method will be explained in the next section.









2.6.1 The Multicanonical Algorithm (MUCA)

In canonical ensemble, the probability of visiting a state in the energy space is:

Pcanonical (T, E) o n(E)e-E/k T (2-75)

Here, n(E) is the density of states (DOS), which means the number of states

between E and E + dE. e-E/kBT is the Boltzmann factor. As potential energy increases,

the Boltzmann factor decreases but the DOS increases rapidly. A bell-shaped

probability distribution function (PDF) of E can be observed. However, in the MUCA

method,54'55'137 the PDF is designed to be flat (a constant), although it still can be written

in the form of Eq. 2-76:

PMUCA (E) oc n(E)WMUCA (E) = (2-76)

where WMUCA (E) is the multicanonical weight factor and n(E) is DOS. The

multicanonical weight factor needs to be inversely proportional to the DOS in order to

generate a flat PDF. However, the DOS of a system is in general unknown, which

makes the multicanonical weight a-priori unknown. Generating correct distribution of

n(E) is the central task of a MUCA simulation. In practice, short simulations are

performed in order to determine the DOS in an iterative manner. Details of determining

the DOS can be found in the paper of Okamoto and Hansmann published in 1995.162

After the DOS is resolved, the canonical ensemble PDF will be contained. Thus, the

average of any quantity can be determined by Eq. 1-11 or Eq. 1-12, depending on either

MD or MC simulation is performed.

Another way to explore the DOS is by using the Wang-Landau algorithm.163'164 In

the Wang-Landau algorithm, the DOS is recorded by a histogram g(U) and initially set

to unity for all its elements. Another histogram which is called visit histogram is also









constructed with initial values set to zero. The visit histogram represents the number of

visits to each energy level. Monte Carlo moves are made. Instead of being evaluated by

the Metropolis criterion, they are evaluated by the DOS,

w(i j) = min 1, 9(Ui (2-77)

where w(i j) is the transition probability from state i to state j. Each time an energy

level is visited, the corresponding element of the DOS histogram is updated by

multiplying the current value with a modification coefficient that is greater than 1. The

initial value of the modification coefficient is fo = e z 2.71828. Every time a MC move is

performed, the corresponding element of the visit histogram is also updated. The MC

moves will continue until the visit histogram is flat. At this stage, the DOS are

converged. In order to achieve a finer convergence, a second round of the above

process will be performed. This time, the modification coefficient fi in the second round

is given by fi = o. The visit histogram is then reset to zero. This process will iterate

until a modification coefficient that is approximately 1 is achieved (in the paper of Wang

and Landau, the final value of the modification coefficient is 1.00000001). By utilizing

Wang-Landau algorithm, the DOS will be obtained and a random walk in the potential

energy space will be achieved.

2.6.2 Parallel Tempering

In 1986, Swendsen and Wang firstly performed parallel tempering (replica

exchange MC) simulations to investigate spin glass.59 Multiple non-interacting copies

(replicas) of the system are simulated at different temperatures. At each temperature,

MC simulation is conducted to sample the conformational space. Structures or

temperatures of the two replicas are attempted to be exchanged periodically. The









detailed balance condition is applied and the weight factor of the state is the Boltzmann

weight factor. The Metropolis criterion has been utilized to accept or reject the move.

Hansmann et al.58 first utilized the parallel tempering algorithm in the study of a

biomolecule (7-residue Ket-enkephalin). Other applications of the parallel tempering

algorithm include X-ray structures determination performed by Falcioni and Deem.165

A MC simulation at a high temperature accepts the transition attempts more often

than doing that at a low temperature. Thus, the simulation at high temperatures tends to

visit more conformations in conformational space. Exchanging structures with replicas

at lower temperatures can help them avoid getting trapped in the conformational space.

The acceptance ratio, which is the averaged fraction of successful exchange

attempts, is an important issue in the parallel tempering method. It is correlated with

temperature distribution of replicas. According to Kofke,166 the acceptance ratio is the

area of overlap between the potential energy PDF at two temperatures. Given the

number of replicas, if the temperatures of the two replicas are too different, the overlap

between the two potential energy PDFs will be small. Therefore, accepting an exchange

attempt is unlikely, which makes parallel tempering simulation inefficient. However, if

the temperatures of the two adjacent replicas are too close, the overlap between two

PDFs will be large, and hence the acceptance ratio will be large. But the conformational

space sampled by two adjacent replicas will be too close. More replicas than actually

needed are utilized to achieve the same goal, and hence computer resource is wasted.

2.7 Replica Exchange Molecular Dynamics (REMD) Methods

Due to the correlation between conformation and protonation sampling, correct

sampling of protonation states requires accurate sampling of protein conformations.

Hence, generalized ensemble methods such as multicanonical algorithm and REMD









should be used to avoid kinetic trapping which comes from low rates of barrier crossing

in constant temperature MD simulations. REMD has been applied to the continuous

protonation state constant-pH method (REX-CPHMD) by Khandogin et al.110-113 They

have performed REX-CPHMD simulations to predict pKa values110 and to explore pH-

dependent protein dynamics.111-113

The REMD, which is the MD version of parallel tempering, have been developed

by Sugita and Okamoto in 1999.62 The theory of REMD is essentially the same as

parallel tempering. In their method, temperatures are attempted to be exchanged. This

leads to the unique part of REMD: the treatment of velocities after accepting an

exchange attempt, because the velocities must reflect the temperature correctly. Sugita

and Okamoto proposed to rescale the velocities in order to recover the desired

temperature when temperatures are swapped. Similar to other generalized-ensemble

methods, REMD algorithm wants to make the system perform a random walk in either

temperature or potential energy space, and hence avoid kinetic trapping. The

advantage of REMD over other generalized-ensemble method is that the weight factor

is Boltzmann weight which is a-priori known. This advantage makes REMD very

frequently employed in the MD simulations of complex systems. The REMD algorithm

has been applied to studies of peptides, proteins, protein-membrane system in order to

describe free energy landscape, amyloid formation, structure prediction and binding.

Many extended versions such as solute-tempering REMD167 and structure-reservoir

REMD168-170 have been proposed to improve the performance of REMD algorithm. The

REMD variants will be briefly explained later in this section.








2.7.1 Temperature REMD (T-REMD)

A thorough description of the T-REMD algorithm can be found in the original paper

of Sugita and Okamoto.62 In T-REMD, N non-interacting copies (replicas) of a system

are simulated at N different temperatures (one each). Regular MD is performed and

periodically an exchange of configurations between two (usually adjacent) temperatures

is attempted. Suppose replica i at temperature Tm and replica at temperature Tn are

attempting to exchange; the following satisfies the detailed balance condition:

Pm(i)Pnj)w(i G j) = Pn(i)Pm(j)w(j i) (2-78)

Here w(i j) is the transition probability between two states i and j and Pm(i) is

the population of state i at temperature m (in REMD assumed Boltzmann weighted).

Since,

Pm(i) oc e-H(pi,qi)/kBTm (2-79)

where H is the Hamiltonian of the state, q represents the molecular structure, and p

stands for momentum. The Hamiltonian consists of kinetic energy (K) and potential

energy (U) terms and can be written as:

H(p,q) = K(p) + U(q) (2-80)

In the original derivation of exchange probability, Sugita and Okamoto mentioned

that exchanging two replicas (states) is equivalent to exchanging temperatures. The

moment of each replica after exchange attempt need to be rescaled:

Pn(i) = Tnm pm(i) (2-81)

Pm 0) = TmTnPn () (2-82)

After inserting Eq. 2-79 and Eq. 2-80 into Eq. 2-78, the detailed balance equation

becomes:









exp[-[K(pm(i)) + U(qi)]/kBTm} exp[-[K(pn(j)) + U(qj)]/kBTn} w(i j) =

exp{-[K(pn(i)) + U(qi)]/kBTn} exp{-[K(pm(j)) + U(qj)]/kBTm w(j i) (2-83)

According to Eq. 2-81 and Eq. 2-82,

K(pm(i)) =(Tm/Tn)K(p,(i)) (2-84)

K(p (j))= (Tn/Tm)K(pm(j)) (2-85)

Therefore, kinetic energy contributions on both sides of Eq. 2-83 will be canceled

out, leaving only potential energy terms contribute to exchange probability.

w(i-j) exp [-U(qi)/kBTn]-exp [-U(qj)/kBTm]
(2-86)
w(->i) exp [-U(qi)/kBTm]-exp [-U(qj)/kB Tn

Further manipulation of Eq. 2-86 yields:

w(i->j) exp [(B ) (U(qi)- U(qj))]
wU(i->i) Ik kB Tn

If the Metropolis criterion is applied, the exchange probability is obtained as:

w(i -j) = min{1, exp ( ) ((q) U(qj))l (2-87)

If the exchange attempt between two replicas is accepted, the temperatures of the

two replicas will be swapped and velocities rescaled to the new temperatures by

multiplying all the old velocities by the square root of the new temperature to old

temperature ratio:


Vnew = Vold (2-88)
Sl Told

Here, Vnew and Vold are the new and old velocities, respectively. Tnew and Told are

the temperatures after and before an exchange is accepted, respectively.

The acceptance ratio is the average value of the exchange probabilities between

two temperatures:


100









Pc = (min 1, exp [( ) (U(qi) U(q;))) (2-89)

For a given system, the potential energy function is independent of temperature

but the potential energy PDF in a canonical ensemble depends on temperature. The

potential energy PDF can be considered as a Gaussian function (to the second order

truncation of the Taylor expansion of the PDF at the potential energy value

corresponding to maximum probability). The Gaussian is centered at mean potential

energy of the system with a variance -2 = kBT2CV, where C, is the heat capacity. At this

stage, the Gaussian function expression of the potential energy PDF is not adopted. It

will be employed later in this section. The potential energy PDF at temperature Tm is

currently written as:
1
Pm(U) = -n(U)exp(-U/kBT) (2-90)
Qm

where n(U) is the DOS and the exponential term is the Boltzmann weight factor as a

function of potential energy. Recall that in the probability theory, the average quantity

can be expressed as:

(A) = fP(A)- A -dA (2-91)

Extend Eq. 2-91 to the bivariate case and notice that the two PDFs are

independent. The acceptance ratio can be rewritten as,

Pac = ffC Pm(U) P(U') min 1,)exp ([(U ) (U- U')l dUdU' (2-92)

Let a function g(U, U') to denote min 1, exp [(-m (U U')}, P =

1/kBTm and ln = l/kBTn, then,

g(U, U') = min{1, exp[(fm fln)(U U')]} (2-93)


101








Without loss of generality, we can assume that Pm > /,, which means T, < T,.

Therefore, another way of writing Eq. 2-93 is g(U, U') = 1 when U > U' and g(U, U') =

exp[(P, ,S)(U U')] when U < U'. Inserting g(U, U') into Eq. 2-92 will lead to:

Pacc = foo Pm (U)dU f00 1 P,(U')dU' + fcm Pm(U)dU fS exp[(fn fSn)(U -

U')] P (U')dU' (2-94)

For simplicity, we denote fm Pm(U)dU 0f exp[(fP fl,)(U U')] P(U')dU' as

h(U, U'). Inserting Eq. 2-90 into h(U, U'),

h(U, U') = f0c-n(U)e-mUdU f0 e mU e-fmu' e-fnU epnU' -n(U')e-nu'dU'

(2-95)

Since U and U' are independent, Eq. 2-95 can be rewritten as:

h(U, U') = fc n(U)e-PmU enmU e-PnUdU fm ~1n(U')e-PnU' e-fmu' ePnU'dU'

(2-96)

Simplifying Eq. 2-96 will formulate h(U, U') as:
h(U, U') = f~ n(U) e-PnUdU fu n(U')e-m'dU' (2-97)
00-oo QM Qn

Recall that a partition function is just a normalizing constant. Qm and Qn in Eq. 2-

97 can switch their positions in the integrand. Thus, Eq. 2-97 becomes:

h(U,U') = fo Pn (U)dU fU Pm (U')dU' (2-98)

Inserting Eq. 2-98 into Eq. 2-94,

Pacc = fo Pm (U)dU fU Pn (U')dU' + f0o Pn(U)dU fu P,(U')dU' (2-99)

Each term on the right-hand side of Eq. 2-99 can be interpreted as an overlap

between two PDFs. The sum is the entire overlap between two PDFs. Therefore, the


102








average exchange probability is just the overlap between potential energy PDFs at two

temperatures.

Next, let us consider the temperature distribution in the simplest case, in which the

heat capacity is a constant. As mentioned earlier, a potential energy PDF of a canonical

ensemble can be written as a Gaussian function,

P(U)= P(U)exp 2k(T2) (2-100)

where U is the average potential energy, P(U) is the probability density of finding U at

temperature T, and C, is the heat capacity. Since the PDF should be normalized, it is

easy to find the relationship between P(U) and the standard deviation of the Gaussian

function:

P(U) = 1/ 2nkBT2C (2-101)

For simplicity in the derivation of the acceptance ratio, the Gaussian PDF at

temperature T, will be written as Eq. 2-102 from now on:

P(U) 1 exp -(u-U)2 (2-102)

Recall that one assumption to distribute temperatures is to maintain a random

walk in temperature space. Hence, a constant acceptance ratio should be achieved for

any two adjacent temperatures. As shown previously, the acceptance ratio is the

overlap between two potential energy PDFs. Consider two potential energy PDFs at

temperatures Tm < Tn. The PDF at Tm will be to the left of the PDF at Tn. After finding

the potential energy Uintersect where the two Gaussian PDFs intersect, the overlap

between two PDFs can be computed by integrating the left Gaussian PDF from


103









Uintersect to infinity and the Gaussian on the right from minus infinity to Uintersect and

adding them up,

pc = UU 2- exp m-ml2 dU + uiect 1 exp -(u U)2 dU(2-103)
acc Uintersect A2 mn2 2 I 2 2 0 f-- 2oFn2

Complementary error functions will be utilized and Eq. 2-103 will become,

Pacc = erf (Uinters ct + erf c (Un-Uintersect) (2-104)

According to Rathore et al.,171 the acceptance ratio can be approximate to:

Pacc -erf c U (2-105)

where a = (am + an)/2.

For a geometric distribution of temperatures where Tn = cTm, am + n =

kBCvTTm(c + 1). The average potential energy difference can be computed as,

(Un Ur) CAT = C,(Tn Tm) = C,(c 1)Tm (2-106)

Thus, if the heat capacity does not change with temperature, the temperature term

in the numerator and denominator in Eq. 2-105 will be canceled, which means the

acceptance ratio will be a constant. Furthermore, Eq. 2-105 also signals the number of

replicas needed to cover a temperature range as a function of system size. In order to

have a non-zero Pacc, (Un Um)/a 1. This leads to, CAT/( k(CTm(c + 1)) 1.

Further simplifications lead to:

ATm Tm (2-107)

Since the heat capacity is O(N), where N is the number of particles, the number of

replicas to cover a temperature range is O(N1/2).


104









2.7.2 Hamiltonian REMD (H-REMD)

Instead of preparing replicas with different temperatures, another way to overcome

potential energy barriers is simply changing the PES to reduce potential energy

barriers.61 And this is the basic idea of H-REMD. In H-REMD algorithm, replicas differ in

their Hamiltonians but have the same temperature. Likewise, regular MD is performed

and an exchange of configurations between two neighboring replicas is attempted

periodically. Let us consider replica with Hamiltonian H, and replica j with Hamiltonian

Hm are attempting to exchange. By employing the detailed balance equation (Eq. 2-78)

and Boltzmann weight of a molecular structure, the transition probability can be written

as:

w(i j) = min{1, exp[-p(Hn (i) + Hm(j) Hm(i) Hn(j))] (2-108)

2.7.3 Technical Details in REMD Simulations

Temperature distributions have been explored in order to optimize the

performance of REMD method. For systems having constant heat capacity, a

geometrical distribution of temperatures has been adopted. Sugita and Okamoto,62 and

Kofke166 believed that the most efficient way to exploit REMD algorithm is letting each

replica spend the same amount of simulation time at each temperature (a random walk

in temperature space). In practice, this is achieved by producing the same acceptance

ratio for each replica, given that each replica only attempts to exchange with its

neighbors in temperature space. Under the condition that the system has a constant

heat capacity, a geometrical distribution of temperatures (7Tl/T = c) is achieved.

Sanbonmatsu and Garcia suggested an iterative method to distribute temperatures for

replicas in 2002.172 They have chosen the averaged values of potential energy as a

function of temperature to maintain a random walk in the temperature space. In 2005,


105









Rathore et al.171 suggested that an acceptance ratio of 0.2 yields the best performance,

based on constant heat capacity assumption. They have chosen Go-type model of

protein A and the Lennard-Jones liquid to study the deviation of heat capacity relative to

the final value as a function of acceptance ratio. A minimum of deviation at acceptance

ratio around 0.2 has been observed. Kone and Kofke173 have performed similar study

for the parallel tempering simulations. They also considered a random-walk model in

temperature space through replica exchange moves. The acceptance ratio is given by:

Pacc = erfc(1 C1/2) (2-109)

where B = Pi/flo, Pi is the Boltzmann weight factor, and C is the heat capacity which is

assumed to be constant in their study. Without loss of generality, flo is greater than Pf,.

The mean-square displacement of this random-walk (Eq. 2-110) has been maximized

with respect to acceptance ratio. The maximum is shown near an acceptance ratio of

20%.

02 Oc (1nB)2Pacc(B) (2-110)

where U2 is the mean-square displacement, B and Pac are shown in Eq. 2-109.

Temperature distributions in parallel tempering simulation of villin headpiece

subdomain HP-36 have been investigated by Trebst et al.174 HP-36 will undergo helix-

coil transition at high temperatures and hence, the heat capacity will not be held

constant. The diffusion of a replica in temperature space has been introduced to judge

the performance. In their method, a replica is labeled "up" when its previous visit of the

extreme temperature is the highest temperature; it is labeled "down" when its previous

visit of the extreme temperature is the lowest. For each temperature Ti, two histograms

nup (Ti) and down (Ti) are recorded. The two histograms keep the record of the number


106









of visits from replicas with label "up" and "down", respectively. The average fraction of

replicas traveling from the lowest to highest temperature can be calculated as:

f(T) = nup (Ti) (2-111)
nup (Ti)+ndown (Ti)

The diffusivity D(T) is adopted and has the form:

aT
D(T) oc T (2-112)
df/dT

They have pointed out that the diffusivity is temperature dependent, a minimum of

diffusivity has been observed around the temperature where heat capacity is at

maximum. The plot showing diffusivity vs temperature indicates that random walk is

suppressed the most when phase transition occurs. The numbers of round-trip between

temperature extremes of each replica has been maximized to generate an optimal

temperature distribution. More recently, Nadler and Hansmann175-177 suggested that the

optimal number of replicas between the lowest and highest temperatures in explicit-

solvent simulation has the following formula: Noptima = 1 + 0.594V ln(Tmax /Tmin),

where the C is the heat capacity, and Tmax and Tmin is the highest and lowest

temperature, respectively. They also proposed that the optimal temperature distribution

i-1
can be formulated as: Toptimal (i) = Tmin Tn ) -1
Gmin

In addition to replica temperature distribution, exchange attempt frequency (EAF)

is also an important issue in parallel tempering and REMD sampling efficiency. In 2001,

Opps and Schofield178 investigated the effect of EAF for parallel tempering. Two-

dimensional spin system and a polypeptide in vacuum have been selected to test the

effect of EAF on the properties such as order parameter and radius of gyration of the

polypeptide. They suggested that the most efficient scheme is to attempt after a few MC


107









steps. The situation is more complicated in the case of REMD. In general, thermostats

are used in MD propagations to maintain a canonical ensemble is satisfied. It is argued

that exchanges in REMD should happen when system temperature stabilizes.179

Attempting to exchange frequently may prevent the system from heat dissipation. This

argument was supported by studies of a peptide Fs21 performed by Zhang et al.179

They have suggested that 1 ps of exchange attempt interval is desirable for REMD.

However, Sindhikara et al.180 have later shown that small exchange attempt interval

(even as small as a few MD steps) does not affect heat dissipation, given that REMD

exchange is done properly. Conformational sampling deviation relative to long

simulation time reference calculation as a function of EAF has been investigated. They

have pointed out that large EAF (small exchange attempt time interval) is preferred.

Abraham and Gready181 studied the effect of EAF based on a 23-residue peptide in

explicit water. By examining the potential energy autocorrelation time, they argued that

an exchange period below 1 ps is too short for replica exchange attempts to be

independent, and hence reduce the tempering efficiency. However, their conclusion was

not supported by an investigation of tempering efficiency performed by Zhang and

Ma.182 Zhang and Ma utilized the transition matrix and its correlation functions. The

autocorrelation function of transition probability can be written as a function of

eigenvalues of transition matrix. The decay time has been explored in order to

understand the tempering efficiency. Zhang and Ma found that tempering efficiency

increases monotonically as EAF increases.

Thermostat effects on the performance of REMD have also been explored. Earlier

work has been done by the Garcia group.172 They have studied if the potential energy


108









PDFs satisfy the Boltzmann distribution: ln[P(U, TI)/P(U, T2)] = k +c,
Skg BT2 kg _Ti/

where P(U, T) is the potential energy PDF at temperature T and c is a constant. They

have found that Nose-Hoover and the Anderson thermostats satisfy the above

condition, while the Berendsen thermostat does not. Rosta et al.183 investigated the

thermostat artifact in the REMD simulations in 2009. The current REMD exchange

scheme assumes Boltzmann distribution (canonical ensemble) in the calculation of

exchange probability. However, the Berendsen thermostat cannot preserve the

Boltzmann distribution. Thus REMD simulations of bulk water and protein folding are

performed and the temperature is controlled by Berendsen thermostat and Langevin

dynamics. They have studied the potential energy PDFs and thermal unfolding under

the two thermostats. The Berendsen thermostat has been shown to produce a shift

average potential energy and prolonged tails for potential energy PDF for bulk water,

while no such effect has been seen when Langevin dynamics is employed. An

increased probability of folding at low temperatures has been reported by Berendsen

thermostat, whereas the probability of folding is decreased at high temperatures. The

authors proposed that REMD simulations performed with thermostats that can generate

a Boltzmann distribution, such as Langevin dynamics, Andersen and Nose-Hoover

thermostats.

In a REMD simulation, the number of replicas needed to cover a temperature

range scales as 0(f1/2), where f is the degree of freedom of the system. Given a large

system, the number of replicas needed is large. For example, 64 replicas have been

used in a REMD study of p-hairpin surrounded by explicit water molecules (4342 atoms

in each replica) to cover the temperature range from 270 K to 695 K.184 A number of


109









methods have been developed to reduce the number of replicas needed in REMD

simulations. In 2002, Fukunishi et al.61 proposed Hamiltonian-REMD (H-REMD). In the

H-REMD scheme, replicas differ in their Hamiltonians but have the same temperature.

The exchange strategies in the paper of Fukunishi were to scale hydrophobic

interactions and to scale van der Waals interactions. In 2005, Liu et al.167 published a

method with the name replica exchange with solute tempering. In the replica exchange

with solute tempering algorithm, the protein-water interactions and water-water

interactions are scaled such that the exchange probability does not depend on the

number of explicit water molecules. The number of replicas in replica exchange with

solute tempering simulation to cover the same temperature range is significantly

reduced when comparing with original REMD algorithm. Lyman et al.,185 and Liu and

Voth later,186'187 have developed resolution exchange schemes to improve the

performance of REMD. Coarse-grained models (low resolution) are employed to replace

the role of high-temperature replicas. The Simmerling group has contributed the hybrid

explicit/implicit solvation model188 in order to reduce the number of replicas needed in

REMD simulations with explicit water molecules. Each replica is propagated in an

explicit water box. At an exchange attempt, the solute and its solvation shell, which is

calculated on-the-fly, are placed in dielectric continuum. Exchange probabilities are

calculated based on the potential energies of the solute and the hybrid solvent. The

usage of a hybrid solvent can shrink the number of replicas from 40 to 8, in a test case

of polypeptide Alalo simulated at temperatures from 267 K to 571 K. Structural reservoir

techniques168-170 have also been incorporated into REMD algorithm. High temperature

MD simulations are performed first to generate a structural reservoir. Structures in the


110









reservoir will be brought to replicas via exchanges. One advantage of using structural

reservoir is that non-Boltzmann weight factors can be chosen in the calculation of

exchange probabilities.170 Recently, Ballard and Jarzynski189 proposed to use non-

equilibrium work simulations to accept exchange attempts. Kamberaj and van der

Vaartl90 developed a new scheme to perform exchanges, in which the generalized

canonical PDF have been employed to achieve a flat potential of the mean force in

temperature space. The Wang-Landau algorithm163'164 has been adopted in order to

estimate the DOS in temperature space and the round-up time between extreme

temperatures has been minimized. More recently, solvent viscosity has been selected

as a parameter in addition to temperature for REMD method.191 This method is named

V-REMD and it is essentially a two-dimensional REMD method. The motivation of

choosing viscosity as a parameter is that the lower the viscosity, the faster a protein will

diffuse, and sample the conformational space. In this algorithm, one replica is selected

to have normal viscosity, others use reduced viscosities. The mass of solvent molecules

is scaled by a factor of 22 when the viscosity is scaled by a factor of A. Changing the

mass of solvent molecules does not affect the potential energy at an exchange attempt.

Thus, the exchange probability of the V-REMD is the same as conventional T-REMD.

The author applied V-REMD to the study of trialanine, deca-alanine, and a 16-residue 3-

hairpin peptide. By using the V-REMD, replica numbers are reduced by a factor of 1.5 to

2.

The replica exchange method (REM) can be coupled with other generalized-

ensemble methods in order to enhance conformational sampling. The Okamoto group

have coupled REM with MUCA and simulated tempering. The two new schemes are


111









called multicanonical replica exchange method,192 and replica exchange simulated

tempering,193,194 respectively. The details of coupled REM and generalized-ensemble

methods can be found in a review by Mitsutake et al.53

Due to its stochastic nature, the REMD algorithm has been employed to

investigate thermodynamics rather than kinetics.195 However, a properly designed

scheme of analyzing the REMD trajectory in phase space can yield information about

kinetics. In 2005, Levy and his coworkers195 designed a kinetic network and used

master equation to solve for the transition rate from REMD simulations. The structures

at all temperatures are grouped into states based on their structural similarity (they

selected a 42 dimensional Euclidean distance space based on Ca-Ca distances, instead

of clustering, to group their structures). A state is denoted as a node and an edge

stands for a transition between two nodes. A total of 800,000 nodes and 7.347x 109

edges were obtained. The master equation has been utilized to describe the transitions

between two states. Since they discretized the conformational space into states, the

master equation is written in a matrix notation, d = KP(t), where K is the transition
dt

matrix and P(t) is probability distribution of states at time t. Instead of solving for

eigenvalues of the transition matrix or solving the differential equation numerically, the

authors actually simulated the path satisfying the master equation. Likewise, this

Markov state model has been employed in the study of protein folding too.

In 2006, van der Spoel and Seibertl96 studied protein folding rate based on

Arrhenius equation. The folding mechanism in their investigation has been assumed to

be two-state. A binary folding indicator, which is the RMSD relative to the native state,

has been adopted by the authors. Hence, the first-order reaction rate equation has been


112









set up. Then, the rate equation was integrated and averaged over all trajectories in

order to generate an derived fraction of folded structures. A fitting parameter x2, which

is equal to the difference between derived and actual fraction of folded structures, was

minimized numerically with respective to energy barriers and pre-exponential factors. In

this manner, the Arrhenius reaction rate will be resolved from REMD simulations.

Yang et al.197 proposed to use diffusion equation to extract kinetics from REMD

simulation in 2007. The Fokker-Planck equation has been employed to extract local drift

velocity and diffusion coefficient from REMD simulations. Langevin dynamics on the

reaction coordinate is performed using drift velocity and diffusion coefficient. The free

energy landscape will be reconstructed based on drift velocity and diffusion coefficient.

In 2008, Buchete and Hummer198 demonstrated that both local conformational

transition rate as well as globally folding rates can be accurately extracted from REMD

simulations, without any assumption in temperature dependence of the kinetics

(Arrhenius and non-Arrhenius). Similar to Levy and coworkers, Buchete and Hummer

have also adopted the master equation operating on discretized space to describe

transitions. Conditional probability of state j at time t, given the initial state i, was

computed by the master equation. The likelihood of seeing Nji number of transitions in a

time interval has been maximized with respective to the natural log of transition rate

constant (transition matrix elements) and the natural log of equilibrium population of

state i. Thus, the rate constants will be generated. A detailed description can be found

in the paper of Buchete and Hummer.


113









CHAPTER 3
CONSTANT-pH REMD: METHOD AND IMPLEMENTATION

3.1 Introduction

In this chapter, the constant-pH REMD algorithm used in the AMBER simulation

suite is presented and is employed to study model systems. We first tested our method

based on five dipeptides and a model peptide having the sequence Ala-Asp-Phe-Asp-

Ala (ADFDA). The two ends of model peptide ADFDA were not capped so the two

ionizable side chains would have different electrostatic environment. The pKa values of

the two Asp residues are expected to be different due to the difference in electrostatic

environment.

Then our constant-pH REMD method is applied to a heptapeptide derived from

OMTKY3, the same heptapeptide as Dlugosz and Antosiewicz studied in their paper.

NMR experiments indicated the pKa of Asp is 3.6,122 0.4 pKa unit lower than the value of

blocked Asp dipeptide. Dlugosz and Antosiewicz performed constant-pH MD

simulations and their method predicted the pKa to be 4.24.122 Our purpose is to show

that the REMD algorithm coupled with a discrete protonation state description can

greatly improve pH-dependent protein conformation and protonation state sampling.

3.2 Theory and Methods

3.2.1 Constant-pH REMD Algorithm in AMBER Simulation Suite

In the case of constant pH molecular dynamics, the potential energy of the system

depends not only on the protein structure but also on the protein protonation state.

* Reproduced in part with permission from Meng, Y.; Roitberg, A.E. Constant pH
Replica Exchange Molecular Dynamics in Biomolecules Using a Discrete Protonation
Model, J. Chem. Theory. Comput. 2010, 6, 1401-1412. Copyright 2010 American
Chemical Society.


114








Likewise, when coupling REMD algorithm with constant-pH MD, one can either attempt

to exchange molecular structures only or swap both structures and protonation states at

the same time. For simplicity, let us consider two replicas where replica 0 has

temperature To, protein structure qo and protonation state no, while replica 1 has

temperature T1, structure q- and protonation state n-. A diagrammatic description of the

two exchange algorithms is shown in Figure 3-1.

(A)
q1, nq i q l qo, n




x X
q no qno q% no t


Figure 3-1. Methods to perform exchange attempts. A) Only molecular structures are
attempted to exchange. The protonation states are kept the same. B) Both
molecular structures and protonation states are attempted to exchange.
The first way of performing an exchange attempt is that replica 0 tries to jump from
state (qo, no) to state (ql, no) at temperature To in one Monte Carlo step. Similarly,

replica 1 attempts to transit from state (ql, n-) to state (qo, n1) at temperature T1.

Protonation states are kept at exchange attempts and only change during dynamics.

Therefore, the detailed balance equation now becomes:

w(P30qonofi, qn, q, nof, qon,) exp(-8,0E(qo,no))-exp(-AE(q,, n))
w(A0qno, Aqonl -> 0qono, Aq1n1) exp(-/0E(q,,no))-exp(-AE(qo,nl)) (3-1)

Here w(floqono, fllqln -- floq1no,1lqo0nl) is the transition probability of swapping

structures. If Metropolis criterion is used, this exchange probability can be written as:

w(f-o ono, iqlni floq1no, lqonl) = min{l, exp(-A)} (3-2)


115









In Eq. 3-2, A has the form:

A = lo [E(qo, no) E(ql, no)] fl [E(qo, n) E(ql, ni)] (3-3)

Here fo = 1/kBTo, fl1 = 1/k T1 and E is the potential energy. If the protonation

states of two adjacent replicas at an exchange attempt are the same, the exchange

probability of our constant pH REMD will be equivalent to the conventional REMD

exchange probability. However, if it is not the case, four potential energy terms are

needed to calculate exchange probability. Under this circumstance, the constant-pH

REMD becomes a REMD algorithm that combines both temperature and Hamiltonian

REMD algorithms.

One possible concern of exchanging only structures would be the role of kinetic

energy, especially when no and nl are different. In the REMD algorithm developed by

Sugita and Okamoto, the kinetic energy terms in the Boltzmann factors cancel each

other on average through velocity rescaling (Eq. 2-88). Only potential energies are

required to compute exchange probabilities. There is a problem in canceling kinetic

energy terms when the numbers of particles of two systems attempting to exchange are

not the same. However, according to the constant-pH MD algorithm proposed by

Mongan et al., a proton does not leave the molecule but becomes a dummy atom when

an ionizable side chain is in deprotonated state. Furthermore, that dummy atom retains

its position and velocity which are controlled by molecular dynamics. Hence, the kinetic

energy contributions to the Boltzmann weight will be cancelled out during exchange

probability calculation, leaving only potential energy useful for the calculation.

The second possibility consists of exchanging protonation states as well as

molecular structures at REMD Monte Carlo moves. For instance, replica 0 attempts to


116









move from state (qo, no) to state (ql, n1) at temperatures To in one MC move and replica

1 attempts to jump from state (ql, n1) to state (qo, no) at temperature T-. The detailed

balance equation now can be written as:

w(fqn,,/1q1n1 ->pi4q1,n1,/~1qno, ) ,n ) w(qn qnw(qn ->~i0)qln1) (3-4)
w(/o0q1n1,, /1qon ->/oq0ono,Aflq,1n1) w(/A1qon ->fA1qn,1) w(/oqln -> P0qono)

This equation states that the exchange probability is the product of MC transition

probabilities at temperature To and T-. If the protonation states of two adjacent replicas

are the same at an exchange attempt, the exchange probability of constant-pH REMD

becomes the exchange probability of conventional temperature-based REMD. If no and

n, are different, then each MC transition is essentially the protonation state change step

in constant-pH MD plus a structural transition. For example, consider the MC transition

at temperature To,

w(foqono floqlnl) = minl{, exp(-Ai)} (3-5)

In Eq. 3-5, A1 has the form:

A, = lo [E(ql, no) E(qo, no)] + (pH pKa,ref) + lo [Eelec (ql, nl) Eelec (ql, no)] -

Po AGref,MM (3-6)

The first term in A6 derives from the transition in configuration at fixed protonation

state no, and the rest corresponds to protonation state change at fixed structure q-. Eeiec

represents the electrostatic component of potential energy. Similarly, the transition

probability of MC jump at T- can be expressed as:

w(Plqlnl fllqono) = min{l, exp(-A2)} (3-7)

And


117









A2 = 1 [E(qo, nl) E(ql, ni)] (pH pKa,ref ) 1 [Eelec (qo, nl) Eelec (qo, no)] +

Pi AGref,MM (3-8)

Therefore, similar to Eq. 3-2, the exchange probability can be written as:

w(floqonoi, f qnl floqinl, fllqono) = min{l, exp(-A')} (3-9)

And

A' = A + pf [Eezec (ql, nl) Eezec (ql, no)] fl [Ee ec (qo, nl) Eeiec (qo, no)] +

(fo Pl) AGref,MM (3-10)

In Eq. 3-10, A is the same quantity as in Eq. 3-3.

The exchange probability calculation in the second method of coupling REMD and

constant-pH MD utilizes the same energy terms required by the first method since

obtaining electrostatic potential energies does not require extra energy calculations. The

advantage of implementing the second exchanging protocol over the first one should

not be significant because it is the conformational sampling at higher temperature that

greatly improves conformational sampling at lower temperatures. Allowing protonation

states to change at exchange attempts does not provide extra gains in conformational

sampling. In addition, one can always choose to sample protonation state space during

the MD propagation. Therefore, only the first method of performing exchanges was

implemented.

3.2.2 Simulation Details

Constant pH REMD simulations were carried out first on five reference

compounds: blocked Aspartate, Glutamate, Histidine, Lysine and Tyrosine to test our

method and implementation. The experimental pKa values of those reference

compounds are known and listed in Table 3-1. We later performed constant pH REMD


118









simulations on a model peptide ADFDA (Ala-Asp-Phe-Asp-Ala, unblocked termini) and

the heptapeptide derived from OMTKY3 (residues 26 to 32 with blocked termini). Four

replicas were used in the reference compounds and ADFDA REMD simulations. The

temperatures were 240, 300, 370 and 460 K for all six molecules. The pH range for the

study of acidic side chains was sampled from 2.5 to 6 and the pH range of histidine-6 is

from 5.5 to 8. The basic side chains were titrated from pH 9 to 12. An interval of 0.5 was

chosen for all titrations.

Eight replicas were chosen for the heptapeptide with a temperature range from

250 to 480 K. 10 ns were used for each replica in all REMD simulations and an

exchange was attempted every 2 ps. A MC move to change protonation state was

attempted every 10 fs. A second set of REMD runs was done with the same overall

conditions but different initial structures in order to check simulation convergence.

To compare conformational and protonation state sampling, 100 ns of constant pH

MD simulations were carried out for aspartate reference compound and ADFDA at the

same pH values as in the REMD runs. For the heptapeptide, one set of 10 ns constant

pH MD simulations were done at all pH values simulated by REMD method.

Constant pH REMD and MD simulations were done using the AMBER 10

molecular simulation suite.199 The AMBER ff99SB force field139 was used in all the

simulations. The SHAKE algorithm145 was used to constrain the bonds connecting

hydrogen atoms with heavy atoms in all the simulations which allowed use of a 2 fs time

step. OBC Generalized Born implicit solvent model200 was used to model water

environment in all our calculations. The Berendsen thermostat, 146 with a relaxation time

of 2 ps, was used to keep the replica temperature around their target values. Salt


119









concentration (Debye-Huckel based) was set at 0.1M. The cutoff for non-bonded

interaction and the Born radii was 30 A.

3.2.3 Global Conformational Sampling Comparison Using Cluster Analysis

In our study, global conformational samplings have been compared utilizing cluster

analysis.169'188 Cluster analysis is a technique to group "similar" structures and each

group is called a cluster. A cluster analysis measures the similarity between two objects.

In the cluster analysis we performed, protein backbone similarity (measured by

backbone RMSD) is considered and the hierarchical agglomerative clustering algorithm

is employed. Hierarchical algorithm basically creates a hierarchy of clusters and a

hierarchical algorithm can be agglomerative or divisive. The hierarchical agglomerative

algorithm starts with considering every object as a cluster and combines similar clusters

into one cluster, while the divisive algorithm starts with one cluster containing all objects

and divides it into more groups.

In our work, the cluster analysis was done using the Moil-View program.201 The

MD and REMD trajectories (having same number of frames) at 300 K and under the

same solution pH value were first combined. The ptraj module of the AMBER package

has been utilized to create the combined trajectory. The "trajin" keyword was used to

read in two trajectories and the "trajout" command generated the trajectory we need.

The combined trajectory was clustered based on peptide backbone atoms root-

mean-square deviations (RMSDs). A cluster cutoff RMSD of 1.5 A is chosen for both

ADFDA and the heptapeptide during our analysis. By clustering the combined trajectory,

the MD and REMD conformational samplings will populate the same clusters. The

fraction of the conformational ensemble corresponding to each cluster (fractional

population of each cluster) was calculated for MD and REMD run, respectively. Two


120









sets of fractional population of clusters were generated. One must note that the

fractional population of each cluster from MD and REMD trajectory may not be the

same. Therefore, the correlation between the two sets of fractional population can be

investigated by plotting one set against the other and doing a linear fitting.

The Moil-View program will generate a file pointing out which cluster a snapshot in

the combined trajectory belongs to. Thus, the fractional population of each cluster was

obtained for MD and REMD simulation. If the MD and REMD simulations produced the

same structural ensemble, the fractional population of a cluster from MD simulation will

be the same as that from REMD simulation. Cluster population fraction from REMD

simulation vs that from MD simulation was plotted (see Figure 3-7A). The correlation

coefficient values which represent the correlations between MD and REMD cluster

population were calculated at each solution pH value by doing linear regression.169'188 A

high correlation between MD and REMD cluster population indicates that the structure

ensembles are similar to each other. This method provides a direct comparison of

global conformational sampling between MD and REMD simulations.

The same technique was used when studying convergence of constant pH REMD

and MD trajectories (see Figure 3-7B and Figure 3-12). When investigating

convergence of conformational sampling, snapshots from two constant-pH REMD

simulations (or two constant-pH MD simulations) were combined. The two constant-pH

simulations should have the same temperatures and solution pH values. They only

differ in initial structures. A high correlation coefficient indicates the two structural

ensembles are similar and two conformational samplings are converged, while a poor


121









correlation means the structural ensembles are different and the conformational

sampling depends on initial condition.

3.2.4 Local Conformational Sampling and Convergence to Final State

In our study, the local conformational sampling was examined by comparing the

probability distribution of backbone dihedral angle pair ((p, p). Essentially, we are

comparing the Ramachandran plot of a residue. Each ((p, p) probability density was

computed by inning (p and p angle pairs 10 x 100, which would lead to a 36x36

histogram. These two dimensional histograms were normalized into populations and the

contours were plotted. The metric used to evaluate ((p, p) probability density

convergence was the root-mean-squared deviation (RMSD) between the cumulative ((p,

p) histogram and the one produced by using all configurations. Each cumulative

histogram was constructed by using ((p, p) pairs up to current time and following the

same algorithm mentioned earlier in this section. Essentially, we were computing the

RMSD between two matrices. The RMSD between the cumulative probability density at

time t and the final probability density (all configurations were utilized to compute final

probability density) is given by,


RMSD(t) = J 61jl[Pi (t Pi,final 2 /36 x 36 (3-11)

where Pij (t) is the ijth element of the cumulative probability density of the ((p, p) pairs

at time t and Pij,fina is the corresponding element in the final probability density matrix.

3.3 Results and Discussion

3.3.1 Reference Compounds

We first applied our constant pH REMD method to the reference compounds.

Table 3-1 shows the pKa values predicted by REMD simulations (10 ns for each replica)


122










as well as the reference pKa values. All our pKa values were calculated by fitting to the

HH equation. Agreement between constant pH REMD predictions and the reference

values can be seen.

Table 3-1. The REMD pKa predictions of reference compounds.
pKa Aspartate Glutamate Histidine Lysine Tyrosine
REMD 3.97(0.01) 4.41(0.01) 6.40(0.03) 10.42(0.01) 9.61(0.01)
Reference 4.0 4.4 6.5 10.4 9.6
The numbers in parenthesis are the standard errors.


The pH titration curves of the same reference compounds showed agreement

between MD (100 ns) and REMD simulations. Figure 3-2 demonstrates the REMD and

MD titration curves of aspartic acid reference compound as an example.

1.0-
MD run
S--- REr.1D run
0.8-

0
t /
re 0.6 /
LL

C: 0.4-


S0.2


0.0
3 4 5 6 7
Solution pH


Figure 3-2. Titration curves of blocked aspartate amino acid from 100 ns MD at 300K
and REMD runs. Agreement can be seen between MD and REMD
simulations.

We further studied the convergence of protonation states sampling. REMD and

MD protonation fraction (cumulative protonation fraction) were plotted with respect to

MC attempts for aspartate reference compound at all pH values. Figure 3-3

demonstrated the protonated fraction versus time at pH 4 as one example. According to


123










Figure 3-3, it suggests that although the final pKa predictions are the same between

REMD and MD simulations, the protonation state sampling during REMD simulations

clearly converges faster than that in a MD run.

-MD, pH=4
6- --REMD, pH=4




0


.2
t5
g 05,





0 50000 100000 150000 200000
MC Titration Steps

Figure 3-3. Cumulative average protonation fraction of aspartic acid reference
compound vs Monte Carlo (MC) steps at pH=4.

3.3.2 Model peptide ADFDA

The model peptide ADFDA (as zwitterion) was chosen as a more stringent test of

our constant pH REMD method. The charged termini will provide different electrostatic

environment for each titratable Asp residue and hence a correct constant pH REMD

model should reflect this difference between titration curves of the two Asp residues.

The Asp2 residue is closer to the NH3+, so the deprotonated state is favored and the

pKa value of Asp2 residue should shift below 4.0 (which is the pKa value of the

reference aspartic dipeptide). The Asp4 residue is closer to the COO- negative charge

and hence the pKa value should shift above 4.0.

The titration curves of the model peptide ADFDA from REMD simulations are

shown in Figure 3-4. We can clearly see that Asp2 and Asp4 have different titration


124










curves from each other and from the reference compound. The pKa value and Hill

coefficient for each Asp residue were obtained by fitting titration curves to a Hill plot.

The results are shown in Table 3-2. The REMD pKa predictions reflect the difference

between Asp2 and Asp4 due to different peptide electrostatic environments. We also

displayed the MD titration curves of Asp2 and Asp4 in Figure 3-4 and listed the MD pKa

predictions and corresponding Hill coefficients in Table 3-2. The titration curve of Asp2

residue only showed a small difference between MD and REMD simulation. But we can

see differences in titration behaviors of Asp4 between MD and REMD calculations when

solution pH is below 5. Interestingly, Lee et al. studied blocked Asp-Asp peptide using

CPHMD method, reporting different Hill coefficient for each of the two Asp residues.


Model peptide ADFDA Titration curves at 300K
1.0,- -.- Asp2MD
Asp2 REMD
v Asp4 MD
0.8 v Asp4 REMD
S6Asp reference
C- :
04

u-

0 / .
0 /
S 0.2-


0.0 2
2 3 4 5 6 7
Solution pH


Figure 3-4. The titration curves of the model peptide ADFDA at 300K from both MD and
REMD simulations. MD simulation time was 100 ns and 10 ns were chosen
for each replica for REMD runs.

Table 3-2. pKa predictions and Hill coefficients fitted from the Hill's Plot
Asp2 Asp4
pKa Hill Coefficient pKa Hill Coefficient
REMD 3.74 0.87 4.38 0.67
MD 3.76 0.89 4.54 0.85


125









Convergence rates of Asp2 titration behavior were compared between REMD and

MD calculations due to the fact that Asp2 titration curves are very close. The cumulative

protonated fractions versus MC attempts at pH 4 are shown in Figure 3-5. Likewise,

faster convergence in protonation state sampling can be seen for REMD simulation

even though both REMD and MD calculations resulted in the same final protonated

fraction. Clearly, our constant pH REMD method accelerates the convergence of

sampling of protonation states.

MD, pH=4, Asp2
0.5 REMD pH=4, Asp2



.0

0



c 0.3






0 20000 40000 60000 80000 100000
MC Titration Steps

Figure 3-5. Cumulative average protonation fraction of Asp2 in model peptide ADFDA
vs Monte Carlo (MC) steps at pH=4.

In addition to protonation state sampling, we also evaluated the conformational

sampling in constant pH MD and REMD simulations. First, distribution of backbone (p

and yp angle pairs (Ramachandran plots) of residue Asp2, Phe3 and Asp4 in ADFDA at

each solution pH were studied. The regions in Ramachandran plots sampled by MD and


126










REMD simulations are the same at all pH values. Ramachandran plots for residue Asp2

at pH 4 are shown in Figure 3-6 as an example.

(A) (B)
"MD, pH=4, Asp2 24E-2 15 REMD, pH=4, Asp2 24E-2

100- 2.1E-2 100- 21E-2
1.8E-2 18E-2
50- 50-
1.5E-2 15E-2
07 / 10 -E-2 rs 0 ?E-2
4 E.
-50- -50-
E-3 6E-3

0, .150.-

.150 .100 .50 0 50 100 150 -150 -100 -50 0 50 100 150
phi phi

Figure 3-6. Backbone dihedral angle (cp, yp) normalized probability density
(Ramachandran plots) for Asp2 at pH 4 in ADFDA. Ramachandran plots at
other solution pH values are similar. For Asp2, constant-pH MD and REMD
sampled the same local backbone conformational space. Phe3 and Asp4
Ramachandran plots also display the same trend.

Since the Ramachandran plot only represented local conformational sampling, we

also evaluated global conformational sampling by clustering MD and REMD trajectories

and comparing the cluster populations. The MD and REMD cluster population R2 values

are listed in Table 3-3. A plot of cluster populations from MD and REMD trajectories at

solution pH of 4 is shown in Figure 3-7A as an example. The large R2 values indicate

that the MD and REMD sampled the same conformational space and generated the

same structure ensemble. The small size of ADFDA and simple structure of each

residue make 100 ns long enough for MD to sample the relevant conformations.

We further studied the convergence of REMD simulations by comparing global

conformation distribution between two REMD simulations starting from two different

structures. Cluster populations of the two REMD simulations at solution pH 4 are


127










displayed in Figure 3-7B. The R2 value is 0.959 at pH 4. This large correlation tells us

that the two REMD simulations provide the same structure ensemble and hence the two

simulations are converged.


Table 3-3. Correlation coefficients between MD and
pH=2.5 pH=3
R2 0.94 0.90
pH=4.5 pH=5
R2 0.85 0.98
The R values were calculated by linear regression.


REMD cluster populations.
pH=3.5 pH=4
0.79 0.93
pH=5.5 pH=6
0.92 0.96


ADFDA, pH=4 Linear Fit, R=0.93


C
20-
5
w
0

.o 10.
0
10 -
S5-


30 35


ADFDA, pH=4 -Linear Fit, R =0.96


0 5 10 15 20
% Population of REMD Run 1


25 30


Figure 3-7. Cluster populations of ADFDA at 300K. A) MD vs REMD at pH 4.
Trajectories from MD and REMD simulations are combined first. By clustering
the combined trajectory, the MD and REMD structural ensembles will
populate the same clusters. The fraction of the conformational ensemble
corresponding to each cluster (fractional population of each cluster) was
calculated for MD and REMD simulation, respectively. Two sets of fractional
population of clusters were generated, and hence plotted against each other.
B) Two REMD runs from different starting structures at pH 4. Large
correlation shown in Figure 3-7B suggests that the REMD runs are
converged. Large correlations between two independent REMD runs are also
observed at other solution pH values. Correlations between MD and REMD
simulations can be found in Table 3-3.

3.3.3 Heptapeptide derived from OMTKY3

We first compared the protonation state sampling between constant pH REMD

and MD simulations. Titration curves of Asp3, Lys5 and Tyr7 from two sets of


128


0 5 10 15 20 25
% Population of REMD Run









simulations are plotted in Figure 3-8A and 3-8B. For each titratable residue, titration

curves generated by constant pH REMD and MD are close to each other. Since the pKa

value of Asp3 in this heptapeptide is experimentally determined to be 3.6, it will be

interesting to evaluate how our predicted values compare to the experimental result.

The pKa values of Asp3 were calculated based on Hill's plots which are displayed in

Figure 3-8C. The predicted pKa value is 3.7 for both REMD and MD simulations and

they are in excellent agreement with the experimental pKa value. Following the same

procedures, our predicted pKa values of Lys5 and Tyr7 from constant pH REMD and

MD simulations were obtained. Not surprisingly, the REMD and MD schemes yielded

essentially the same predicted pKa values for Lys5 and Tyr7.

(A) MD, Asp3
REMD, Asp3

1.0-
/

0.8 /
0 /
uL 0.6 /
-o




0.4 -
/
Q_

0.2


0.0 i i ,
4 6 8 10
Solution pH

Figure 3-8. A) Titration curves of Asp3 in the heptapeptide derived from protein
OMTKY3. B) Titration curves of Lys5 and Tyr7 in the heptapeptide derived
from protein OMTKY3. C) shows the Hill's plots of Asp3. The pKa values of
Asp3 are found through Hill's plots.


129










--MD, Lys5
REMD, Lys5
MD, Tyr7
-REMD, Tyr7




/


1.0-


0.8-
-.



0
Lu 0.6-
--

C .
0)
| 0.4-
0.
0)

0.2-


0.0-


10


12


Solution pH


--MD, Asp3
REMD, Asp3


4 5 6


Solution pH


Figure 3-8. Continued


130


(B)


/


4


I










Although the final pKa predictions are the same for constant pH REMD and MD

simulations, constant pH REMD showed clear advantage in the convergence of

protonation state sampling. Again, we chose the cumulative average protonation

fraction vs MC steps to reflect protonation state sampling convergence for all three

titratable residues. Several representative plots are shown in Figure 3-9. The trend that

constant pH REMD simulations produce faster convergence in protonation fraction is

universal. Therefore, it is very clear that constant pH REMD method is better than

constant pH MD in protonation state sampling.


(A) MD, pH=4, Asp3
0.5 REMD, pH=4, Asp3




0.4


M'-
0
--

0
0. -



0.2-




01 ._
0 20000 40000 60000
MC Titration Steps



Figure 3-9. A) Cumulative average protonation fraction of Asp3 of the heptapeptide
derived OMTKY3 vs MC steps. B) and C) is cumulative average protonation
fraction of Tyr7 and Lys5 in the heptapeptide vs MC steps, respectively.
Clearly, faster convergence is achieved in contant-pH REMD simulations.


131










(B)


0 20000 40000 60000
MC Titration Steps


--MD, pH=10, Lys5
REMD, pH=10, Lys5


40000


60000


MC Titration Steps


Figure 3-9. Continued


132


MD, pH=9, Tyr7
REMD, pH=9, Tyr7


(C)


0.3-





0.2-








0.1


0.0- -


20000













Conformational sampling is an important issue in constant pH studies. We first

looked at the conformational sampling on peptide backbones. We evaluated backbone

conformational sampling through Ramachandran plots. Six residues (from Ser2 to Tyr7)

are studied here. Not surprisingly, Ramachandran plots from constant pH REMD and

MD simulations are very close, suggesting that the overall local conformational

samplings are similar. The Ramachandran plots of Asp3 at pH 4 are shown in Figure 3-

10 as examples. The only exception is Tyr7 in acidic pH values. Tyr7 can visit the left-

handed alpha helix conformation during constant pH REMD runs but is not able to do

that in constant pH MD runs. In general, constant pH REMD and MD yielded the same

Ramachandran plots for the heptapeptide.

(A) (B)
150- MD, pH=4, Asp3 4E-2 150- 9 REMD, pH=4, Asp3 46-2
100- 36E2 00- 3 6E-2
100- 100 -
3 2E.2 3 23 -2
50- R-E 50- '. "
A4E-
0 0-
-0 -2
I 2E-2 z&-2
8E-3 8E-3
-100- -100

-150- 0 -150- 0
-150 .100 .50 0 50 100 150 -150 -100 -50 0 50 100 150
phi phi

Figure 3-10. Dihedral angle ((p, p) probability densities of Asp3 at pH 4. A) Constant-pH
MD results. B) Constant-pH REMD results. The two probability densities are
almost identical, indicating that constant-pH MD and REMD sample the same
local conformational space. All others also show very similar trend.

As demonstrated earlier, the overall samplings of ((p, p) distribution by constant

pH REMD and MD are similar for Ser2 to Thr6. It is interesting to determine how fast

each sampling scheme reaches the final distribution. We studied evolution of backbone

conformational sampling based on cumulative data as what we did in the case of


133










protonation state sampling convergence. As described in the METHOD section, the

RMSD between the (cp, p) distribution up to current time versus total simulation time

was calculated. The smaller a RMSD is, the closer a probability distribution reaches to

the final distribution. Deviations were calculated starting from the second nanosecond

with time intervals incremented by 100 ps. The cumulative time-dependence RMSD of

Asp3 and Lys5 are also shown in Figure 3-11 as examples. As seen in the figures,

these curves decrease faster in constant pH REMD simulations. Figure 3-11 suggests

that although the final (cp, p) probability distributions are similar between constant pH

REMD and MD simulations, the constant pH REMD simulation clearly reaches the final

state faster.


(A) (B)
o0.006- -MD, pH=4, Asp3 o0.00 -MD, pH=4, Lys5
-REMD, pH=4, Asp3 -REMD, pH=4. Lys5
0.005- 0.006
00.005

0.004
0.0030003
0.003
0.002-
0.0020.002
0002-
0.001 0.001


0 2000 4000 6000 8000 2000 4000 6000 8000
Time (ps) Time (ps)


Figure 3-11. The root-mean-square deviations (RMSD) between the cumulative ((p, p)
probability density up to current time and the ((p, p) probability density
produced by entire simulation. ((p, p) probability density convergence
behaviors at other pH values also show that REMD runs converge to final
distribution faster.

Cluster analysis was also applied to study the convergence of conformation

sampling in the heptapeptide. By comparing cluster populations between the first and


134










second half of one trajectory, one could check the convergence of that simulation. The

two halves of a structural ensemble should yield the same populations in each cluster if

convergence is reached. For example, simulations at pH 4, both constant pH REMD

and MD yield about 20 clusters and the correlations coefficients are calculated through

a linear regression. Cluster population plots and correlation coefficients are shown in

Figure 3-12. A much higher correlation coefficient can be seen in constant pH REMD

simulation, suggesting the two halves of the constant pH REMD simulation at pH 4

populate each cluster much more similarly than the corresponding constant pH MD

does. Hence, much better convergence is achieved by the constant pH REMD run.


(A) (B)
(A) ----Linear Fit, pH=4, R2=0.54 (B).89
40- 25 Linear Fit, pH=4, R2=0.89
C.
0 1(I0
( 35-
%D E al
2 20-
0
30- s

25 -
I / i15-

S20-

C 1- Cu



-I I n I o I* a
Q. f n
0 5 10 15 20 25 30 5 40 0 5 10 15 20 25
% Population of MD Run (the first half) % PNpulaliori of REMD Run (the first half)


Figure 3-12. Cluster population at 300 K from constant pH MD and REMD simulations
at pH=4. Cluster analysis is performed using the entire simulation. The
populations in each cluster from the first and second half of the trajectory are
compared and plotted. Ideally, a converged trajectory should yield a
correlation coefficient to be 1. A) Constant pH MD. B) Constant pH REMD.
Much higher correlation coefficient can be seen in constant pH REMD
simulation, suggesting much better convergence is achieved by the constant
pH REMD run.


135









3.4 Conclusions

In our work, we have applied replica exchange molecular dynamics (REMD)

algorithm to the discrete protonation state model developed by Mongan et al. in order to

study pH-dependent protein structure and dynamics. Seven small peptides were

selected to test our constant pH REMD method. Constant pH molecular dynamics (MD)

simulations were ran on the same peptides for comparison. The constant REMD

method results are encouraging. The constant REMD method can predict pKa values in

agreement with literature and experimental results. Constant pH REMD method also

displays advantage in convergence behaviors during protonation states and

conformational sampling.

The REMD algorithm has been proven beneficial to study pH-dependent protein

structures. Our future work will include studies of pH-dependent protein dynamics and

application of this constant pH REMD to large proteins.


136









CHAPTER 4
CONSTANT-pH REMD: STRUCTURE AND DYNAMICS OF THE C-PEPTIDE OF
RIBONUCLEASE A

4.1 Introduction

The protein and peptide folding problem202 is an important aspect of protein

science and biophysical chemistry.203 In 1961, Anfinsen studied the refolding of

denatured ribonuclease (RNase).204 He first increased the temperature of the protein

and the protein lost its functional three-dimensional shape (native state). When Anfinsen

lowered the temperature, he found that the RNase was able to refold into its normal

shape, without any other help. His experiment raised questions about protein folding. In

general, people are interested in the thermodynamics (such as free energy landscape,

folding pathway, and interactions in a protein), folding kinetics (such as how fast a

protein folds), and native state prediction for a given sequence in protein folding.202 Both

experimental and theoretical approaches have been employed to understand protein

folding.205,206

From now on, our introduction to protein folding will focus on computer

simulations. In a protein folding simulation, the concept of free energy landscape always

plays an important role.202'207 Many questions can be answered once the free energy

landscape is obtained. Levinthal,208 in 1968, proposed that it is impossible for a protein

to search all its conformations during folding process because the time taken to visit all

conformations will be much longer than the folding time observed. His argument is well

known as the "Levinthal's paradox". Thus, proteins must fold to their native states along

some well-defined folding pathways. The "new" view of protein folding is the free energy

landscape theory, which provides a statistical view of the folding landscape.202,203'207

The folding process does not require chemical-reaction-like steps between specific


137









states. Basically, a protein folds on a funnel-shaped free energy landscape, which is

defined by the amino acid sequence of the protein. Folding process is a directed visit of

conformations on a landscape in order to reach the native state, which is the most

thermodynamically stable conformation. Changing temperature, adding denaturant to

the protein solution, or changing solution pH value of the protein system is able to

change the free energy landscape, and hence affect protein folding. The free energy

landscape of a protein is often rugged51 and requires advanced sampling techniques

such as REMD method to sample the conformational space. Due to the visual limitation,

a free energy landscape is frequently projected onto one or two reaction coordinates. In

practice, the free energy landscape is often projected onto several important reaction

coordinates such as the radius of gyration of a protein, the number of backbone

hydrogen bonds, and native contacts. Principal component analysis has also been

carried out to generate the folding free energy landscape. The relative free energy

(potential of the mean force, PMF) can be calculated by the following,

AF(B A) = F(A) F(B) = -kBTln(P(A)/P(B)) (4-1)

where AF(B A) is the relative PMF between state A and state B defined by reaction

coordinatess, P(A) and P(B) are the probability density of find state A, and B along the

reaction coordinatess, respectively.

Knowing the free energy landscapes can help people understand folding

mechanisms. Transition states, intermediates, and folding pathways can be obtained

from a folding free energy landscape. For example, when the free energy barrier

between folded and unfolded state is disappeared, the folding is called downhill folding,

in which the folding time is determined by diffusion rate on the free energy landscape.


138









One example of the protein folding free energy landscape studies is simulating the

folding of C-terminal 3-haripin of protein G, performed by Zhou et al. in 2001.184 The

OPLSAA force field, SPC explicit water model, and REMD algorithm have been

employed in their simulation. The free energy landscape has been projected onto seven

different reaction coordinates such as radius of gyration, number of hydrogen bonds,

and fraction of native contacts. Two-dimensional free energy landscapes along those

reaction coordinates were generated in order to elucidate the folding pathway. Four

different states were found in the folding landscape, native state, unfolded state, and

two intermediate states. Structural features of each state were also characterized. The

formation of hydrophobic core and hydrogen-bonding in the folding process has been

investigated. They have found that the hydrophobic core and hydrogen bonds formed

almost simultaneously after initial collapse.

Although not investigated in this chapter, protein folding kinetics is also an

important aspect of protein folding.209 One example of the folding kinetics study is

seeking the speed of protein folding.210 Computer simulations have been performed to

elucidate folding kinetics.211 The Pande group at Stanford University pioneered

computer simulations of folding kinetics.206'211-213 When studying protein folding kinetics,

the Pande group conducted multiple independent MD simulations starting from different

initial conditions. The probability of the native state in the structure ensemble was

computed after a pre-defined simulation time. Assuming the folding mechanism is two-

state folding and follows the first-order reaction kinetics, and the transition time is much

shorter than staying time in either state, the probability of barrier-crossing can be given

by,


139









P(t) = 1 e-kt (4-2)

where t is simulation time and k is the folding rate. In the limit of t < 1/k, Eq. 4-2 can

be simplified to P(t) ~ kt, according to the Taylor expansion. The probability of barrier-

crossing can be computed by using the fraction of simulations that crossed the barrier.

Other methods utilized to explore folding kinetics include Markov state models.195'198'214-
217

One example of predicting folding time is given by studying the C-terminal 3-

hairpin of protein G. In their studies, Pande and co-workers213 utilized the OPLSAA

force field and the GB implicit solvent model using water-like viscosity via Langevin

collision coefficient. A total simulation time of 38 ps has been accumulated through

2700 independent simulations, among which 8 completely folded trajectories were

found. Thus, a folding time of 4.7+1.7 ps can be derived from Eq. 4-2, which is in

agreement with the experimental result of 6 ps. Furthermore, the folding free energy

landscape has been generated and the folding pathway and folding intermediates etc

have also been probed.

Another area of protein folding simulation is to probe protein folding through the

unfolding simulations. The unfolding simulations adopt the assumption that folding

processes follow the reverse pathways of unfolding processes. Both temperatures and

denaturants can be employed to denature proteins. Levitt and Daggett have been

performed unfolding simulations extensively.218-220

The C-peptide, residues 1 to 13 from the N-terminus of RNase A, is a peptide well

studied by experiments.5,'7221-226 In 1971, Brown and Klee223 first observed the presence

of a-helix of C-peptide through circular dichroism (CD) spectroscopy. This peptide was


140









further studied extensively by the Baldwin group.5,7'222,224,226 CD spectroscopy showed

that the C-peptide demonstrated pH-dependent a-helix formation. The mean residue

ellipticity at 222 nm of the C-peptide showed a bell-shaped pH profile, having a

maximum at pH value of 5. Mutation experiments indicated that the Glu2 and Hisl2 in

the C-peptide were crucial to the pH-dependent helix formations.5'7'224'226 Maximal mean

residue ellipticity occurred at pH 5 because both the glutamate and histidine residues

are charged at that pH. NMR experiments on an analog of the C-peptide (RN-24) by the

Wright group also confirmed the formation of complete and partial helix.225 Two side

chain interactions were believed to stabilize the partial helix formation in the C-peptide

and its analogs in the mutation experiments and NMR studies.7,224-226 A salt-bridge

between Glu2 and Arg10 side chains was proposed to improve the helix formation as

the pH values increased to 5. The interaction between Phe8 and Hisl2 was also

believed to improve helix formation as the pH values reduced to pH of 5.

The folding and side chain interactions of C-peptide and its analogs were also

extensively studied by molecular simulations.227-235 Schaefer et a/.232 studied the helical

conformations and folding thermodynamics. The Okamoto group228-230'233-235 has

performed thorough investigations of the C-peptide using a multicanonical algorithm

(MUCA) and the replica exchange method (REM) in both implicit solvent and explicit

solvent. They have studied secondary structures of the C-peptide, roles of Glu2 and

Hisl2 in the C-peptide, helix-coil transition, and dielectric effect in the implicit solvent.

Ohkubo and Brooks231 utilized REMD simulations with the GB model to explore the

helix-coil transition of short peptides including the C-peptide. Conformational entropy as

a function of temperature has been explored for the C-peptide and its analogues


141









(different chain length). The conformational entropy has been found to be proportional

to chain length over a wide range of temperatures. Felts and co-workers227 carried out

REMD simulations with the AGBNP implicit solvent model to study the folding free

energy landscape of the C-peptide. The free energy landscape was projected onto

radius of gyration and helical length. The possible interaction between Glu2-Argl0 was

also explored. Dielectric effects of AGBNP solvation model on helical length and salt-

bridge has been investigated too. In 2005, Sugita and Okamoto233 performed replica

exchange multicanonical algorithm simulations in explicit solvent to explore the folding

mechanism and side-chain interactions such as Glu2-Argl0 and Phe8-Hisl2. They

constructed folding free energy landscape along the principal component axes. The

correlations between Glu2-Argl0 and Phe8-Hisl2 interactions and the C-peptide

conformations have been elucidated. They have found that the minimum free energy

conformation possess both interactions. They have also suggested that the purpose of

Glu2-Argl0 salt-bridge is to prevent a-helix extending to N-terminus of the C-peptide

and the Phe8-Hisl2 stabilizes the alpha-helix conformation toward the C-terminus.

More importantly, Khandogin etal.112 studied the pH-dependent folding of the C-peptide

with REX-CPHMD. Important electrostatic interactions such as the Lysl-Glu9, Glu2-

Arg10 and Phe8-Hisl2 interactions were also investigated.

The C-peptide has also been selected to test the effect of force fields on protein

folding simulations and simulation convergence. In 2004, Yoda et a/.234'235 tested six

commonly employed force fields (AMBER94, AMBER96, AMBER99, CHARMM22,

OPLS-AA/L, and GROMOS96) on the C-peptide as well as the C-terminal fragment

from the B1 domain of the G-peptide in explicit water using generalized-ensemble


142









method. Melting curves have been studied. Secondary structures of both peptides were

also computed and compared with experimental data. AMBER99 and CHARMM22 were

found showing best agreement for the C-peptide.

In this chapter, we present a study of the C-peptide using constant-pH REMD

method introduced in the previous chapter. The effect of pH on the folding of C-peptide

and the structural ensemble is studied. We compare directly with experimental

measurements of helicity, namely the mean residue ellipticity at 222 nm. Important

electrostatic interactions such as Glu2-Argl0 salt-bridge and Phe8-Hisl2 interaction are

also examined.

4.2 Methods

4.2.1 Simulation Details

The C-peptide we simulated has the sequence: KETAAAKFERQHM. The N-

terminus of the C-peptide (lysine) is charged while the C-terminus (methionine) is

capped with an amide. For our study, constant-pH REMD simulations were performed

starting from a completely extended structure at pH values 2, 3, 4, 5, 6.5 and 8. Eight

replicas were chosen with a temperature range from 260 to 420 K. A simulation time of

44 ns were used for each replica in all REMD runs and an exchange was attempted

every 2 ps. The structures obtained from the first 4 ns were discarded, resulting in a 40

ns of production time for each replica. Glu2, Lys7, Glu9 and Hisl2 are selected to be

titratable. A MC move to change protonation state was attempted every 10 fs. A second

set of REMD runs was done at pH values of 2, 5 and 8 starting from a fully helical initial

structure in order to check simulation convergence. The three pH values are selected to

represent low pH, pH where maximum helicity was observed experimentally and high

pH, respectively.


143









AMBER 10 molecular simulation suite199 was used to simulate the C-peptide. The

AMBER ff99SB force field139 was used in all the simulations. The SHAKE algorithm145

was used in all the simulations which allowed use of a 2 fs time step. OBC Generalized

Born implicit solvent model200 was used to model water environment in all our

calculations. The Berendsen thermostat,146 with a relaxation time of 2 ps, was used to

keep the replica temperature around their target values. Salt concentration (Debye-

Huckel based) was set at 0.1 M. The cutoff for non-bonded interaction and the Born

radii was 30 A (this cutoff is longer than the peptide).

4.2.2 Cluster Analysis

When studying the folding of C-peptide, the roles of cluster analysis are two-fold.

One role is to compare structural ensembles and check convergence at particular

temperature and solution pH value, while the other is to analyze a single ensemble of

structures to investigate protein structures and interactions. As described in the

previous chapter, cluster analysis was done using the Moil-View program201 and the Ca

RMSD has been chosen to measure structure similarity.

When comparing conformational sampling, two different ways of comparisons

have been adopted. The first way is to compare the first and the second halves of one

trajectory. In this case, cluster analysis was performed on a single trajectory and the

cluster information can be utilized to study folding thermodynamics and interactions in

the C-peptide. The second way is to compare the structural ensembles produced by

simulations starting from the fully extended and fully helical structures. In the second

case, the two trajectories (having same number of frames) at 300 K and under the same

solution pH value were first combined. Then the combined trajectory was clustered on

the basis of peptide backbone atoms root-mean-square deviations (RMSDs). The


144









population fraction corresponding to each cluster was obtained for both trajectories. The

correlation coefficient, which represents the correlation between the cluster populations

of the two trajectories, was calculated at each solution pH value by doing linear

regression. A high correlation indicates that the structure ensembles are close to each

other. This method provides a direct comparison of global conformational sampling

between the two trajectories. A cluster cutoff RMSD of 2.0 A is chosen during our

analysis.

4.2.3 Definition of the Secondary Structure of Proteins (DSSP) Analysis

The secondary structures of the C-peptide have been explored by DSSP

algorithm,236 which is proposed by Kabsch and Sander. The DSSP algorithm identifies

the secondary structure of a residue by hydrogen bond calculations. The calculation is

based on electrostatic energy between backbone carbonyl group and amide group,

U = qlq2 ( + --- 332 kcal/mol (4-3)
ON rTCH rOH TCN

In the above equation, ql and q2 are the partial charges on each atoms. If the

electrostatic energy is below -0.5 kcal/mol, then a hydrogen bond will assigned to

corresponding carbonyl and amide groups. The secondary structure of a residue is

labeled by one letter: G for 310 helix, H for alpha-helix, I for pi-helix, B for antiparallel

beta-sheet, b for parallel beta-sheet, and T for turns.

4.2.4 Computation of the Mean Residue Ellipticity

CD spectroscopy is one of the most commonly used techniques to study protein

secondary structures and folding.237 Chiral molecules absorb left circularly polarized

light (LCPL) and right circularly polarized light (RCPL) differently. CD spectroscopy


145









measures the difference in absorbance of LCPL and RCPL of a chiral molecule. It can

provide information of protein secondary structures.

Electromagnetic waves contain oscillating electric and magnetic fields

perpendicular to each other and to the propagating directions. A circularly polarized light

(CPL) has an electric field vector rotating along its propagation direction but maintains

its magnitude. This is in contrast to linearly polarized light which has an electric field

vector oscillating in one plane but change its magnitude. When a LCPL is propagating

toward an observer, the electric field vector rotates counterclockwise, while the RCPL

rotates clockwise.

When a circularly polarized light passes through chiral molecules, the difference in

the absorption of LCPL and RCPL is given by:

AE(A) = EL( ) ER(A) (4-4)

where EL and ER is extinction coefficient of LCPL and RCPL, respectively and A is

wavelength. AE has the dimensions of (cm M)-1 or cm2 dmol-1. The extinction

coefficient E can be calculated by Beer-Lambert law: E = A/c 1 where A is the

absorbance, c is the concentration, and I is the width of the cuvette. This difference

gives CD spectroscopy. Many CD instruments record signal in ellipticity, 0, which is

measured in degrees. The ellipticity can be calculated as: 0 = 32.98(AL AR) = 32.98

c 1 AE, where 32.98 has unit of degree. A more frequently adopted measurement of

CD is the molar ellipticity [0],238

1000
[0] = = 3298 An(A) (4-5)
1Here, the molar ellipticity has units of de

Here, the molar ellipticity has units of deg cm2 dmol-1


146









The integrated intensity of a CD band is called rotational strength. Theoretically,

for a electronic transition from ground state (0) to excited state (i), the rotational

strength can be calculated as,

Roi = Im((WolI e IA Ii) (lI I l lo)) (4-6)

where i0 and Cip is the wavefunction of electronic ground and excited state,

respectively; ,e and Pm is the electronic transition and magnetic transition dipole

moment operator, respectively; and Im stands for the imaginary part. Eq. 4-6 suggests

that the frequently adopted units of rotational strength are Debye-Bohr magnetons

(DBM, 1 DBM=9.274 x 10-39 erg cm3, where erg is the cgs unit of energy). Eq. 4-6 is

origin-dependent because the magnetic transition dipole moment operator is origin-

dependent. In order to avoid this origin-dependence, the dipole-velocity formulation can

be employed,

Ro, = -(eh/2ntmvo)Im((V |lvi) (Vi Aim lo)) (4-7)

Here, e is the charge of an electron, m is the mass of an electron, and v0o is the

frequency of the transition.

According to the paper of Sreerama and Woody,238 CD spectrum can be

calculated as, assuming each CD band (CD transition) is a Gaussian function of

wavelength,

AEk = 2.278RkAk/Ak (4-8)

where AEk, Rk, Ak, and Ak is the CD, rotational strength, wavelength and half-bandwidth

(one half of the width at 1/e of its maximum) of the kth transition, respectively. In Eq. 4-

8, the constant 2.278 has the dimensions of DBM-1 cm2 dmol-1.


147









The far ultraviolet (far UV, with a wavelength smaller than 250 nm) CD spectra of

proteins can yield important information about the secondary structures of proteins.238 In

the far UV range, peptide bonds in a protein are the main chromophores. Thus, the CD

spectra in the far UV range are reported on a residue basis (mean residue ellipticity). In

a protein CD spectrum, a positive band at ~190 nm and two negative bands at 208 nm

and 222 nm can be found for a-helix.239 In particular, a strong negative band at 222 nm

is a leading indication of the presence of helical structures. Structures containing 3-

sheet will show two bands in CD spectra: a positive band at ~198 nm and a negative

band at -215 nm.240

Computing protein CD spectra using quantum mechanical methods combining

with Eq. 4-7 is only possible in principle due to the size and complexity of protein

structures. The matrix method241 using pre-determined parameters has been adopted to

tackle this problem. In the matrix method, a secular matrix is constructed based on

transition energies and interactions between transitions. A protein is considered as a set

of independent chromophores. Each local transition energies and interactions between

transitions in different chromophores are utilized to construct the secular matrix. A

transition on a local chromophore is represented by a charge distribution. The charge

distributions, as parameters, are determined from quantum mechanical wavefunctions

or experiments or a combination of both.242-244 The off-diagonal elements of the secular

matrix, which represent the interactions between transitions in different chromophore,

are further simplified by charge-charge (monopole-monopole) electrostatic

interaction,238

Vj,kl = Em En ijm qkln /rijm,kln (4-9)


148









Here, Vj,ki is the electrostatic energy between transition j on chromophore i and

transition 1 on chromophore k. m sums over the point charges of transition j on

chromophore i and n sums over the point charges of transition 1 on chromophore k, and

r denotes for the distance between two charges.

Diagonalization of the secular matrix using a unitary transformation will yield the

eigenvalues and eigenvectors corresponding to all transitions of the protein.

Eigenvalues provide information about transition energies and the eigenvectors

describe the mixing of local transitions. The rotational strength can be obtained from

eigenvectors.

In this work, the algorithm developed by the Woody group238,244 was used to

compute the mean residue elliptcity. Detailed description of their algorithm can be found

in the paper of Sreerama and Woody. The peptide transitions (two mi* transitions at

140 and 190 nm, respectively and one nn* transition at 220 nm) were computed using

the Matrix method241 in the origin-independent form.245 Transition charge distributions

monopolee charges) are obtained from INDO/S246 semi-empirical electronic structure

calculations. Side chain transitions of phenylalanine, tyrosine and tryptophan were also

included in the calculations. The a-helix formation can be characterized by two negative

bands at 208 and 222 nm, and a positive band at 192 nm. Following the experiments

performed by the Baldwin group, the mean residue ellipticity at 222 nm ([8]222) was

calculated to generate the pH profile.

In practice, Woody's program reads in one protein structure in PDB format and

yields the mean residue ellipticity and the rotational strength as a function of

wavelength. Therefore, the ptraj module of the AMBER 10 package has been utilized to


149










generate a protein structural ensemble in order to find out an ensemble average of the

mean residue ellipticity at 222 nm.

4.3 Results and Discussion

4.3.1 Testing Structural Convergence

Conformational sampling convergence is investigated utilizing cluster analysis, as

described earlier. Two ways of checking conformational sampling of the simulations

from the fully extended structure are utilized. One way is to compare the first and the

second halves of the trajectory and the other way is to compare to the structural

ensembles produced by simulations starting from a fully helical structure. The R2 values

of the cross clustering are listed in Table 4-1. Plots demonstrating the cluster population

correlations from both ways at pH 2 are showed in Figure 4-1 as an example. The large

R2 values indicate that converged structural ensembles are achieved through 40 ns

simulations.

(A) pH=2 (B) pH=2
15 Linear Fit, R=0O.90 20- Linear Fit, R =0.95

2 15-


1I lo


S 2 5
5-r


0 5 10 15 05 10 15 20
% Population % Population
First Half REMD strating from extended structure

Figure 4-1. Cluster population at 300 K from constant pH REMD simulations at pH 2. A)
Cluster analysis is performed on the trajectory initiated from fully extended
structure. The populations in each cluster from the first and second half of the
trajectory are compared and plotted. B) Two REMD runs from different
starting structures at pH 2. Correlation coefficients at other pH values can be
found in Table 4-1.


150









Table 4-1. Correlation coefficients between two sets of cluster populations.
pH = 2 pH = 3 pH = 4 pH = 5 pH= 6.5 pH = 8

R2
0.90 0.92 0.90 0.94 0.93 0.85
(E vs E)

R2
0.95 ... 0.88 -__ 0.84
(E vs H)


151


E vs E means comparing the first and the second halves of the trajectories starting from the fully
extended structure. E vs H stands for comparing structural ensemble given by simulations starting from
fully extended and fully helical structures, respectively.

4.3.2 pKa Calculation and Convergence

Four residues of the C-peptide are titratable in our constant-pH REMD simulations:

Glu2, Lys7, Glu9 and Hisl2. Lys7 is always protonated in the pH range of 2 to 8, as

expected. Thus, only the data from glutamate and histidine residues are analyzed. For

each glutamate and histidine residue, the fraction of deprotonation at each pH value is

obtained and a Hill's plot is utilized to find out the pKa value. The pKa values are 3.1, 3.7

and 6.5 for Glu2, Glu9 and Hisl2 respectively.

The cumulative average fraction of protonation vs constant-pH MC attempts is

chosen to study the convergence of the pKa calculation. The cumulative average

fraction of protonation represents the time evolution of the protonation state sampling.

As shown in Figure 4-2, a stabilized fraction of protonation is achieved through 40 ns

simulations.

4.3.3 The Mean Residue Ellipticity of the C-peptide

The mean residue ellipticity of the C-peptide at each pH value and at 300 K was

computed. The pH-profile of the [e]222 (Figure 4-3) is clearly a bell-shaped curve, in

agreement to the experimental pH-profile of the [e]222. The maximum of our calculated









[e]222 is at pH value of 5, with a numerical value of ~ -6400 deg cm2 dmol-1. However,

the computed values of [e]222 at the ends (pH = 2, 3, and 8) suggest that the helix is

more populated in the simulations than in experiments at those pH values.

0.8- Glu2 at pH 3
SGlu9 at pH 4


0.6
o


S0.4
0






0.0 ,-, -,
0 20000 40000 60000
MC steps (total time = 40 ns)

Figure 4-2. Cumulative average fraction of protonation vs Monte Carlo (MC) steps. Only
the two glutamate residues are shown here and the histidine residue is found
to show the same trend. The pH values are selected such that the overall
average fraction of protonation is close to 0.5.

As mentioned in the section 2.2.2, the protonation state model involves using

parameters fitted at 300 K, thus results obtained at temperatures other than 300 K

should be viewed qualitatively, not quantitatively. C-peptide at a temperature lower than

300 K shows a more negative [8]222 (more helical), while the [8]222 becomes less

negative (less helical) when the temperature is higher than 300 K. Experiments showed

that the pH-profile becomes flat at high temperatures.5 Our results also reflect the same

trend: pH profile of the [e]222 at 420 K is flat and less negative than those at 300 K, while

the pH profile at 280 K is still bell-shaped and more negative.


152













-7.0
-7.0 --T = 420 K


-6.5- .
E N)



S -5.5 -
o 3
(, 5.5 /\
CN
4 -5.0 --1.5 o


-4.5- -

I I 1 .0
2 3 4 5 6 7 8
Solution pH Value

Figure 4-3. Computed the mean residue ellipticity at 222 nm as a function of pH values.
A bell-shaped curve at 300 K is obtained with a maximum at pH 5. The effect
of temperature on mean residue ellipticity at 222 nm is also demonstrated.

4.3.4 Helical Structures in the C-peptide

In order to examine the helical conformations in different environments, constant-

pH REMD at pH values 2, 5, and 8 are selected to represent the pH range. The

secondary structures of the C-peptide were computed utilizing the DSSP algorithm.236

Any residue which according to the DSSP algorithm belongs to the 310-helix or a-helix

conformation is called helical. The helical percentages of each residue are shown in

Figure 4-4. The maximum helical percentage of a residue is ~ 55% at pH 2 and 5, and

the maximum helical percentage is ~ 40% at pH 8. The averaged helical percentage at

pH 5 is around 30%, which is in good agreement with experiments (29+2%). Figure 4-4

suggests that the C-peptide contains a lot of non-helical structures, even at pH 5 where

the helical content is maximal.


153









-- pH =2
60- pH = 5
pH = 8

50-


( 40








10


0-
1 2 3 4 5 6 7 8 9 10 11 12 13
Residue Number

Figure 4-4. Helical Content as a function of residue number.

We calculated the Ca RMSD vs the fully folded structure (the fully helical structure

has a Ca RMSD of 0.8 A relative to the ribonuclease A X-ray structure, Thr3 to Hisl2

are chosen to calculate Ca RMSD) and the Ca radius of gyration (Rg) of the C-peptide.

The time series and the probability density of RMSDs and Rg are illustrated in Figure 4-

5. According to Figure 4-5B, two conformations can be seen at all three pH values. The

conformation with the smaller RMSD represents structures closer to the fully helical

structure and the structural ensemble at pH 5 possesses more such kind of structures

than the other two structural ensembles. Figure 4-5D demonstrate the probability

density of the Rg, and it suggests that the C-peptide is more compact at pH 5 than at pH

2 and 8. The results of Rg agree with the results of RMSDs because the helical

structures are more compact.


154










pH=5


I, 1 d I J I I


i Ill. i r i I II 11 ii il I'
10000 20000 3000 40000
Time (ps)


pH=5





i ,,ll I ,


Ca Radius of Gyration (A)


Figure 4-5. A) Time series of Ca RMSDs vs the fully helical structure at pH 5. The first
two residues at each end are not selected because the ends are very flexible.
B) Probability densities of the Ca RMSDs. Clearly, the structural ensemble at
pH 5 contains more structures similar to the fully helical structure. C) Time
series of Ca radius of gyration at pH 5. D) Probability density of the Ca radius
of gyration. More compact structures are found at pH 5.

We further studied the details of the C-peptide structural ensemble with respect to

pH values. The studies of helical structure were on the basis of our DSSP results. We

first show the probability density of total number of helical residues at pH 2, 5 and 8 in

Figure 4-6A. As expected, simulations at pH 5 generated the smallest number of non-

helical structures and the percentage is ~ 25%. Simulation at pH 8 generated the most

non-helical structures and ~ 37% of the structural ensemble possesses no helical


155


5

I I


-pH = 2
- pH= 5
- pH=8


3
Ctx RMSD (A)


10



9.

., a
03*^--


10000 20000 30000 40000
Time (ps)











residue. For those structures possessing helical residues, structures having four helical

residues are the most probable and structures containing three helical residues are also

common at all three pH values. Besides, structures possessing six helical residues are

also found. Furthermore, simulation at pH 5 yielded more configurations possessing

seven-residue and longer helices. Thus, longer helical chains are formed more often at

pH 5.

(A) Cl pH=2 (B) -- pH = 2
0,4 IpH=5 07- =lpH = 5
SpH =8 H-- l=8
0,6
0.3-
0.5-

0.4
1 02

01 0.2




0 2 4 6 8 10 0 1 2
Number of Helical Residues Number of Helical Segments

(C) pH = 2 (D) i l pH =2
0.4- pH = 5 I pH = 5
I pH=8 0.4- I--pH=8

03-


Z 0.2 Z

0-1
02 -2



00 M _Q3
2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11
Helix Starting Position Helical Length

Figure 4-6. A) Probability densities of number of helical residues in the C-peptide. B)
Probability densities of the number of helical segments in the C-peptide. A
helical segment contains continuous helical residues. The probability of
forming the second helical segment is very low at all three pH values, thus
only the first helical segment is further studied. C) Probability densities of the
starting position of a helical segment. D) Probability densities of the length of
a helical segment (number of residues in a helical segment).


156









Next, the number of helical segments (a helical segment contains continuous

helical residues) is studied and shown in Figure 4-6B. The number of helical segment

ranges from zero to two at all three pH values. However, C-peptide structures having

two helical segments are really rare. The probability densities of having two helical

segments at pH 2 and 8 are ~ 0.05, while that at pH 5 is ~ 0.1.

Due to the small population of the second helical segment, the analysis of the

helical length (number of helical residues in a segment) and the helix starting position

(residue number of the amino acid initiating a helical segment) is focused on the first

helical segment. Figure 4-6C demonstrates the probability density of helix starting

position in the C-peptide. The helix starting position is affected by pH. The most

probable starting position is affected by solution pH. At pH 2, Lys7 is the most favorable

position to start a helix but the most probable place to initiate a helix is Thr3 at pH 5 and

8. At pH 2 and 5, Thr3, Ala6 and Lys7 are favorable positions to start a helix, while Thr3

and Lys7 are the favorable place to start a helix at pH 8. However, the effect of solution

pH on the helical segment length is not as significant as the effect on helix starting

position. Figure 4-6D shows that the three-residue or four-residue helices are dominant

at all three pH values.

4.3.5 The Two-Dimensional Probability Densities

Two-dimensional (2D) probability density can be employed to study the

correlations between important variables. The peaks in the plots indicate the coupling

between two variables and represent stable conformations. The more populated a

region is, the more stable the corresponding conformation is. The 2D probability

densities between helix starting position and helical length are illustrated in Figures 4-7

to 4-9. Helices consisting of Thr3-Ala5, Lys7-Argl0 and Glu9-Hisl2 are present at all


157











three pH values, while the number of helical conformations is more at pH 5 and 8. At pH


2 and 5, the most probable helix formation is the four-residue helix starting from Lys7


(Lys7-Argl0). The 2D-probability densities reveal that the six-residue (Lys7-Hisl2) helix


and the seven-residue (Ala6-Hisl2) helix are found stable at pH 5. At pH 8, Thr3-Ala5


becomes the most favorable helical formation. Lys7-Argl0 and Lys7-Hisl2 are also


favorable. At pH 8, a new seven-residue helix (Thr3-Glu9) is found.


pH=2 0
10- 1 IF




'-i'- -'
= ',,.: 4








I I I
2 4 6 8 10
Helical Segment Starting Position


Figure 4-7. 2D probability density of helical starting position and helical length, pH = 2.


2 4 6 8 10
Helical Segment Starting Position


Figure 4-8. 2D probability density of helical starting position and helical length, pH=5.



158


pH 5








S,______
.. .-:- : ..
~-.-..-.

",,,',,











10 pH =8
10-,


E '-







2-

I I I
2 4 6 8 10
Helical Segment Starting Position

Figure 4-9. 2D probability density of helical starting position and helical length, pH=8.

2D-probability densities correlating helical length and Ca RMSDs relative to fully

helical structure are shown in Figures 4-10 to 4-12. As expected, structures having long

helices (helical length > 7) correspond to the conformations with RMSDs smaller than

2.2 A and this region is more populated at pH 5. Interestingly, configurations possessing

four-residue helix can also yield RMSDs smaller than 2.2 A, suggesting that structures

having partial helix can be similar to the fully helical too.

5-
pH=2 o





S- II











Helical Segment Length

Figure 4-10. 2D probability density of helical length and C-RMSD at pH = 2.

S ,' o ll '- '159

159














pH=5 0
1 OE-3
2 0E-3
; 3 OE-3
4 0E-3
4-~ 5 OE-3
I 6 OE-3
-" 70 OE-3
8 O8E-3
I .I I 9 OE-3
S 3 il ). 1 OE-2
S- 1 E-2
C] 1 2E-2
S13E-2
S1 4E-2
15E-2
O 2 < 1 6E-2
18E-2
-I k1 9E-2
2 0E-2
II 1E-2
21E-2

-2 3E-2
42 4E-2

0 2 4 6 8 10

Helical Segment Length


Figure 4-11. 2D probability density of helical length and Ca-RMSD at pH = 5.



pH = 8 0
S, 1 OE-3
2 OE-3
3 OE-3
4 0E-3
4- ;5 O0E-3
S30E-3
S7 0E-3
S -,' 80E-3
9 0E-3
1 EE-2
3- I" 1 1E-2
o 1 2E-2
13E-2
S .'/, "14E-2

O 2 1 E-2
S1 217E-2
1 9E-2
19E-2

21E-2
-1" J 2' 2 3E-2
24E-2

0 2 4 6 8 10

Helical Segment Length


Figure 4-12. 2D probability density of helical length and Ca-RMSD at pH = 8.


4.3.6 Important Electrostatic Interactions: Lysl-Glu9 and Glu2-Argl0


The salt-bridge between Glu2 and Arg10 was found in the X-ray structure of


RNase A.247 Amino acid substitution experiments on the C-peptide indicated this salt-


bridge is crucial to the increase in helical content when the pH value is increasing to pH





160









5.7,224 Proton NMR experiments done by Osterhout et a/.225 suggested that this salt-

bridge stabilizes partial helix instead of complete helix. They proposed that the RN-24

structural ensemble contains three major conformations: unfolded, complete folded and

partial helix with Glu2-Argl0 interaction. Hansmann et a/.229 also proposed that the salt-

bridge stabilizes partial helix by performing multicanonical simulations. Felts et al.227

found that the salt-bridge is only significantly found in the globular non-helical C-peptide

structures. Sugita and Okamoto233 studied the C-peptide using multicanonical REM and

explicit solvent. They found that Glu2-Argl0 salt-bridge does not stabilize helix directly,

but to stop the helix extending to the N-terminus. In the REX-CPHMD study performed

by Khandogin et al., they found that Lysl -Glu9, instead of Glu2-Argl0, contributes to

the helix formation.

The Lysl-Glu9 and Glu2-Argl0 interactions are studied in our work. Figure 4-13A

and 4-13B show the probability density vs charge distance of the two interactions at pH

2, 5 and 8. At pH 2, neither Lysl-Glu9 nor Glu2-Argl0 salt-bridge is formed, consistent

with mostly protonated glutamate. At pH 5 and 8, Glu2-Arg10 salt-bridge is clearly

formed (Figure 4-13A) while the Lysl-Glu9 salt-bridge is formed in a much less extent

(Figure 4-13B). Figure 4-14 shows the correlation between the two salt-bridges at pH 5.

Clearly, the two salt-bridges cannot be formed at the same time. The effect of Glu2-

Arg10 salt-bridge on helical structure formation can be reflected by conditional

probabilities. The probabilities of finding helical residue(s) given that the Glu2-Argl0

salt-bridge is formed are calculated at pH 2, 5 and 8. The conditional probabilities are

0.64, 0.73 and 0.63, respectively. Although at pH 2, the probability of forming Glu2-


161











Arg10 salt-bridge is low (~ 1%), the chance of having a helical structure is 63% once it


is formed. This clearly shows the stabilizing effect of Glu2-Argl0 on helix formation.

(A) -pH=2 (B)
0010 -pH = 5 005 -pH=2
-pH= 8 -pH=5
-pH= 8
0 008- 0.04


0006 003-


C 0.004 0.02-


0002- 0,01 -


00000sMO- 0.00
5 10 15 20 25 30 5 10 15 20 25 30
Lysl-Glu9 Distance (A) Glu2-Arg10 Distance (A)

Figure 4-13. A) Probability density of Lysl-Glu9 distance (A). The distance is the
minimum distance between the side-chain nitrogen atom of Lysl and the
side-chain carboxylic oxygen atoms of Glu9. B) Probability density of Glu2-
Arg10 distance (A). The distance is the minimum distance between side-chain
carboxylic oxygen atoms of Glu2 and guanidinium nitrogen atoms of Argl0.


25- pH=5 0
8.0E-5
1.6E-4
2.4E-4
20 3.2E-4
4.OE-4
*< 4.8E-4
5. 5.6E-4
S- '. "'- 6.4E-4
15-
15 7.2E-4
S* r : 8.0E-4
0)
"- .*
0 .* ; ..
.- 10 '- '
-j


5


I I I I I
5 10 15 20 25

Glu2-ArglO Distance (A)


Figure 4-14. Two-dimensional probability density of Lysl-Glu9 and Glu2-Argl0 at pH 5.
Apparently, Lysl -Glu9 and Glu2-Argl0 salt-bridges cannot be formed
simultaneously.


162










The correlation between Glu2-Argl0 salt-bridge and helical length, and helix

starting position are further studied. Figure 4-15A shows that the Glu2-Argl0 salt-bridge

can be found in non-helical configurations, four-residue and six-residue helices at pH 5.

Moreover, in the six-residue helix, the Glu2-Argl0 salt-bridge is always formed. The

same pattern is obtained at pH 8, thus the pH 8 results are not shown here. Figure 4-

15B shows the correlation between the salt-bridge and helix starting position at pH 5.

When a helix is initiated at Thr3, the salt-bridge is not formed. When a helix begins at

Ala4, Lys7 and residues behind Lys7, only the salt-bridge is seen. However, in the non-

helical configurations and helices begin at Ala6, both states are found. Besides, Lys7 is

the most probable place to initiate a helix when the salt-bridge is formed. Again, no salt-

bridge is found when a helix starts at Thr3. Combining the correlations between Glu2-

Arg10 and helical length, and Glu2-Argl0 and helix starting position, the salt-bridge

clearly has the effect that preventing forming helices near the N-terminus and stabilizing

partial helix near the C-terminus (Lys7-Argl0 and Lys7-Hisl2).

35- 35
pH=5 pH=5
| 1 1 M-

oM 0



,5 "ttto a 2 2,. A 2


0 2 4 6 8 10 0 2 4 6 8 10
Helical Segment Length A Helix Starting Position B

Figure 4-15. A) Two-dimensional probability density of Glu2-Argl0 salt-bridge formation
and helical length at pH 5. According to the plot, the Glu2-Arg10 salt-bridge
can be found in four-residue, six-residue and non-helical structures. B) Two-
dimensional probability density of Glu2-Arg10 salt-bridge and the helix
starting position at pH 5. If a helix begins from Thr3, it cannot have a Glu2-
Arg10 salt-bridge. Thus, one role of the Glu2-Argl0 salt-bridge is to prevent
helix formation from Thr3.


163










4.3.7 Important Electrostatic Interactions: Phe8-His12

Hisl2 is believed to be responsible for the decrease in helical content when

solution pH values increase from 5 to 8.226 Hisl2 was found to interact with Phe8.221

However, the nature of the Phe8-Hisl2 interaction is not completely clear. A weak

hydrogen bond between the charged side chain of Hisl2 (proton donor) and the

aromatic ring of Phe8 (proton acceptor) is supported by the configuration in RNase A X-

ray structure247 and ion screening experiments222,226 but is in contrast to proton NMR

experiments.221 A contact between the aromatic ring of Hisl2 and backbone carbonyl

oxygen of Phe8 has been proposed to explain the proton NMR results. Sugita and

Okamoto studied the interaction between the aromatic ring of Phe8 and the charged

ring of His12.233 They observed the contact between two rings has been made and

stabilizes helix near the C-terminus. However, the REX-CPHMD results showed that the

interaction between backbone carbonyl oxygen of Phe8 and the charged side-chain of

Hisl2 is responsible for the increased helical content at pH 5.112

(A)
0.06- -pH = 2
_pH= 5
0.05- -pH = 8

0.04

J3 0.03-

002



4 8 12 16
Phe8 Backbone-Hisl2 Ring Distance (A)

Figure 4-16. A) Probability density of Phe8 backbone to Hisl2 ring distance. The
distance is the minimum distance between Phe8 backbone carbonyl oxygen
atom and Hisl2 imidazole nitrogen atoms. B) Probability density of Phe8 ring
to Hisl2 ring distance. The distance is the minimum distance between Phe8
aromatic ring carbon atoms and Hisl2 imidazole nitrogen atoms.


164












---pH= 2
-pH=5
--pH= 8


0,01





0.00 ---
4 8 12 16 20
Phe8 Ring-His12 Ring Distance (A)

Figure 4-16. Continued

We also studied ring-ring and backbone-ring interactions between Phe8 and Hisl2

at pH 2, 5 and 8. The ring-ring interaction is represented by minimum distance between

aromatic atoms in Phe8 and the two side-chain nitrogen atoms of Hisl2. The backbone-

ring interaction is represented by minimum distance between backbone carbonyl

oxygen atom of Phe8 and the two side-chain nitrogen atoms of Hisl2. Figure 4-16A and

4-16B show the probability densities of each distance at three pH values. We found that

the backbone-ring contact is made at all three pH values. However, forming such a

contact at pH 8 is much less favorable than doing that at pH 5. Interestingly, Phe8

backbone and Hisl2 ring close contact and Glu2-Argl0 salt-bridge formation are

coupled (Figure 4-17). The ring-ring contact is observed at pH 5 but not at pH 8. At pH

2, the ring-ring contact is formed but is much less probable. More importantly, the

integrated probability of making a backbone-ring contact is larger than the integrated

probability of forming a ring-ring contact at pH 2 and 5. In order to separate

configurations making a contact from the rest, a cutoff distance of 4.0 A and 5.0 A is

adopted, in the case of backbone-ring and ring-ring contact, respectively. The integrated


165










probability (area under the curve) of making backbone-ring contact and ring-ring contact

is 0.34 and 0.22, respectively, at pH 5. The integrated probability is 0.23 and 0.14,

respectively, at pH 2. Thus, the Phe8 backbone-Hisl2 ring interaction is the major form

of the contact.

8 7
pH=5 0 pH=5 0
3 DE.4 3 8E.4




5 10 15 2 30 2.5 3.0 3 0E 3
0 6-



I I4



5 10 15 20 2. 3.0 3
Glu2-Arg10 Distance (A) A Glu2-Arg10 Distance (A) B

Figure 4-17. A) Two-dimensional probability density of Glu2-Argl0 distance and Phe8-
Hisl2 backbone-to-ring distance at pH 5. B) Correlations between Glu2-
Arg10 salt-bridge and Phe8-Hisl2 contact at pH 5.

We further examine the correlation between the Phe8 backbone-Hisl2 ring

contact and helical properties such as helical length and helix starting position. The

backbone-ring contact is found in the four-residue and six-residue helices at pH 2 and 5.

At pH 8, it can be seen in the four-residue helix. The 2D probability densities are similar

at the three pH values, thus only the plot at pH 5 is shown as an example (Figure 4-18A

and 4-18B). Similar to the Glu2-Arg10 salt-bridge, Lys7 is the most favorable place to

initiate a helix with a contact between Phe8 and His12. Thus, the Phe8-His12

backbone-ring contact stabilizes the helix formation near the C-terminus (Lys7 to Arg 0

and Lys7 to His12). However, unlike the Glu2-Arg10 interaction, helix formation initiated

from Thr3 is able to form a contact between Phe8 and His12. Phe8-His12 contact does

not affect helix formation near the N-terminus.


166










(A) (B)
12 12
pH =5 1 pH =5 I=


46 (N
I I ...
C ,,,:-:


;o 0 '

.: I I





0 2 4 6 8 10 0 2 4 6 8 10
Helical Segment Length Helix Segment Starting Position

Figure 4-18. A) Two-dimensional probability density of helical segment length and
Phe8-His12 interaction. B) Two-dimensional probability density of helical
segment starting position and Phe8-His12 interaction. Phe8-His12 also
stabilizes four-residue and six-residue structures. Helices begin at Lys7 and
Phe8-Hil gis2 coupled. Unlike Glu2-Argl0, Phe8-Hisl2 stabilizes helices





Cluster analysis is performed to find out significant conformations and to examine

important electrostatic interactions. The structures at pH 5 are clustered because both

Glu2-Argl0 and Phe8-Hisl2 contacts are more probable than at pH 2 or 8 so that the

contacts can be studies in clusters. The top 20 populated clusters and their average

helical percentage is plotted in Figure 4-19A. The most populated cluster shows the

largest average helical content and the second most populated cluster shows a much

lower helical content (close to the lowest among 20 clusters). The most populated

cluster corresponds to the conformation yielding small Ca-RMSDs (< 2.2 A) relative to

the fully helical structure (Figure 4-19B). Interestingly, the plot showing helical

percentage vs the residue number (Figure 4-19C) reveals that the second most

populated cluster only shows helical structures between Lys7 and Hisl2. Thus, helices

are only formed near the C-terminus. Figure 4-19D demonstrates the probability density


167









of the Glu2-Argl0 and Phe8-Hisl2 interactions. Compare with the corresponding

probability densities on the basis of the entire structural ensemble, forming a contact

between Glu2-Argl0, and Phe8-Hisl2 is more probable in the structures belong to the

second most populated cluster than in the entire structural ensemble. This is especially

obvious for the Glu2-Argl0 interaction. Results obtained from the second most

populated cluster confirm that Glu2-Argl0 and Phe8-Hisl2 contacts, especially the

Glu2-Argl0 contact, stabilize partial helix formation near the C-terminus.

4.4 Conclusions

In this chapter, we have studied the pH-dependent helix formation of the C-peptide

of ribonuclease A using constant-pH REMD simulations. The mean residue ellipticity at

222 nm at each pH value is computed and utilized to gauge helical content. The pH

profile clearly demonstrates a bell-shaped curved with a maximal helicity at pH 5, in

good agreement with experimental results. The pH effect on the C-peptide structural

ensembles is studied at three representative pH values: 2, 5 and 8, representing the two

ends in the pH profile and the pH value yielding the maximum helical content. At pH 2,

helices consisting of Thr3-Ala5, Lys7-Argl 0 and Glu9-Hisl2 are formed and the Lys7-

Arg10 is the most stable one. At pH 5, additional six-residue (Lys7-Hisl2) and seven-

residue (Ala6-Hisl2) helices are stable helices but the most probable helix is the same

as that at pH 2. At pH 8, the most favorable helix switched to Thr3-Ala5. Lys7-Hisl2 and

a new seven-residue helix (Thr3-Glu9) are also present.

Glu2-Argl0 salt-bridge formation and its role in the helix formation are studied. We

find that the salt-bridge is formed and is more probable at pH 5. The Glu2-Argl0 salt-

bridge is found to stabilize helix formation near the C-terminus. The nature of Phe8-

Hisl2 interaction and its role in helix formation are also explored. Backbone carbonyl


168












oxygen of Phe8 and side-chain charge of Hisl2 contact is the major form. The role of


Phe8 and Hisl2 contact is similar to that of the Glu2-Arg10 salt-bridge. Results from


cluster analysis on trajectory generated at pH 5 confirmed the effects of Glu2-Argl0 and


Phe8-Hisl2 interactions.


w pH=5(B)


ll m
*


* *
*


0 5 10 15 20
Population of Cluster (%)


6 8
Residue Number


-the most populated cluster, pH = 5
-the second most populated cluster, pH = 5


0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
Cu RMSD (A)


Glu2-Argl0, the second most
most populated cluster, pH = 5
Phe8 backbone-Hisl2 ring, the
second most populated cluster,
pH = 5











4 8 12 16
Distance (A)


Figure 4-19. A) Top 20 populated clusters and average helical percentage. B)
Probability densities of the Ca-RMSD vs the fully helical structure of the top 2
populated clusters. C) Helical Percentage as a function of residue number of
the top 2 populated clusters. D) Probability density of the Glu2-Argl0 and
Phe8 backbone-His12 ring interactions in the second most populated cluster.


169


C 40-


.
30-

4









CHAPTER 5
CONSTANT-pH REMD: pKa CALCULATIONS OF HEN EGG WHITE LYSOZYME

5.1 Introduction

Hen egg white lysozyme (HEWL, shown in Figure 5-1) has been selected to test

pKa prediction methods or constant-pH methods for a long time.125 This protein is a 129

amino acids enzyme and is the first enzyme to have its three-dimensional structure

determined by X-ray crystallography.248'249 Lysozyme can be found in the secretions

such as tears and saliva. The function of this enzyme is to catalyze the hydrolysis of a

polysaccharide and the reaction has an optimal pH around 5.125 By hydrolyzing

polysaccharides, lysozyme can damage the cell walls of certain bacteria. HEWL is a

monomeric single-domain enzyme whose active site is situated in a cleft between two

regions. Two residues are crucial to the catalysis, Glu35 and Asp52. During the

hydrolysis, a covalent enzyme-substrate intermediate is formed.249 In this process,

Glu35 acts as the proton donor and Asp52 becomes the nucleophile.249 The starting

point of the catalytic mechanism is the donation of a proton from Glu35 to the substrate.

Then, Asp52 will attack the anomeric carbon of the substrate and form a covalent bond

with the substrate. In the final step, the enzyme-substrate complex is hydrolyzed by a

water molecule and the initial protonation states of Glu35 and Asp52 are restored.

HEWL has been a good test system of pKa prediction studies for several reasons.

First, accurate predicting the pKa values of both ionizable residues in active site can

help people identify proton donor and nucleophile in HEWL according to a simple

criterion proposed by Nielsen and McCammon in 2003.250 They proposed that if

catalytic mechanism involves two acidic residues, then the proton donor should have a

pKa value of at least 5.0 and the pKa of nucleophile should be at least 1.5 pH units lower


170









than that of proton donor. Second, the pKa values of HEWL acidic residues were

determined by Bartik et al.251 using two-dimensional proton NMR. It shows several

ionizable residues having pKa values much different from their intrinsic pKa values.

Furthermore, there are more than 100 PDB entries of the wild-type HEWL structure, the

effect of structural variation can be tested for pKa calculation methods, especially for the

FDPB method.250 Thus, our constant-pH REMD method will be tested on HEWL.





















Figure 5-1. Crystal structure of HEWL (PDB code 1AKI). Residues in red represent
aspartate and residues in blue are glutamate.

Various constant-pH methods have been tested on HEWL. Burgi et al.130 utilized

their constant-pH method to predict pKa values of HEWL. The RMS error between

predicted and experimental pKa values was determined to be from 2.8 to 3.8 pH units.

In 2004, Lee et al.114 applied their CPHMD method to four proteins: turkey ovomocoid

(PDB code 10MT), bovine trypsin inhibitor (1BPI), HEWL (193L) and ribonuclease A

(7RSA). The overall pKa RMS error relative to experimental data was around 1 pH unit.


171









For HEWL, the average absolute error of all ionizable residues (including the termini)

was 1.6 pH units, while the average absolute error of pKa values of acidic ionizable

residues relative to experimental data was 1.5 pH units. However, the pKa values of

Glu35 and Asp52 were both 5.8, indicating that CPHMD results were not able to predict

proton donor and nucleophile. In the same year, Mongan et al.127 published their

discrete protonation state constant-pH MD method. HEWL was also selected as the test

system. In the study of performed by Mongan et al., four different crystal structures of

HEWL were utilized (1AKI, 1LSA, 3LZT, and 4LYT). The RMSD of pKa values of all

ionizable residues relative to experimental results were 0.86, 0.77, 0.88, and 0.95 for

1AKI, 1LSA, 3LZT, and 4LYT, respectively. In addition to pKa predictions, Mongan et al.

also studied protonation-conformation correlation. Principal component analysis of a

trajectory was conducted and projected onto the first two (largest eigenvalues)

eigenvectors and association between conformation and protonation was observed. In

2006, Khandogin and Brooks110 utilized REX-CPHMD method to predict pKa values of

10 proteins. The RMS error values between REX-CPHMD and experimental pKa values

ranged from 0.6 to slightly greater than 1 pH unit. For HEWL, the RMS error between

predicted and experimental pKa values was 0.6 pH unit and the maximum absolute error

is 1.0 pH unit. So far, their HEWL pKa prediction RMS error is the smallest among

constant-pH pKa calculations on HEWL. Machuqueiro and Baptista presented HEWL

pKa predictions from their stochastic titration constant-pH MD with explicit water model

in 2008.125 The RMS error between predicted and experimental pKa values were 0.82,

and 1.13 for generalized reaction field,252 and PME154 treatment of long-range

electrostatics, respectively. A comparative FDPB calculation (single crystal structure,


172









which is the same as that utilized in constant-pH MD, and a protein dielectric constant of

2) was also conducted and the RMS error was found to be 2.76. Since the constant-pH

method proposed by Baptista requires FDPB calculation, the selection of dielectric

constant inside the protein was crucial. Machuqueiro and Baptista performed constant-

pH MD utilizing three different dielectric constants (E=2, 4, and 8) combined with PME

treatment of long-range electrostatics. The pKa RMS error values were 1.13, 1.02, and

1.12 for E =2, 4, and 8, respectively. More recently, the constant-pH MD proposed by

Mongan et al.127 was coupled with accelerated molecular dynamics (AMD)133'134 and

tested on HEWL by Williams et al.129 Constant-pH AMD and MD simulations of 5 ns in

length have been performed. Only acidic ionizable residues in HEWL were taken into

consideration by constant-pH scheme. RMS error values between predicted and

experimental pKa values were calculated. The constant-pH AMD yielded an overall

RMS error value of 0.73, while the original constant-pH MD pKa RMS error was 0.80.

The pKa RMS error of aspartates were 0.75, and 1.46 from constant-pH AMD, and MD,

respectively. The pKa RMS error of glutamates were 0.85, and 1.04 from constant-pH

AMD, and MD, respectively. In general, recent works utilizing various constant-pH

schemes have achieved RMS error values in the range of 0.6~1.13 for HEWL.

In this chapter, we present a study of HEWL using constant-pH REMD algorithm.

Both structural restrained and unrestrained simulations were done. pKa values from

constant-pH REMD are compared with experimental values. We also investigated the

pKa convergence, effect of structural restraint and conformation-protonation

correlations.


173









5.2 Simulation Details

Crystal structure 1AKI (PDB code) has been taken as HEWL starting structure in

our study. Water molecules in the crystal structure were striped first. Only aspartate and

glutamate residues were studied so there are nine ionizable residues selected.

Hydrogen atoms were added by the LEaP module in the AMBER suite. The post-

processed crystal structure was then minimized and heated from 0 K to 300 K. The

restart structure from the heating process was taken as the initial structure for our

constant-pH REMD simulations. In this chapter, all REMD runs refer to constant-pH

REMD simulations for simplicity. The pH range was from 2 to 6 in an increment of 0.5

pH unit.

Two sets of REMD simulations were performed: the unrestrained ones (ntr=0 in

AMBER) and the restrained ones (ntr=1 in AMBER). In each REMD run, an exchange of

structures was attempted every 500 MD steps. 1000 exchange attempts were intended

to use for both sets. Thus Simulation time of each replica in each set is 1 ns. In the

unrestrained REMD runs, we chose the highest temperature to be 320 K in the hope

that HEWL will not unfold at all temperatures. In the restrained REMD runs, Ca atoms

from residue 3 to 126 were restrained by harmonic potentials. The restraining harmonic

potential has the following form: Ures = k(q, re ), where and qre, are Cartesian

coordinates at current time and Cartesian coordinates of the reference structure,

respectively, k is the force constant of the harmonic potential which determines the

strength of a restraint. In our simulations, the reference coordinates are the initial Ca

atoms coordinates. By putting restraining harmonic potential on Ca atoms, the

secondary structure of HEWL will be preserved and the highest temperature will be


174









increase to 420 K in order to achieve better side-chain conformational sampling. The

force constant of the harmonic potentials was 1.0 kcal/mol-A2 (setting restraint_wt=1 in

AMBER).

Several other REMD simulations were done according to results from the two sets

of REMD runs. The general goal of those simulations was to test what we proposed

from the two previous sets. First, another 1 ns constant-pH REMD simulation with

restraint on Ca atoms was continued for all the pH values in order to check the pKa

convergence of the restrained simulations. Likewise, 1000 exchange attempts were

conducted in those 1 ns simulations and the restraint strength is still 1.0 kcal/mol-A2.

Second, a new set of constant-pH REMD simulations with restraint on Ca atoms was

performed. The force constant adopted in the second set was 0.1 kcal/mol-A2 so that

the effect of restraint strength can be tested. The details of constant-pH REMD

simulations can be found in Table 5-1.

Table 5-1. Simulation details of constant-pH REMD runs
Restrained Restraint Number of Temperature Simulation Exchange
pH values
or not Strength Replicas (K) Time (ns) Attempts

2-6 No 0 4 280-320 1 1000


2-6 Yes 1 8 280-420 2 2000


3, 4, 4.5 Yes 0.1 8 280-420 2 2000

The restraint strength was represented by the force constant of a harmonic potential. The unit of force
constant is kcal/mol-A2. For the REMD simulation with 1 kcal/mol-A2 restraint, it was actually performed in
two stages. Each stage lasted for 1 ns and the purpose of the second stage was to check the pKa
convergence.

All simulations were done using the AMBER 9 molecular simulation suite253 with

the AMBER ff99SB force fields.139 The SHAKE algorithm145 was used to allow a 2 fs

time step. OBC Generalized Born implicit solvent model200 was used to model water


175








environment in all our calculations. The Berendsen thermostat, 146 with a relaxation time

of 2 ps, was used to keep the replica temperature around their target values. Salt

concentration (Debye-Huckel based) was set at 0.1M. The cutoff for nonbonded

interaction and the Born radii was 30 A.

5.3 Protein Conformational and Protonation State Equilibrium Model
Suppose an ionizable side chain has only two conformations in equilibrium and

each conformer has its own equilibrium in protonation state. We can use 1p, 1d, 2p and

2d to label conformer 1 in protonated form, conformer 1 in deprotonated form,

conformer 2 in protonated form, and conformer 2 in deprotonated form, respectively.

The equilibrium among all species is demonstrated in Figure 5-2.

pKa,1

Id 1p


$ $

2d 2p
pKa,2

Figure 5-2. A simple schematic view of the conformation-protonation equilibrium in a
constant-pH simulation.
Then, K12, the equilibrium constant between conformation 1 and 2 is

K12 [p] + [ ] (5-1)
[2p]+ [2d]

In the above model, pKa,1 and pKa,2 represent protonation equilibrium within each

conformation. They can be expressed as:

pKa, = pH log( ] (5-2)
[ld]


176









and


pKa,2 = pH log( ). (5-3)
[2d]

So, the pKa of that ionizable residue is

pKa = pH log( (5-4)
[Id] + [2d]

5.4 NMR Chemical Shift Calculations

Theoretical NMR chemical shift titration curve was generated. Due to the limitation

of system size, full quantum mechanical NMR calculations were performed only on

ionizable residue dipeptide (ionizable residue with two ends blocked). The structure of

ionizable dipeptide was extracted from the representative structures (representing

different side chain conformations) generated from cluster analysis. Proper protonation

states were assigned for each structure. All full quantum mechanical NMR calculations

were done in Gaussian03 software package254 using B3LYP functional and 6-311 ++G**

basis set. Isotropic magnetic shielding constants were computed in vacuum using GIAO

method.255 Tetramethylsilane (TMS) was used as reference in order to obtain the

chemical shift.

Recently, Merz and co-workers256 developed an automated fragmentation

quamtum mechanical/molecular mechanical (AF-QM/MM) approach to study protein

properties. They have applied their method to compute protein chemical shift of Trp-

Cage. In this AF-QM/MM model, one residue and the atoms near it (less than 4 A) are

assigned to the QM region and the rest of a protein will be put into the MM region.

During NMR calculations, all atoms in the MM region will be viewed as point charges.


177









We applied this AF-QM/MM method to 1AKI to calculate chemical shift as well. Again,

all AF-QM/MM calculations were based on representative structures.

5.5 Results and Discussions

5.5.1 Structural Stability and pKa Convergence

Since changing protonation state during simulation will cause discontinuity in force

and energy, structural stability in our simulations is important. We chose to use Ca

atoms root-mean-square deviation (RMSD) vs 1AKI structure as our metric. Figure 5-3A

shows us the Ca RMSD vs time in unrestrained REMD runs. In Figure 5-3A, HEWL is

instable at all the pH simulated. The RMSD can reach a very high value (~ 18 A) during

simulations. Even at pH=4 where Ca RMSD values are small relative to the rest, the Ca

RMSD can still go beyond 3 A. pKa predictions from unrestrained REMD runs shouldn't

be used.

Figure 5-3B shows the RMSDs in the restrained REMD runs. Although the RMSD

values are small and stable throughout 2 ns simulations, the restrained REMD

simulations still reveal problems, according to Figure 5-3B. Our simulations use 1AKI

which is resolved at pH=4.5 as starting structure. As pH is moving away from 4.5, one

may expect HEWL will adopt conformations a little bit different from 1AKI. So a bigger

RMSD should be expected where the pH value is far away from 4.5. This behavior has

been confirmed in the work of Mongan et al. However, putting restraint on Ca atoms

results in the same RMSDs in the entire pH range. This may have negative effect on

pKa predictions at pH values far away from 4.5.


178












20



16


12


8-



4
,.. .' ,* '

0---,-----
0 2000
F


l l i : r 1|



4000 6000 8000
rame Number (total time=1 ns)


0.4-



0.2-


0 2000 4000 6000 8000 1000
Frame Number (total time=2 ns) B


Figure 5-3. Ca RMSD vs crustal structure (PDB code: 1AKI). A) Ca RMSD vs 1AKI from
REMD without restraint on Ca. B) Ca RMSD vs 1AKI from REMD with restraint
on Ca. The restraint strength is 1 kcal/mol-A2.

In order to check protonation state sampling convergence from the restrained

REMD simulations, pKa prediction error (predicted value minus experimental value)

against time as well as time evolution of prediction deviation (predicted pKa value at


179


-pH=2
pH=3
pH=4
-pH=5
pH=6


10000


-- pH=2
pH=3
pH=4
pH=5
pH=6





S. I


1 i


0 6 1 i i '' ,' l'
0.6-- i I









current time minus the final predicted pKa value) are followed and demonstrated in

Figure 5-4 and 5-5. According to those plots, stabilizations in pKa predictions are seen

after a few hundred picoseconds of simulations. Increasing simulation time wouldn't

change average pKa predictions and their errors relative to experimental values. In

order to show convergence in protonation state sampling is reached in a wide range of

pH, a representative plot of Asp52 pKa deviations are shown in Figure 5-5B.

Convergence is clearly seen over the pH range.

Glu7
S 4- Asp18
-I Asp48
S.-Asp52
E
Ec Asp66
Co Asp87


0
x 2-
~Asp101


S--------________________


o



S" -




0 1000 2000
Time (ps)

Figure 5-4. pKa prediction error as a function of time. The predicted pKa at a given time
is a cumulative result. For each ionizable residue, the time series of its pKa
error is generated at a pH where the average predicted pKa is closest to that
pH value. In this way, we try to eliminate any bias toward the energetically






favored state. A flat line is an indication of convergence. Glu35 is not shown
here due to poor convergence.
0
0 2 z





0 1000 2000
Time (ps)

Figure 5-4. pKa prediction error as a function of time. The predicted pKa at a given time
is a cumulative result. For each ionizable residue, the time series of its pKa
error is generated at a pH where the average predicted pKa is closest to that
pH value. In this way, we try to eliminate any bias toward the energetically
favored state. A flat line is an indication of convergence. Glu35 is not shown
here due to poor convergence.


180











- Glu7
Aspl8
Asp48
- Asp52
Asp66
-Asp87
Asp101
Aspll9


0-


1000
Time (ps)


1500


4-




2-




0-




-2-


1500


2000


-pH=2
pH=2.5
pH=3
pH=3.5
pH=4
pH=4.5


2000


Time (ps)


Figure 5-5. A) pKa prediction convergence to its final value. Similarly, the pKa value at a
given time is a cumulative average. A flat line having y-value of 0 is expected
when pKa calculation convergence is reached. The same pH values are
chosen for each ionizable residue as in Figure 5-4. B) Asp52 pKa prediction
convergence to its final value at multiple pH values. The pH values are
selected in such a way that the pKa calculated at this pH will be used to
compute composite pKa.





181


-Lt .









5.5.2 pKa Predictions

A popular way to study the accuracy of pKa prediction is to look at the pKa RMS

error relative to experimentally measured pKa values. In general, a Hill's plot is used to

generate pKa for each ionizable residue because Hill's plot can combine results from all

simulations. Mongan et al. proposed a way to calculate pKa without using Hill's plot in

their constant-pH MD paper. They called pKa values calculated in their way composite

pKa values. A composite pKa is an average of all pKa values having an absolute offset

less than 2 pH units. Here an offset means the difference between predicted pKa and its

corresponding pH values.

Table 5-2 shows pKa values and the pKa RMS error values from the 2-ns

restrained REMD runs. Composite pKa values, pKa values obtained from Hill's plots and

their RMS error values relative to experimental measurements are also listed in Table 5-

2. We used the same experimental pKa values as Mongan et al. did to calculate pKa

RMS error. In our work, the pKa predictions from Hill's plots yield a RMSD value of 0.84,

while utilizing composite pKa values produces a RMS error value of 0.87. According to

constant-pH simulation literatures, the RMS error values of HEWL pKa prediction are

around 0.8 for acidic ionizable residues. So there is no significant improvement in pKa

prediction from our simulations.

However, as we mentioned in the structural stability discussion, putting a restraint

on Ca atoms of a protein lowers the ability to adjust its conformations. The further a pH

value is away from crystal pH, the more a structure ensemble is skewed from the

correct one. Simulations performed around pH 4.5 are less affected by the restraint than

simulations done at pH values far away from 4.5. Since the less a structural ensemble is

skewed, the less human error in pKa predictions. So one may expect smaller pKa RMS


182










error relative to experimental values will be seen around pH 4.5. pKa prediction RMS

error relative to experimental values are plotted against pH values in Figure 5-6. As

expected, a minimum having RMS error of 0.74 at pH 4.5 can be found. An RMS error

of 0.74 is among the best published HEWL predictions.


Table 5-2. Predicted pKa values and their RMS errors relative to experimental


measurements from the restrained REMD simulations.
Ex pH pH pH pH pH pH pH pH Com Hill
Exp2 Hill
2 2.5 3 3.5 4 4.5 5 6 p

Glu7 2.85 3.61 3.58 3.46 3.03 2.99 2.93 2.36 3.37 3.27 3.23


Aspl8 2.66 1.59 1.54 1.51 1.61 1.91 2.35 2.5 3.69 1.63 1.4


Glu35 6.2 3.76 3.65 4.36 4.14 4.31 4.53 4.76 4.61 4.27 4.58


Asp48 2.5 1.88 1.98 2.14 2.34 2.6 2.45 1.96 2.9 2.23 2.01


Asp52 3.68 2.71 2.45 2.63 2.82 3.05 2.72 2.77 3.99 2.73 2.68


Asp66 2.0 2.5 2.69 2.86 2.92 3.12 2.72 3.09 4.04 2.8 2.73


Asp87 2.07 2.32 2.43 2.64 2.49 2.54 2.64 2.79 3.62 2.51 2.42


Asp101 4.09 4.52 4.4 4.14 4.03 3.79 3.55 3.44 3.96 3.89 3.85


Asp119 3.2 2.71 2.78 3.01 3.01 3.25 3.01 2.89 3.97 2.96 2.9

RMS
1.04 1.1 0.91 0.89 0.83 0.74 0.79 1.12 0.87 0.84
Error
ihtf thl b Lv"f~r f~~ vn~ntln rlifLr~m "f~r f~~ h rmrft V\~in r~~


iniszabl resiu (pse a n s poaeperm fo deiiin vands "Hill stands for te c~ au otiemfo h
Hill's plot. The force constant of the harmonic potential used here is 1.0 kcallmol-A2.


183


I"











1.2







a)
1. -1


S1.0
E


S0.9-

(I)
0.8-


0.7
0.7 -- | --- i -------- --- | -
2 4 6
pH Value


Figure 5-6. RMS error between predicted and experimental pKa vs pH value. A
minimum of pKa RMS error can be found near the pH at which 1AKI crystal
structure is resolved.

5.5.3 Constant-pH REMD Simulations with a Weaker Restraint

Based on what have been found so far, we propose that reducing restraint

strength on Ca atoms will yield better pKa predictions. This is because reducing restraint

strength will increase degree of freedom in conformation sampling. HEWL can relax its

structure further, even at pH 4.5. Thus a more accurate structure ensemble can be

produced. This, in turn, will improve pKa calculations. Constant-pH REMD simulations

with a weaker restraint (harmonic potential on Ca atoms) of 0.1 kcal/mol-A2 were carried

out at three different pH values to test our hypothesis. First, as shown in Figure 5-7A, all

three simulations generate larger Ca RMSDs relative to 1AKI than those simulations

with stronger restraint do. This means HEWL relaxes more when a weaker restraint is

used. Besides, the Ca RMSD fluctuations in all three runs are bigger than those in the 1

kcal/mol-A2 REMD runs. This means more conformational space is visited. Another


184









interesting point in the weaker-restrained REMD runs is that the Ca RMSDs at pH 3 and

4 are larger than those at pH 4.5. Simulations at pH 3 and 4 do tend to sample

conformations that are different from at pH 4.5.

The pKa prediction results are listed in Table 5-3. pKa prediction deviation from the

final value vs time at pH value of 4.5 is shown in Figure 5-7B to demonstrate

protonation state sampling convergence. According to Table 5-3, nearly 0.1 pH unit

improvement in the RMS error of predicted pKa values can be seen at each pH for the

weakly restrained REMD runs. However, among all three RMS error values, the best

one is still obtained at pH 4.5 indicating that restraint is still favoring simulations near pH

4.5. After reducing the restraint strength, our best pKa RMS error relative to

experimental values is 0.62.

Table 5-3. Predicted pKa values and their RMS errors relative to experimental
measurements from weakly restrained REMD simulations.
pH=3 pH=4 pH=4.5

1 0.1 1 0.1 1 0.1

Glu7 3.46 3.71 2.99 3.38 2.93 3.34

Aspl8 1.51 1.57 1.91 1.76 2.35 2.23

Glu35 4.36 5.09 4.31 5.23 4.53 5.24

Asp48 2.14 2.27 2.6 2.48 2.45 2.71

Asp52 2.63 2.47 3.05 2.88 2.72 3.29

Asp66 2.86 2.63 3.12 2.66 2.72 2.93

Asp87 2.64 2.52 2.54 2.79 2.64 2.88

Asp101 4.14 3.82 3.79 3.77 3.55 3.54

Asp119 3.01 2.22 3.25 2.21 3.01 3.38

RMSE 0.91 0.84 0.83 0.72 0.74 0.62
In Table 5-3, the number 1 in the second row means the force constant of the restraining potential is 1
kcal/mol-A2, while 0.1 stands for 0.1 kcal/mol-A2. RMSE stands for RMS Error.


185
















1.51



1.0 I
1.0-


--pH=3
pH=4
pH=4.5


j



: I ~I '


0.5-


0 2000


4000
4000


6000
6000


8000 10000
8000 10000


Frame Number (total time=2 ns)


- Glu7
SAsp18
Glu35
Asp48
Asp52
- Asp66
Asp87
-Asp101
- Asp119


.......-T....


1000
1000


150
1500


20
2000


Time (ps)


Figure 5-7. A) Ca RMSD of HEWL from weaker restraint REMD simulations. The
RMSDs are larger than those with stronger restraints. When comparing
RMSDs at different pH for simulations using weaker restraint, RMSDs are
greater at pH 3 and 4 than those at pH 4.5. B) pKa prediction deviation from
final value at pH 4.5 from constant-pH REMD with 0.1 kcal/mol-A2


186









5.5.4 Active Site lonizable Residue pKa Prediction: Asp52

Accurate calculations of the pKa values of ionizable residues in active site are

important because their protonation states are crucial in enzyme reactions. In the case

of HEWL, Asp52 works as a nucleophile. This requires Asp52 to be deprotonated during

reactions which has an optimal pH around 5. In both restrained REMD, Asp52 is indeed

deprotonated around pH 5. However, the error of Asp52 relative to experimental value

is about 1 pH unit. Mongan and co-workers also had the same trend except that a

bigger error was obtained in their simulations. They claimed that Asp52-Asn46

hydrogen bond caused the very low predicted pKa of Asp52.127

Asp52 and residues that strongly interact with it (three asparagine residues:

Asn44, Asn46 and Asn59) in the crystal structure of 1AKI (hydrogen atoms are added

and proper protonation state is chosen at pH 4.5) are shown in Figure 5-8. We studied

those interactions which are represented by atom-to-atom distances in our REMD

simulations. We find that Asp52 is closer to Asn59 and Asn44 rather than to Asn46,

indicating that Asp52 has stronger interactions with Asn59 and Asn44 than with Asn46.

Time series of Asp52 carboxylic oxygen atoms to Asn59 and Asn44 ND2 distances at

pH 3 are shown in Figure 5-9. As can be seen from Figure 5-9A and 5-9B, Asp52 and

Asn44, Asn59 stay within hydrogen-bonding distance for a long time at pH as low as 3.

Furthermore, hydrogen-bonding distances between Asp52 and Asn44, and between

Asp52 and Asn59 are coupled. Two oxygen atoms in the carboxylic group of Asp52 are

able to work as proton acceptors simultaneously. This means that the deprotonated

form of Asp52 is over-stabilized by hydrogen-bonding, even at low pH values.


187










ASN46


ASN59


AS452


Figure 5-8. Asp52 in the crystal structure of 1AKI. Its neighbors that having strong
electrostatic interactions are also shown.


- Asp52 OD1 and Asn59 ND2
- Asp52 OD1 and Asn44 ND2


U I] II


10-


8


6


4


-Asp52 002 and Asn59 ND2
- Asp52 OD2 and Asn44 ND2


,11 ,


0 2000 4000 6000 8000 10000
Frame Number


0 2000 4000 6000
Frame Number


Figure 5-9. A) Time series of Asp52 carboxylic oxygen atom OD1 to Asn59 and Asn44
ND2 distances at pH 3 in the 1 kcal/mol-A2 constant-pH REMD run. B) Time
series of Asp52 carboxylic oxygen atom OD2 to Asn59 and Asn44 ND2
distances under the same condition. Hydrogen bonds which are stabilizing
deprotonated Asp52 are formed in a large extent even at a low pH.

Next, hydrogen bond analysis was conducted with PTRAJ module in the AMBER

suite for both sets of restrained REMD simulations. Hydrogen bonds can be found

between Asp52 and all three asparagines (Asn44, Asn46, and Asn59) in both sets. The

occupation times of Asp52-Asn44 and Asp52-Asn59 hydrogen-bonding are longer than


188


8000 10000









that of Asp52-Asn46 hydrogen-bonding. Furthermore, the Asp52-Asn44 and Asp52-

Asn59 hydrogen-bonding are coupled according to the distances demonstrated in

Figure 5-9. Asp52 is protonated only when the entire carboxylic group is pointing away

from Asn44 and Asn59. The Asp52-Asn44 and Asp52-Asn59 hydrogen-bonding, not the

Asp52-Asn46 hydrogen-bonding, is responsible for low predicted pKa value of Asp52.

The hydrogen bond contents are similar in both strongly and weakly restrained REMD

simulations. This indicates that the hydrogen-bonding effect on Asp52 in our simulations

is too strong. Reducing restraint strength doesn't help the conformational sampling of

Asp52.

5.5.5 Active Site lonizable Residue pKa Prediction: Glu35

Glu35 is another problematic case in our study. In the 1 kcal/molA2 runs, it's the

largest single residue error: the error is almost 2 pH units. Excluding Glu35 will lower

the pKa RMS error value by nearly 0.2 pH unit. In the 0.1 kcal/molA2 runs, the pKa value

of Glu35 is improved, having an error around 1 pH unit. This is the main reason that

smaller pKa RMS errors relative to experimental data are found in all three 0.1

kcal/mol-A2 REMD simulations. Although the pKa error of Glu35 in the weakly restrained

REMD simulation is large, the good news for weakly restrained REMD simulations is

that Glu35 can be correctly identified as proton donor based on the criterion proposed

by Nielsen and McCammon: Glu35 has a pKa value ~5.2 and the pKa difference

between Asp52 and Glu35 is greater than 1.5 pH units.

The predicted pKa value of Glu35 was determined to be 5.32 in the study

performed by Mongan et al. They claimed that a similar hydrogen-bonding effect as

Asp52 demonstrated was responsible for the low predicted pKa value of Glu35.127

However, hydrogen-bonding analysis of our data does not show any significant


189









hydrogen-bonding is formed by Glu35, which is in contrary to what Mongan et al.

claimed.

In the 1AKI crystal structure, Glu35 side-chain is in the vicinity of Gln57, Trp108

and Ala110 side-chains. Several key distances between Glu35 carboxylic group and

Gln57, Trp108 and Ala110 side chains in the crystal structure are listed in Table 5-4.

According to Table 5-4, Glu35 is in a hydrophobic region except that a close distance

between Glu35 OE2 atom and Ala110 backbone amide nitrogen atom. The hydrophobic

effect is the main reason of an elevated pKa value of Glu35. However, when the

carboxylic group is pointing toward the Ala110 amide group, the deprotonated form of

Glu35 will be favored. If such a conformation is stable throughout simulations, the

predicted pKa value will be smaller than what it supposed to be. We think one reason of

a low predicted pKa value is that Glu35 is stuck in conformations stabilizing

deprotonated form. But the weakly restrained simulations allow Glu35 to relax structure

further and visit conformations stabilizing protonation more frequently.

Table 5-4. Distance between Glu35 carboxylic oxygen atoms and neighboring residue
side-chain atoms in 1AKI crystal structure.
Glu35 OE1 Glu35 OE2
Gln57 CB 3.56 5.25
Gln57 CG 3.85 5.84
Trp108 CB 5.36 3.43
Trp108 CG 5.43 3.94
Trp108 CD1 4.65 3.67
Ala110 N 4.65 3.09
Ala110 CB 4.19 3.48
The unit of all distances in Table 5-4 is A.

Glu35 heavy-atom RMSD relative to 1AKI as well as cluster analysis on the basis

of those RMSDs are chosen to study Glu35 conformational sampling. Distributions of


190










heavy-atom RMSD, which are shown in Figure 5-10, show that 2 conformations are

found in the strongly restrained simulations: one centered at RMSD ~0.1 A (we label

that conformation as conformation 1) and the other centered at ~0.6 A (it is labeled as

conformation 2). However, an extra conformation (conformation 3) is visited by the

weakly restrained REMD simulations. Cluster analysis is employed to separate those

conformations. For conformation 2, the carboxylic group of Glu35 points toward the

Ala110 amide group in both sets of the restrained REMD runs (Figure 5-11). The

carboxylic group in conformation 1 also points toward the Ala110 amide group, although

in a lesser extent. However, conformation 3 (shown in the weakly restrained runs only)

contains configurations in which Glu35 carboxylic group is pointing away from Ala110

amide group (Figure 5-12B). In this conformation, the Glu35 side-chain is in the

hydrophobic region and the protonated species is favored. A too-low percentage of

conformation 3 is responsible for the low predicted pKa value of Glu35.

REMD, pH=4.5, res=1.0
1 0- REMD, pH=4.5, res=0.1



S0.6-
P 0,6

0,4

02

00
0 2000 4000 6000 8000 10000
Frame Number (total time=2 ns) A

Figure 5-10. A) Time series of the Glu35 heavy atoms (excluding two carboxylic oxygen
atoms) RMSD relative to crystal structure 1AKI. B) Probability distribution of
the RMSD. The conformation centered at RMSD ~0.1 A is labeled as
conformation 1. The one centered at ~0.6 A is named conformation 2.
Apparently, an extra conformation (conformation 3) is visited by the weakly
restrained REMD simulation.


191










- REMD, pH=4.5, res=1.0
- REMD, pH=4.5, res=0.1


0.04
0


0.02



0.00
0.0 0.2 04 0.6 0.8 1.0
RMSD of Glu35 vs 1AKI (A) B

Figure 5-10. Continued















-71 .24 86 1'
-81 .30 -1544



REMD, pH=4.5, res=1.0 A REMD, pH=4.5, res=1.0 B

Figure 5-11. A) Representative Structure of conformation 1. B) Representative Structure
of conformation 2. The structure ensemble is generated from REMD
simulations with stronger restraining potential. The carboxylic group of Glu35
in conformation 2 is clearly pointing toward the amide group of Alal 10.
Deprotonated form of Glu35 tends to decrease the electrostatic energy.
Furthermore, conformation 1 does not particularly favor the protonated Glu35.
No significant stabilizing factor is found for the protonated Glu35.


192























REMD, pH=4.5, res=0.1

Figure 5-12. Representative Structure of conformation 3 from cluster analysis. Glu35 is
in the hydrophobic region, consisting of Gln57, Trpl08 and Ala110.
Conformation 1 and 2 in the weakly restrained simulations are basically the
same as those demonstrated in Figure 5-11.

Another possible reason of underestimating pKa value of Glu35 is the use of

implicit solvent in constant-pH MD and REMD simulations. Imoto et al. suggested that

Glu35 and Asp52 were coupled by two water molecules through hydrogen-bonding.

Glu35 carboxylic group acted as a proton donor in the hydrogen-bonding. Thus the

protonated form of Glu35 was stabilized and contributed to the elevated pKa value. Two

water molecules are indeed found between Glu35 and Asp52 in the 1AKI crystal

structure and they are within hydrogen-bonding distances to Glu35 and Asp52. If the

hypothesis is true, the use of implicit solvent breaks this hydrogen-bonding network.

Thus a stabilizing factor of protonated Glu35 is missing. A constant-pH algorithm

employing explicit solvent is needed to study this effect.

5.5.6 Correlation between Conformation and Protonation

As described earlier, one advantage of utilizing constant-pH methods is that the

conformational sampling and the protonation state sampling are directly coupled. In this


193









work, side-chain dihedral angles are chosen to study conformation-protonation coupling.

Asp119 land X2 dihedral angles at pH 3 will be shown as representatives. Two

dimensional histograms between dihedral angles and protonation states are displayed

in Figure 5-13. A two-dimensional (2D) histogram is generated by putting bins in

dihedral angle and protonation state space (As explained in the second chapter,

considering syn and anti configuration of protons will generate five protonation states in

the case of ionizable aspartate in AMBER. They can be labeled as 0, 1, 2, 3 and 4 in

which state 0 stands for deprotonated state and the rest represent protonated species).


150- 1500 150-

S 13 03
100 -00 l c 2 h angle











around -170. In Figure 5-13A, we can clearly see that conformation 1 is coupled with
-150- 0 -"',


0- ( 0N 36.0



0 1 2 3 4 0 1 2 3 4
Protonatiaon Stae A Protonalion State B


Figure 5-13. A) Correlation between side chain dihedral angle xland protonation states.
B) Correlation between side chain dihedral angle x^and protonation states.

Our 2D histograms can show the correlations between dihedral angle distribution

and protonation state distribution. Two conformations are obtained in X1 space:

conformation 1 having X1 angle around -60 while conformation 2 having X1 angle

around -170. In Figure 5-13A, we can clearly see that conformation 1 is coupled with

protonated form and most structures in conformation 2 are in deprotonated state.

According to Figure 5-13B, similar behavior can be seen in 72 space too. Most


194









deprotonated Asp119 are found having X2 near 40 and -1400, while configurations

showing -750 and 1000 of X2 are protonated.

A closer look at the 1AKI crystal structure reveals that side-chains of Asp119 and

Arg125 are close to each other (the carboxylic group of Asp119 and the guanidinium

group of Arg125 are in hydrogen bond distance). Since Arg125 has a positive charge on

its guanidinium group, it stabilizes the deprotonated Asp119 when two side chains are

close to each other. We calculated pKa of Asp119 in 1AKI using H++ (H++ is a web-

based FDPB server developed by Alexy Onufriev's group at Virginia Tech. The FDPB

equation is solved on the basis of only one protein structure).257'258 The calculated pKa

of Asp119 using FDPB method is -1.1, 0.7 and 1.3 when the internal dielectric constant

is set to be 2, 4, and 6, respectively. All three pKa values are much lower than

experimental pKa value of 3.2. This behavior agrees with what we just explained:

Asp119-Arg125 side-chain coupling stabilizes the deprotonated form of Asp119. The

single structure FDPB-based pKa calculations yield such low pKa values because only

one conformation is visited by Asp119. Therefore, Asp119 must sample other

conformations in order to yield accurate pKa predictions. Time evolution of distance

between Asp119 and Arg125 side chain is shown in Figure 5-14 to reflect that

conformations other than crystal conformation are visited in our constant-pH REMD

runs. In Figure 5-14, we can clearly see that the close contact between Asp119 and

Arg125 side-chains can be broken during our simulations. Allowing side-chains to move

will result in a pKa value of 3.0 in our simulations. The comparison between constant-pH

and single-structure FDPB algorithm clearly demonstrates the importance of

conformational sampling in pKa calculations.


195









Asp119 D1
16- Asp119 OD2


12









0 2000 4000 6000 8000 10000
Frame Number (2 ns in total)

Figure 5-14. Minimal distance between Asp119 side chain carboxylic oxygen atoms
(OD1 and OD2) and Arg125 guanidinium nitrogen atoms. Since guanidinium
group has three nitrogen atoms, the minimal distance is the shortest distance
between Asp119 OD1 (or OD2) and those three nitrogen atoms.

Therefore, another way to look at conformations is combining both Asp119 and

Arg125. Now distances between Asp119 CG and Arg125 CZ atoms are selected to

distinguish different conformations. Figure 5-15A shows the CG-CZ distance probability

distribution. The probability distributions also reveal that two conformations exist. One

conformation is centered at CG-CZ distance of 4.2 A which represents the Asp119 and

Arg125 coupling is on. The other conformation is actually representing all structures not

belonging to the previous conformation. Based on the distance between Asp119 CG

and Arg125 CZ, we can say the coupling is off. The 2D histogram between distance and

protonation state at pH 3 is shown in Figure 5-15B. As can be seen in the 2D histogram

contour plot, short distance conformation is indeed in the deprotonated state. The pKa of

shorter distance conformation is negative infinity. Although several snapshots possess

both protonated state and short distance, 2D histogram doesn't reveal them as a stable

conformation. So, the short distance conformation is purely coupled with deprotonated


196










form. We also obtain the pKa value of the longer distance conformation is 3.3 according

to Hill's plot.


0.14- pH=3 12-
pH=4 0
pH=4.5
0.12 1
100.10-
0,10
t I


o
,L 0.06- "
A 270.0
0.04- 6- m

002300
=*':,, -" ^ y --- ---- i -a -- i-- --- --- I --- --- I I
4 B 8 10 12 14 0 1 2 3 4
Distance (A) A Protonation State B


Figure 5-15. A) Probability distribution of Asp119 CG to Arg125 CZ distances. The
Asp119 CG to Arg125 CZ distance is used to distinguish conformations. B)
Coupling between conformations and protonation states.

5.5.7 Conformation-Protonation Equilibrium Model

Due to the coupling between conformation and protonation equilibrium, knowing

the pH effect on conformational equilibrium will be interesting and important. Again,

Asp119 is selected as the representative of our study. First, we want to show the

derivation and the analytical form of K12 as a function of pH values in a general case.

From now on, we will label conformation 1 in deprotonated form as 1d. The, 1p, 2d and

2p stand for conformation 1 in protonated form, conformation 2 in deprotonated form

and conformation 2 in protonated form, respectively. According to eq. 2 and 3, [1p] =

[Id].10(pKa,'-pH) and [2p] = [2d]O1(pKa,2-pH). We can substitute [1p] and [2p] in eq. 1 with

[1d] and [2d] so the conformational equilibrium constant will have the form:


12 d] 1+10(pKa, pH)
[2d] 1 + 10(pKa2 pH)


197









In Eq. 5-5, [1d]/[2d] is the equilibrium constant of conformation 1 and 2 in

deprotonated form and it is equal to the K12 at high pH where both conformations are in

the deprotonated form. So K12 has the final analytical formula:

1 +l 10(pKa,1pH)
K12 K12,h + (pKa-pH) (5-6)
1 +- 10(pKa,2-pH)

where K12,h stands for K12 at high pH. In our derivation, conformation 1 always has a

smaller pKa value than conformation 2. So the denominator always increases faster

than the numerator when pH values going down. Considering that K12,h is a constant,

then K12 is a sigmoid function. When pH is much greater than both pKa values, K12

becomes K12,h. When pH is much smaller than both pKa values, K12 reaches its lower

bound. In the case of Asp119, the pKa value is minus infinity for conformation 1 when

we use Asp119 CG and Arg125 CZ distance to distinguish two conformations. The

ratios of K12 and K12,h from both analytical derivations and actual simulations are plotted

in Figure 5-16. Close agreement between K12/K12,h plots generated from simulations

and conformation-protonation equilibrium model is seen in Figure 5-16A. The

agreement shows that the model could represent conformational equilibrium in our

constant-pH REMD simulations. So, further use of that model is possible. Different pKa,i

and pKa,2 values are also used in order to test how two pKa values affect shape and

inflection point of the sigmoid function. According to Figure 5-16B, 5-16C and 5-16D, if

the difference between pKa,i and pKa,2 is large (greater than 1 pH unit, approximately),

the inflection point will appear at a pH value that equals to pKa,2. pKa,i will affect the

inflection point only when the difference is small. If we view a K12/K12,h plot as a titration

curve and the inflection point is the pKa value, then the K12/K12,h plot yields a pKa value

equals to pKa,2 values, which is 3.3 in the case of Asp119.


198











--Actual Simulation
-*-Analytical


08

0.6

04


Solution pH


- pKa,1- minus infinity, pKa,2=3.3
-pKa,1= -0.5, pKa,2=3.3
- pKa,1= 1.0, pKa,2=3.3
- pKa,1= 2.0, pKa,2=3.3


0 2 4 6
Solution pH


- pKa,1= minus infinity, pKa,2=3.3
-pKa,1= minus infinity, pKa,2=2.0
pKa,1 = minus infinity, pKa,2=4.0
- pKa,1= minus irnn.r. rI a 2=6.0


- pKa,1=1.0, pKa,2=2.0
- pKa,.12.0, pKa,2=3.0
pKa,1=3.0, pKa,2=3.5
-pKa,1=3.5, pKa,2=4.0


Solution pH C Solution pH


Figure 5-16. K12/K12,h as a function of pH and its dependence on pKa,i and pKa,2.

Since the analytical form of K12, pKa,1 and pKa,2 are known and the sum of all


fractions is unity, we can figure out fractions of each species. The analytical expressions


of each species are:


[1d] =( K12 P-)) (5-7)
K12+ 0(pK pH
[1p] = ( K2 10pKa,1-pH
)p] =12 () (5-8)
K12 +1 1+1opKa,1-pH


[2d] =() (+OpI,2 (5-9)


12+1) 0pKapH(5-10)
S 1 10PKa,2-pH
G121+1 1+10pKa,2-pH (-0


199












In our study of Asp119, pKa,i is minus infinity which lead to [1 p] is equal to zero.


K12,h is calculated as the average of all [1d]/[2d], which results in a K12,h of 1.6. Another


K12,h of 1.8, which is the K12 at pH 5, is also tried. Then, fractions of each species from


both analytical formula and actual simulations are shown in Figure 5-17.


ld, K,,=1.8
2d, K,=1.8
10- -- K2p, K, =1.8
d, K 2=1.6
S 8. 2d, K,=1.6
S- 2p, K, =1.6

06-
w
0 04-

U.
02


00
0 1 2 3 4 5 6 7
Solution pH A



1.0 ---Analytical, K h=1.8
-*-Analytical, K1,2=1.6
Actual Simulations
0.8-


A-06-

04

LL

0.2-



2 3 4 5
Solution pH B


Figure 5-17. A) Fraction of each species as a function of pH titrationn curves) obtained
from equations based on conformation-protonation equilibrium. The effect of
K12,h is tested. B) Comparison of titration curves derived from actual
simulations and from the equilibrium equations.


Firstly, the fraction of 2p vs pH plots are almost identical for two K12,h values. This


means that although the fractions of 1d and 2d are affected, the sum of 1d and 2d is


200









not. Secondly, titration curves derived from analytical formula and actual simulations

agree with each other very well. The agreement among titration curves leads to similar

pKa values. Both analytical titration curves using different K12,h yield pKa values to be

between 2.8 and 2.9 with negligible difference and the actual simulation titration curve

gives a pKa value of 3.0. The analysis demonstrates that the equilibrium model could

represent protonation equilibrium in our simulations.

5.5.8 Theoretical NMR Titration Curves

Since the model can be used to simplify conformation-protonation equilibrium in

our constant-pH REMD simulations, it is interesting to know whether it has some

practical meanings. Reproducing experimental titration curves offers us a good

objective. So, quantum mechanical calculations of NMR chemical shift (5) are

performed and their results are demonstrated and discussed in this part. As we have

shown earlier, the dynamics of Asp119 generates two conformations indicating whether

the Asp119-Arg125 electrostatic interaction is "on" or "off". Our NMR calculations are

based on the representative structures of each conformation, in proper protonation

state. Due to the size of HEWL molecule, full quantum mechanical calculations are too

expensive. So our first trial is using Asp119 dipeptide. Chemical shifts of the 1d, 2p and

2d are obtained and the fractions of each species at different pH can be calculated

using eq. 7, 8 and 10. At each pH value, the theoretical chemical shift used to make a

titration curve is calculated as follows: = 81d -[1d] + 2d -[2d] + 2 -[2p]. The chemical

shifts of 1d, 2d and 2p are 2.17, 2.48, 3.03 ppm respectively and the theoretical NMR

titration curve is plotted in Figure 5-18. Compare theoretical titration curve with

experimental one, the trend is correctly reproduced. At low pH, the theoretical and


201










experimental chemical shifts agree well: 3.03 ppm versus 3.13 ppm. However, the

difference between calculated and experimental high pH chemical shifts is greater than

0.6 ppm. This makes our calculated (61ow pH-6high pH) is 0.75 ppm while the experimental

difference is only 0.21 ppm.


-- Full QM + Asp119 dipeptide
1 ----QM/MM + entire HEWL
3.2-


3.0 -
E

2.8
\
C)
2.6
E

2.4-


2.2
0 1 2 3 4 5 6 7
Solution pH


Figure 5-18. Theoretical NMR chemical shifts as a function of pH. It's plotted to see if
the conformation-protonation equilibrium model can reproduce experimental
titration curve based on NMR chemical shift measurements.

The problem at high pH could be that a dipeptide cannot accurately represent

Asp119 and its environment especially we have known there is a strong Asp119-Arg125

Coulomb interaction. So a set of QM/MM calculations was conducted using the entire

HEWL molecule. The new chemical shifts are 2.58, 2.69 and 3.25 ppm for 1d, 2d and

2p. Comparing chemical shifts based on dipeptide and the entire molecule, differences

of 2p and 2d are 0.22 ppm and 0.21 ppm. More importantly, both 2p chemical shifts are

similar to experimental low pH (each one shows the difference near 0.1 ppm). The

differences are small for 2p and 2d because there are no significant interactions for

Asp119 in conformation 2. Unlike 2p or 2d, the chemical shift of 1d is improved by 0.41


202









ppm, telling that using the whole HEWL molecule does change ld chemical shift a lot.

After applying QM/MM method on the entire HEWL, the calculated (i6ow pH-5high pH)

becomes 0.63 ppm. The theoretical titration curve using QM/MM technique is also

displayed in Figure 5-18. But no matter whether a dipeptide or the entire HEWL is used

in NMR calculations, the pKa values are around 2.9 as expected. NMR titration curves

yield the same pKa value as protonation (deprotonation) fraction vs pH does. The NMR

titration curve calculations validate the use of conformation-protonation equilibrium

model and confirm its applicability. This model can be used to simplify a lot analysis

involving further calculations.

5.6 Conclusions

In this chapter, constant-pH REMD simulations are performed to study the pKa of

hen egg white lysozyme. Three sets of constant-pH REMD simulations have been

performed: one set of simulations are conducted without restraining potential, while a

harmonic potential is put on the Ca atoms in the other two sets of REMD simulations.

The force constants of the two harmonic potentials are 1, and 0.1 kcal/mol-A2,

respectively, so that the effect of restraint strength on pKa prediction accuracy can be

studied.

In our constant-pH REMD simulations, the unrestrained ones are found to be

structurally instable. The Ca atom RMSD relative to crystal structure can be as high as

18 A. Due to the effect of restraining potential, HEWL in a restrained simulation is stable

and similar to the crystal structure, according to the Ca atom RMSD values. In the

restrained simulations with a force constant of 1 kcal/mol-A2, accurate pKa predictions

are achieved. The overall RMS errors between predicted and experimental pKa values

are 0.87 and 0.84, dependent of pKa calculation methods. Unfortunately, those two


203









RMS errors are not better than constant-pH MD results obtained by Mongan et al. The

advantage of incorporating REMD method is not observed. However, a plot showing

RMS error as a function of pH value yields the smallest RMS error at pH 4.5, at which

the crystal structure was resolved. Supported by the work of Mongan et al., we propose

that the further away from crystal pH value, the stronger the biasing effect from the

restraining potential. The biasing effect of conformational sampling will in turn affect pKa

predictions. As expected, reducing the strength of harmonic potential results in

improved pKa predictions. Likewise, the smallest pKa RMS error of 0.62 is obtained at

pH 4.5 in the weakly restrained constant-pH REMD simulations. An RMS error of 0.62 is

among the best pKa predictions generated from constant-pH simulations.

The pKa predictions of catalytic ionizable residues are of particular interest in the

case of HEWL. Constant-pH REMD simulations with stronger restraining potential failed

to identify proton donor under the criteria proposed by Nielsen and McCammon in 2003.

The weakly restrained constant-pH REMD simulations are able to predicted proton

donor and nucleophile, although the errors of predicted pKa values of Glu35 and Asp52

are among the largest in our simulations. Hydrogen-bonding is found to be responsible

for the large error of Asp52. The hydrogen-bonding of Asp52 with Asn44 and Asn59

over-stabilizes the deprotonated form of Asp52, causing the pKa value of Asp52 too

small. For Glu35, conformational sampling also plays a role in underestimating its pKa

value. However, other factors such as the use of implicit solvent may affect the pKa

prediction of Glu35 too.

In this work, we also focused on conformation and protonation equilibrium in

constant-pH REMD simulations. Correlations between protonation and side-chain


204









dihedral angles X, and X2 are studied. Other representation of conformations such as

whether an important electrostatic interaction is formed or not is also adopted. In both

cases, the coupling between conformation and protonation is observed. The effect of

conformation-protonation coupling is partially reflected by the comparison between

constant-pH and single structure FDPB algorithms. Constant-pH REMD yields better

pKa values are seen because more conformation space is visited.

The conformation-protonation equilibrium is further studied. Equilibrium constants

between conformations are derived in order to show how pH affects conformation

equilibrium. The conformational equilibrium constant is shown to be pH dependent and

it's a sigmoid function of pH values. The shape of the sigmoidal function is influenced by

pKa values of each conformation. Titration curves which are the means to obtain pKa

values are also derived from conformation-protonation equilibrium. All analytical results

are in good agreement with our simulations. In addition, we apply this conformation-

protonation equilibrium to reproduce experimental NMR titration curve by carrying out

full QM and QM/MM calculations. First, we showed the importance of protein

environment to chemical shift calculations. Calculation using isolated ionizable side

chain can only qualitatively reproduce experimental NMR titration curve. The error

mainly comes from the high pH end where an isolated side chain assumption fails. After

adding protein environment, our theoretical titration curve is greatly improved and good

agreement to experimental result is obtained. Our conformation-protonation equilibrium

model can be used to represent our simulations and will simplify further calculations.


205









LIST OF REFERENCES

(1) Bettelheim, F. A. Introduction to general, organic, and biochemistry; 8th ed.;
Thomson Brooks/Cole: Belmont, CA, 2007.

(2) Dey, A.; Verma, C. S.; Lane, D. P. Br. J. Cancer 2008, 98, 4-8.

(3) Vogelstein, B.; Lane, D.; Levine, A. J. Nature 2000, 408, 307-310.

(4) Matthew, J. B.; Gurd, F. R. N.; Garciamoreno, E. B.; Flanagan, M. A.; March, K.
L.; Shire, S. J. Crc Cr. Rev. Biochem. 1985, 18, 91-197.

(5) Bierzynski, A.; Kim, P. S.; Baldwin, R. L. Proc. Natl. Acad. Sci. U. S. A. 1982, 79,
2470-2474.

(6) Ferguson, N.; Schartau, P. J.; Sharpe, T. D.; Sato, S.; Fersht, A. R. J. Mol. Biol.
2004, 344, 295-301.

(7) Shoemaker, K. R.; Kim, P. S.; Brems, D. N.; Marqusee, S.; York, E. J.; Chaiken,
I. M.; Stewart, J. M.; Baldwin, R. L. Proc. Natl. Acad. Sci. U. S. A. 1985, 82,
2349-2353.

(8) Garcia-Mira, M. M.; Sadqi, M.; Fischer, N.; Sanchez-Ruiz, J. M.; Munoz, V.
Science 2002, 298, 2191-2195.

(9) Hunenberger, P. H.; Helms, V.; Narayana, N.; Taylor, S. S.; McCammon, J. A.
Biochemistry 1999, 38, 2358-2366.

(10) Demchuk, E.; Genick, U. K.; Woo, T. T.; Getzoff, E. D.; Bashford, D.
Biochemistry 2000, 39, 1100-1113.

(11) Dillet, V.; Dyson, H. J.; Bashford, D. Biochemistry 1998, 37, 10298-10306.

(12) Harris, T. K.; Turner, G. J. IUBMB Life 2002, 53, 85-98.

(13) Laidler, K. J. Chemical kinetics; 3rd ed.; Harper & Row: New York, 1987.

(14) Fersht, A. Structure and mechanism in protein science : a guide to enzyme
catalysis and protein folding; W.H. Freeman: New York, 1999.

(15) Simonson, T.; Carlsson, J.; Case, D. A. J. Am. Chem. Soc. 2004, 126, 4167-
4180.

(16) Lee, A. C.; Crippen, G. M. J. Chem. Inf Model. 2009, 49, 2013-2033.

(17) Langsetmo, K.; Fuchs, J. A.; Woodward, C. Biochemistry 1991, 30, 7603-7609.


206









(18) Garcia-Moreno, B.; Dwyer, J. J.; Gittis, A. G.; Lattman, E. E.; Spencer, D. S.;
Stites, W. E. Biophys. Chem. 1997, 64, 211-224.

(19) Garcia-Moreno, B.; Fitch, C.; Karp, D.; Gittis, A.; Lattman, E. Biophys. J. 2002,
82, 300a-300a.

(20) Tanford, C. Adv. Protein Chem. 1962, 17, 69-165.

(21) Dwyer, J. J.; Gittis, A. G.; Karp, D. A.; Lattman, E. E.; Spencer, D. S.; Stites, W.
E.; Garcia-Moreno, B. Biophys. J. 2000, 79, 1610-1620.

(22) Harms, M. J.; Castaneda, C. A.; Schlessman, J. L.; Sue, G. R.; Isom, D. G.;
Cannon, B. R.; Garcia-Moreno, B. J. Mol. Biol. 2009, 389, 34-47.

(23) Mehler, E. L.; Fuxreiter, M.; Simon, I.; Garcia-Moreno, E. B. Proteins: Struct.,
Funct., Genet. 2002, 48, 283-292.

(24) Anderson, D. E.; Becktel, W. J.; Dahlquist, F. W. Biochemistry 1990, 29, 2403-
2408.

(25) Dyson, H. J.; Jeng, M. F.; Tennant, L. L.; Slaby, I.; Lindell, M.; Cui, D. S.; Kuprin,
S.; Holmgren, A. Biochemistry 1997, 36, 2622-2636.

(26) Bashford, D.; Case, D. A.; Dalvit, C.; Tennant, L.; Wright, P. E. Biochemistry
1993, 32, 8045-8056.

(27) Wang, Y. X.; Freedberg, D. I.; Yamazaki, T.; Wingfield, P. T.; Stahl, S. J.;
Kaufman, J. D.; Kiso, Y.; Torchia, D. A. Biochemistry 1996, 35, 9945-9950.

(28) Dyson, H. J.; Tennant, L. L.; Holmgren, A. Biochemistry 1991, 30, 4262-4268.

(29) Jeng, M. F.; Dyson, H. J. Biochemistry 1996, 35, 1-6.

(30) Wilson, N. A.; Barbar, E.; Fuchs, J. A.; Woodward, C. Biochemistry 1995, 34,
8931-8939.

(31) Callis, P. R. Methods Enzymol. 1997, 278, 113-150.

(32) Callis, P. R.; Burgess, B. K. J. Phys. Chem. B 1997, 101, 9429-9432.

(33) Vivian, J. T.; Callis, P. R. Biophys. J. 2001, 80, 2093-2109.

(34) Inoue, M.; Yamada, H.; Yasukochi, T.; Kuroki, R.; Miki, T.; Horiuchi, T.; Imoto, T.
Biochemistry 1992, 31, 5545-5553.


207









(35) Kajander, T.; Kahn, P. C.; Passila, S. H.; Cohen, D. C.; Lehtio, L.; Adolfsen, W.;
Warwicker, J.; Schell, U.; Goldman, A. Structure 2000, 8, 1203-1214.

(36) Bartlett, G. J.; Porter, C. T.; Borkakoti, N.; Thornton, J. M. J. Mol. Biol. 2002, 324,
105-121.

(37) Jiang, Y. X.; Ruta, V.; Chen, J. Y.; Lee, A.; MacKinnon, R. Nature 2003, 423, 42-
48.

(38) Luecke, H.; Richter, H. T.; Lanyi, J. K. Science 1998, 280, 1934-1937.

(39) Bashford, D.; Case, D. A. Annu. Rev. Phys. Chem. 2000, 51, 129-152.

(40) Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T. J. Am. Chem. Soc.
1990, 112, 6127-6129.

(41) Cramer, C. J. Essentials of computational chemistry : theories and models; J.
Wiley: West Sussex, England ; New York, 2002.

(42) Raha, K.; Merz, K. M. In Annual reports in computational chemistry; Spellmeyer,
D. C., Ed.; Elsevier: Amsterdam ; Boston, 2005; Vol. 1, p p113-130.

(43) Dixon, S. L.; Merz, K. M. J. Chem. Phys. 1996, 104, 6643-6649.

(44) Vreven, T.; Morokuma, K. In Annual Reports in Computational Chemistry;
Spellmeyer, D., Ed.; Elsevier: Amsterdam ; Boston, 2006; Vol. 2, p p35-51.

(45) Field, M. J.; Bash, P. A.; Karplus, M. J. Comput. Chem. 1990, 11, 700-733.

(46) Singh, U. C.; Kollman, P. A. J. Comput. Chem. 1986, 7, 718-730.

(47) Warshel, A.; Levitt, M. J. Mol. Biol. 1976, 103, 227-249.

(48) Kamerlin, S. C. L.; Haranczyk, M.; Warshel, A. J. Phys. Chem. B 2009, 113,
1253-1272.

(49) Monard, G.; Merz, K. M. Acc. Chem. Res. 1999, 32, 904-911.

(50) Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. J.
Chem. Phys. 1953, 21, 1087-1092.

(51) Wolynes, P. G.; Onuchic, J. N.; Thirumalai, D. Science 1995, 267, 1619-1620.

(52) Itoh, S. G.; Okumura, H.; Okamoto, Y. Mol. Simul. 2007, 33, 47-56.

(53) Mitsutake, A.; Sugita, Y.; Okamoto, Y. Biopolymers 2001, 60, 96-123.


208









(54) Berg, B. A.; Neuhaus, T. Phys. Lett. B 1991, 267, 249-253.

(55) Berg, B. A.; Neuhaus, T. Phys. Rev. Lett. 1992, 68, 9-12.

(56) Lyubartsev, A. P.; Martsinovski, A. A.; Shevkunov, S. V.; Vorontsovvelyaminov,
P. N. J. Chem. Phys. 1992, 96, 1776-1783.

(57) Marinari, E.; Parisi, G. Europhys. Lett. 1992, 19, 451-458.

(58) Hansmann, U. H. E. Chem. Phys. Lett. 1997, 281, 140-150.

(59) Swendsen, R. H.; Wang, J. S. Phys. Rev. Lett. 1986, 57, 2607-2609.

(60) Earl, D. J.; Deem, M. W. Phys. Chem. Chem. Phys. 2005, 7, 3910-3916.

(61) Fukunishi, H.; Watanabe, O.; Takada, S. J. Chem. Phys. 2002, 116, 9058-9067.

(62) Sugita, Y.; Okamoto, Y. Chem. Phys. Lett. 1999, 314, 141-151.

(63) Tanford, C.; Kirkwood, J. G. J. Am. Chem. Soc. 1957, 79, 5333-5339.

(64) Tanford, C.; Roxby, R. Biochemistry 1972, 11, 2192-2198.

(65) Bashford, D.; Karplus, M. Biochemistry 1990, 29, 10219-10225.

(66) Gilson, M. K. Proteins: Struct., Funct., Genet. 1993, 15, 266-282.

(67) Antosiewicz, J.; Mccammon, J. A.; Gilson, M. K. J. Mol. Biol. 1994, 238, 415-436.

(68) Antosiewicz, J.; McCammon, J. A.; Gilson, M. K. Biochemistry 1996, 35, 7819-
7833.

(69) Bashford, D.; Karplus, M. J. Phys. Chem. 1991, 95, 9556-9561.

(70) Yang, A. S.; Gunner, M. R.; Sampogna, R.; Sharp, K.; Honig, B. Proteins: Struct.,
Funct., Genet. 1993, 15, 252-265.

(71) Yang, A. S.; Honig, B. J. Mol. Biol. 1993, 231, 459-474.

(72) Madura, J. D.; Briggs, J. M.; Wade, R. C.; Davis, M. E.; Luty, B. A.; Ilin, A.;
Antosiewicz, J.; Gilson, M. K.; Bagheri, B.; Scott, L. R.; Mccammon, J. A.
Comput. Phys. Commun. 1995, 91, 57-95.

(73) Nicholls, A.; Honig, B. J. Comput. Chem. 1991, 12, 435-445.


209









(74) Beroza, P.; Fredkin, D. R.; Okamura, M. Y.; Feher, G. Proc. Natl. Acad. Sci. U. S.
A. 1991, 88, 5804-5808.

(75) Bone, S.; Pethig, R. J. Mol. Biol. 1985, 181, 323-326.

(76) Harvey, S. C.; Hoekstra, P. J. Phys. Chem. 1972, 76, 2987-&.

(77) Garcia-Moreno, B.; Fitch, C. A. Methods Enzymol. 2004, 380, 20-51.

(78) Simonson, T.; Brooks, C. L. J. Am. Chem. Soc. 1996, 118, 8452-8458.

(79) Mehler, E. L.; Eichele, G. Biochemistry 1984, 23, 3887-3891.

(80) Mehler, E. L.; Guarnieri, F. Biophys. J. 1999, 77, 3-22.

(81) Alexov, E. G.; Gunner, M. R. Biophys. J. 1997, 72, 2075-2093.

(82) Barth, P.; Alber, T.; Harbury, P. B. Proc. Natl. Acad. Sci. U. S. A. 2007, 104,
4898-4903.

(83) Georgescu, R. E.; Alexov, E. G.; Gunner, M. R. Biophys. J. 2002, 83, 1731-1748.

(84) Gunner, M. R.; Alexov, E.; Torres, E.; Lipovaca, S. J. Biol. Inorg. Chem. 1997, 2,
126-134.

(85) Livesay, D. R.; Jacobs, D. J.; Kanjanapangka, J.; Chea, E.; Cortez, H.; Garcia,
J.; Kidd, P.; Marquez, M. P.; Pande, S.; Yang, D. J. Chem. Theory Comput.
2006, 2, 927-938.

(86) You, T. J.; Bashford, D. Biophys. J. 1995, 69, 1721-1733.

(87) Kollman, P. Chem. Rev. 1993, 93, 2395-2417.

(88) Straatsma, T. P.; Mccammon, J. A. Annu. Rev. Phys. Chem. 1992, 43, 407-435.

(89) Warshel, A.; Sussman, F.; King, G. Biochemistry 1986, 25, 8368-8372.

(90) Russell, S. T.; Warshel, A. J. Mol. Biol. 1985, 185, 389-404.

(91) Jorgensen, W. L.; Briggs, J. M. J. Am. Chem. Soc. 1989, 111, 4190-4197.

(92) Merz, K. M. J. Am. Chem. Soc. 1991, 113, 3572-3575.

(93) Hu, H.; Yang, W. T. Annu. Rev. Phys. Chem. 2008, 59, 573-601.

(94) Li, G. H.; Zhang, X. D.; Cui, Q. J. Phys. Chem. B 2003, 107, 8643-8653.


210









(95) Riccardi, D.; Schaefer, P.; Cui, Q. J. Phys. Chem. B 2005, 109, 17715-17733.

(96) Bas, D. C.; Rogers, D. M.; Jensen, J. H. Proteins: Struct., Funct., Bioinf. 2008,
73, 765-783.

(97) Jensen, J. H.; Li, H.; Robertson, A. D.; Molina, P. A. J. Phys. Chem. A 2005, 109,
6634-6643.

(98) Li, H.; Hains, A. W.; Everts, J. E.; Robertson, A. D.; Jensen, J. H. J. Phys. Chem.
B 2002, 106, 3486-3494.

(99) Li, H.; Robertson, A. D.; Jensen, J. H. Proteins: Struct., Funct., Bioinf 2004, 55,
689-704.

(100) Li, H.; Robertson, A. D.; Jensen, J. H. Proteins: Struct., Funct., Bioinf 2005, 61,
704-721.

(101) Minikis, R. M.; Kairys, V.; Jensen, J. H. J. Phys. Chem. A 2001, 105, 3829-3837.

(102) Day, P. N.; Jensen, J. H.; Gordon, M. S.; Webb, S. P.; Stevens, W. J.; Krauss,
M.; Garmer, D.; Basch, H.; Cohen, D. J. Chem. Phys. 1996, 105, 1968-1986.

(103) Gordon, M. S.; Freitag, M. A.; Bandyopadhyay, P.; Jensen, J. H.; Kairys, V.;
Stevens, W. J. J. Phys. Chem. A 2001, 105, 293-307.

(104) Mongan, J.; Case, D. A. Curr. Opin. Struct. Biol. 2005, 15, 157-163.

(105) Baptista, A. M. J. Chem. Phys. 2002, 116, 7766-7768.

(106) Baptista, A. M.; Martel, P. J.; Petersen, S. B. Proteins: Struct., Funct., Genet.
1997, 27, 523-544.

(107) Borjesson, U.; Hunenberger, P. H. J. Chem. Phys. 2001, 114, 9706-9719.

(108) Borjesson, U.; Hunenberger, P. H. J. Phys. Chem. B 2004, 108, 13551-13559.

(109) Khandogin, J.; Brooks, C. L. Biophys. J. 2005, 89, 141-157.

(110) Khandogin, J.; Brooks, C. L. Biochemistry 2006, 45, 9363-9373.

(111) Khandogin, J.; Brooks, C. L. Proc. Natl. Acad. Sci. U. S. A. 2007, 104, 16880-
16885.

(112) Khandogin, J.; Chen, J. H.; Brooks, C. L. Proc. Natl. Acad. Sci. U. S. A. 2006,
103, 18546-18550.


211









(113) Khandogin, J.; Raleigh, D. P.; Brooks, C. L. J. Am. Chem. Soc. 2007, 129, 3056-
3057.

(114) Lee, M. S.; Salsbury, F. R.; Brooks, C. L. Proteins: Struct., Funct., Bioinf 2004,
56, 738-752.

(115) Mertz, J. E.; Pettitt, B. M. Int. J. Supercomp. Appl. 1994, 8, 47-53.

(116) Kong, X. J.; Brooks, C. L. J. Chem. Phys. 1996, 105, 2414-2423.

(117) Chen, J. H.; Brooks, C. L.; Khandogin, J. Curr. Opin. Struct. Biol. 2008, 18, 140-
148.

(118) Baptista, A. M.; Teixeira, V. H.; Soares, C. M. J. Chem. Phys. 2002, 117, 4184-
4200.

(119) Dlugosz, M.; Antosiewicz, J. M. Chem. Phys. 2004, 302, 161-170.

(120) Dlugosz, M.; Antosiewicz, J. M. J. Phys. Chem. B 2005, 109, 13777-13784.

(121) Dlugosz, M.; Antosiewicz, J. M. J. Phys.: Condens. Matter2005, 17, S1607-
S1616.

(122) Dlugosz, M.; Antosiewicz, J. M.; Robertson, A. D. Phys. Rev. E2004, 69,
021915.

(123) Machuqueiro, M.; Baptista, A. M. J. Phys. Chem. B 2006, 110, 2927-2933.

(124) Machuqueiro, M.; Baptista, A. M. Biophys. J. 2007, 92, 1836-1845.

(125) Machuqueiro, M.; Baptista, A. M. Proteins: Struct., Funct., Bioinf 2008, 72, 289-
298.

(126) Machuqueiro, M.; Baptista, A. M. J. Am. Chem. Soc. 2009, 131, 12586-12594.

(127) Mongan, J.; Case, D. A.; McCammon, J. A. J. Comput. Chem. 2004, 25, 2038-
2048.

(128) Walczak, A. M.; Antosiewicz, J. M. Phys. Rev. E 2002, 66, 051911.

(129) Williams, S. L.; de Oliveira, C. A. F.; McCammon, J. A. J. Chem. Theory Comput.
2010, 6, 560-568.

(130) Burgi, R.; Kollman, P. A.; van Gunsteren, W. F. Proteins: Struct., Funct., Genet.
2002, 47, 469-480.


212









(131) Meng, Y. L.; Roitberg, A. E. J. Chem. Theory Comput. 2010, 6, 1401-1412.

(132) Schaefer, M.; Karplus, M. J. Phys. Chem. 1996, 100, 1578-1599.

(133) Hamelberg, D.; Mongan, J.; McCammon, J. A. J. Chem. Phys. 2004, 120, 11919-
11929.

(134) Hamelberg, D.; Mongan, J.; McCammon, J. A. Protein Sci. 2004, 13, 76-76.

(135) Ponder, J. W.; Case, D. A. Adv. Protein Chem. 2003, 66, 27-85.

(136) Allinger, N. L.; Yuh, Y. H.; Lii, J. H. J. Am. Chem. Soc. 1989, 111, 8551-8566.

(137) Leach, A. R. Molecular modelling : principles and applications; 2nd ed.; Prentice
Hall: Harlow, England ; New York, 2001.

(138) MacKerell, A. D. In Annual reports in computational chemistry
Spellmeyer, D. C., Ed.; Elsevier: Amsterdam ; Boston, 2005; Vol. 1, p p91~102.

(139) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C.
Proteins: Struct., Funct., Bioinf 2006, 65, 712-725.

(140) MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.;
Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir,
L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.;
Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.;
Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. J. Phys.
Chem. B 1998, 102, 3586-3616.

(141) Daura, X.; Mark, A. E.; van Gunsteren, W. F. J. Comput. Chem. 1998, 19, 535-
547.

(142) Jorgensen, W. L.; Tirado-Rives, J. J. Am. Chem. Soc. 1988, 110, 1657-1666.

(143) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D.
M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J. Am. Chem. Soc.
1995, 117, 5179-5197.

(144) Verlet, L. Phys. Rev. 1967, 159, 98.

(145) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. J. Comput. Phys. 1977, 23, 327-
341.

(146) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; Dinola, A.; Haak, J.
R. J. Chem. Phys. 1984, 81, 3684-3690.


213









(147) McQuarrie, D. A. Statistical thermodynamics; University Science Books: Mill
Valley, Calif., 1973.

(148) Nose, S. J. Chem. Phys. 1984, 81, 511-519.

(149) Berendsen, H. J. C.; Grigera, J. R.; Straatsma, T. P. J. Phys. Chem. 1987, 91,
6269-6271.

(150) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. J.
Chem. Phys. 1983, 79, 926-935.

(151) Mahoney, M. W.; Jorgensen, W. L. J. Chem. Phys. 2000, 112, 8910-8922.

(152) Allen, M. P.; Tildesley, D. J. Computer simulation of liquids; Clarendon Press;
Oxford University Press: Oxford [England] New York, 1987.

(153) Ewald, P. P. Annalen Der Physik 1921, 64, 253-287.

(154) Darden, T.; York, D.; Pedersen, L. J. Chem. Phys. 1993, 98, 10089-10092.

(155) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Chem. Phys. Lett. 1995, 246, 122-
129.

(156) Kirkwood, J. G. J. Chem. Phys. 1935, 3, 300-313.

(157) Straatsma, T. P.; Mccammon, J. A. J. Chem. Phys. 1991, 95, 1175-1188.

(158) Zwanzig, R. W. J. Chem. Phys. 1954, 22, 1420-1426.

(159) Bennett, C. H. J. Comput. Phys. 1976, 22, 245-268.

(160) Shirts, M. R.; Chodera, J. D. J. Chem. Phys. 2008, 129, 124105.

(161) Jorgensen, W. L.; Ravimohan, C. J. Chem. Phys. 1985, 83, 3050-3054.

(162) Hansmann, U. H. E.; Okamoto, Y. Nucl. Phys. B 1995, 914-916.

(163) Wang, F. G.; Landau, D. P. Phys. Rev. E2001, 64, 056101.

(164) Wang, F. G.; Landau, D. P. Phys. Rev. Lett. 2001, 86, 2050-2053.

(165) Falcioni, M.; Deem, M. W. J. Chem. Phys. 1999, 110, 1754-1766.

(166) Kofke, D. A. J. Chem. Phys. 2002, 117, 6911-6914.


214









(167) Liu, P.; Kim, B.; Friesner, R. A.; Berne, B. J. Proc. Natl. Acad. Sci. U. S. A. 2005,
102, 13749-13754.

(168) Li, H. Z.; Li, G. H.; Berg, B. A.; Yang, W. J. Chem. Phys. 2006, 125, 144902.

(169) Okur, A.; Roe, D. R.; Cui, G. L.; Hornak, V.; Simmerling, C. J. Chem. Theory
Comput. 2007, 3, 557-568.

(170) Roitberg, A. E.; Okur, A.; Simmerling, C. J. Phys. Chem. B 2007, 111, 2415-
2418.

(171) Rathore, N.; Chopra, M.; de Pablo, J. J. J. Chem. Phys. 2005, 122, 024111.

(172) Sanbonmatsu, K. Y.; Garcia, A. E. Proteins: Struct., Funct., Genet. 2002, 46,
225-234.

(173) Kone, A.; Kofke, D. A. J. Chem. Phys. 2005, 122, 206101.

(174) Trebst, S.; Troyer, M.; Hansmann, U. H. E. J. Chem. Phys. 2006, 124, 174903.

(175) Nadler, W.; Hansmann, U. H. E. Phys. Rev. E2007, 76, 065701.

(176) Nadler, W.; Hansmann, U. H. E. Phys. Rev. E2007, 75, 026109.

(177) Nadler, W.; Hansmann, U. H. E. J. Phys. Chem. B 2008, 112, 10386-10387.

(178) Opps, S. B.; Schofield, J. Phys. Rev. E2001, 6305, 056701.

(179) Zhang, W.; Wu, C.; Duan, Y. J. Chem. Phys. 2005, 123, 154105.

(180) Sindhikara, D.; Meng, Y. L.; Roitberg, A. E. J. Chem. Phys. 2008, 128, 024103.

(181) Abraham, M. J.; Gready, J. E. J. Chem. Theory Comput. 2008, 4, 1119-1128.

(182) Zhang, C.; Ma, J. P. J. Chem. Phys. 2008, 129, 134112.

(183) Rosta, E.; Buchete, N. V.; Hummer, G. J. Chem. Theory Comput. 2009, 5, 1393-
1399.

(184) Zhou, R. H.; Berne, B. J.; Germain, R. Proc. Natl. Acad. Sci. U. S. A. 2001, 98,
14931-14936.

(185) Lyman, E.; Ytreberg, F. M.; Zuckerman, D. M. Phys. Rev. Lett. 2006, 96, 028105.

(186) Liu, P.; Shi, Q.; Lyman, E.; Voth, G. A. J. Chem. Phys. 2008, 129, 114103.


215









(187) Liu, P.; Voth, G. A. J. Chem. Phys. 2007, 126, 045106.

(188) Okur, A.; Wickstrom, L.; Layten, M.; Geney, R.; Song, K.; Hornak, V.;
Simmerling, C. J. Chem. Theory Comput. 2006, 2, 420-433.

(189) Ballard, A. J.; Jarzynski, C. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 12224-
12229.

(190) Kamberaj, H.; van derVaart, A. J. Chem. Phys. 2009, 130, 074906.

(191) Nguyen, P. H. J. Chem. Phys. 2010, 132, 144109.

(192) Sugita, Y.; Okamoto, Y. Chem. Phys. Lett. 2000, 329, 261-270.

(193) Mitsutake, A.; Okamoto, Y. Chem. Phys. Lett. 2000, 332, 131-138.

(194) Mitsutake, A.; Okamoto, Y. J. Chem. Phys. 2004, 121, 2491-2504.

(195) Andrec, M.; Felts, A. K.; Gallicchio, E.; Levy, R. M. Proc. Natl. Acad. Sci. U. S. A.
2005, 102, 6801-6806.

(196) van der Spoel, D.; Seibert, M. M. Phys. Rev. Lett. 2006, 96, 238102.

(197) Yang, S. C.; Onuchic, J. N.; Garcia, A. E.; Levine, H. J. Mol. Biol. 2007, 372, 756-
763.

(198) Buchete, N. V.; Hummer, G. Phys. Rev. E2008, 77, 030902.

(199) Case, D. A.; Darden, T. A.; T.E. Cheatham, I.; Simmerling, C. L.; Wang, J.; Duke,
R. E.; Luo, R.; Crowley, M.; Walker, R. C.; Zhang, W.; Merz, K. M.; B.Wang;
Hayik, S.; Roitberg, A.; Seabra, G.; Kolossvary, I.; K.F.Wong; Paesani, F.;
Vanicek, J.; X.Wu; Brozell, S. R.; Steinbrecher, T.; Gohlke, H.; Yang, L.; Tan, C.;
Mongan, J.; Hornak, V.; Cui, G.; Mathews, D. H.; Seetin, M. G.; Sagui, C.; Babin,
V.; Kollman, P. A.; University of California, San Francisco: San Francisco, 2008.

(200) Onufriev, A.; Bashford, D.; Case, D. A. J. Phys. Chem. B 2000, 104, 3712-3720.

(201) Elber, R.; Roitberg, A.; Simmerling, C.; Goldstein, R.; Li, H. Y.; Verkhivker, G.;
Keasar, C.; Zhang, J.; Ulitsky, A. Comput. Phys. Commun. 1995, 91, 159-189.

(202) Dill, K. A.; Ozkan, S. B.; Shell, M. S.; Weikl, T. R. Annu. Rev. Biophys. 2008, 37,
289-316.

(203) Dobson, C. M. Nature 2003, 426, 884-890.


216









(204) Anfinsen, C. B.; Haber, E.; Sela, M.; White, F. H. Proc. Natl. Acad. Sci. U. S. A.
1961, 47, 1309-1314.

(205) Mayor, U.; Johnson, C. M.; Daggett, V.; Fersht, A. R. Proc. Natl. Acad. Sci. U. S.
A. 2000, 97, 13518-13522.

(206) Snow, C. D.; Nguyen, N.; Pande, V. S.; Gruebele, M. Nature 2002, 420, 102-106.

(207) Brooks, C. L. Acc. Chem. Res. 2002, 35, 447-454.

(208) Levinthal, C. J. Chim. Phys. Phys.-Chim. Biol. 1968, 65, 44-45.

(209) Gruebele, M. Annu. Rev. Phys. Chem. 1999, 50, 485-516.

(210) Kubelka, J.; Hofrichter, J.; Eaton, W. A. Curr. Opin. Struct. Biol. 2004, 14, 76-88.

(211) Snow, C. D.; Sorin, E. J.; Rhee, Y. M.; Pande, V. S. Annu. Rev. Biophys. Biomol.
Struct. 2005, 34, 43-69.

(212) Snow, C. D.; Qiu, L. L.; Du, D. G.; Gai, F.; Hagen, S. J.; Pande, V. S. Proc. Natl.
Acad. Sci. U. S. A. 2004, 101, 4077-4082.

(213) Zagrovic, B.; Sorin, E. J.; Pande, V. J. Mol. Biol. 2001, 313, 151-169.

(214) Jayachandran, G.; Vishal, V.; Pande, V. S. J. Chem. Phys. 2006, 124, 054118.

(215) Singhal, N.; Snow, C. D.; Pande, V. S. J. Chem. Phys. 2004, 121, 415-425.

(216) Swope, W. C.; Pitera, J. W.; Suits, F. J. Phys. Chem. B 2004, 108, 6571-6581.

(217) Swope, W. C.; Pitera, J. W.; Suits, F.; Pitman, M.; Eleftheriou, M.; Fitch, B. G.;
Germain, R. S.; Rayshubski, A.; Ward, T. J. C.; Zhestkov, Y.; Zhou, R. J. Phys.
Chem. B 2004, 108, 6582-6594.

(218) Daggett, V.; Levitt, M. J. Mol. Biol. 1993, 232, 600-619.

(219) Daggett, V.; Levitt, M. J. Cell. Biochem. 1993, 223-223.

(220) Daggett, V.; Levitt, M. Curr. Opin. Struct. Biol. 1994, 4, 291-295.

(221) Dadlez, M.; Bierzynski, A.; Godzik, A.; Sobocinska, M.; Kupryszewski, G.
Biophys. Chem. 1988, 31, 175-181.

(222) Baldwin, R. L. Biophys. Chem. 1995, 55, 127-135.

(223) Brown, J. E.; Klee, W. A. Biochemistry 1971, 10, 470-476.


217









(224) Fairman, R.; Shoemaker, K. R.; York, E. J.; Stewart, J. M.; Baldwin, R. L.
Biophys. Chem. 1990, 37, 107-119.

(225) Osterhout, J. J.; Baldwin, R. L.; York, E. J.; Stewart, J. M.; Dyson, H. J.; Wright,
P. E. Biochemistry 1989, 28, 7059-7064.

(226) Shoemaker, K. R.; Fairman, R.; Schultz, D. A.; Robertson, A. D.; York, E. J.;
Stewart, J. M.; Baldwin, R. L. Biopolymers 1990, 29, 1-11.

(227) Felts, A. K.; Harano, Y.; Gallicchio, E.; Levy, R. M. Proteins: Struct., Funct.,
Bioinf 2004, 56, 310-321.

(228) Hansmann, U. H. E.; Okamoto, Y. J. Phys. Chem. B 1998, 102, 653-656.

(229) Hansmann, U. H. E.; Okamoto, Y. J. Phys. Chem. B 1999, 103, 1595-1604.

(230) La Penna, G.; Mitsutake, A.; Masuya, M.; Okamoto, Y. Chem. Phys. Lett. 2003,
380, 609-619.

(231) Ohkubo, Y. Z.; Brooks, C. L. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 13916-
13921.

(232) Schaefer, M.; Bartels, C.; Karplus, M. J. Mol. Biol. 1998, 284, 835-848.

(233) Sugita, Y.; Okamoto, Y. Biophys. J. 2005, 88, 3180-3190.

(234) Yoda, T.; Sugita, Y.; Okamoto, Y. Chem. Phys. 2004, 307, 269-283.

(235) Yoda, T.; Sugita, Y.; Okamoto, Y. Chem. Phys. Lett. 2004, 386, 460-467.

(236) Kabsch, W.; Sander, C. Biopolymers 1983, 22, 2577-2637.

(237) Johnson, W. C. Annu. Rev. Biophys. Biophys. Chem. 1988, 17, 145-166.

(238) Sreerama, N.; Woody, R. W. Methods Enzymol. 2004, 383, 318-351.

(239) Gratzer, W. B.; Doty, P.; Holzwarth, G. M. Proc. Natl. Acad. Sci. U. S. A. 1961,
47, 1785-1791.

(240) Manning, M. C.; Illangasekare, M.; Woody, R. W. Biophys. Chem. 1988, 31, 77-
86.

(241) Bayley, P. M.; Nielsen, E. B.; Schellma.Ja J. Phys. Chem. 1969, 73, 228-243.

(242) Clark, L. B. J. Am. Chem. Soc. 1995, 117, 7974-7986.


218









(243) Hirst, J. D. J. Chem. Phys. 1998, 109, 782-788.


(244) Woody, R. W.; Sreerama, N. J. Chem. Phys. 1999, 111, 2844-2845.

(245) Goux, W. J.; Hooker, T. M. J. Am. Chem. Soc. 1980, 102, 7080-7087.

(246) Ridley, J.; Zerner, M. Theor. Chim. Acta 1973, 32, 111-134.

(247) Wlodawer, A.; Svensson, L. A.; Sjolin, L.; Gilliland, G. L. Biochemistry 1988, 27,
2705-2717.

(248) Blake, C. C. F.; Koenig, D. F.; Mair, G. A.; North, A. C. T.; Phillips, D. C.; Sarma,
V. R. Nature 1965, 206, 757-761.

(249) Vocadlo, D. J.; Davies, G. J.; Laine, R.; Withers, S. G. Nature 2001, 412, 835-
838.

(250) Nielsen, J. E.; McCammon, J. A. Protein Sci. 2003, 12, 313-326.

(251) Bartik, K.; Redfield, C.; Dobson, C. M. Biophys. J. 1994, 66, 1180-1184.

(252) Tironi, I. G.; Sperb, R.; Smith, P. E.; Vangunsteren, W. F. J. Chem. Phys. 1995,
102, 5451-5459.

(253) Case, D. A.; Darden, T. A.; T.E. Cheatham, I.; Simmerling, C. L.; Wang, J.; Duke,
R. E.; R.Luo; Merz, K. M.; Pearlman, D. A.; Crowley, M.; Walker, R. C.; Zhang,
W.; Wang, B.; S.Hayik; Roitberg, A.; Seabra, G.; Wong, K. F.; Paesani, F.; Wu,
X.; Brozell, S.; Tsui, V.; H.Gohlke; Yang, L.; Tan, C.; Mongan, J.; Hornak, V.; Cui,
G.; Beroza, P.; Mathew, D. H.; C.Schafmeister; Ross, W. S.; Kollman, P. A.;
University of California, San Francisco: San Francisco, 2006.

(254) Frisch, M. J. T., G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.;
Cheeseman, J. R.; Montgomery, Jr., J. A.; Vreven, T.; Kudin, K. N.; Burant, J. C.;
Millam, J. M.; lyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.;
Scalmani, G.; Rega, N.; Petersson, G. A.; Nakatsuji, H.; Hada, M.; Ehara, M.;
Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao,
O.; Nakai, H.; Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B.;
Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev,
O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.;
Morokuma, K.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Zakrzewski, V. G.;
Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas, O.; Malick, D. K.; Rabuck, A.
D.; Raghavachari, K.; Foresman, J. B.; Ortiz, J. V.; Cui, Q.; Baboul, A. G.;
Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.;
Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith, T.; AI-Laham, M. A.; Peng, C. Y.;
Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong,
M. W.; Gonzalez, C.; and Pople, J. A.; Gaussian, Inc.: Wallingford CT, 2004.


219









(255) Ditchfie.R Mol. Phys. 1974, 27, 789-807.

(256) He, X.; Wang, B.; Merz, K. M. J. Phys. Chem. B 2009, 113, 10380-10388.

(257) Anandakrishnan, R.; Onufriev, A. J. Comput. Biol. 2008, 15, 165-184.

(258) Gordon, J. C.; Myers, J. B.; Folta, T.; Shoja, V.; Heath, L. S.; Onufriev, A. Nucleic
Acids Res. 2005, 33, 368-371.


220









BIOGRAPHICAL SKETCH

Yilin Meng was born in Jilin, Jilin Province, People's Republic of China. He went to

the Dalian University of Technology at Dalian, Liaoning Province and studied chemical

engineering. He graduated with a bachelor's degree in engineering in 2004. During his

college, Yilin has developed an interest in the computational chemistry, especially the

electronic structure theory and has worked in Dr. Ce Hao' group for a year.

In August 2004, Yilin came to the University of Florida and began his life as a

graduate student. His original plan was to keep studying the electronic structure theory.

However, he was impressed by the research of Dr. Adrian E. Roitberg. Later, he joined

the Roitberg group and started his career in the molecular modeling.


221





PAGE 1

1 CONSTANT pH REPLICA EXCHANGE MOLECULAR DYNAMICS STUDY OF PROTEIN STRUCTURE AND DYNAMICS By YILIN MENG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2010

PAGE 2

2 2010 Yilin Meng

PAGE 3

3 To my family

PAGE 4

4 ACKNOWLEDGMENTS At the completion of my graduate study at the University of Florida, I would like to take great pleasure in acknowledging th e people who have supported me over these years. I primarily thank my advisor, Professor Adrian E. Roitberg. Throughout the years wor king in his group, I have learned a tremendous amount from him. His guidance and encouragement supported me to overcome the obstacles not only in research but also in my personal life. There is no way I would have achieve d my goal without his support and help. I am thank ful for the support and guidance of my committee members, Professor s Kenneth M. Merz Jr., Nicolas C. Polfer Ste ph en J. Hagen and Arthur S. Edison. I also would like to thank Professor s So Hirata, Joanna R. Long, Carlos L. Simmerling and Wei Yang for their guidance in my research. I am very grateful for the assistance and helpful discussions from my colleagues in the Roitberg group, especially Dr. Daniel Sindhikara, Dr. Gustavo Seabra, Dr. Lena Dolghih Dr. Seonah Kim, Jason Swails Danial Dashti, Billy Miller, Dwight McGee, and Sung Cho I appreciate all my friend s at the Quantum Theory Project the Department of Chemistry and Physics. I thank the source of funding that supported my graduate study My research was supported by National Institute of Health under Contract 1R01 AI073674. Computer resources and support were provided by the Large Allocations Resourc e Committee through grant TG MCA05S010 and the University of Florida High Performance Computing Center.

PAGE 5

5 I want to acknowledge my wife Xian who encouraged me and supported me to complete this work. Finally, I am very grateful for my whole family for their love and encouragement.

PAGE 6

6 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF TABLES ................................ ................................ ................................ ............ 9 LIST OF F IGURES ................................ ................................ ................................ ........ 10 LIST OF ABBREVIATIONS ................................ ................................ ........................... 17 ABSTRACT ................................ ................................ ................................ ................... 19 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 21 1.1 Acid Base Equilibrium ................................ ................................ ....................... 21 1.2 Amino Acids and Proteins ................................ ................................ ................. 22 1.3 Ionizable Residues in Proteins and the Effect of pH on Proteins ...................... 25 1.4 Measuring p K a Values of Ionizable Residues ................................ ................... 29 1.5 Molecul ar Modeling ................................ ................................ ........................... 38 1.6 Potential Energy Surface ................................ ................................ .................. 39 1.7 Molecular Dynamics, Monte Carlo Methods and Ergodicity .............................. 41 1.8 Theoretical Protein Titration Curves and p K a Calculations Using Poisson Boltzmann Equation ................................ ................................ ............................ 44 1.9 Computing p K a Values by Free Energy Calculati ons ................................ ........ 48 1.10 p K a Prediction Using Empirical Methods ................................ ......................... 53 1.11 Constant pH Molecular Dynamics (Constant pH MD) Methods ...................... 53 2 THEORY AND METHODS IN MOLECULAR MODELING ................................ ...... 59 2.1 Potential Energy Functions and Classical Force Fields ................................ .... 59 2.1.1 Potential Energy Surface ................................ ................................ ......... 59 2.1.2 Force Field Models ................................ ................................ .................. 60 2.1.3 Protein Force Field Models ................................ ................................ ...... 63 2.2 Molecular Dynamics (MD) Method ................................ ................................ .... 64 2.2.1 MD Integrator ................................ ................................ .......................... 64 2. 2.2 Thermostats in MD Simulations ................................ ............................... 65 2.2.3 Pressure Control in MD Simulations ................................ ........................ 68 2.3 Monte Carlo (MC) Method ................................ ................................ ................ 70 2.3.1 Canonical Ensemble and Configuration Integral ................................ ..... 70 2.3.2 Markov Chain Monte Carlo (MCMC) ................................ ....................... 71 2.3.3 The Metropolis Monte Carlo Method ................................ ....................... 73 2.3.4 Ergodicity and the Ergodic Hypothesis ................................ .................... 74 2.4 Solvent Models ................................ ................................ ................................ 74 2.4.1 Explicit Solvent Model ................................ ................................ ............. 75

PAGE 7

7 2.4.2 The Poisson Boltzmann (PB) Implicit Solvent Model ............................... 77 2.4.3 The Generalized Born (GB) Implicit Solvent Model ................................ 79 2.5 p K a Calculation Methods ................................ ................................ ................... 80 2.5.1 The Co ntinuum Electrostatic (CE) Model ................................ ................ 80 2.5.2 Free Energy Calculation Methods ................................ ........................... 82 2.5.3 Constant pH MD Methods ................................ ................................ ....... 87 2.6 Advanced Sampling Methods ................................ ................................ ........... 94 2.6.1 The Multicanonical Algorithm (MUCA) ................................ ..................... 95 2.6.2 Parallel Tempering ................................ ................................ .................. 96 2.7 Replica Exchange Molecular Dynamics (REMD) Methods ............................... 97 2.7.1 Temperature REMD (T REMD) ................................ ............................... 99 2.7.2 Hamiltonian REMD (H REMD) ................................ .............................. 105 2.7.3 Technical Details in REMD Simulations ................................ ................ 105 3 CONSTANT pH REMD: METHOD AND IMPLEMENTATION .............................. 114 3.1 Introduction ................................ ................................ ................................ ..... 114 3.2 Theory and Methods ................................ ................................ ....................... 114 3.2.1 Constant pH REMD Algorithm in AMBER Simulation Suite .................. 114 3.2.2 Simulation Details ................................ ................................ .................. 118 3.2.3 Global Conformational Sampling Comparison Using Cluster Analysis .. 120 3.2.4 Local Conformational Sampling and Convergence to Final State ......... 122 3.3 Results and Discussion ................................ ................................ ................... 122 3.3.1 Reference Compounds ................................ ................................ .......... 122 3.3.2 Model peptide ADFDA ................................ ................................ ........... 124 3.3.3 Heptapeptide derived from OMTKY3 ................................ ..................... 128 3.4 Conclusions ................................ ................................ ................................ .... 136 4 CONSTANT pH REMD : STRUCTURE AND DYNAMICS OF THE C PEPTIDE OF RIBONUCLEASE A ................................ ................................ ........................ 137 4.1 Introduction ................................ ................................ ................................ ..... 137 4.2 Methods ................................ ................................ ................................ .......... 143 4.2.1 Simulation Details ................................ ................................ .................. 143 4.2.2 Cluster Analysis ................................ ................................ ..................... 144 4.2.3 Definition of the Secondary S tructure of Proteins (DSSP) Analysis ...... 145 4.2.4 Computation of the Mean Residue Ellipticity ................................ ......... 145 4.3 Results and Discussion ................................ ................................ ................... 150 4.3.1 Testing Structural Convergence ................................ ............................ 150 4.3.2 p K a Calculation and Convergence ................................ ......................... 151 4.3.3 The Mean Residue Ellipticity of the C peptide ................................ ....... 151 4.3.4 Helical Structures in the C peptide ................................ ........................ 153 4.3.5 The Two Dimensional Probability Densities ................................ .......... 157 4.3.6 Important Electrostatic Interactions: Lys1 Glu9 and Glu2 Arg10 ........... 160 4.3.7 Import ant Electrostatic Interactions: Phe8 His12 ................................ ... 164 4.3.8 Cluster Analysis Results ................................ ................................ ........ 167 4.4 Conclusions ................................ ................................ ................................ .... 168

PAGE 8

8 5 CONSTANT pH REMD: p K a CALCULATIONS OF HEN EGG WHITE LYSOZYME ................................ ................................ ................................ .......... 170 5.1 Introduction ................................ ................................ ................................ ..... 170 5.2 Simulation Details ................................ ................................ ........................... 174 5.3 Protein Conformational and Protonation State Equilibrium Model .................. 176 5.4 NMR Chemical Shift Calculation s ................................ ................................ ... 177 5.5 Results and Discussions ................................ ................................ ................. 178 5.5.1 Structural Stability and p K a Convergence ................................ .............. 178 5.5.2 p K a Predictions ................................ ................................ ...................... 182 5.5.3 Constant pH REMD Simulations with a Weaker Restraint .................... 184 5.5.4 Active S ite Ionizable Residue p K a Prediction: Asp52 ............................ 187 5.5.5 Active Site Ionizable Residue p K a Prediction: Glu35 ............................. 189 5.5.6 Correlation between Conformation and Protonation .............................. 193 5.5.7 Conformation Protonation Equilibrium Model ................................ ........ 197 5.5.8 Theoretical NMR Titration Cu rves ................................ ......................... 201 5.6 Conclusions ................................ ................................ ................................ .... 203 LIST OF REFERENCES ................................ ................................ ............................. 206 BIOGRAPHICAL SKETCH ................................ ................................ .......................... 221

PAGE 9

9 LIST OF TABLES Table page 1 1 Intrinsic p K a values of ionizable residues in proteins. 26 ................................ ...... 29 3 1 The REMD p K a predictions of reference compounds. ................................ ...... 123 3 2 p K a ........................... 125 3 3 Correlation coefficients between MD and REMD cluster populations. .............. 128 4 1 Correlation coefficients between two sets of cluster popu lations. ..................... 151 5 1 Simulation details of constant pH REMD runs ................................ .................. 175 5 2 Predicted p K a values and their RMS errors rel ative to experimental measurements from the restrained REMD simulations. ................................ ... 183 5 3 Predicted p K a values and their RMS errors relative to experimental measurements from weakly restrained R EMD simulations. ............................. 185 5 4 Distance between Glu35 carboxylic oxygen atoms and neighboring residue side chain atoms in 1AKI crystal structure. ................................ ....................... 190

PAGE 10

10 LIST O F FIGURES Figure page 1 1 A) Structure of an amino acid named alanine. An amino group ( NH2), a carboxylic acid group ( COOH), a side chain ( R, in this case, a methyl group) and a hydrogen atom are bonded to a central carbon atom (C ). B) Dihedral angles and of alanine dipeptide. ................................ .................... 23 1 2 A Ramachandran plot (a contour plot showing the probability densi ty of ( ) pairs) of tyrosine generated from the simulation of a heptapeptide which will be described later in chapter 3. In this figure, a left handed helix is also shown. ................................ ................................ ................................ ................ 25 1 3 A diagram showing the cartoon representation of an enzyme at low pH (acidic) and at around the optimal pH value. EH indicates the structure at low pH and E stands for the zwitterion form, which is the active species in our model. 13 ................................ ................................ ................................ .............. 26 1 4 The reaction schemes showing the enzyme reactions at which pH values are smaller than the optimal pH value. K s K K 1 and K 2 are equilibrium constants of corresponding reactions and k cat is the rate constant of the rate determining step. This model can be used to explain how pH value affects enzyme catalysis in the pH range that is larger than optimal pH. 13,14 ................. 27 1 5 A) An exampl the titration described in Figure 1 5A. The two plots are generated from constant pH MD simulations of an aspartic acid in a pentapeptide. ................... 30 1 6 13 C NMR titration curves of aspartate residues in HIV 1 protease/KNI 272 complex taken from Wang et al .,1996. 27 In this figure, Asp C chemical shifts are plotted as a function of pD. Asp25 and Asp125 do not change protonation states in this pD range. But isotope shift experiments show that with permission from Wang, Y. X.; Freedberg, D. I.; Yamazaki, T.; Wingfield, P. T.; Stahl, S. J.; Kaufman, J. D.; K iso, Y.; Torchia, D. A. Biochemistry 1996 35 9945 9950 ........................... 32 1 7 Thermodynamic cycle used to compute p K a shift. Both acid dissociation reacti ons occur in aqueous solution. A thermodynamic cycle is a series of thermodynamic processes that eventually returning to the initial state. A state function, such as reaction free energy in this case, is path independent and hence, unchanged through a cycl ic process. ................................ ...................... 49 1 8 7 and Figure 1 8, protein AH represents the ionizable residue in protein environment. AH represents the reference compound

PAGE 11

11 which is usually the ionizable residue with two termini capped. In practice, a proton does not disappear but instead becomes a dummy atom. The proton has its position and velocity. The bonded interactions involving the proton are still effective. However, there is no non bonded interact ion for that proton. The change in protonation state is reflected by changes of partial charges in the ionizable residue. ................................ ................................ .......................... 50 2 1 A diagram showing bond stretching coupled with angle be nding. A cross term calculating coupling energy is adopted when evaluating the total potential energy. ................................ ................................ ................................ 62 2 2 A diagrammatic description of TIP3P and TIP4P water models. A) TIP3P model The red circle is oxygen atom and the black circles are the hydrogen atoms. Experimental bond length and bond angle are adopted. B) TIP4P model. Oxygen and hydrogen atoms are labeled with same color as in the TIP3P model. TIP4P model also employs the exp erimental OH bond length and HOH bond angle. Clearly, the fourth site (green circle) which carries negative partial charge has been added to the TIP4P model. ............................ 77 3 1 Methods to perform exchange attempts. A) Only molecular structures are attempted to exchange. The protonation states are kept the same. B) Both molecular structures and protonation states are attempted to exchange. ........ 115 3 2 Titration curves of blocked aspartate amino acid from 100 ns MD at 300K and REMD runs. Agreement can be seen between MD and REMD simulations. ................................ ................................ ................................ ....... 123 3 3 Cumulative ave rage protonation fraction of aspartic acid reference compound vs Monte Carlo (MC) steps at pH=4. ................................ ................................ 124 3 4 The titration curves of the model peptide ADFDA at 300K from both MD and REMD s imulations. MD simulation time was 100 ns and 10 ns were chosen for each replica for REMD runs. ................................ ................................ ....... 125 3 5 Cumulative average protonation fraction of Asp2 in model peptide ADFDA vs Monte Carlo (MC) steps at pH=4. ................................ ................................ ..... 126 3 6 (Ramachandran plots) for Asp2 at pH 4 in ADFDA. Ramachandran plots at other solution pH values are similar. For Asp2, constant pH MD and REMD sampled the same local backbone conforma tional space. Phe3 and Asp4 Ramachandran plots also display the same trend. ................................ ........... 127 3 7 Cluster populations of ADFDA at 300K. A) MD vs REMD at pH 4. Trajectories from MD and REMD simulatio ns are combined first. By clustering the combined trajectory, the MD and REMD structural ensembles will populate the same clusters. The fraction of the conformational ensemble corresponding to each cluster (fractional population of each cluster) was

PAGE 12

12 calcul ated for MD and REMD simulation, respectively. Two sets of fractional population of clusters were generated, and hence plotted against each other. B) Two REMD runs from different starting structures at pH 4. Large correlation shown in Figure 3 7B suggests that the REMD runs are converged. Large correlations between two independent REMD runs are also observed at other solution pH values. Correlations between MD and REMD simulations can be found in Table 3 3. ................................ ............................. 128 3 8 A) Titration curves of Asp3 in the heptapeptide derived from protein OMTKY3. B) Titration curves of Lys5 and Tyr7 in the heptapeptide derived K a values of Asp3 are f ................................ ................................ .. 129 3 9 A) Cumulative average protonation fraction of Asp3 of the heptapeptide derived OMTKY3 vs MC steps. B) and C) is cumulative average protonation fracti on of Tyr7 and Lys5 in the heptapeptide vs MC steps, respectively. Clearly, faster convergence is achieved in contant pH REMD simulations. ..... 131 3 10 pH MD results. B) Constant pH REMD results. The two probability densities are almost identical, indicating that constant pH MD and REMD sample the same local conformation al space. All others also show very similar trend. ................ 133 3 11 The root mean robability density behaviors at other pH values also show that REMD runs converge to final distribution faster. ................................ ................................ ............................. 134 3 12 Cluster population at 300 K from constant pH MD and REMD simulations at pH=4. Cluster analysis is performed using the entire simulation. The populations in each cluster from the first and second half of the trajectory are compared and plotted. Ideal ly, a converged trajectory should yield a correlation coefficient to be 1. A) Constant pH MD. B) Constant pH REMD. Much higher correlation coefficient can be seen in constant pH REMD simulation, suggesting much better convergence is achieved by the constan t pH REMD run. ................................ ................................ ................................ .. 135 4 1 Cluster population at 300 K from constant pH REMD simulations at pH 2. A) Cluster analysis is performed on the trajectory initiated from fully extended structure The populations in each cluster from the first and second half of the trajectory are compared and plotted. B) Two REMD runs from different starting structures at pH 2. Correlation coefficients at other pH values can be found in Table 4 1. ................................ ................................ ............................ 150 4 2 Cumulative average fraction of protonation vs Monte Carlo (MC) steps. Only the two glutamate residues are shown here and the histidine residue is found

PAGE 13

13 to show the same trend. The pH values a re selected such that the overall average fraction of protonation is close to 0.5. ................................ ................. 152 4 3 Computed the mean residue ellipticity at 222 nm as a function of pH values. A bell shaped cur ve at 300 K is obtained with a maximum at pH 5. The effect of temperature on mean residue ellipticity at 222 nm is also demonstrated. .... 153 4 4 Helical Content as a function of residue n umber. ................................ ............. 154 4 5 A) Time series of C RMSDs vs the fully helical structure at pH 5. The first two residues at each end are not selected because the ends are very flexible. B) Probability densities of the C RMSDs. Clearly, the structural ensemble at pH 5 contains more structures similar to t he fully helical structure. C) Time series of C radius of gyration at pH 5. D) Probability density of the C radius of gyration. More compact structures are found at pH 5. ................................ ................................ ................................ ...................... 155 4 6 A) Probability densities of number of helical residues in the C peptide. B) Probability densities of the number of helical segments in the C peptide. A helical segment contains continuous helical residues. The probability of forming the second helical segment is very low at all three pH values, thus only the first helical segment is further studied. C) Probability densities of the starting position of a helical segment. D) Probability densities of the length of a helical segment (number of residues in a helica l segment). .......................... 156 4 7 2D probability density of helical starting position and helical length, pH = 2. .... 158 4 8 2D p robability density of helical starting position and helical length, pH=5. ...... 158 4 9 2D probability density of helical starting position and helical length, pH=8. ...... 159 4 10 2D probability density of helical length and C RMSD at pH = 2. ..................... 159 4 11 2D probability density of helical length and C RMSD a t pH = 5. ..................... 160 4 12 2D probability density of helical length and C RMSD at pH = 8. ..................... 160 4 13 A) Probability densi ty of Lys1 Glu9 distance (). The distance is the minimum distance between the side chain nitrogen atom of Lys1 and the side chain carboxylic oxygen atoms of Glu9. B) Probability density of Glu2 Arg10 distance (). The distance is the minimum distance betw een side chain carboxylic oxygen atoms of Glu2 and guanidinium nitrogen atoms of Arg10. ................................ ................................ ................................ ............... 162 4 14 Two dimensional probability density of Lys1 Glu9 and Glu2 Arg10 at pH 5. Apparently, Ly s1 Glu9 and Glu2 Arg10 salt bridges cannot be formed simultaneously. ................................ ................................ ................................ 162

PAGE 14

14 4 15 A) Two dimensional probability density of Glu2 Arg10 salt bridge formation and helical length at pH 5. Acc ording to the plot, the Glu2 Arg10 salt bridge can be found in four residue, six residue and non helical structures. B) Two dimensional probability density of Glu2 Arg10 salt bridge and the helix starting position at pH 5. If a helix begins from Thr3, it c annot have a Glu2 Arg10 salt bridge. Thus, one role of the Glu2 Arg10 salt bridge is to prevent helix formation from Thr3. ................................ ................................ ................. 163 4 16 A) Probability density of Phe8 backbone to His12 rin g distance. The distance is the minimum distance between Phe8 backbone carbonyl oxygen atom and His12 imidazole nitrogen atoms. B) Probability density of Phe8 ring to His12 ring distance. The distance is the minimum distance between Phe8 aromatic ring carb on atoms and His12 imidazole nitrogen atoms. ................................ .. 164 4 17 A) Two dimensional probability density of Glu2 Arg10 distance and Phe8 His12 backbone to ring distance at pH 5. B) Correlations be tween Glu2 Arg10 salt bridge and Phe8 His12 contact at pH 5. ................................ .......... 166 4 18 A) Two dimensional probability density of helical segment length and Phe8 His12 interaction. B) Two dimensional pr obability density of helical segment starting position and Phe8 His12 interaction. Phe8 His12 also stabilizes four residue and six residue structures. Helices begin at Lys7 and Phe8 His12 is coupled. Unlike Glu2 Arg10, Phe8 His12 stabilizes helices startin g from Thr3. ................................ ................................ ................................ ................. 167 4 19 A) Top 20 populated clusters and average helical percentage. B) Probability densities of the C RMSD vs the fully helical structure of the top 2 populated clusters. C) Helical Percentage as a function of residue number of the top 2 populated clusters. D) Probability density of the Glu2 Arg10 and Phe8 backbone His12 ring interactions in the second m ost populated cluster. ......... 169 5 1 Crystal structure of HEWL (PDB code 1AKI). Residues in red represent aspartate and residues in blue are glutamate. ................................ .................. 171 5 2 A simple schematic view of the conformation protonation equilibrium in a constant pH simulation. ................................ ................................ .................... 176 5 3 C RMSD vs crustal structure (PDB code: 1AKI). A) C RMSD vs 1AKI from REMD without restraint on C B) C RMSD vs 1AKI from REMD with restraint on C The restraint strength is 1 kcal/molA 2 ................................ .... 179 5 4 p K a prediction error as a function of time. The predicted p K a at a given time is a cumulative result. For each ionizable residue, the time series of its p K a error is generated at a pH where the average predicted p K a is closest to that pH va lue. In this way, we try to eliminate any bias toward the energetically favored state. A flat line is an indication of convergence. Glu35 is not shown here due to poor convergence. ................................ ................................ ......... 180

PAGE 15

15 5 5 A) p K a prediction convergence to its final value. Similarly, the p K a value at a given time is a cumulative average. A flat line having y value of 0 is expected when p K a calculation convergence is reached. The same pH values are chosen for each ionizab le residue as in Figure 5 4. B) Asp52 p K a prediction convergence to its final value at multiple pH values. The pH values are selected in such a way that the p K a calculated at this pH will be used to compute composite p K a ................................ ................................ ................... 181 5 6 RMS error between predicted and experimental p K a vs pH value. A minimum of p K a RMS error can be found near the pH at which 1AKI crystal structure is resolved. ................................ ................................ ................................ ........... 184 5 7 A) C RMSD of HEWL from weaker restraint REMD simulations. The RMSDs are larger than those with stronger restraints. When comparing RMSDs at different pH for simulations using weaker restraint, RMSDs are greater at pH 3 and 4 than those at pH 4.5. B) p K a predi ction deviation from final value at pH 4.5 from constant pH REMD with 0.1 kcal/mol 2 ................................ ...... 186 5 8 Asp52 in the crystal structure of 1AKI. Its neighbors that having strong electrostatic in teractions are also shown. ................................ ......................... 188 5 9 A) Time series of Asp52 carboxylic oxygen atom OD1 to Asn59 and Asn44 ND2 distances at pH 3 in the 1 kcal/mol 2 constant pH REMD run. B) Time series of Asp52 carboxylic oxygen atom OD2 to Asn59 and Asn44 ND2 distances under the same condition. Hydrogen bonds which are stabilizing deprotonated Asp52 are formed in a large extent even at a low pH. ................ 188 5 10 A) Time series of the Glu35 heavy atoms (excluding two carboxylic oxygen atoms) RMSD relative to crystal structure 1AKI. B) Probability distribution of the RMSD. The conformation centered at RMSD ~0.1 is labeled as conformation 1. The one centered at ~0.6 is named conformation 2. Apparently, an extra conformation (conformation 3) is visited by the weakly restrained REMD simulation. ................................ ................................ ............ 191 5 11 A) Representative Structure of conformation 1. B) Representative Structure of conformation 2. The structure ensemble is generated from REMD simulations with stronger restraining potential. The carboxylic group of Glu35 in conformation 2 is clearly pointing toward the amide group of Ala110. Deprotonated form of Glu35 tends to decrease the electrostatic energy. Furthermore, conformation 1 does not particularly favor the protonated Glu35. No significant stabilizing factor is found for the protonated Glu35. ........ 192 5 12 Representative Structure of conformation 3 from cluster analysis. Glu35 is in the hydrophobic region, consisting of Gln57, Trp108 and Ala110 Conformation 1 and 2 in the weakly restrained simulati ons are basically the same as those demonstrated in Figure 5 11. ................................ ................... 193

PAGE 16

16 5 13 A) Correlation between side chain dihedral angle 1 and protonation states. B) Correlation between side chain dihedral angle 2 and protonation states. ... 194 5 14 Minimal distance between Asp119 side chain carboxylic oxygen atoms (OD1 and OD2) and Arg125 guanidinium nitrogen atoms. Since guanidi nium group has three nitrogen atoms, the minimal distance is the shortest distance between Asp119 OD1 (or OD2) and those three nitrogen atoms. .................... 196 5 15 A) Probability distribution of A sp119 CG to Arg125 CZ distances. The Asp119 CG to Arg125 CZ distance is used to distinguish conformations. B) Coupling between conformations and protonation states. ................................ 197 5 16 K 12 / K 12,h as a function of pH and its dependence on p K a,1 and p K a,2 ............... 199 5 17 A) Fraction of each species as a function of pH (titration curves) obtained from equations based on conformation pro tonation equilibrium. The effect of 12 is tested. B) Comparison of titration curves derived from actual simulations and from the equilibrium equations. ................................ ............... 200 5 18 Theoretical NMR chemical shifts as a function of pH. It conformation protonation equilibrium model can reproduce experimental titration curve based on NMR chemical shift measurements. ........................... 202

PAGE 17

17 LIST OF ABBREVIATION S ACE Analytical Continuum Electrostatic BA R Bennett Acceptance Ratio CD Circular Dichroism CE Continuum Electrostatic CPHMD Continuous Constant pH Molecular Dynamics CPL Circularly Polarized Light DOF Degree of Freedom DOS Density of States DSSP Definition of the Secondary Structure of Proteins EA F Exchange Attempt Frequency EFP Effective Fragment Potential FEP Free Enery Perturbation FDPB Finite Differece Poisson Boltzmann GB Generalized Born HEWL Hen Egg White Lysozyme HH Henderson Hasselbach H REMD Hamiltonian Replica Exchange Molecular Dynamics LCPL Left Circularly Polarized Light MC Monte Carlo MCMC Markov Chain Monte Carlo MCCE Multiconfo rmation Continuum Electrostatic MD Molecular Dynamics MDFE Molecular Dynamics b ased Free Energy (calculation)

PAGE 18

18 MM Molecular Mechanics MUCA Multicanonical NMR N uclear Magnetic Resonance NPT Isothermal isobaric Ensemble NVE Microcanonical Ensemble NVT Canonical Ensemble PB Poisson Boltzmann PBC Periodic Boundary Condition PES Potential Energy Surface PDF Probability Distribution Function PMF Potential of the Mean Force QM Quantum Mechanics QM/MM Hybrid Quantum Mechanical Molecular Mechanical RCPL Right Circularly Polarized Light REM Replica Exchange Method REMD Replica Exchange Molecular Dynamics REX CPHMD Replica Exchange C ontinuous Constant pH Molecular Dynamics RF Radio Frequency RMSD Root Mean Square Deviation TI Thermodynamic Integration T REMD Temperature Replica Exchange Molecular Dynamics V REMD Viscosity Replica Exchange Molecular Dynamics

PAGE 19

19 Abstract of Dissertation Presented to the Graduate School of the U niversity of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy CONSTANT pH REPLICA EXCHANGE MOLECULAR DYNAMICS STUDY OF PROTEIN STRUCTURE AND DYNAMICS By Yilin Meng August 2010 Chair: Adrian E. Roi tberg Major: Chemistry Solution pH is a very important thermodynamic variable that affects protein structure, function and dynamics. Enormous effort has been made experimentally and computationally to understand the effect of pH on protein s One category of computational method to study the effect of pH is the constant pH molecular dynamics (constant pH MD) methods. C onstant pH MD employs dynamic protonation in simulations and correlate s protein conformation s and protonation state s Therefore, constant pH MD algorithms are able to predict p K a value of an ionizable residue as well as to study pH dependence directly. A replica exchange constant pH molecular dynamics (constant pH REMD) method is proposed and implemented to improve coupled protonation and conf ormational state sampling. By mixing conformational sampling at constant pH (with discrete protonation states) with a temperature ladder, this method avoids conformational trapping. Our method was tested on seven different b iological systems. The constant pH REMD not only predicted p K a correctly for model peptide s but also converged faster than constant pH MD Furthermore the constant pH REMD showed its advantage in the efficiency of conformational samplings. The advantage of utilizing constant pH REMD is clear.

PAGE 20

20 We have studied the effect of pH on the structure and dynamics of C peptide from ribonuclease A by constant pH REMD The mean residue ellipticity at 222 nm at each pH value is computed, as a direct comparison with experimental measurements. The C pe ptide conformational ensembles at pH 2, 5, and 8 are studied. The Glu2 Arg10 and Phe8 His12 interactions and their role s in the helix formation are also investigated Constant pH REMD method is applied to the study of hen egg white lysozyme (HEWL). p K a v alues are calculated and compared with experimental values. Factors that could affect p K a prediction such as hydrogen bond network and interaction between ionizable residues are discussed. Structural feature such as coupling between conformation and proton ation states is demonstrated in order to emphasize the importance of accurate sampling of the coupled conformations and protonation states.

PAGE 21

21 CHAPTER 1 INTRODUCTION 1.1 Acid Base Equilibrium Acids and bases are common in our daily lives. For example, vineg ar is acidic and ammonia is basic. According to the Bronsted Lowry definition, an acid is a chemical compound that can donate protons and a base is a chemical compound that can accept protons. A n acid can be converted to its conjugate base by transferring a proton to a base and a base is converted to its conjugate acid by accepting a proton. For simplicity, the conversion between an acid and its conjugate base can be described by the reaction: + + where HA is an acid, A is its conjugate base and + represents proton (in aqueous environment, + is hydronium ion 3 + ) There exists an equilibrium state between any acid base conjugate pair. At equilibrium, the concentration of each species is constant In an acid base reaction, an acid dissociation constant is used to describe this equilibrium. The acid dissociation constant has the definition of Eq. 1 1. = + (1 1) H ere K a is the acid dissociation constant and + and represent the activity of each species, respectively. In Eq. 1 1, the activity of each individual species (take as an example) can be e xpressed as: = [ ] (1 2) In Eq. 1 2, is the activity coefficient of [ ] is the concentration of and is the standard concentration which is 1 M. In an ideal solution, the activity coefficients are unity. T he concentration of each species is divided by standard

PAGE 22

22 concentration in order to make the acid dissociation constant dimension less. For simplicit y, the acid dissociation constant is expressed using the concentration of each species from now on. The K a indicates the strength of an acid: the stronger the acid is, the larger the K a is. The order of magnitude of K a can span over a broad range. Therefor e, a logarithmic (base 10) measure of the K a is more frequently adopted: = log 10 (1 3 ) Combining Eq. 1 1 and Eq. 1 3 we can express the p K a value as: = log 10 (1 4 ) Eq. 1 4 is the Henderson Hasselbalch (HH) equation. It allows one to solve directly for pH values instead of calculating the concentration of hydronium ions first. When = the HH equation becomes = Therefore, the p K a value of an acid is numerically equal to the pH value at which the acid and its conjugate base have the same concentration s The acid dissociation constant represents the thermodynamics of an acid dissociation reaction because the p K a value is proportional to the Gibbs free energy of the reaction. For simple compound s such as acetic acid, t emperature is the most important factor that affects its p K a value However, for complex molecules such as proteins and peptides, the effect of environment is also crucial and will be discussed in this dissertation. 1.2 Amino A cids and Proteins The goal of this dissertation is to study the acid base equilibrium in peptide and protein systems and its effect on peptide and protein conformations by constant pH REMD method. Thus, an introduction to peptide and protein, especially their structures

PAGE 23

23 will be helpful. Amino acids have the generic structure as shown in Figure 1 1 A Each amino acid consists of an amino g roup ( NH2), a carboxylic acid group ( COOH) and a distinctive side chain ( R). All three groups are connected to a carbon atom which is called carbon alpha (C ). There are twenty naturally occurring side chains and they can be divided into groups based on their physical or chemical properties. For example, one way to categorize the twenty side chains is based on their acid/base properties in aqueous solution T herefore, an aspartic acid is an acidic amino acid and a lysine is a basic amino acid. For an amino acid, its carboxylic group can react with the amine group of another amino acid. This condensation reaction forms a peptide bond which links the two amino a ci ds and yields a water molecule. As a consequence of th e condensation reaction, proteins are formed A protein is a string of amino acids connected by peptide bonds and folded into a globular structure A protein often consists of a minimum of 30 to 50 am ino acids. 1 Shorter chains of amino acids are often called peptides. Each amino acid in a protein or peptide is called a residue. The peptide bonds form the backbone of a protein. A B Figure 1 1. A) S tructure of an amino acid named alanine An amino group ( NH2), a carboxylic acid group ( COOH), a side chain ( R, in this case, a methyl group) and a hydrogen atom are bonded to a central carbon atom (C ). B) Dihedral angles and of alanine dipeptide.

PAGE 24

24 A protein usually has four levels of structure which are called primary structure, secondary structure, tertiary structure and quaternary structure. The primary structure is the sequence of amino acids. The folding of a protein is de termined by its primary structure. Next, the secondary structure (e.g. helix, strand, or loop) is the three dimensional structure of local segments of a protein. As mentioned earlier, proteins fold themselves into functional structures after they are formed. After folding, protein backbones often possess certain type s of fold or alignment. The term of secondary structure is used to describe the three dimensional structure s of such manners. The t wo most common secondary structures found in proteins strands. The local secondary structure of a particular residue in a protein can be described by a Ramachandran plot which is a two dimensional histogram (or probability distribution) of backbone dihedral angle pair ( ). As demonstrated in Figure 1 1B backbones can rotate around the N C and C C bonds, fo rming dihedral angles and Backbone conformations of a residue can be described by specifying ( ). Three main regions are populated in general in a Ramachandran plot, corresponding to the three main stable conformations a residue has: the right h helix region near ( = 57, = strand region near ( = 125, = 150 ) and the polyproline II region near ( = 75, =145). The most populated region indicates the most stable conformation of a residue. An example of Ramachandran plot is shown in Figure 1 2 Furthermore, the tertiary structure is the three dimensional positions of all atoms in a protein. The tertiary structures yield information about protein side chains, for example, salt bridges. Finally, the quaternary structure def ines the positions of all atoms

PAGE 25

25 in a protein containing multiple peptide chains for example, the hemoglobin tetramer It is the highest level of protein structures. Figure 1 2. A Ramachandran plot (a contour plot showing the probability density of ( ) pairs ) of tyrosine generated from the simulation of a heptapeptide which will be described later in chapter 3 In this f igure, a left handed helix is also shown. Proteins perform vital functions, which are important to our lives. Almost all cell a ctivities depend on proteins. For example, hemoglobin can transport oxygen molecules from lung to cells; 1 many chemical reactions occurring in living organisms are catalyzed by proteins c alled enzymes; and proteins are also involved in cell signaling. Mutations in the proteins, aggregation and misfolding of proteins can cause many diseases. For example, many cancers result from the mutations in the tumor suppressor p53. 2,3 Thus, understanding protein structures and functions is important. 1.3 Ionizable Residues in Proteins and t he Effect of pH on Proteins An ionizable residue in a protein is a residue with a side chain that can donate or accept prot on(s). T here are seven ionizable residues: ASP, GLU, HIS, CYS, TYR, LYS and ARG. Ionizable residues define the acid base properties of that protein.

PAGE 26

26 Consequently the s olution pH value becomes an important thermodynamic variable affecting protein structure dynamics folding mechanism and function 4 Many biological phenomena such as protein folding/misfolding, 5 8 substrate docking 9 and enzyme catalysis are pH dependent. 10 12 A good example of how pH value affects protein s is the pH dependence of enzyme kinetics. Most enzymes possess an optimal pH value, at which the reaction rate is largest Enzyme catalysis is pH dependent because the active sites of enzymes in general contain important acidic or basic residues. Only one form (acidic or basic) of the ionizable residue is catalytically active, thus the concentration of the catalytically active species will affect the kinetics. Consider a simple reaction model ( Figure 1 3 and Figure 1 4 ) to demons trate how pH value affects enzyme reaction rate In this model, only the zwitter ion form is active; n o intermediate exists f or the enzyme reaction and the protonation deprotonation steps are fast er than catalysis steps. Furthermore, the rate determining step does not depend on pH value. Figure 1 3. A diagram showing the cartoon representation of an enzyme at low pH (acidic) a nd at around the optimal pH value. EH indicates the structure at low pH and E stands for the zwitterion form, which is the active species in our model. 13

PAGE 27

27 Figure 1 4. The r eaction scheme s showin g the enzyme reactions at which pH values are smaller than the optimal pH value K s K K 1 and K 2 are equilibrium constants of corresponding reactions and k cat is the rate constant of the rate determining step. This model can be used to explain how pH va lue affect s enzyme catalysis in the pH range that is larger than optimal pH. 13,14 The equilibrium const ants shown in Figure 1 4 are not independent of each other s The relationship among them is given by: 2 = 1 (1 5 ) According to the above equation, if 1 = 2 then the substrate binding will not be affect ed by pH value of the so lution. If it is not the case, then the binding is pH dependent. After applying steady state approximatio n to the the reaction rate can be written as: = 0 + 1 + + / 2 + + / 1 (1 6 ) w here 0 is initial concentration of the enzyme and + is the concentration of hydronium ions. At low pH, increasing the concentration of hydronium ions (pH value decreases ) will decrease t he reaction rate. Th e same kind of model can also be applied to derive the effect of pH on reaction rate when the pH is higher than optimal Likewise, only the zwitterion form is catalytically active. The conclusion is that pH value too high or too low wil l lower the enzyme catalytic reaction rate

PAGE 28

28 Give n the importance of the solution pH, k nowing the p K a value of an ionizable residue in a protein is important because it will indicate the average protonation state of that ionizable residue at a certain pH v alue. However, the p K a value of an ionizable residue is highly affected by its protein environment. 15,16 Two major factors affect protein p K a values: one is the desolvation effect and the other is the electrostatic interaction. Other factors such as hydrogen bonding and structural rearrangement are also able to affect protein p K a values. An ionizable side chain in the interior of a protein can have a different p K a value from the isolated amino acid in solution, whi ch is caused by dehydration effect. 17 19 For example, Asp26 of the thioredoxin, which lies in a deep pocket of the protein, has a p K a value of 7.5 17 while the p K a value of a water exposed aspartic acid is 4.0. 20 The Garcia Moreno group has been employing site direct mutagenesis method to study the effect of desolvation 18,19,21 23 and will be described later in this chapter. Their research on the buried ionizable resid ues provide s a probe of the dielectric constant inside the protein, which is an important parameter for the p K a prediction on the basis of the Poisson Boltzmann equation. Electrostatic interactions such as salt bridges are also able to affect p K a values. F or example His31 and Asp70 form a salt bridge in the T4 lysozyme 24 Th e formation of this salt bridge shifts the p K a of Asp 70 to 0.5 and changes the p K a of His31 to 9.1. Interestingly, Asp26 in the thioredoxin has been shown to form a salt bridge with Lys57 when it is in the deprotonated form 25 The formation of a salt bridge should reduce the p K a value of Asp26. Therefore, the p K a value of 7.5 is the combined result of desolvation effect and electrostatic interaction.

PAGE 29

29 Each ionizable residue has its own intrinsi c p K a value. The intrinsic p K a value of an ionizable residue is defined as the p K a value measured when this residue is fully solvent exposed and is not interacting with any other groups, 20 for example, an aspartate residue with two termini blocked. This kind of dipeptide is often used as reference (or model) compound in the theoretical c alculation of protein p K a values. The intrinsic p K a values are reported in Table 1 1: Table 1 1. Intrinsic p K a values of ionizable residues in proteins. 26 Residue Name Intrinsic p K a value ASP 4.0 GLU 4.4 HIS 6.7 CYS 8.0 TYR 9.6 LYS 10.4 ARG 12.0 1.4 Measuring p K a Values of Ionizable Residues A general way to determining the p K a value of an acid experimentally is through titration. In experiments the pH values are measured by a pH meter as a function of the volume of base added to the solution. Therefore, a t itration curve will be obtained ( Figure 1 5 A shows an example of titration curve) and the p K a value is the pH value at which the deprotonate d and protonated species have the same concentrations. Another way of presenting a titration curve is by plotting the fraction of deprotonation (protonation) vs the pH value. A Hill plot (an example is shown in Figure1 4B ) which can be obtained by plottin g log([ A ]/[ HA ]) as a function of pH, is used to study titration behavior. After fitting to the modified HH equation: = + log t he x intercept is the p K a value and the slope ( ) is the Hill coefficient which reflects interactions between ionizable residues. The HH equation will be represented as a

PAGE 30

30 straight line in a Hill plot, with a slope of unity. If only one ionizable residue is present in the system of interest, or an ionizable residue does not couple with other ionizable residue(s) the HH equation should be reproduced A non zero slope reflects statistical error (random error) Intera cting ionizable residues will demonstrate non HH behavior and possess non When > 1 we say the proton binding is positively cooperative which means binding of the first proton will increase the binding affinity of the other one. When < 1 the binding of protons is negatively cooperative which means the binding of one proton will de crease the affinity of the other proton. A B Figure 1 5 A) An example of titration curve on the basis of the t itration described in Figure 1 5 A The two plots are generated from constant pH MD simulations of a n aspartic aci d in a pentapeptide. However, determining p K a value of protein ionizable residues by measuring solution pH as a function of the volume of base is difficult because there are multiple ionizable residues in a protein in general. An experimental technique th at is site specific is preferred. Nuclear Magnetic Resonance (NMR) is one of the most frequently employed spectroscopic methods in chemistry, physics and biological science. One application of the NMR method is to measure p K a values of individual ionizable residues. NMR

PAGE 31

31 spectroscopy measures the absorption of radio frequency (RF) radiation by a nucleus in magnetic field. Only a nucleus with a spin quantum number that equals half of an integer is able to generate NMR signal Furthermore, the absorption is af fected by the chemical environment around that nucleus. Electron density around a nucleus provide s a shielding effect to the external magnetic field for the nucleus. Thus different chemical environment (electron density) around a nucleus will affect its r esonance frequency, resulting in chemical shift. Changes in protonation state are able to result in changes in the chemical shift of the nuclei around the ionizable site (for example, C of Asp, C of Glu, and N and N of His). Subsequently at a given pH value, the equilibrium between the protonated and deprotonated species can yield a weighted average chemical shift, = + 1 + 10 ( ) (1 7 ) He re and are the chemical shift observed, chemical shift of the protonated species, the change in chemical shift s caused by titration, respectively, and n is the Eq. 1 7 the HH equation is implied. Therefore, chemical shifts will be measured at different pH values and a titration curve will be obtained. Figure 1 6 demonstrates a titration curve generated by NMR spectroscopy. However, in practice, o ne dimensional NMR spectra are often too complicated to be interpreted for proteins. Introducing a new spectrum dimension will allow the ability to simplify the spectra and yield more useful information. In two dimensional NMR spectroscopy, the sample is excited by one or more pulses in the so 1 and the signal is not recorded during time 1 Following the evolution time one or more pulses will be

PAGE 32

32 applied to the sample and the resulting signal will be measured as a function of a new time variable 2 1 H, 13 C and 15 N NMR are frequently employed in experiments to determine protein p K a values. 14 Proton NMR has shown to be particularly useful in studying histidine p K a values. It is also employed to study the acid base equilibrium of tyrosine residues. 13 C NMR experiments can be performed to determine the p K a values of lysine and aspartate. Figure 1 6 13 C NMR titration curves of aspartate residues in HIV 1 protease/KNI 272 complex taken from Wang et al ,1996. 27 In this figure, Asp C chemical shifts are plotted as a function of pD. Asp25 and Asp125 do not change protonation states in this pD range. But isotope shift experiments show that Asp25 is protonated and Asp125 is deprotonated in this pD range. Reprinted with permission from Wang, Y. X.; Freedberg, D. I.; Yamazaki, T.; Wingfield, P. T.; Stahl, S. J.; Kaufman, J. D.; Kiso, Y.; Torchia, D. A. Biochemistry 1996 35 9945 9950 One example of measuring the p K a value of an ionizable resid ue using NMR technique is the determination of the p K a value of Asp26 in Escherichia coli

PAGE 33

33 thioredoxin. 17,25,28 30 NMR method, especially the 2D NMR technique, has been intensively employed in the investigations of the p K a value of Asp26. Escherichia coli thioredoxin has two redox forms. The oxidized form has a disulfide bond linking Cys32 and Cys35, while the two cysteine residues are not bonded in the reduced form. H ence, the two cysteine residues are ionizable in the reduced form which makes the investigation s more complicated. Asp26 is located at the bottom of a hydrophobic cavity near the active site disulfide and is completely buried in the protein. In 1991, Dyson et al. investigated pH effect on the thioredoxi n in the vicinity of active site, using 2D NMR. 28 Both oxidized and reduced thioredoxin have been studied. C H and C H chemical shifts of Cys32 and Cys35, and NH, C H and C H chemical shifts of Asp26 as a function of pH value have been measured Those chemical shifts have been found to titrate with a p K a value of 7.5. Since the cysteine residues in the oxidized thioredoxin are not ionizable, they proposed that the apparent p K a is the p K a value of Asp26. In the same year, e xperiments performed by Langsetmo et al. measured electrophoretic mobility of the wild type and D26A mutation of the oxidized thioredoxin as a function of pH. A p K a of 7.5 has been obtained from their experiments. 17 I n 1995 Wilson et al. measur ed the chemical shifts of C H1, C H2 and C atoms of Cys32 and Cys35 using the reduced form of thioredoxin. 30 Both the wild type and D26A mutation have b een studied. Comparing the titration curves between the wild type and the D26A mutation, a titration showing p K a value > 9 has been found missing in the D26A thioredoxin experiment. Adopti ng that the cysteine residues in the reduced thioredoxin have p K a values of 7.1 and 7.9 derived from Raman spectroscopy, they concluded that Asp26 has an apparent p K a of greater than 9. However, their results were challenged by the

PAGE 34

34 p K a determinations of Cy s32 and Cys35 in the reduced form of thioredoxin. In 1995, Jeng et al. studied the titration behaviors of Cys32 and Cys35 in the reduced form of thioredoxin by 13 C NMR experiments. 29 Their p K a values were found to be 7.5 and 9.5. Their p K a values of Cys32 and Cys35 challenged the results obtained by Wilson et al In ord er to elucidate the p K a value of Asp26 in the reduced thioredoxin, Jeng and Dyson measured the p K a value of Asp26 in 1996 using 2D NMR 29 The 13 C chemical shift of the carboxylic group, which is bonded to titrating site, as well as the C H1 and C H2 proton chemical shifts was measured as a function of pH value. The authors believed that the pH effect on 13 C chemical shift of the carboxylic group should result from titration due to its close distance to the titrating site. The apparent p K a value obtained from their experiments has been shown between 7.3 and 7.5 which is the same as the p K a value of Asp26 in the oxidized form. Fluorescence spectroscopy can be utilized to determine p K a values as well Fluorescence is the emission of light by a substance when it is relaxing from electronic excited state ( 1 ) to electronic ground state ( 0 ) In fluorescence spectroscopy, the substance is first excited from 0 to one of many vibrational states of 1 by absorbing a photon. Following the excitation, r elaxation to the vibrational ground state 1 occurs through c ollisions with other molecules Once in the ground vibrational state of 1 t he substance will return to one of many vibrational state s of 0 by emitting a photon Since the substance can return to various vibrational states in the electronic ground state, a band of emission wavelength s will be observed. T he absorption and emission wavelength s are different (emission photon s have a larger wavelength ) and the

PAGE 35

35 difference in wavelength is called Stokes shift. The average time the su bstance stays in its electronic excited state is called the fluorescence lifetime. In biophysical chemistry, the tryptophan fluorescence is frequently employed to study the conformational changes in protein s In general, t ryptophan has a maximal absorptio n wavelength of 280 nm 31 and maximal emission wavelength o f 300~350 nm. 32,33 Changes in the environment of a tryptophan residue will affect the emission wavelength and/or intensity Furthermore, it has been noticed that tryptophan fluorescence is sensitive to the polarity of the local environment. One advantage of tryptophan fluorescence spectroscopy is that the chromophore is intrinsic; no change is made to the protein. If the change in protonation state of an ionizable residue affects the spectrum of a neighboring tr yptophan residue, which is the main fluorescent species in a protein, the n fluorescence spectroscopy can be employed to generate a titration curve. Therefore, the p K a value will be obtained. One example of determining p K a value by fluorescence spectroscopy is measuring the p K a of Glu35 in HEWL performed by the Imoto group. 34 The Trp108 is in van der Waals contact with G lu35. Changes in protonation state of Glu35 can induce a large shift in intensity of Trp108 fluorescence signal. Another way of obtaining a titration curve is the potentiometric method. The potentiometric titration measures pH value as a function of the vo lume of titrant added The volume of titrant added at each dosing can be used to calculate moles of hydrogen ion released from (or bound by) a peptide or protein and hence number of hydrogen ions released (or bound) per molecule Plotting number of hydrog en ions released (or bound) per molecule as a function of pH will generate a titration curve. By utilizing

PAGE 36

36 potentiometric titration, a titration curve of the entire peptide or protein can be obtained. The Garcia Moreno group has been utilizing the potentio metric method combined with other experimental techniques and protein p K a calculations, to investigate p K a values of ionizable residues buried deep in a protein. 18,19,21 23 As mentioned earlier in the last section protein environment can shift the p K a value of an ionizable residue. In nature, a small portion of the ionizable residues are buried in the deep pocket s of the protein, inaccessible to water. 22,35 Those buried io nizable residues are crucial to the protein functions such as catalysis, 12,36 and ion or electron transport. 37,38 Determining and understanding the p K a values of buried ion izable residues is important for biological research. The Garcia Moreno group performed site directed mutagenesis experiments mutating a nonpolar residue which is inaccessible to water to an ionizable residue. The p K a value of the mutated ionizable residu e is determined experimentally and predicted theoretically. By combining experimental and theoretical determination, the dielectric effect and electrostatic interactions will be elucidated. One example of the mutagenesis hyperstable variant the staphylococcal nuclease (SNase) to glutamate. 19,21 of SNase are called PHS and PHS/V66E. The PHS nuclease can be made by mutat ing three residues of the wild type SNase : P117G, H124L, and S128A. Val66 has been found in the core region of the SNase and inaccessible to aqueous environment. The potentiom etric titrations have been performed on both PHS and PHS/V66E. The difference bet ween the two titration curves represents the Glu66 titration plus other titrations affected by the mutation, although it is assumed that the latter effect is not

PAGE 37

37 significant. The difference in hydrogen ions ( ) bound to PHS and PHS/V66E was fitted t o the following equation, = 10 1 + 10 (1 8 ) where is the solution pH value, an d in this case is the p K a value of Glu66. The pH dependence of PHS and PHS/V66E stability was also demonstrat ed by the guanidine hydrochloride denaturation free energy profile s. The Trp140 fluorescence was recorded as a probe of the denaturation The difference in denaturation free energy profiles was also fitted nonlinearly to obtain the p K a value of Glu66. The p K a value of Glu66 has been determined to be 8.8 from potentiometric titration and 8.5 from the protein stability study. The p K a shift o f 4.4 (on the basis of the potentiometric measurements, and glutamate has an intrinsic p K a value of 4.4) is among the largest ones for acidic ionizable residues. Once the experimental p K a value is accurate obtained, a reverse p K a prediction can be perfor med to investigate the dielectric constant inside the protein, which is an important parameter in the continuum electrostatic model and will be explained later this chapter. In fact, the direct potentiometric measurements were first carried out by the Garc ia Moreno group on PHS and PHS/V66K. 18 A p K a value of 6.38 was found for Lys66, while the p K a value of lysine model compound is 10.4. R ecent site directed mutagenesis studies on PHS have extended to Leu38. 22 Mutations to aspartate, glutamate and lysine were conducted. Similar to their treatment on Val66 mutations potentiometric titration and p rotein denaturation experiments were conducted to determine p K a values by the Garcia Moreno group For the PHS/L38E,

PAGE 38

38 NMR technique was employed to facilitate Glu38 p K a measurement. PHS/L38K has shown a p K a value close to the intrinsic value of lysine. After mutation, lysine was found to adjust its side chain to let water molecules p enetrate. However, L38D and L38E have shown elevated p K a values. Both Asp38 and Glu38 were still inaccessible to water although structural rearrangement was also observed Their p K a values were further perturbed by electrostatic interactions with surface carboxylic groups. Their investigations have unveiled how conformational change s desolvation and electrostatic interactions affect p K a values. 1.5 Molecular Modeling Experimental techniques such as spectroscopy are fundamental to the study of protein s tructure and function. For example, NMR spectroscopy is frequently employed in biological science X ray crystallography can be applied to resolve protein structures and circular dichroism (CD) spectrometry is employed to determine the secondary structure of a protein. However, the advances in computational power combined with the leap in theory make experiments not the only way to understand biological molecules. Molecular modeling offers another way to investigate structures and properties of biological m olecules. It combines theories developed in the fields of physics, chemistry and biology with the computer resources to simul ate the behaviors of molecules. R esults from simulations are often compared to experimental observations in order to validate the m ethod and understand the behavior of biological molecules from an atomistic level.

PAGE 39

39 1.6 Potential Energy Surface Molecules possess more than one stable configuration in general. In principle, a ll possible molecular configurations need to be considered in or der to simulate a molecule correctly. A potential energy surface (PES), which is a surface defined by the potential energies of all possible configurations, can be utilized to fulfill this requirement. The local minima of a PES indicate stable conformation s of a molecule. There are multiple ways to ge nerate a PES. Quantum mechanical calculations offer the most accurate way to construct a PES. By solving the Schrodinger Equation, one can obtain energies and wave function of the molecule. In the field of chem istry, electronic structure theory utilizes quantum mechanics to describe the motion of electrons in the framework of Born Oppenheimer approximation The Born Oppenheimer approximation states that the electronic relaxation caused by nuclear motion is inst antaneous because of the huge difference in the masses of electrons and nuclei. Thus, electro nic motion and nuclear motion are decoupled. The eigenvalue of the electronic Schrodinger equation at each nuclear configuration is the potential energy of nuclei at that geometry Solving Schrodinger equation at different configurations will yield the PES of a molecule. However, the cost of electronic structure calculations is very expensive, which hinders the use of high level of theory when studying large biologi cal molecules. Due to the cost of electronic structure methods, an alternative way to describe a PES is to use a classical mechanical model. One of the commonly used algorithms is the all atom force field in which the PES is computed without solving the S chrodinger equation. In an all atom force field model, no electrons are present and each atom is represented by a single particle ( in contrast to the united atom force field model where a functional group is represented by a particle ) Atoms interact with each other via bonded

PAGE 40

40 and non bonded pote ntial energy terms. Equation 1 9 shows an example of all atom force field model that is frequently adopted in the simulations of proteins : = 1 2 ( 0 ) 2 + 1 2 ( 0 ) 2 + 2 1 + 3 n = 1 + 4 0 + 4 12 6 = + 1 = 1 ( 1 9 ) The first three summations are bonded terms and they represent interactions of bond stretching, valence angle bending and torsions respectively. In Eq. 1 9 bond stretching and a ngle bending are considered by a harmonic potential. The torsion term is expressed as Fourier series due to the periodic nature of a dihedral angle. The latter two summation term s are the non bonded interaction term s The two components in the double summ ation represent electrostatic interactions and van der Waals interactions respectively Electrostatic potential is represented by Coulomb interaction. and are partial charges on atom and respectively. i s the distance between the two atoms. In Eq. 1 9 van der Waals interaction is calculated by the Lennard Jones potential, in which is the well depth and is the distance when repulsive and attractive potentials are equal. Solvent effect is also considered when implicit solvent such as the Generalized Born (GB) model 39,40 is adopted (solvent models will be briefly d escribed in the next chapter). The cost of all atom force field model is low compared with ab initio methods because it utilizes pre defined parameters when calculating potential energies. The strategy of generating those parameters is via fitting to exper imental data and quantum mechanical calculations. One must notice that the parameters are often internally consistent which means parameters of different force fields are in general non transferrable. The all atom force field models are utilized much more frequently

PAGE 41

41 than the quantum mechanical methods when simulating large systems such as proteins. However, force fields such as Eq. 1 9 do not allow bond breaking or forming. Thus, they are not able to study reactions. Nowadays, linear scaling techniques in e lectronic structure theory are developed in order to fill the gap between force fields and the high accuracy ab initio methods. 41,42 One example of the linear scaling algorithm is the DivCon program developed by th e Merz group. 43 The balance between computational accuracy and cost is the main theme in the computational chemistry 44 One category of schemes attempting to achieve this balan ce is the so called hybrid quantum mechanical molecular mechanical (QM/MM) methods. 41,45 47 The basic idea of the QM/MM methods is that different regions of a system may play different roles. For example, if one wa nts to study an enzymatic reaction, the potential energy calculation involving the active site should be done by a quantum mechanical model because the classical force field is not able to describe bond forming/breaking. On the other hand, the bulk water ( assuming no water molecule participates in enzymatic reaction) and the protein environment of the enzyme can be represented by the force field in order to save simulation time. In the QM/MM methodology, different regions of a system are treated by di fferen t level of theory and interact with each other. The QM/MM approaches have become a key area in the simulation of proteins. 48,49 1.7 Molecular Dynamics, Monte Carlo Method s and Ergodicity Accurately simulating the b ehavior of a molecule requires more than knowing the PES. A molecule often has more than one minimum on the PES. Finding the correct probability distribution of molecular conformations is also important because the majority of experiments measure molecula r properties as averages over molecular

PAGE 42

42 structures. Sampling algorithm s such as m olecular d ynamics (MD) and the Metropolis Monte Carlo (MC) method are crucial to molecular modeling. For a system containing N number of particles, there are 6 N degree s of fre edom (DOF). Half of the DOF comes from coordinates and the other half represents the momentum of all particles. The 6 N dimensional space defined by those DOF is called the phase space. Both MD and MC methods sample the molecular phase space. Over time, the system will generate a trajectory in the phase space. MD utilizes the equation of motion to propagate a system in the phase space (The details of molecular dynamics will be presented in the next chapter). Each particle in the system has velocity and posit Eq. 1 10 ) is applied to control the dynamics: = = ( 1 10 ) The force on any particle in the system is given by the negative gradient of the potential energy. The equation of motion is usually sol ved numerically. By propagating the equation of motion the phase space will be explored and a probability distribution for DOFs will be obtained. Therefore, molecular properties are able to be c omputed by averaging over times: = lim 1 = 0 ( 1 1 1 ) In Eq. 1 1 1 A is the property of interest. t is the total simulation time. is the size of the sample taken during the entire simulation. The brac ket stands for taking average. is the value of A at time in the simulation. In contrast to MD, the Metropolis MC method (from now on, we will call the Metropolis MC method as MC method unless otherwise mentioned) does not utilize the

PAGE 43

43 e quation of motion. MC method samples the phase space through a Markov chain (the details of Monte Carlo method will be presented in the next chapter). In MC algorithm, a new state (for example, a new molecular configuration) is randomly selected and the tr ansition probability relationship between the current state and the new state is calculated by the detailed balance equation. Then a Metropolis criterion 50 is applied to accept or reject the transition to the new state. The Markov chain can be applied because the system is assumed to be at equilibrium. Likewise, after a sufficient number of transitions, the phase space will be explored and molecular properties can be comp uted by averaging over ensemble: = (1 1 2 ) Here is the value of A in state is the normalized probability density of state The MD and the MC methods represent two different ways of sampling phase space and computing average molecular properties. According to the ergodic hypothesis, the time avera ge is equal to the ensemble average: = = lim 1 = = 0 (1 1 3 ) The ergodic hypothesis is often assumed to be true in molecular simulations. Th is hypothesis makes MD and MC methods equivalent in sampling phase space If the system is ergodic, t he phase spaces generated by MD and MC should be the same because the phase space does not depend on sampling technique. The same behavior should also extend to any observable properties. C onformational sampling in a MD or MC simulation is essential in the study of complex systems such as polymers and proteins. One major concern is that the PES of

PAGE 44

44 a complex system is very rugged and contain s a lot of local energy minima. 51 Thus, kinetic trapping would occur as a result of the low rate of potential energy barrier crossing, especially when the barrier is high. In order t o overcome this kinetic trapping beha vior, generalized ensemble methods (advanced sampling methods) 52,53 are frequently employed in molecular simulations. Popular generalized ensemble methods include multicanonical algorithm, 54,55 simulated tempering method 56,57 parallel tempering method 58 60 and replica exchange molecular dynamics (REMD) method. 61,62 A more thorough description of MD MC and the advanced sampling methods will be presented in the next chapter. 1.8 Theoretical Protein Titration Curves and p K a Calculation s Using Poisson Boltzmann E quation Studying protein titrat ion curves theoretically has a long history. As early as 1957, Tanford and Kirkwood presented their study of protein titration curve. 63 In their model, proteins were considered to be low dielectric spheres with disc rete unit charges on ionizable residues. They proposed that the p K a value of an ionizable residue can be calculated from its intrinsic p K a value and pair wise electrostatic interactions with other ionizable residues. Calculating the pair wise electrostatic interactions involves using empirical parameters. A protein titration curve showing average charge as a function of pH value was plotted. The Tanford Kirkwood model was further extended and utilized to study lysozyme by Tanford and Roxby. 64 The equations used to generate a titration curve in the Tanford and Roxby paper were the same as those Tanford and Kirkwood used. However, they employed an iterative approach to generate titration curves and p K a values for all ionizable residues. In their approach, each ionizable residue was initially assigned a p K a value that is equal to its intrinsic value. At a given pH, the

PAGE 45

45 average charge on each site (representing fraction of deprotonation/protonation) can be computed. Those average charges were then employed to update p K a values. This process was repeated until self consistent average ch arge and p K a value of a site was obtained. Therefore, a titration curve can be produced by plotting average charge as a function of pH value. In 1990, Bashford and Karplus utilized the finite difference Poisson Boltzmann (FDPB) equation in the calculation of p K a values. 65 A detailed description of the FDPB method will be present in the next chapter. The p K a shift of an ionizable residue relative to a model compound is calculated (in their paper, i ntrinsic p K a is a quantity defined as the p K a value of an ionizable residue when other sites are neutral, that is no interactions between ionizable sites). Given a molecular configuration, three terms are calculated by FDPB equation for each ionizable sit e: the Born solvation free energy, the pair wise electrostatic interactions with non ionizable residues (represented by partial charges), and the pair wise electrostatic interactions between ionizable sites. Summing the three terms yields the electrostatic work of charging the ionizable side chain, and hence yields the p K a shift. A p rotein titration curve is represented by plotting fraction of protonation vs pH value. Considering a protein with N ionizable sites and each site can have two states (protonate d and deprotonated), there are 2 N possible macro states and each macro state can be represented by an N dimensional vector. Once the FDPB equation is solved, free energy differences of each vector relative to completely deprotonated are computed. Thus, the fraction of protonation of an ionizable site can be calculated by taking the Boltzmann weighted average of the 2 N macro states.

PAGE 46

46 The FDPB method forms the foundation of the continuum electrostatic (CE) models, which are frequently utilized when studying pr otein p K a values. 16,65 71 The FDPB method has been implemen ted into many modeling software packages such as UHBD 72 and DELPHI. 73 Many modifications have been done to improve its performance. In 1991 Beroza et al. employed the Metropolis MC method to sample 2 N numbers of protonation states instead of calculating the protonation fraction at a given pH value directly 74 After us ing MC sampling of protonation states, the number of ionizable residues included in the simulation can increase dramatically. S olving the F DPB equation requires the dielectric constant in a protein as an input parameter and the dielectric constant is very important because the electrostatic energy is inversely proportional to it. It is considered as the most important adjustable parameter in FDPB based p K a calculations. 16 Thus, o ne question arisen from utilizing FDPB method is how to choose dielectric constant for proteins. The values between 4 and 20 are typically adopted in the FDPB calculations. 67 Direct experimental determination of the interior dielectric constant is extremely difficult In practice, the protein dielectric constants are mea sured utilizing protein powders which will cause problem s in interpreting the resulting dielectric constants. 18,75,76 Research has been performed to find an optimal interior dielectric constant for protein p K a predictions. However, considering the difference in protein environment, no single dielectric constant can yield experimental p K a values for both internal and surface residues in a protein 77 In 1996, Simonson and Brooks studied charge screening ef fect and protein dielectric constant by MD simulations. 78 What they found was that protein dielectric constant can range from ~4 in th e interior of protein to a much higher value (~30) in the region near

PAGE 47

47 the surface. As mentioned in section 1.4, the Garcia Moreno group conducted site directed mutagenesis experiments in the deep pocket of a protein where water is inaccessible and measured the p K a value of mutated ionizable residue 18,19,21 23,77 Then, t he experimental p K a value was put back into FDPB equation in order to examine protein interior dielectric constant. The protein interior dielectric constants were found to be ~11 18 Mehler and his co worker employed a sigmoidal screened electrostatic interaction to treat the protein dielectric environment. 79,80 Their m ethod had been applied to Glu35 and Asp66 in hen egg white lysozyme and ha d obtained satisfactory results. 80 Another problem in the FDPB based p K a calculation is that the FDPB equation is often solved on the basis of one structure such as X ray crystal structure. The entropic effect is missing when a single structure is used. To improve the performan ce of the CE model in p K a calculations protein conformational sampling is also considered in order to incorpora te conformational flexibility into p K a calculations 81 86 In the 1 990s, You and Bashford developed an algorithm in which 36 side chain conformations of ionizable residues are adopted in the calculation of p K a values. 86 In 1997, Alex and Gunner proposed to use M onte Carlo method to sample 2 possible states instead of just 2 protonation states. 81 Here N is the number of ionizable residues and each one can have M possible conformations. Furthermore, each one of the K non ionizable residue possesses L number of possible conformations. The Gunner group further extends this algorithm to the so called multiconformation continuum electrostatic method (MCCE). 83 Recently, Barth et al. proposed a rotamer repacking technique combined with FDPB method and was given the name FDPB_MF. 82 In the FDPB_MF method, the

PAGE 48

48 conformation al space of side chain of ionizable residues was defined by a rotamer probability distribution. Each rotame r was given a weight and was interacting with other ionizable residues in a mean field scheme. 1.9 Computing p K a Values by Free Energy Calculations MD based free energy (MDFE) calculations 87,88 have also been em ployed to predict p K a values. MDFE calculations combine free energy calculation algorithms with MD propagations. MD propagation s sample phase space and generate a conformational ensemble. Free energy calculation methods calculate the free energy difference between two states on the basis of the phase space sampled by MD. F ree energy perturbation (FEP) and thermodynamic integration (TI) are two frequently employed free energy calculation methods and will be explained with more details in the next chapter. Fr ee energy calculation algorithms such as FEP and TI methods can be used to compute p K a because is associated with the free energy of reaction. Early p K a calculations u tiliz ing free energy calculations were conducted by the Warshel et al. 89,90 Jorgensen et al. 91 and Merz 92 with the FEP method and classical force fields. In the 1980s, Wars hel et al. proposed a protein dipole Langevin dipole (PDLD) model for the p K a calculations. 90 In the PDLD model, proteins were treated as particles hav ing partial charges and polarizable dipoles, while the solvent molecules nearby were viewed as Langevin dipoles. The bulk water that is far away from ionizable residues was still treated as dielectric continuum. Electrostatic interactions between charges a nd dipoles, and dipoles and dipoles were computed. Jorgensen et al. combined ab initio quantum mechanical calculations and classical FEP calculations in 1989. 91 Jorgensen et al. calculated the p K a difference between two acids, and The gas phase dissociation free energy of and were

PAGE 49

49 computed by quantum mechanical methods. The solvation free energy calculations were conduct ed using MC FEP method for the neutral molecules and the anions. One shortcoming of their calculations is that only small organic molecules were investigated due to the computational cost of quantum mechanical methods. In 1991, Merz performed classical FEP calculations for three glutamate residues in two proteins (HEWL and human carbonic anhydrease II) 92 The g lutamate dipeptide was utilized as a model compou nd to eliminate the gas phase dissociation free energy calculations. When MDFE calculations utilizing t he classical force field s are performed, quantum effect s such as bond forming/breaking cannot be simulated. Thus, the p K a shift of an ionizable residue relative to its intrinsic p K a value (p K a value of the reference compound which is defined in section 1.3 of this dissertation ) is computed by the free energy calcul ations A diagrammatic explanation of p K a shift calculation utilizing the MDFE method is dem onstrated in Figure 1 7 and Figure 1 8 Figure 1 7. Thermodynamic cycle used to compute p K a shift. Both acid dissociation reactions occur in aqueous solution. A thermodynamic cycle is a series of thermodynamic processes that eventually returning to the initial state A state function, such as reaction free energy in this case, is path independent and hence, unchanged through a cyclic process.

PAGE 50

50 Figure 1 8. In Figure 1 7 and Figure 1 8 protein AH represents the ionizable residue in protein environment AH represents the reference compound which is usually the ionizable residue with two termini capped. In practice, a proton does not disappear but instead becomes a dummy atom. The proton has its position and velocity. The bonded interactions involving the proton are still effective However, there is no non bonded interaction for that proton. The change in protonation state is reflected by changes of partial charges in the ionizable residue. Equations 1 1 4 to 1 20 explain how p K a values will be computed from free energy calculations using force fields : = 1 2 303 (1 1 4 ) = 1 2 303 (1 1 5 ) In Eq. 1 1 4 and 1 1 5 and are the acid dissociation reaction free energy of the ionizable residue in protein and the reference compound, respectively Therefore, the p K a shift between ioniza ble residue in protein environment and the reference compound can be calculated as = 1 2 303 According to the thermodynamic cycle shown in Figure1 6A, = 1 + 2 + 3 Here, 1 and 2 are the free energy difference between two protonated species, and between two deprotonated species, respectively.

PAGE 51

51 3 is equal to zero because the free energy difference between two protons that are in the same environment is zero. However, c alculating 1 and 2 directly utilizing MDFE calculations is not preferable because the difference between the reference compound and the protein system is very large. A sim ple way to determine the difference between 1 and 2 is needed Therefore, the thermodynamic cycle shown in Figure1 6B is employed. By utilizing that thermodynamic cycle, ( 1 + 2 ) can be expressed as ( ) where and are the free energy difference between the protonated and deprotonated ionizable residue in protein and the reference compound, respectively. and can be further expressed as: = + (1 16) A nd = + (1 17) In Eq. 1 16 and Eq. 1 17, t he MM in the subscripts stands for the free energ y difference s which are calculated by classical force fiel ds. The quantum mechanical contributions (labeled by QM in the subscripts) to the free energy difference of an ionizable residue in protein environment and its reference compound are assumed to be the same: ( ) = ( ) (1 1 8 ) Combining all derivations and assumption, the difference between two acid dissociation reaction free energies can be written as: = (1 19)

PAGE 52

52 Thus subtracting Eq. 1 1 5 from Eq. 1 1 4 yields: = + 1 2 303 (1 20 ) and are are computed by MDFE calculations (f or example, TI) A more detailed description of the MDFE methodology and how to compute and will be explained in the next chapter. An example of using classical force field MDFE calculation s to study p K a values is given by Simonson et al. 15 T he p K a values of Asp 2 0 (experimental p K a of 2, which is lower than the intrinsic Asp p K a value), Asp26 (experimental p K a of 7.5 ) in thioredonxin, and Asp14 (with an experimental p K a around 4) in ribonuclease A were evaluated by TI calculations The aspartate dipeptide was taken as the model compound; b oth explicit and implicit water models were used in their simulations. Proton di ssociation was represented by changes in the partial charges of carboxylic group only The f ree energy change caused by the disappearance of the proton van der Waals interaction was not considered because the van der Waals radius of the proton in aspartate is zero in the AMBER force field Correct protonation free energies have been obtained. Entropic and enthalpic effects are also correctly obtained. However, several problems have also been found with the MDFE based p K a calculations. For example, interacti ons between ionizable sites are not a ble to be incorporated directly. Furthermore their free energy difference s have shown dependence on the force fields and solvation models. Hybrid quantum mechanical/molecular mechanical (QM/MM) methods can be coupled w ith free energy calculation simulations. 48,93 Recently, the Cui group has

PAGE 53

53 conducted p K a calculations using FEP calculations coupled with SCC DFTB method. 94,95 A detailed de scription of QM/MM free energy calculations of p K a values can be found in a recent review by Kamerlin et al. 48 1.10 p K a Prediction Using Empirical Methods Empirical models are also employed to study protein p K a values. According to Lee and Crippen, 16 the seemingly most accepted empirical method is PROPKA which is developed by the Jensen group. 96 101 The PROPKA method involves using 30 parameters obtained from 314 residues in 44 proteins. QM calculations and the effective fragment potential (EFP) method, 102,103 which is a QM/MM method, are employed to generate those parameters. In the PROPKA method, a p K a value is K a values. Three types of perturbations are considered: th e hydrogen bonding, desolvation effect and charge charge interactions. A detailed description of the PROPKA method can be found in a review by Jensen et al. 97 1.11 Constant pH Molecular Dynamics (Co nstant pH MD) Methods Traditionally, MD simulations have been performed in a manner of constant protonation state. The protonation state of an ionizable residue is assigned before a MD simulation is started. Moreover, the protonation states are not allowed to change during MD propagations. Performing constant protonation state MD simulations requires knowing the p K a values of all ionizable residues beforehand. Not knowing the p K a value may result in wrong assignment of protonation state. In addition, if p K a values are near the solution pH values, constant protonation state MD simulations are not able to reflect this situation. More importantly, constant protonation state MD simulations cannot be employed to study the coupling between conformations and proton ation states. Thus,

PAGE 54

54 constant pH MD algorithms were developed in order to correlate protein conformation and protonation state. 104 The purpo se of constant pH MD is to describe protonation equilibrium correctly at a given pH value. Therefore, its applications include p K a predictions and studying pH effects. One category of constant pH MD methods uses a continuous protonation parameter. 105 115 Earlier models include a grand canonical MD algorithm developed by Mertz and Pettitt in 1994 115 and a method introduced by Baptista et al. in 1997. 106 In the Mertz and Pettitt model, protons are allowed to be exchanged between a titratable side chain and water molecules. Baptista et al. used a potential of mean force to treat protonation and conformation simultaneously. Later, Brjesson and Hnenberger developed a continuous protonation variable model in which the protonation fra ction is adjusted by weak coupling to a proton bath, using an explicit solvent. 107,108 More recently, the continuous protonation state model has been further developed by the Brooks group. 109 114 They developed a constant pH MD algorithm by the name of continuous constant pH molecular dynamics (CPHMD). In the CPHMD method, Lee et al. 114 applied dynamics 116 to the protonation coordinate and used th e Generalized Born (GB) 40,117 implicit solvent model. They chose a variable to control protonation fraction and introduced an artificial potential barrier between protonat ed and deprotonat ed states. The potentia l is a biasing potential to increase the residency time close to protonation/deprotonation states and it centered at half way of titration ( =1/2). The CPHMD method was then extended by incorporating improved GB model and REMD algorithm for better samplin g. The applications of CPHMD and replica exchange CPHMD included predicting p K a values of various proteins, 110,114 studying

PAGE 55

55 proton tautomerism 109 and pH dependent protein dynamics such as folding 112,113 and aggregation. 111 In addition to continuous protonation state models, discrete protonation state methods have also been developed to study pH depe ndence of protein structure and dynamics. 118 131 The discret e protonation state models utilize a hybrid molecular dynamics and Monte Carlo (hybrid MD/MC) method. Protein conformations are sampled by molecular dynamics and protonation states are sampled using a Monte Carlo scheme periodically during a MD simulation. A new protonation state is selected after a user defined number of MD steps and the free energy difference between the old and the new state is calculated. The Metropolis criterion is used to accept or reject the protonation change. Various solvent models and protonation state energy algorithm s were used in discrete protonation state constant pH MD simulations. Burgi et al. 130 presented their constant pH MD method using discrete protonation state model and applied it to hen egg white lysozyme (HEWL) The lysozyme was dissolved into explicit water. Short TI calculations (20 ps of dynamics) were carried out to provide classical free energy difference between old and new protonation states at each MC attempt. The MC move is evaluated based on the following free energy difference: = ln 10 + (1 21 ) In the above equation, is a parameter and represent the pH value of the solution, is the p K a value of the mod el compound (reference compound), and is the classical force field proton dissociation free energy given by TI for the protein and reference compound, respectively. One pitfall of the method

PAGE 56

56 developed by Burgi et al. is the choice of simulation time of TI. The 20 ps TI calculation represents neither single structure protonation free energy nor an average of the entire ensemble. The Baptista group proposed their constant pH MD method using the FDPB method to calculate pr otonation energies and their MD was done in explicit solvent. 118,123 126 The MD propagations are conducted at fixed protonation state s The MC moves in the protonation states are performed at fixed molecular config uration s The MD propagation is able to generate a conditional PDF of coordinates and momenta given protonation states while the MC sampling is able to yield a conditional PDF of protonation states given molecular configurations. Baptista et al. proved th at the hybrid MD and MC method is able to generate an ergodic Markov chain. 118 Hence, conditional probability distributions yielded by MD and MC are able to generate a joint probability distribution satisfying semigrand canonical ensemble. The work done by Baptista et al. provides the theore tical justification for combined MD and MC sampling in the discrete protonation state constant pH methods. In practice MD simulations are conducted in explicit water to sample conformational space. A new protonation state is selected and the free energy d ifference is calculated using the structure at that moment and the continuum electrostatic model. The MC transition is evaluated and if the move is accepted, a short MD run is performed to relax the solvent. After solvent relaxation, MD steps continue for solute and solvent. The Baptista group applied their constant pH MD method to the study of protonation conformation coupling effect, 123 the pH dependent conformation states of kyotorphin, 124 p K a predictions of the HEWL 125 and the redox titration of cytochrome c 3 126

PAGE 57

57 Walczak and Antosiewicz also employed the FD PB method to determine protonation energy but they used Langevin D ynamics to propagate coordinates between MC ste ps. 128 This method is f urther extended by Dlugosz and Antosiewicz. 119 122,128 The extended method combines conventional MD simulation using the ana lytical continuum electrostatic (ACE) 132 scheme to sample conformations with the FDPB method for the MC moves. Succinic acid 119 and a heptapeptide derived from ovomucoid third domain (OMTKY3 ) 122 have been studied by Dlugosz and Antosiewicz. This heptapeptide corresponds to residue s 26 32 of OMTKY3 and has the sequence of acetyl Ser Asp Asn Lys Thr Tyr Gly methylamine. Nuclear magnetic reso nance (NMR) experiments indicated the p K a of Asp is 3.6, 122 0.4 p K a unit lower than the value of blocked Asp dipeptide. In their studies, the conventional molecular dynamics (MD) simul ations were carried out to sample peptide conformations. Their method predicted the p K a to be 4.24. Mongan et al developed a method combining the GB model and the discrete protonation state model and implemented it into the AMBER simulation suite 127 In as solvation free energy calculations. Therefore, solvent models in conformational and protonation state sampling are consistent and the computational cost is small. More recently, the accelerated molecular dynamics (AMD) 133,134 method was combined with pH algorithm to enhance conformational sampling. 129 This model has been utilized to calculate p K a values of an enzyme and to explore the protonation conformation coupling. The continuous protonation state model developed by the

PAGE 58

58 Brooks group, the discrete protonation state model proposed by Baptista et al. and by Mongan et al. will be further explained in chapter 2.

PAGE 59

59 CHAPTER 2 T HEORY AND METHODS IN MOLECULAR MODELIN G Molecular M odeling or molecular simulation is a way to study molecules using theories developed in the fields of physics, chemistry and biology coupled with the computer resources With the development of computer p ower and parallel computation molecular modeling is more and more often involved in the research of biology, chemistry and physics. 42 Understanding the underlying theory and me thods of molecular modeling is necessary in order to perform simulations and analyze the data generated. In this chapter, the basic theory and methods of constant pH replica exchange molecular dynamics method and protein p K a calculations methods are descri bed. 2.1 Potential Energy Functions and Classical Force Fields 2.1.1 Potential Energy Surface Molecular modeling studies molecules which in general possess more than one configuration for a chemical formula in general. In principle, a ll possible molecul ar configurations need to be considered in order to simulate a molecule correctly. A potential energy surface (PES) which is a surface defined by the potential energies of all possible configurations, can be utili zed to fulfill this requirement The conce pt of PES is a result of the Born Oppenheimer approximation. The Born Oppenheimer approximation states that the electronic relaxation caused by nuclear motion is instantaneous because of the huge difference in the masses of electrons and nuclei. Thus, elec tro nic motion and nuclear motion are decoupled E lectronic energy which is computed at a fixed nuclear geometry (molecular structure), is the potential energy of nuclei at that structure L ocal minima on the PES indicate stable conformations of a

PAGE 60

60 molecule Quantum mechanic s forms the foundation of understanding the molecular behaviors and offers the most accurate way to construct a PES. Ideally, t he Schrodinger equation is solved for electronic energy at all possible nuclear configuration s and hence, yield s the PES of a molecule. 2.1.2 Force Field Models Although quantum mechanical calculations generate very accurate energies, performing a molecular simulation using quantum mechanical method is too time consuming even through the use of parallel computation especially for large systems such as polymers and proteins. F orce field (equivalent to molecular mechanics) models have been designed to solve this problem. Force field models ignore electrons and calculate the potential energy of a system based on nucle ar geometry only. Force field calculations are fast because the potential energy functions are simple and parameterized. In a force field model, the potential energy of a system has the following contributions in general: bond stretching (vibration), angl e bending, bond rotation (torsion), electrostatic interaction and the van der Waals interaction. The former three contributions are often called the bonded interactions and the last two bel ong to non bonded interactions. In many force field models such a s the AMBER force field, 135 bond stretching energy between atoms and is the second order truncation of the Taylor expansion of potential energy function about equilibrium distance and hence, can be formulated as a harmonic potential : = 1 2 2 (2 1)

PAGE 61

61 where is the force constant, is the distance between two atoms and is the equilibrium distance between the two atoms. One drawback of this function is tha t a bond cannot be broken and has infinite energy when two atoms are infinitely apart. Therefore, such a potential energy can be applied to bond stretching near equilibrium distance only. A simplest remedy is to include higher order Taylor expansion terms but this increase s the computation time. For example, expansions up to the fourth order are adopted in the general organic force field MM3. 136 This Taylor expansion strategy is also employed in deriving angle bending potential functions. Torsions (or dihedral angles) are periodic and hence, Fourier series is ad opted as tor sion potential energy function. On e example of t he formula of torsion potential energy is displayed in Eq. 1 9 The van der Waals interaction in a force field model should be able to reproduce the repulsi on and attraction between two particles having no permanent charges This attractive i nteraction is generally called dispersion. Quantum mechanics indicates that the dispersion energy is inversely proportional to the sixth power of the distance between two particles (say atoms) and (under the dipole dipole interaction approximation) : 137 = 6 (2 2) w here is a constant specific to and and is the distance between and Th ere is no theoretical derivation for the repulsive interaction. However, for computation al simplicity, the repulsive energy is taken to be inversely proportional to the twelfth power of the distance. A simple way to combine repulsive and attractive potenti als is just adding up the two potentials. Thus, van der Waals interaction is governed by the Lennard Jones potential shown in Eq. 1 9 Due to the fact that van der Waals

PAGE 62

62 interaction decays very fast as a function of inter particle distance, it is often cal led E simplest model of electrostatic interaction is the point charge model which is adopted in the AMBER force field. Partial charges are assigned to each atom and applied to calculating interaction energy. More complicated models such as calculating electrostatic energy through dipole moment dipole moment interaction have also been employed. 137 Bond, angle and torsion interactions are coupled. Thus, the coupling effects (cross terms) should be incorporated into force fields. Mathematically, cross terms are generated from multi dimensional Taylor expansions. For example, the angle bending accompanied by two bond stretching motions (shown in Figure 2 1) is formulated to be (as in MM3): = 1 2 + (2 3 ) Figure 2 1 A diagram showing bond stretching coupled with angle bending. A cross term calculating coupling energy is adopted when evaluating the total potential energy T he force field is simply a function and corresponding parameters. Thus, obtaining parameters is crucial for force field development. Given a potential energy function,

PAGE 63

63 parameters are required to reproduce experimental data or quantum mechanical calculation results as much as possible. 2.1.3 Protein Force Field Models Computer simulations of biological molecules often involve thousands of atoms or even more 138 especially when using explicit solvent models. M any simulations on proteins choose to use force field s to reduce computational cost Popular protein force fields include (but are not limited to) AMBER 99SB 139 CHARMM 22 140 GROMOS 96 141 and OPLS force fields. 142 In general, a simple potential energy function like Eq.1 9 is employed in the protein force fields. Protein force field parameters are in general optimized on the basis of small molecules Take the AMBER force field ( Eq. 1 9 ) as an example; there are bon ded and non bonded terms in it In the non bonded terms, the partial charges are fitted to quantum mechanical calculation using Hartree Fock/6 31G* level of theory in vacuum This level of theory typically overestimates dipole moment, and hence the resulti ng partial charges can satisfactorily approximate the condensed phase charge distribution. The Lennard Jone s parameters have been obtained from reproducing liquid properties following the work of Jorgensen et al 142 After the partial charges are assigned, the Lennard Jone s parameters are fitted to reproduce experimental data such as heat capacity, liquid density, and the heat of vaporization. The bond stretching and angle bending pa rameters are derived by fitting to structural and vibrational experimental data of small molecules that make up proteins. The bond and angle parameters should ensure that the geometries of simple protein fragments are close to experimental data. The torsion (dihedral angle) parameters can be obtained from quantum mechanical conformational energy calculations. Determining torsion parameters is often the la st step of force field parameter optimizations. Given

PAGE 64

64 the previous obtained individual energy term parameter sets, the torsion parameters are adjusted to best fit quantum mechanical conformational energies, for example, the Ramachandran plot of a model com pound. Detailed description of the protein force field parameter determinations can be found in the paper of Cornell et al. 143 MacKerell et al. 140 and Hornak et al 139 2.2 Molecular Dynamics (MD) Method 2.2 .1 MD Integrator As mentioned in the introduction, MD samples the phase space utilizing the equation of motion. A trajectory in th e phase space will be generated over time. The e rgodic hypothesis is assumed to be true, that is, the time average of any property at equilibrium is equivalent to the ensemble average. Thus, given a set of initial positions and momenta and a method to comp ute forces, a MD simulation can be applied to any system. For a simple system such as a harmonic oscillator moving along one axis, there exists an analytical solution of the trajectory (the coordinate and momentum as a function of time can be expressed ana the analytical solution of complex systems such as polymers or proteins. Therefore, numerical integrators are implemented to propagate positions and velocities of particles. One of the frequently used int egrator is the leap frog algorithm: 41,144 + = + + 1 2 (2 4 ) + 1 2 = 1 2 + ( ) (2 5 ) = ( ) = ( ) (2 6 ) Here, q and v stand for the position and velocity of a particle respectively; a (t), F (t) and U (t) represent the acceleration, the force an d the potential energy at time t

PAGE 65

65 is the time step used in MD simulation. One frequently employed potential energy function is the force field model introduced in the previous section. According to Eq. 2 4, 2 5 and 2 6 the leapfrog algorithm propag ates positions and velocities in a coupled way. The velocity at time t can be calculated by velocities at + 1 2 and 1 2 by the following equation: = 1 2 + 1 2 + 1 2 (2 7 ) One important issue in the MD propagation is choosing a proper time step that optimizes speed of propagation and accuracy of the simulation. A too small time step will waste simulation time in sampling the same conformation, whereas a too large time step can bring two atoms too close and hence cause instability of the trajectory. In general, a time step is a tenth of the period of fastest motion. In biological molecules, the fastest motion is the bond stretching and bonds with hydrog en atoms in particular. Thus, one way to increase time step without reducing accuracy is to remove the degree of freedom having highest frequency. One commonly employed algorithm to achieve this goal is the SHAKE algorithm. 145 When using the SHAKE algorithm to remove heavy atom to hydrogen DOF the heavy atom to hydrogen bond length is fixed T he fi xed bond lengths act as distance constraints between heavy and hydrogen atom s. Lagrangian multiplier s have been utilized to keep the bond lengths constant By employing the SHAKE algorithm, a large time step such as 2 fs could be used. Methods that can in tegrate the equation of motion more efficiently are popular area of research. 2.2 .2 Thermostats in MD Simulations Before describing thermostats in MD simulations, the concept of thermodynamic ensemble (statistical ensemble) should be introduced first. An e nsemble is a large

PAGE 66

66 amount of replicas of the system of interest (it may contain infinite number of replicas). All replicas in an ensemble are considered at once. Each replica represents the system in one possible state. Thermodynamic ensembles are characte rized by macroscopic thermodynamic properties. Several frequently employed thermodynamic ensembles are microcanonical ensemble (NVE ensemble), canonical ensemble (NVT ensemble), isothermal isobaric ensemble (NPT ensemble) and grand canonical ensemble. simulation conserve the total energy and represent a system in the microcanonical (NVE) ensemble where number of particles ( ) volume ( ) and total energy ( ) are constant However our system of interest is in the canonical (NVT) ensemble in which number of particles ( ) volume ( ) and temperature ( ) are constant. T h erefore maintaining a constant temperature in a MD simulation is necessary Any algorithm that can maintain c onstant temperature and approximate the NVT ensemble is called a thermostat. Popular thermostats include Berendsen thermostat, 146 Langevin dynamics 147 and Nose Hoover thermostat. 148 The Berendsen thermostat and Langevin dynamics are utilized in our MD simulation s and thus explained here. In a MD simulation, the temperature can be written as: = 1 3 2 2 = 1 (2 8) Here N is the number of particles, n is number of constrained degree of freedom, m i and are the mass and velocity of particle i Thus, tempera ture is a function of velocities of all particles. The simplest way to control temperature is to rescale velocity at each time step. However, this will cause discontinuity in the momentum trajectory in phase space.

PAGE 67

67 Berendsen et al introduced a weak coupl ing method to an external heat bath to MD simulations. The heat bath can add or remove heat from the system in order to maintain a constant temperature. T he rate of temperatu re change is governed by Eq. 2 9 : = 1 0 (2 9 ) w here 0 is the temperature of the bath and is the coupling time which indicates the time scale a system r elaxes to target value. By employing a coupling time, the MD propagation can avoid sudden change in velocities. Since temperature is computed from velocities of all the atoms, what t he Berendsen thermostat really does is to multiply all velocities with a s c aling factor (shown in E q. 2 10 ) in order to rescale the current temperature T to the target value T 0 = 1 + 0 1 1 / 2 (2 10 ) By rescaling velocities, the Berend sen thermostat controls the temperature in MD simulations. As mentioned before, the coupling time determines how tightly the system and the heat bath coupled together. A large means the coupling is weak. It takes long time for the system to relax from current temperature to target temperature. As the internal energy will be conserved and the microcanonical ensemble will be restored. If is small, the coupling between the system and the heat bath is strong and the velocity s caling factor is large. However, large velocity scaling factor will cause large disruption in the momentum part of the phase space trajectory. The larger the scaling factor is, the less natural the trajectory is.

PAGE 68

68 Langevin dynamics belongs to the category of stochastic thermostat. 137 It mimics motio n of MD method when using stochastic therm ostat becomes: = 1 + (2 11 ) In Eq. 2 11 and are the velocity, position and mass of particle i respectivel y, U is the potential energy, is the friction coefficient and A(t) is a random force at time t The amplitude of this force is determined by fluctuat ion dissipation theorem (Eq. 2 12 ). ( 1 ) 2 = 2 1 2 (2 12 ) 1 2 is the time correlation of A on particle i at time t 1 with A on particle j at time t 2 is the Boltzmann const ant, T is the temperature, is the Kronecker delta function and 1 2 is the Dirac delta function. Langevin dynamics can be used as thermostat because the equation of motion is temperature dependent via the random force term. 2.2 .3 Pre ssure Control in MD Simulations Most biological experiments are performed in a constant pressure and constant temperature situation (NPT ensemble) Therefore, pressure control techniques (barostats) should be used in simulations to maintain system pressure s and it is done by adjusting the system volumes. Since the number of particles is constant during a simulation, another application of maintaining pressure is to regulate system density which should be at certain appropriate value. A generally employed ba rostat is the Berendsen barostat. 146

PAGE 69

69 The pressure of a sys tem in a simulation is calculated using the virial theorem of Clausius and can be expressed as: = 1 1 3 = + 1 = 1 (2 13 ) In the above equation, P is pressure, N is the number of particles, and T is the temperature. and are the distance and interaction energy between atoms and respectively. Analogous to temperature control, the pressure can be maintained simply by rescaling volume at each time step although the system volume will be disrupt ed too much. Berendsen barostat was developed in order to smooth the change in volume. T he Berendsen barostat, in which the algorithm is the same as Berendsen thermostat, utilizes a pressure bath. The rate of pressure change is governed by following equati on: = 1 0 (2 14 ) where is the coupling constant and 0 is the pressure of the bath. The change in pressure is reflected by a djust ing system volume The coordinates of all particles in the system are scaled by a factor 1 / 3 and is formulated as: = 1 0 (2 15 ) The in the above equation is the isothermal compressibility. It represents the volume fluctu ation caused by pressure change: = 1 (2 16 )

PAGE 70

70 2.3 Monte Carlo (MC) Method 2.3 .1 Canonical Ensemble and Configuration Integral In statistical mechanics, an ensemble is a collection of a very large number of systems and each system is a replica (on a thermodynamic level) of a particular th ermodynamic s ystem of interest If the thermodynamic system of interest has a volume of V N number of particles and temperature T then an ensemble containing a very large number of such systems is called the canonical ensemble. The canonical ensemble is important bec ause it best represent s systems of interest in practice. Because each system of the canonical ensemble is not isolated, the energy of each system is not fixed. Thus, there is a probability of finding a system with energy and the probability distrib ution of systems in the canonical ensemble is the so called Boltzmann distribution (Eq. 2 17 ). = 1 / (2 17 ) Here Q is the part ition function and is essentially a normalization factor. is the quantum energy of a system. = / (2 18 ) In classical mecha nics, the Hamiltonian function H is employed to describe the total energy of a system and can be expressed as where p and q are momenta and positions respectively. In general, the Hamiltonian can be separate d into kinetic energy which depends onl y on momenta and potential energy which depends only on positions. In addition to using the Hamiltonian instead of quantum energy, the energy levels become continuous in the classical limit. Thus, the partition function will be wr itten as an integral.

PAGE 71

71 = ( ) (2 19 ) H ere = 1 / After integrating the kinetic energy term, the partition function has the form of Eq. 2 20 and is called conf iguration integral. = (2 20 ) Thus, the Boltzmann distribution in the clas sical limit is given by Eq. 2 21 : = 1 (2 21 ) 2.3.2 Markov Chain Monte Carlo (MCMC) The definition of Markov chain is crucial to the MCMC methods, so it will be explained first in this section. Conside r a stochastic process at discrete steps ( 1 2 ) for a system that has a set of states ( 1 2 ) with finite size. We define that the system is in state at step The conditional probability of = given that 1 is in state etc is : = | 1 = 2 = 1 = (2 22 ) A Markov process is defined in Eq. 2 22 with the property that the conditional probability of = only depends on its previous state 1 = : = | 1 = 2 = 1 = = = | 1 = (2 23 ) The corresponding sequence of st ates ( 1 2 ) is called a Markov chain. The conditional probability = | 1 = is essentially the transition probability from state to and is denoted as Based on the probability theor y, a transition probability has the properties 0 and = 1 Thus, t he probability of = can be written as: = = = | 1 = 1 = = 1 = (2 24 )

PAGE 72

72 A change in = with respect to step is governed by the master equation: = = = + = (2 25 ) At equilibrium (or under steady state approximation) it is clear that = should not change with steps This leads to: = = = (2 26 ) Since the Markov chain introduced above possesses discrete and finite number of states, the transition probability can be described as a matrix, which is called the transition matrix. The th element of the transiti on matrix represents The probability distribution can be represented by a row vector. Multplying a probability distribution with transition matrix will generate a new probability distribution. If a Markov chain is time homogeneous (the definitio n of time is essentially a step due to the stochastic nature of a Markov chain) the elements of transition matrix are constants (time independent). W hen a probability distribution vector is not changed by multiplying with the transition matrix, the distri bution is said to be stationary. At equilibrium, the elements of the transition matrix are independent of time. The equilibrium distribution is an eigenvector of the transition matrix with an eigenvalue of 1. Hence, multiplying equilibrium probability dist ribution with transition will not change it. Properties of a Markov chain include: a Markov chain is irreducible, if all states communicate with each other; a Markov chain is called aperiodic, if number of steps needed to move between two states is not per iodic ; it is positive recurrent, if the expectation value of the return time to a state is finite. Th e se properties are closely related to the ergodicity of a Markov chain.

PAGE 73

73 The MCMC methods are Monte Carlo sampling s from a probability distribution by employing a Markov chain whose equilibrium probability distribution is the intended probability distribution States sampled by Monte Carlo method form a Markov chain. The transitions in MCMC must satisf y the detailed balance equation: = = = (2 27 ) A Markov chain is said to be reversible when it satisfies the detailed balance equation 2.3 .3 The Metropolis Monte Carlo Method In 1953, Metropolis et al 50 proposed an algorithm to sample the phase space of a system at equilibrium by the MC method. According to the Metropolis algorithm, at configuration i a new configuration j is chosen, both configurations are weighted by Boltzmann distribution ( Eq. 2 21 ) and the detailed balance condition ( Eq. 2 27 ) is employed to evaluate the transitions (MC moves) between configurations, = ( ) (2 2 8 ) In the above equation, ( ) is the Boltzmann weight of configuration i and ( ) is the transition probability from configuration i to j Inserting Eq. 2 21 into Eq. 2 28 and r earranging Eq. 2 28 yields: ( ) ( ) = ( ) ( ) = ( ) = e (2 29 ) A nd the transition probability from configuration i t o j can be written as: = 1 (2 30 ) In practice, the new configuration is accepted if 0 However, if > 0 a random number between zero and one is generated and is compared with If the random number is less than or equal to then the new configuration is accepted. Otherwise,

PAGE 74

74 the current configuration is kept and is added to t he configuration ensemble. This accept/reject criterion is the so called Metropolis criterion. The MC sampling with the Metropolis criterion generates a Markov chain whose equilibrium PDF is the Boltzmann distribution. Compare the Metropolis MC with MD, MC method simulates a system in the canonical ensemble withou t controlling temperatures; the bottleneck of MC sampling is the potential energy difference while the bottlene ck of MD is the energy barrier. 2.3.4 Ergodicity and t he Ergodic Hypothesis In statistical mechanics, ergodic (adjective of ergodicity) is a wo rd used to describe a system which satisfies the ergodic hypothesis. T he ergodic hypothesis states that over a long period of time, the time average and the ensemble average of a property should be the same. In our simulations, the ergodic hypothesis is of ten assumed to be true. Ergodicity breaking (the ergodic hypothesis does not hold) often means that the system is trapped in a local region of the phase space. One example when the ergodic hypothesis does not hold is the spontaneous magnetization of a ferr omagnetic system below Curie temperature. The ensemble average of net magnetization is zero since spin up and spin down are degenerate states and the population of either states should be the same. However, a net magnetization exists when temperature is be low Curie temperature. Ergodicity is often discussed in a Markov chain. A Markov chain is called ergodic when all its states are irreducible, aperiodic and have positive recurrent. 2.4 Solvent Models Because proteins are stable and perform their functions in condense d phase, especially in aqueo us solution, representing the so lvation effect is of great importance. One frequently used solvent model in MD simulations is the water model. Two ways of representing aqueous solution are present here: the explicit a nd the implicit solvent

PAGE 75

75 models. As its name indicat es the explicit water model employs water molecules in the simulation and the implicit water model treats w ater as a dielectric continuum. 2.4 .1 Explicit Solvent Model Different types of water molecules s uch as SPC /E 149 TIP3P 150 and TIP4P 150 are developed. Water molecules parameters are fitted to bulk water properties such a s density heat of vaporization, and dipole moment. 15 0 The density of liquid water is an important physical quantity to check the water models. The density of liquid water shows a maximum at 4 C and water models should correctly reflect this. TIP3P failed to achieve that, while TIP4P and TIP5P 151 and their variants were able to repr oduce this trend. Take the TIP3P and TIP4P water models as examples. A simple diagrammatical description of TIP4P and TIP4P water models are shown in Figure2 2 The TIP3P water model has one oxygen atom and two hydrogen atoms. The geometry of TIP3P water i s the same as experimental geometry with OH bond length of 0.9572 and HOH angle of 104.52 Only oxygen a tom has a van der Waals radius. Thus, the van der Waals interactions only occur among oxygen atoms. Partial charges are placed on oxygen atom and hyd rogen atoms. The partial charge on the oxygen atom is 0.834 and the partial charge on each hydrogen atom is 0.417 where is the charge of an electron. When computing interactions (Coulomb interaction and Lennard Jone s interaction) between two TIP3P water molecule, there are 3 3=9 distances needed to be calculated. The TIP4P water model, as its name implies, has four sites. Similar to the TIP3P water model, experimental geometry (bond length and bond angle) is also adopted in the TIP4P model. The onl y atom in the TIP4P molecule having the van der Waals interaction is oxygen too. However, for the TIP4P model, the negative partial

PAGE 76

76 charge is located on the fourth site instead of being placed on the oxygen atom, as in the TIP3P model. The use of the fo urth site carrying negative charge is able to improve electrostatic properties of water such as dipole moment. The positive partial charges are still placed on hydrogen atoms. The new partial charges are 1.04 and 0.52 New Lennard Jone s potential para meters have also been employed for the TIP4P water model to achieve better fitting results. Computing the interactions between a pair of the TIP4P molecules requires knowing 9 distances for electrostatic interactions and 1 distance for the Lennard Jone s po tential Therefore, using TIP4P model in a simulation will be computationally more expensive than using TIP3P model. For a five site water model such as TIP5P, 17 distances are needed in order to calculate water water interactions. When simulating a mole cule with explicit water molecules, the periodic boundary condition (PBC) is utilized in order to mimic reality. 152 Otherwise, water molecules evaporate into vacu um. Ewald summation 153 or Particle Mesh Ewald (PME) summation 154 is employed to compute the long r ange electrostatics efficiently when the PBC is employed. One advantage of employing the expli cit water model is that the solvent solute interaction can be represented. For example, studying the hydrogen bonding between water molecules and proteins requires using the explicit water model However, it suffers from computational cost. CPU time is app roximately proportional to number of inter atomic interactions.

PAGE 77

77 A B Figure 2 2. A d iagrammatic description of TIP3P and TIP4P water models. A) TIP3P model. The red circle is oxygen atom and the black circles are the hydrogen atoms. Experimental bond length and bond angle are adopted. B) TIP4P model. Oxygen and hydrogen atoms are labeled with same color as in the TIP3P model. TIP4P model also employs the experimental OH bond length and HOH bond angle. Clearly, the fourth site (green circle) which carries negative partial charge has been added to the TIP4P model. 2.4 .2 The Poisson Boltzmann (PB) Implicit Solvent Model An alternative way of representing solvation effect is to reproduce the PES after a molecule is dissolved in solvent. The solution p hase potential energy of a molecule can be computed by adding solvation free energy to the gas phase potential energy. Given the correct solution phase PES, correct forces can be generated for the equation of motion. Thus, the key issue is finding the accu rate free energy of solvation. A dielectric continuum model can be employed to calculate free energy of solvation. In the dielectric contin uum model, the free energy (work) of assembling a charge distribution is expr essed as : = 1 2 (2 31 ) H ere r is the charge density of the molecule and r is the electrostatic potential. The Poisson Boltzmann model utili zes the Poisson Boltzmann equation to describe the electrostatic potential as a function of charge density. In practice, the

PAGE 78

78 linearized PB equation ( Eq. 2 32 ), which utilizes the first order truncation of Taylor series expansion of the hyperbolic sine, is often employed. = 4 + 2 (2 32 ) In the above equation is the dielectric constant, is a switching function which is zero when electrolyte is inaccessible an d otherwise one, and 2 is the Debye Hckel parameter. For simple cases such as spherical charge distributions, the solutions to PB equation are analytical and simple. Consider dissolving a sphere with charge and radius and the charge is uniform ly distributed on the surface. The charge density on the surface can be expressed as: = 4 2 (2 33 ) Here is any point on the surface. From outside of the sphere, the electrostatic potential at is calculated by : = (2 34 ) Integrating the right hand side of Eq. 2 31 from in finity to with Eq. 2 33 and Eq. 2 34 will yield = 2 2 The free energy of solvation is the difference between gas phase and solution phase free energies. Thus, it can be written as: = 1 2 1 1 2 (2 35 ) This is the so called Born equation and is the basis of the generalized Born (GB) method which will be introduced later.

PAGE 79

79 For complex systems such as proteins, there is no analytical solution t o the linearized PB equation. 73 Therefore, this equation is solved iteratively until self consistent is achieved for the charge density and electrostatic potential. 2.4 .3 The Generalized Born (GB) Implicit Solvent Model Solving the linearized PB equation is computationally expensive. An approximate method to the PB implicit solvent model is pr oposed as the GB method. 39,117 Using the GB implicit solvent can greatly shorten the simulation time which makes the GB frequently employed in molecular simulations. Similar to Eq. 2 35 t he free energy of solvati on in the GB method is given by : = 1 2 1 1 (2 36 ) Here and are charges on nuclei and i s calculated by: = 2 + 2 4 1 2 (2 37 ) H ere is the effective Born radius of charge and is the distance between the two charges. Another approximation in the GB method is the Coulomb field approximation. 40 This approximation estimates t he effective Born radius by integrating the energy density of a Coulomb field over the molecular volume. The integral is often evaluated numerically. One should notice that the G B theory involves two approximations to reproduce the PB results. The first approximation contains Eq. 2 36 and 2 37 The second one is the Coulomb field approximation. Further approximations are often introduced to reduce the time computing the effective Born radii in practice The p air wise approximation 155 is often applied. In this approximation, t he van der Waals radius

PAGE 80

80 of an at om and a function dependent on positions and the van der Waals radii of atom pairs are utilized to compute the effective Born radius 2.5 p K a Calculation Methods 2.5 .1 The Continuum Electrostatic (CE) Model The basic idea of the CE model is also given in F igure1 6. Since computing the p K a value of an ionizable residue in a protein directly is difficult (breaking a bond plus dissolving all species into water) a model compound is utilized and the p K a shift is calculated via the thermodynamic cycles shown in Figure 1 7 and Figure 1 8 Like the MDFE calculations, the CE model also compute s the p K a value of an ionizable residue relative to its intrinsic value (or model compound value according to the defi nition of Bashford and Karplus ; the definition of the intr insic p K a can be found in section 1.3 ). The p K a value of an ionizable residue is written as: = + 1 2 303 (2 38 ) In the above equation, is the intrinsic p K a value of an ionizable residue and can be found in Table 1 1. and is the free energy difference between protonated and deprotonated species for that ionizable residu e and its reference compound (the reference compound utilized in the CE model is an isolated ionizable residue with two ends capped and fully exposed to aqueous environment.) respectively. Eq. 2 38 is essentially the same as Eq. 1 20. The difference betwe en MDFE methods and the CE model is how the free energy difference s between the protonated and deprotonated species on the right hand side of Eq.1 20 are generated MDFE methods compute the two free energy differences via free energy calculation algorithms while the CE model calculates them via FDPB method. In this

PAGE 81

81 continuum electrostatic model, proteins are considered as low dielectric regions surrounded by high dielectric continuum representing water. Protonation is represented by adding a unit charge to the ionizable site. In the continuum electrostatic model, and are assumed to differ only in their electrostatic contribution s. This assumption will result in the cancellation of non electrostatic free energy contributions. Thus, calculating the electrostatic work of charging a site in the ionizable residue and in the reference compound from zero to unit charge is required. This electrostatic work can be further decomposed into three terms. For any ionizable site in a fixed protein structure, the electrostatic work consists of three terms: the Born solvation free energy ( ), the background free energy which is the interaction of the ionizable site with non ionizable charges ( ), and the interaction with other ionizable sites ( ). For the reference compound, only the first two terms exist. Thus, can be written as: = ( ) + ( ) + ( ) (2 39 ) A nd can be written as: = ( ) + ( ) (2 40 ) Li n earized PB equation (described in S ection 2.3.2) is solved for electrostatic potentials using finite difference method. For an i onizable site the Born sol vation is determined by Eq. 2 35 The background free ene rgy is calculated using Eq. 2 41 : = ( ) (2 41 ) H ere is non ionizable partial charge and ( ) stands for the electrostatic potential produced at by a unit charge place at The electrostatic interaction with

PAGE 82

82 other ionizable sites can also be evaluated by Eq. 2 41 except that charges on ionizable sites must be used. After computing all components on the right hand sides of Eq. 2 38 and Eq. 2 3 9 the p K a of ionizable residue will be obtained. To produce a titration curve, a protein containing N ionizable resi dues is considered here. Each ionizable residue has two states: protonated and deprotonated. Thus, there are 2 N numbers of macro states for that protein. Each macro state can be represented by a vector =( 1 2 ), whose elements is 0 or 1 accor ding to whether ionizable site is deprotonated or protonated. The free energy of relative to the vector whose components are all zero (this is equivalent to the free energy change when charging the non zero components in the vect or) is given by Eq. 2 42 : = = 1 + 1 2 ( 0 + ) ( 0 + ) = 1 = 1 (2 42 ) H ere = + ( ) for ionizable site is the electrostatic interaction between unit charges at ionizable site and and 0 is the charge of site when it is in the deproto nated state. Thus, which is the fraction of protonation of site can be written as ( Eq. 2 43 ): = 2 303 ( ) 2 303 ( ) (2 43 ) H ere = 1 / and ( ) is the number of non zero components in Summing up individual will generate a titration curve of the entire protein. 2.5 .2 Free Energy Calculation Method s As mentioned previously, the p K a value is prop ortional to the standard free energy of reaction. Therefore, free energy calculation methods can be employed to compute the p K a value of ionizable residue one is interested in. In this section, two frequently

PAGE 83

83 used free energy calculation methods: thermodyn amic integration (TI) 156,157 and free energy perturbation (FEP) 158 are described. Both TI and FEP belong to the so called or equilibrium method and can be employed to compute the free energy difference between two states. In other words, each transition sho uld be reversible. In the TI method, initial state A (having potential energy where is the molecular structure ) and final state B (having potential energy ( ) ) are connected by a reaction coordinate (this reaction coordinate doe s not necessarily have any physical significance) The simplest scheme of constructing the potential energy as a function of is: = 1 + (2 44 ) Slo wly transforming from zero to one converts state A to B ; the intermediate values of correspond to a mixed system without physical meaning. The Helmholtz free energy in the canonical ensemble (or the Gibbs free energy in the isothermal isobaric ensemble) is formulated as: = ln = ln (2 45 ) where is the partition function and is the configuration integral. From now on, our derivation wi ll focus on the canonical ensemble and the Helmholtz free energy but can be extended to isothermal isobaric ensemble and the Gibbs free energy in the same manner (this statement also holds when the free energy perturbation method is described later). Follo wing Eq. 2 45 the Helmholtz free energy as a function of is: = ln = / (2 4 6 ) Here, is the potential energy function and is molecular structure. T he free energy difference can be written as :

PAGE 84

84 = = 1 0 (2 4 7 ) Then, = ln = 1 (2 4 8 ) Plugging the explicit form of configuration integral into the derivative leads to: = / = / (2 4 9 ) / = / 1 / (2 50 ) Therefore, 1 = 1 / 1 / (2 51 ) Since the integration is over coordinate space, the configuration integral can be moved into the integral. Eq. 2 51 now becomes: = 1 = / (2 52 ) The first term in the integrand is the Boltzmann weight factor Rewriting Eq. 2 51 yields: = = (2 53 ) Thus, the final form of is: = 1 0 = 1 0 (2 54 ) In both Eq. 2 53 and 2 54 the bracket represents an ensemble average generated at In p K a calculations, state A (or B ) represents the protonated species and the other represents the deprotonated species. Each intermediate value correspond s to a mixed protonated and deprotonated state, without any physical meaning. When

PAGE 85

85 classical force fields are applied, the proton becomes a dummy atom in the deprotonated state but retains its position and velocity in the protein (or model compound). Furth ermore, state A and B only differ in char ge distribution s Dissociation f ree energy can be computed using methods of numerical integration (such as trapezoidal rule or Gaussian quadrature) to treat Eq. 2 54 As explained in the previous chapter, the quantu m mechanical contributions to the proton dissociation free energy are assumed to be the same for protein and the model compound. Therefore, subtracting dissociation free energy of model compound from that of protein will yield the p K a shift relative to the p K a value of the model compound. The FEP method which was initially introduced by Zwanzig in 1954, 158 is another frequently employed free energy calculation method. Consider two state s ( A and B ) with partition functions and respectively, and the Helmholtz free energy and respectively. The free energy difference from A to B can be expressed as: = = ln / (2 55 ) Suppose the configuration integrals are adopted instead of partition functions T he potential energy function of state A and B is and respectively where is the molecular structure. Thus, = ln / = ln ( ) / / (2 56 ) According to Zwanzig, can be written as the sum of and a perturbation term = + (2 57 ) = ln + / / (2 58 ) = ln / / (2 59 )

PAGE 86

86 The Boltzmann weight factor of state A has the form: = / / (2 60 ) Therefore, = ln / = ln / (2 61 ) The bracket with subscript A stands for the ensemble average performed on the structural ensemble generated from state A Substituting with Eq. 2 61 becomes: = ln / (2 62 ) In order to compute one simulation of state A is performed. Once a configuration is generated, the potential energy difference at configuration is computed. The ensemble average of / can be calculated easily and hence, is obtained According to Eq. 2 62 if the potential energy difference between the two state s (perturbation) is too large, the free energy difference given by FEP calcul a tion can be unreasonably large Thus, FEP calculations cannot accurately reflect the true free energy difference of large changes in Hamiltonian (basically, potential energy) Only similar Hamiltonians contributes to the free energy difference. In order t o compute the free energy difference between two very different systems (such as calculating free energy difference from benzene to toluene), intermediate systems mixing the two very different systems (end points) are adopted in such a way that the differe nces between neighbors can be treated as perturbations. To be specific, a coupling parameter can be adopted in the same fashion as TI. The sum of free energy difference between intermediate systems (each intermediate state has a specific coupling parameter ) will be the targeted free energy difference.

PAGE 87

87 In practice, computing (forward free energy difference) is equally easy (or hard) as computing (backward free energy difference) and one is exactly the opposite of the other in p rinciple. Evaluation of forward and backward free energy difference s provides an indication of convergence. The Bennett Acceptance Ratio (BAR) method 159,160 is a frequently employed scheme to reduce sampling bias a nd statistical error. In 1985, Jorgensen et al 161 calculations in order to reduce the computational cost. The double wide FEP can be explained by the following example. Suppose is to be computed. Instead of performing two MD simulations at and only one MD simulation at + 2 is conducted. The + 2 and + 2 are calculated then the objective free energy difference can be obtained. If configurations of each MD simulation are taken in order to compute the conventional FEP scheme requires 4 potential energy calculations, while doub le wide FEP only requires 3 2.5 .3 Constant pH MD M ethods As described in the previous chapter the constant pH MD methods want to describe protonation equilibrium correctly at a given pH value. The constant pH MD models sample protonation state spac e explicitly, along with the sampling of conformational space. In practice, two protonation state sampling schemes have bee n developed. One scheme utilizes a binary protonation state space: only the protonated and deprotonated states are defined. MC steps have been performed periodically during MD propagations, which sample the conformational space. At each MC step, a new

PAGE 88

88 protonation state is selected and the free energy difference between the old and new states is computed. The Metropolis criterion is the applied to evaluate the MC move. Since a binary protonation state space is adopted, this scheme is generally called the discrete protonation state model. The other scheme employs a continuous protonation state space. Not only the completely protonated and deprotonated species are defined, fractional protonation states also exist in the simulation. The MD propagations sample both conformational and protonation state space. The latter scheme is named continuous protonation state model. In this section, the CP HMD model developed by the Brooks group and two discrete protonation state constant pH MD methods developed by Baptsta et al and by Mongan et al. are described to provide a brief overview. In the CPHMD method, Lee et al 114 applied dynamics 116 to the protonation coordinate and used the Generalized Born (GB) implicit solvent model They chose a variable which is bound between 0 and 1, to control protonation fraction = 0 represents an ionizable residue in its protonated state, while = 1 corresponds to the deprotonated ionizable residue. Due to its continuous nature, = 0 and = 1 are rarely sampled. Thus an arbitrary value is adopted such that any value smaller than is defined to be protonated, while any is greater than 1 is set to be deprotonated. To ensure an unbounded reaction coordinate is practically used, a new coordinate is introduced and is propagated in a MD simulation. is expressed as: = 2 ( 2 63 ) A n artificial potential barrier betwe en the protonated and deprotonated states has been introduced The potential is a biasing potential to increase the residency time

PAGE 89

89 close to protonation/deprotonation states and it is centered at half way point of titration ( =1/2). The formula of this bia sing potential used by Lee et al. is = 4 1 2 2 ( 2 64 ) w here is an adjustable parameter controlling the height of the biasing potential. A valu e o f 1.25 kcal/mol is found enough to provide occupation time in the protonated and deprotonated states. The total potential of the system, which provides the forces for MD propagation, has the form: = + + + + + + + + + = 1 ( 2 65 ) Here, the fir st five terms are essentially defined by Eq.1 9 is the GB solvation free energy which will be explained in the next chapter. is the energy related to surface accessible areas. in Eq. 2 65 represents an ionizable residue. is a potential of the mean force (PMF) in the titration coordinate for a model compound. The shown in Eq. 1 1 7 can be represented by = 0 = 1 The in Eq. 2 65 is fi t to a two parameter parabolic function having the form = 2 2 = 2 303 2 which is the chemical potential of adding a fractional proto n to the solution at pH. The term + is essentially the quantum mechanical dissociation free energy of a fractional proton. The CPHMD method also assumes Eq. 1 1 8 is true. Another feature of the CPHMD method is using an extended Hamiltonian. A kinetic energy term of titration coordinate is employed in CPHMD:

PAGE 90

90 = 1 2 2 = 1 ( 2 66 ) The fic titious mass controls the speed of response of the protonation state c hange to the force on it. Baptista et al 118 proposed that MD simulations incorporating protonation state change is essentially a semigrand canonical ensemble. The joint PDF can be written as: , = , , (2 67 ) Here, is the momenta and coordinates of solute, respecti vely. and is the momenta and coordinates of solvent, respectively. is the vector containing protonation state information of each ionizable residue. The details of is explained in the continuum electrostatic model. is essential ly the number of protonated ionizable residues. is the chemical potential of protons and = 1 / The Hamiltonian contains quantum mechanical and classical force field terms. The quantum mechanical part in their model is assumed not to depend on coordinates and momenta. The introduction of dummy atom to replace the proton in a deprotonated residue makes kinetic energy only a function of momenta. Two conditional samplings have been considered by Baptista et al. : one is conformational sampling unde r a fixed protonation state, the other one is protonation state samping under a fixed structure. The PDF of conformations at fixed protonation state is: | = , , (2 68 ) w here is the classical Hamiltonian. Due to the fact that quantum mechanical Hamiltonian depends only on protonation state, which is fixed in conformational

PAGE 91

91 sampling, the quantum contribution is a constant and is canceled. The PDF of protonation states at fixed coordinates is given in Eq. 2 6 9 : | = 2 303 2 303 (2 69 ) w here is the free energy of a protonation state relative to the completely deproto nated state. In their model, FDPB based method is executed to calculate free energy difference. Combining the two conditional sampling s, one is able to generat e an ensemble satisfying Eq.2 6 7 In order to prove the above statement, one must show the Markov chain constructed by transition matrix and the two conditional probabilities satisfies the following condition, = lim (2 70 ) In the above equation, is the joint PDF as defined in Eq. 2 6 7 is a joint PDF depend on the same variables as and i s tra nsition matrix. Proving Eq. 2 70 holds means that one must prove the Markov chain defined by and is ergodic. In order to prove a Markov chain is ergodic, one needs to prove (a) the Markov chain is irreducible; (b) the chain needs to be aperiod ic; (c) the transition matrix elements are time independent; and (d) the limiting distribution should be stationary. The detailed proof is given by Baptista et al. in their 2002 paper. Their proof justified the discrete protonation state constant pH method which samples conformational space at fixed protoation state and samples protonation state at fixed structure. In 2004, Mongan et al 127 proposed a constant pH MD method and implemented in the AMBER suite This algorithm follows the scheme proposed by Baptista et al 118 but employs the GB model in both MD and MC. Given a protein with N titratable sites, the

PAGE 92

92 discrete protonation state model means protonation states of a protein are described by a vector =( x 1 x 2 x N ) where each x i is some integer representing the protonation state of titratable residue i In AMBER, five amino acids are designed to be titratable: aspartate, glutamate, histidine, lysine and tyrosine. For each titratable residue, diffe rent protonation states have different partial charges on the side chain. This model also includes syn and anti forms of protons for the aspartate and glutamate side chains as At each Monte Carlo step, a titratable site and a new protonation state for that site are chosen randomly and the transition free energy at this fixed configuration is used to evaluate the MC move. Considering a titratable site A in a protein environment, its protonated form is protA H and deprotonated form is protA The equilibrium between the two forms is governed by their free energy difference. This free energy difference is the ensemble average of different configurations. However, the free energy difference cannot be computed b y a molecular mechanics (MM) model since the transition between two forms deals with bond breaking/forming and solvation of a proton which involves quantum mechanical effects. The above problems can be solved by using a reference compound. The reference co mpound has the same titratable side chain as protA H but with known p K a value ( ). Following Mongan et al., we assume the transition free energy can be divided into the quantum mechanics (QM) part and the molecular mechanics (MM) part. We further assume that the quantum mechanical energy components are the same between the reference compound and the protA H. Since the p K a of the reference

PAGE 93

93 compound is known, its transition free energy from deprotonated form to protonated form at a given pH is: = ln 10 (2 71 ) So the QM component of the transition free energy can be expressed as: = (2 72 ) H er e is the molecular mechanics contribution to the free energy of protonation reaction for that reference compound. In practice, the QM component of the MM component. Since the approximation of the QM component of the transition free energy is: = (2 73 ) T hen the transition free ene rgy from protA to protA H can be calculated as: = ln 10 + (2 74 ) Here is the molecular mechanics contribution (electrostatic interactions in nature) to the free energy of the protein titratable site. Hence, by using a reference K a relative to the K a can also help cancelin g some error introduced by GB so lvation model through the use of In AMBER, a reference compound is a blocked dipeptide amino acid possessing titratable side chain (for example a cetyl Asp methylamine). Five reference compounds were constructed corresponding to five titratable residues. The values of for each reference compound are obtained from thermodynamic integration calculations at 300 K and set as internal parameters in AMBER. The is calculated by taking the difference between the potential energy

PAGE 94

94 with the charges of the current protonation state and the potential energy with the charges of the new protonation state. If the transition is accepted, MD steps are carried out to sample conformational space in the new protonation sta te. If the MC attempt is rejected, MD steps are also carried out with no change to the protonation state. 2.6 Advance d Sampling Methods Conformational sampling in a MD or MC simulation is essential in the study of complex systems such as polymers and prote ins. One major concern is that the PES of a complex system is very rugged and contains a lot of local energy minima. Thus, kinetic trapping would occur as a result of the low rate of potential energy barrier crossing, especially when the barrier is high. T o overcome this kinetic trapping behavior, generalized ensemble methods can be employed in molecular simulations. As its name implies, a generalized ensemble method differ from the canonical ensemble method in the weight factor of a state. T he weight facto r in the canonical ensemble is Boltzmann weight. However, a non Boltzmann weight factor can be used in a generalized ensemble method (This does not mean that Boltzmann factor is prohibited in a generalized ensemble method. In fact, parallel tempering which belong to the family of generalized ensemble method, does adopt Boltzmann factor.). By choosing a non Boltzmann weight factor, the system is able to perform a random walk in the potential energy space. Thus, potential energy barriers will be overcome easi ly and more conformations will be visited. Frequently utilized generalized ensemble algorithms include the multicanonical (MUCA) method and replica exchange molecular dynamics (REMD) met hod. In this section, the MUCA and parallel tempering will be introduc ed briefly. Due to the importance of REMD method to this dissertation, t he details of REMD method will be explained in the next section

PAGE 95

95 2.6.1 The Multicanonical Algorithm (MUCA) In canonical ensemble, the probability of visiting a state in the energy spac e is: / (2 75 ) Here, is the density of states (DOS), which means the number of states between and + / is the Boltzmann factor. As potential energy increases, the Boltzmann factor decreases but the DOS increases rapidly. A bell shaped probability distribution function (PDF) of can be observed. However, in the MUCA method, 54,55,137 the PDF is designed to be flat (a constant), although it still can be written in the form of Eq. 2 76 : = ( 2 76 ) whe re is the multicanonical weight factor and is DOS. The multicanonical weight factor needs to be inversely proportional to the DOS in order to generate a flat PDF. However, the DOS of a system is in general unknown, which makes the multicanonical weight a priori unknown. Generating correct distribution of is the central task of a MUCA simulation. In practice, short simulations are performed in order to determine the DOS in an iterative manner. Details of determining the DOS ca n be found in the paper of Okamoto and Hansmann published in 1995. 162 After the DOS is resolved, the canonical ensemble PDF will be ontained. Thus, the average of any quantity can be determined by Eq. 1 11 or Eq. 1 12 depending on either MD or MC simulation is performed. Another way to explore the DOS is by using the W ang Landau algorithm. 163,164 In the Wang Landau algorithm, the DOS is recorded by a histogram ( ) and initially set to unity for all its elements. Another histogram which is called visit histogram is also

PAGE 96

96 constructed with initial values set to zero. The visit histogram represents the number of visits to each energy level. Monte Carlo moves are m ade. Instead of being evaluated by the Metropolis criterion, they are evaluated by the DOS, = 1 ( 2 77 ) where is the tran sition probability from state to state Each time an energy level is visited, the corresponding element of the DOS histogram is updated by multiplying the current value with a modification coefficient that is greater than 1. The initial value of the modification coefficient is 0 = 2 71828 Every time a MC move is performed, the corresponding element of the visit histogram is also updated. The MC moves will continue until the visit histogram is flat. At this stage, the DOS are converged. In order to achieve a finer convergence, a second round of the above process will be performed. This time, the modification coefficient 1 in the second round is given by 1 = 0 The visit histogram is then reset to zero. This process will iterate un til a modification coefficient that is approximately 1 is achieved (in the paper of Wang and Landau, the final value of the modification coefficient is 1.00000001). By utilizing Wang Landau algorithm, the DOS will be obtained and a random walk in the poten tial energy space will be achieved. 2.6.2 Parallel Tempering In 1986, Swendsen and Wang firstly performed parallel tempering (replica exchange MC) simulations to investigate spin glass. 59 Multiple non interacting copies (replicas) of the system are simulated at different temperatures. At each temperature, MC simulation is conducted to sample the conformational space. Structures or temperatures of the two replicas are attempted t o be exchanged periodically. The

PAGE 97

97 detailed balance condition is applied and the weight factor of the state is the Boltzmann weight factor. The Metropolis criterion has been utilized to accept or reject the move. Hansmann et al 58 first utilized the parallel tempering algorithm in the study of a biomolecule (7 residue Ket enkephalin) Other application s of the parallel tempering algorithm include X ray s tructures determination performed by Falcioni and Deem. 165 A MC simulation at a high temperature accepts the transiti on attempts more often than doing that at a low temperature. Thus, the simulation at high temperatures tends to visit more conformations in conformational space. Exchanging structures with replicas at lower temperatures can help them avoid getting trapped in the conformational space. The acceptance ratio, which is the averaged fraction of successful exchange attempts, is an important issue in the parallel tempering method. It is correlated with temperature distribution of replicas. According to Kofke 166 the acceptance ratio is the area of overlap between the potential energy PDF at two temperatures Given the number of replicas, if the temperatures of the two replicas are too different, the overlap between the two potential energy PDFs will be small. Therefore, accepting an exchange attempt is unlikely, which makes parallel tempering simulation inefficient. However, if the temperatures of the two adjacent replicas are too close, the overlap between two PDFs will be large, and hence the acceptance ratio will be large. But the conformational space sampled by two adjacent replicas will be too close. More replic as than actually needed are utilized to achieve the same goal and hence computer resource is wasted 2.7 Replica Exchange Molecular Dynamics (REMD) Methods Due to the correlation between conformation and protonation sampling, correct sampling of protonati on states requires accurate sampling of protein conformations. Hence, generalized ensemble methods such as multicanonical algorithm and REMD

PAGE 98

98 should be used to avoid kinetic trapping which comes from low rates of barrier crossing in constan t temperature MD simulations. REMD has been applied to the continuous protonation state constant pH method (REX CPHMD) by Khandogin et al 110 113 They have performed REX CPHMD simulations to predict p K a values 110 and to explore pH dependent protein dynamics. 111 113 The REMD, which is the MD version of parallel tempering, have been developed by Sugita and Okamoto in 1999. 62 The theory of REMD is essentially the same as parallel tempering. In their method, tempe ratures are attempted to be exchanged. This leads to the unique part of REMD: the treatment of velocities after accepting an exchange attempt, because the velocities must reflect the temperature correctly. Sugita and Okamoto proposed to rescale the velocit ies in order to recover the desired temperature when temperatures are swapped. Similar to other generalized ensemble methods, REMD algorithm wants to make the system perform a random walk in either temperature or potential energy space, and hence avoid kin etic trapping. The advantage of REMD over other generalized ensemble method is that the weight factor is Boltzmann weight which is a priori known. This advantage makes REMD very frequently employed in the MD simulations of complex systems. The REMD algorit hm has been applied to studies of peptides, proteins, protein membrane system in order to describe free energy landscape, amyloid formation, structure prediction and binding. Many extended versions such as solute tempering REMD 167 and structure re servoir REMD 168 170 have been proposed to improve the performance of REMD algorithm. The REMD variants will be briefly explained later in this section.

PAGE 99

99 2.7 .1 Temperature REMD (T REMD) A thorough description of the T REMD algo rithm can be found in the original paper of Sugita and Okamoto. 62 In T REMD, N non interacting copies (replicas) of a system are simulated at N diffe rent temperatures (one each). Regular MD is performed and periodically an exchange of configurations between two (usually adjacent) temperatures is attempted. Suppose replica i at temperature T m and replica j at temperature T n are attempting to exchange; t he following satisfies the detailed balance condition: = ( ) ( 2 78 ) Here is the transition probability between two states i and j a nd P m ( i ) is the population of state i at temperature m (in R EMD assumed Boltzmann weighted). Since, / (2 79 ) w here i s the Hamiltonian of the state, represents the molecular structure, and stands for momentum. The Hamiltonian consists of kinetic energy ( K ) and potential energy ( U ) terms and can be written as: = + (2 80 ) In the original derivation of exchange probability, Sugita and Okamoto mentioned that exchanging two replicas (states) is equivalent to exchanging temperatures. T he momenta of each replica after e xchange attempt need to be rescaled: = / (2 81 ) = / (2 82 ) After inserting Eq. 2 7 9 and Eq. 2 80 into Eq. 2 78 the detailed balance equation becomes:

PAGE 100

100 + / + / = + / + / ( ) (2 83 ) According to Eq. 2 81 and Eq. 2 82 = / (2 84 ) = / (2 85 ) Therefore, kinetic energy contributions on both sides of Eq. 2 83 will be canceled out, leaving only potential energy terms contribute to exchange probability. ( ) = / / / / (2 86 ) Further manipulation of Eq. 2 86 yields: ( ) = 1 1 If the Metropolis criterion is applied, the exchange probability is obtained as: = 1 1 1 ( 2 87 ) If the exchange attempt between two replicas is accepted, the temperatures of the two replicas will be swapped and velocities r escaled to the new temperatures by multiplying all the old velocities by the square root of the new temperature to old temperature ratio: = (2 88 ) Here and are the new and old velocities, respectively. and are the temperatures after and before an exchange is accepted, respectively. The acceptance ratio is the average valu e of the exchange probabilities between two temperatures :

PAGE 101

101 = 1 1 1 (2 89 ) For a given system, the potential energy function is i ndependent of temperature but the potential energy PDF in a canonical ensemble depends on temperature. T he potential energy PDF can be considered as a Gaussian function ( to the second order truncation of the Taylor expansion of the PDF at the potential ene rgy value corresponding to maximum probability). The Gaussian is centered at mean potential energy of the system with a variance 2 = 2 where is the heat capacity. At this stage, the Gaussian function expression of the potential energy PDF is not adopted. It will be employed later in this section. T he potential energy PDF at temperature is curren tly written as: = 1 / (2 90 ) w here is the DOS and the exponential term is the Boltzmann weight factor as a function of potential ene rgy. Recall that in the probability theory, the average quantity can be expressed as: = (2 91 ) Extend Eq. 2 91 to the bivariate case and notice t hat the two PDF s are independent T he acceptance ratio can be rewritten as, = 1 1 1 + (2 92 ) Let a function to denote 1 1 1 = 1 / and = 1 / then, = 1 (2 93 )

PAGE 102

102 Without loss of generality, we can assume that > which means < Therefore, a nother way of writing Eq. 2 93 is = 1 when > and = when < Inserting into Eq. 2 92 will lead to: = 1 + (2 94 ) For simplicity, we denote as Inserting Eq. 2 90 into = 1 1 (2 95 ) Since and are independent, Eq. 2 95 can be rewritten as : = 1 1 (2 96 ) Simplifying Eq. 2 96 will formulate as: = 1 1 (2 97 ) Recall t hat a partition function is just a normalizing constant. and in Eq. 2 97 can switch their positions in the integrand. Thus Eq. 2 97 becomes: = (2 9 8 ) Inserting Eq. 2 9 8 into Eq. 2 94 = + (2 99 ) Each term on the right hand side of Eq. 2 9 9 can be interp reted as an overlap between two PDF s The sum is the entire overlap between two PDF s Therefore, the

PAGE 103

103 average exchange probability is just the overlap between potential energy PDF s at two temperatures. Next, let us consider the temperature distribution i n t he simplest case, in which the heat capacity is a constant. As mentioned earlier, a potential energy PDF of a canonical ensemble can be written as a Gaussian function, = 2 2 2 (2 100 ) w here is the average potential energy, is the probability density of finding at temperatu re and is the heat capacity. Since the PDF should be normalized, it is easy to find the relationship between and the standard deviation of the Gaussian function: = 1 / 2 2 (2 101 ) F or simplicity in the derivation of the acceptance ratio the Gaussian PDF at temperature will be written as Eq. 2 102 from now on : = 1 2 2 2 2 2 (2 102 ) Recall that one assumption to distribute temperatures is to maintain a random walk in temperature space. Hence, a constant acceptance ratio should be achieved fo r any two adjacent temperatures. As shown previously, the acceptance ratio is the overlap bet ween two potential energy PDFs. Consider two potential energy PDFs at temperatures < The PDF at will be to the left of the PDF at A fter finding the potential energy where the two Gaussian PDFs intersect, the overlap between two PDFs can be computed by integrating the left Gaussian PDF from

PAGE 104

104 to infinity and the Gaussian on the right from m inus infinity to and adding them up = 1 2 2 2 2 2 + 1 2 2 2 2 2 (2 103 ) Complementary error functions will be utilized and Eq. 2 103 will become, = 1 2 2 + 1 2 2 (2 104 ) According to Rathore et al 171 the acceptance ratio can be approximate to: 2 2 (2 105 ) w here = + / 2 For a geometric distribution of temperatures where = + = + 1 The average potential energy difference can be computed as, = = 1 (2 106 ) Thus, if the heat capacity does not change with t emperature, the temperature term in the numerator and denominator in Eq. 2 105 will be canceled w hich means the acceptance ratio will be a constant. Furthermore, Eq. 2 105 also signals the number of replicas needed to cover a temperature range as a functi on of system size. In order to have a non zero / 1 This leads to / + 1 1 Further simplifications lead to: (2 107 ) Since the heat capacity is where is the number of particles, the number of replicas to cover a temperature range is 1 / 2

PAGE 105

105 2.7 .2 Hamiltonian REMD (H REMD) Instead of preparing replicas with different temperatures, an other way to overcome potential energy barriers is simply changing the PES to reduce potential energy barriers. 61 And this is the basic idea of H REMD. In H REMD algorithm, replica s differ in their Hamiltonians but have the same temperature. Likewise, regular MD is performed and an exchange of configurations between two neighboring replicas is attempted periodically. Let us consider replica i with Hamiltonian H n and replica j with H amiltonian H m are attempting to exchange. By employing the detailed balance equation ( Eq. 2 7 8 ) and Boltzmann weight of a molecular structure, the transition probability can be written as: = 1 + ( ) (2 108 ) 2.7.3 Technical Details in REMD Simulations Temperature distributions have been explored in order to optimize the performance of REMD method. F or systems having constant heat capacity, a geometrical distribution of temperatures has been adopted. Sugita and Okamoto, 62 and Kofke 166 believed that the most efficient way to exploit REMD algorithm is letting each replica spend the same amount of simulation ti me at each temperature (a random walk in temperature space). In practice, this is achieved by producing the same acceptance ratio for each replica, given that each replica only attempts to exchange with its neighbors in temperature space. Under the conditi on that the system has a constant heat capacity, a geometrical distribution of temperatures ( / = ) is achieved. Sanbonmatsu and Garcia suggested an iterative method to distribute temperatures for replicas in 2002. 172 They have chosen the averaged values of potential energy as a function of temperature to maintain a random walk in the temperature space. In 2005,

PAGE 106

106 Rathore et al 171 suggested that an acceptance ratio of 0.2 yields the best performance, based on constant heat capacity assumption. They have chosen Go type model of prote in A and the Lennard Jones liquid to study the deviation of heat capacity relative to the final value as a function of acceptance ratio. A minimum of deviation at acceptance ratio around 0.2 has been observed. Kone and Kofke 173 have performed similar study fo r the parallel tempering simulations. They also considered a random walk model in temperature space through replica exchange moves. The acceptance ratio is given by: = 1 1 + 1 / 2 ( 2 109 ) where = 1 / 0 is the Boltzmann weight factor, and is the heat capacity which is assumed to be constant in their study. Without loss of generality, 0 is greater than 1 The mean square displacement of this random walk ( Eq. 2 1 10 ) has been maximized with respect to acceptance ratio. The maximum is shown near an acceptance ratio of 20%. 2 ln 2 ( 2 110 ) where 2 is the mean square displacement, and are shown in Eq. 2 10 9 Temperature distributions in parallel tempe ring simulation of villin headpiece subdomain HP 36 have been investigated by Trebst et al 174 HP 36 will undergo helix coil t ransition at high temperatures and hence, the heat capacity will not be held constant. The diffusion of a replica in temperature space has been introduced to judge e visit of the extreme temperature is the lowest. For each temperature two histograms and are recorded. The two histogram s keep the record of the number

PAGE 107

107 replicas traveling from the lowest to highest temperature can be calculated as: = + ( 2 111 ) The diffusivity is adopted and has the form: / ( 2 112 ) They have pointed out that the diffusivity is temperature dependent, a minimum of diffusivity has been observed around the temperature where heat capacity is at maximum. The plot showing diffusivity vs temperature indi cates that random walk is suppressed the most when phase transition occurs. The numbers of round trip between temperature extremes of each replica has been maximized to generate an optimal temperature distribution. More recently, Nadler and Hansmann 175 177 suggested that the optimal number of replicas between the lowest and highest temperatures in explicit solvent simulation has the following formula: = 1 + 0 594 ln / w here the is the heat capacity, and and is the highest and lowest temperature, respectively. They also proposed that the optimal temperature distribution can be formulated as: = 1 1 In addition to replica temperature distribution, exchange attempt frequency (EAF) is also an important issue in parallel tempering and REMD sampling efficiency. In 2001, Opps and Schofield 178 investigated the effect of EAF for parallel tempering. Two dimensional spin system and a polypeptide in vacuum have been selected to test the effect of EAF on the properties such as order parameter and radius of gyration of the polypeptide. They suggested that the most efficient scheme is to attempt after a few MC

PAGE 108

108 steps. The situation is more complicated in the case of REMD. In general, thermostats are used in MD pr opagations to maintain a canonical ensemble is satisfied. It is argued that exchanges in REMD should happen when system temperature stabilizes. 179 Attempting to exchange frequently may prevent the system from heat dissipation. This argument was supported by studies of a peptide Fs21 performed by Zhang et a l 179 They have suggested that 1 ps of exchange att empt interval is desirable for REMD. However, Sindhikara et al 180 have later shown that small exchange attempt interval (even as small as a few MD steps) does not affect heat dissipation, given that REMD exchange is done properly. Conformational sampling deviation relative to long simulation time reference calculation as a function of EAF has been investigated. They have pointed out that large EAF (small exchange attempt time interval) is preferred. Abraham and Gready 181 studied the effect of EAF based on a 23 residue peptide in explicit water. By examining the potential energy autocorrelation time, they argued that an exchange period below 1 ps is too short for replica exchange attempts to be independent, and hence reduce the tempering efficiency. However, the ir conclusion was not supported by an investigation of tempering efficiency performed by Zhang and Ma. 182 Zhang and Ma utilized the transition matrix and its correlation functions. The autocorrelation function of transition probability can be written as a function of eigenvalues of transition matrix. The decay time has been explored in order to understand the tempering efficiency. Zhang and Ma found that tempering efficiency increases m onotonically as EAF increases. Thermostat effects on the performance of REMD have also been explored. Earlier work has been done by the Garcia group. 172 They have studied i f the potential energy

PAGE 109

109 PDFs satisfy the Boltzmann distribution: ln 1 / 2 = 1 2 1 1 + where is the potential energy PDF at temperature and is a constant. They have found that Nose Ho over and the Anderson thermostats satisfy the above condition, while the Berendsen thermostat does not. Rosta et al 183 investigated the thermostat artifact in the REMD simulations in 2009. The current REMD exchange scheme assumes Boltzmann distribution (canonical ensemble) in the calculation of exchange probability. However, the Berendsen the rmostat cannot preserve the Boltzmann distribution. Thus REMD simulations of bulk water and protein folding are performed and the temperature is controlled by Berendsen thermostat and Langevin dynamics. They have studied the potential energy PDFs and therm al unfolding under the two thermostats. The Berendsen thermostat has been shown to produce a shift average potential energy and prolonged tails for potential energy PDF for bulk water, while no such effect has been seen when Langevin dynamics is employed. An increased probability of folding at low temperatures has been reported by Berendsen thermostat, whereas the probability of folding is decreased at high temperatures. The authors proposed that REMD simulations performed with thermostats that can generate a Boltzmann distribution, such as Langevin dynamics, Andersen and Nose Hoover thermostats. In a REMD simulation, the number of replicas needed to cover a temperature range scales as 1 / 2 where is the degree of freedom of the system. Given a large system, the number of replicas needed is large. For example, 64 replicas have been used in a REMD study of hairpin surrounded by explicit water molecules (4342 atoms in each re plica) to cover the temperature range from 270 K to 695 K. 184 A number of

PAGE 110

110 methods have been developed to redu ce the number of replicas needed in REMD simulations. In 2002, Fukunishi et al 61 proposed Hamiltonian REMD (H REMD). In the H REMD scheme, replicas differ in their Hamiltonians bu t have the same temperature. The exchange strategies in the paper of Fukunishi were to scale hydrophobic interactions and to scale van der Waals interactions. In 2005, Liu et al 167 published a method with the name replica exchange with solute tem pering. In the replica exchange with solute tempering algorithm, the protein water interactions and water water interactions are scaled such that the exchange probability does not depend on the number of explicit water molecules. The number of replicas in replica exchange with solute tempering simulation to cover the same temperature range is significantly reduced when comparing with original REMD algorithm. Lyman et al ., 185 and Liu and Voth later, 186,187 have developed resolution exchange schemes to improve the performance of REMD. Coarse grained models (low re solution) are employed to replace the role of high temperature replicas. The Simmerling group has contributed the hybrid explicit/implicit solvation model 188 in order to reduce the number of replicas needed in REMD simulations with explicit water molecules. Each replica is propagated in an explicit water box. At an exchange attempt, the solute and its solvation shell, which is calculated on the fly, are placed in dielectric continuum. Exchange probabilities are calc ulated based on the potential energies of the solute and the hybrid solvent. The usage of a hybrid solvent can shrink the number of replicas from 40 to 8, in a test case of polypeptide Ala 10 simulated at temperatures from 267 K to 571 K. Structural reservo ir techniques 168 170 have also been incorporated into REMD algorithm. High temperature MD simulations are performed first to generate a structural reservoir. Structures in the

PAGE 111

111 reservoir will be brought to replicas via exchanges. One advantage of using structural reservoir is that non Boltzmann weight factors can be chosen in the calculation of exchange probabilities. 170 Recently, Ballard and Jarzynski 189 proposed to use non equilibrium work sim ulations to accept exchange attempts. Kamberaj and van der Vaart 190 developed a new scheme to perform exchanges, in which the generalized canonical PDF have been employed to achieve a flat potential of the mean force in temperatur e space. The Wang Landau algorithm 163,164 has been adopted in order to estimate the DOS in temperature space and the round up time between extreme temperatures has been minimized. More recently, solvent viscosity h as been selected as a parameter in addition to temperature for REMD method. 191 This method is named V REMD and it is essentially a two dimensional REMD method. The motivation of choosing viscosi ty as a parameter is that the lower the viscosity, the faster a protein will diffuse, and sample the conformational space. In this algorithm, one replica is selected to have normal viscosity, others use reduced viscosities. The mass of solvent molecules is scaled by a factor of 2 when the viscosity is scaled by a factor of Changing the mass of solvent molecules does not affect the potential energy at an exchange attempt. Thus, the exchange probability of the V REMD is the same as conventional T REM D. The author applied V REMD to the study of trialanine, deca alanine, and a 16 residue hairpin peptide. By using the V REMD, replica numbers are reduced by a factor of 1.5 to 2. The replica exchange method (REM) can be coupled with other generalized en semble methods in order to enhance conformational sampling. The Okamoto group have coupled REM with MUCA and simulated tempering. The two new schemes are

PAGE 112

112 called multicanonical replica exchange method, 192 and replica exchange simulated tempering, 193,194 respectively. The details of coupled REM and generalized ensemble methods can be found in a review by Mitsutake et al 53 Due to its stochastic nature, the REMD algorithm has been employed to investigate thermodynamics rather than kinetics. 195 However, a properly designed scheme of analyzing the REMD trajectory in phase s pace can yield information about kinetics. In 2005, Levy and his coworkers 195 designed a kinetic network and used master equation to solve for the transition rate from REMD simulations. The structures at all temperatures are grouped into states based on their structural similarity (they selected a 42 dimensional Euclidean distance space based on C C distances, instead of clustering, to group their structure s). A state is denoted as a node and an edge stands for a transition between two nodes. A total of 800,000 nodes and 7.347 10 9 edges were obtained. The master equation has been utilized to d escribe the transitions between two states. Since they discretized the conformational space into states, the master equation is written in a matrix notation, = where is the transition matrix and is probability distri bution of states at time Instead of solving for eigenvalues of the transition matrix or solving the differential equation numerically, the authors actually simulated the path satisfying the master equation. Likewise, this Markov state model has been em ployed in the study of protein folding too. In 2006, van der Spoel and Seibert 196 studied protein folding rate based on Arrhenius equation. The folding mechanism in their investigation has been assumed to be two state. A binary folding indicator, which is the RMSD relative to the native state, has been adopted by the author s. Hence, the first order reaction rate equation has been

PAGE 113

113 set up. Then, the rate equation was integrated and averaged over all trajectories in order to generate an derived fraction of folded structures. A fitting parameter 2 which is equal to the dif ference between derived and actual fraction of folded structures, was minimized numerically with respective to energy barriers and pre exponential factors. In this manner, the Arrhenius reaction rate will be resolved from REMD simulations. Yang et al 197 proposed to use diffusion equation to extract kinetics from REMD simulation in 2007. The Fokker Planck equation has been employed to extract local drift velocity and diffusion coefficient from REMD simulations. Lang evin dynamics on the reaction coordinate is performed using drift velocity and diffusion coefficient. The free energy landscape will be reconstructed based on drift velocity and diffusion coefficient. In 2008, Buchete and Hummer 1 98 demonstrated that both local conformational transition rate as well as globally folding rates can be accurately extracted from REMD simulations, without any assumption in temperature dependence of the kinetics (Arrhenius and non Arrhenius). Similar to Levy and coworkers, Buchete and Hummer have also adopted the master equation operating on discretized space to describe transitions. Conditional probability of state at time given the initial state was computed by the master equation. The likeli hood of seeing number of transitions in a time interval has been maximized with respective to the natural log of transition rate constant (transition matrix elements) and the natural log of equilibrium population of state Thus, the rate const ants will be generated. A detailed description can be found in the paper of Buchete and Hummer.

PAGE 114

114 CHAPTER 3 CONSTANT pH REMD: METHOD AND IMPLEMENTATION 3.1 Introduction In this chapter, the constant pH REMD algorithm used in the AMBER simulation suite is presented and is employed to study model systems. We first tested our method based on five dipeptides and a model peptide having the sequence Ala Asp Phe Asp Ala (ADFDA). The two ends of model peptide ADFDA were not capped so the two ionizable side chains would have different electrostatic environment. The p K a values of the two Asp residues are expected to be different due to the difference in electrostatic environment. Then our constant pH REMD method is applied to a heptapeptide derived from OMTKY3, the same heptapeptide as Dlugosz and Antosiewicz studied in their paper. NMR experiments indicated the p K a of Asp is 3.6, 122 0.4 p K a unit lower than the value of blocked Asp dipeptide. Dlu gosz and Antosiewicz performed constant pH MD simulations and t heir method predicted the p K a to be 4.24. 122 Our purpose is to show that the REMD algorithm coupled with a discrete proto nation state description can greatly improve pH dependent protein conformation and protonation state sampling. 3.2 Theory and Methods 3.2.1 Constant pH REMD Algorithm in AMBER Simulation Suite In the case of constant pH molecular dynamics, the potential en ergy of the system depends not only on the protein structure but also on the protein protonation state. Reproduced in p art with permission from Meng, Y.; Roitberg, A.E. Constant pH Replica Exchange Molecular Dynamics in Biomolecules Using a Discrete Protonation Model, J. Chem. Theory. Comput. 2010, 6, 1401 1412. Copyright 2010 American Chemical Society.

PAGE 115

115 Likewise, when coupling REMD algorithm with constant pH MD, one can either attempt to exchange molecular structures only or swap both structures and pro tonation states at the same time. For simplicity, let us consider two replicas where replica 0 has temperature T 0 protein structure q 0 and protonation state n 0 while replica 1 has temperature T 1 structure q 1 and protonation state n 1 A diagrammatic desc ription of the two exchange algorithms is shown in Figure 3 1. Figure 3 1. Methods to perform exchange attempts. A) Only molecular structures are attempted to exchange. The protona tion states are kept the same. B) Both molecular structures and protonat ion states are attempted to exchange. The first way of performing an exchange attempt is that replica 0 tries to jump from state ( q 0 n 0 ) to state ( q 1 n 0 ) at temperature T 0 in one Monte Carlo step. Similarly, replica 1 attempts to transit from state ( q 1 n 1 ) to state ( q 0 n 1 ) at temperature T 1 Protonation states are kept at exchange attempts and only change during dynamics. Therefore, the detailed balance equation now becomes: (3 1 ) Here ( 0 0 0 1 1 1 0 1 0 1 0 1 ) is the transition probability of swapping structures. If Metropolis criterion is used, this exchange probability can be written as: 0 0 0 1 1 1 0 1 0 1 0 1 = 1 (3 2 )

PAGE 116

116 In Eq. 3 2, has the form: = 0 0 0 1 0 1 0 1 1 1 (3 3 ) H ere 0 = 1 / 0 1 = 1 / 1 and E is the potential energy. I f the protonation states of two adjacent replicas at an exchange attempt are the same, the exchange probability of our constant pH REMD will be equivalent to the conventional REMD exchange probability. How ever, if it is not the case, four potential energy terms are needed to calculate exchange probability. Under this circumstance, the constant pH REMD becomes a REMD algorithm that combines both temperature and Hamiltonian REMD algorithms. One possible conce rn of exchanging only structures would be the role of kinetic energy, especially when n 0 and n 1 are different. In the REMD algorithm developed by Sugita and Okamoto, the kinetic energy terms in the Boltzmann factors cancel each other on average through vel ocity rescaling ( Eq. 2 8 8 ). Only potential energies are required to compute exchange probabilities. There is a problem in canceling kinetic energy terms when the numbers of particles of two systems attempting to exchange are not the same. However, accordin g to the constant pH MD algorithm proposed by Mongan et al., a proton does not leave the molecule but becomes a dummy atom when an ionizable side chain is in deprotonated state. Furthermore, that dummy atom retains its position and velocity which are contr olled by molecular dynamics. Hence, the kinetic energy contributions to the Boltzmann weight will be cancelled out during exchange probability calculation, leaving only potential ene rgy useful for the calculation. The second possibility consists of exchang ing protonation states as well as molecular structures at REMD Monte Carlo moves. For instance, replica 0 attempts to

PAGE 117

117 move from state ( q 0 n 0 ) to state ( q 1 n 1 ) at temperatures T 0 in one MC move and replica 1 attempts to jump from state ( q 1 n 1 ) to state ( q 0 n 0 ) at temperature T 1 The detailed balance equation now can be written as: ( 3 4 ) This equation states that the exchange probability is the product of MC transition probabilities at temperature T 0 and T 1 If the protonati on states of two adjacent replicas are the same at an exchange attempt, the exchange probability of constant pH REMD becomes the exchange probability of conventional temperature based REMD. If n 0 and n 1 are different, then each MC transition is essentially the protonation state change step in constant pH MD plus a structural transition. For example, consider the MC transition at temperature T 0 0 0 0 0 1 1 = 1 1 ( 3 5 ) In Eq. 3 5 1 has the form: 1 = 0 1 0 0 0 + + 0 1 1 1 0 0 ( 3 6 ) The first term in 1 derives from the transition in configuration at fixed protonation state n 0 and the rest corresponds to protonation state change at fixed structure q 1 E elec r epresents the electrostatic component of potential energy. Similarly, the transition probability of MC jump at T 1 can be expressed as: 1 1 1 1 0 0 = 1 2 ( 3 7 ) And

PAGE 118

118 2 = 1 0 1 1 1 1 0 1 0 0 + 1 (3 8 ) Therefore, similar to Eq. 3 2 the exchange probability can be written as: 0 0 0 1 1 1 0 1 1 1 0 0 = 1 ( 3 9 ) And = + 0 1 1 1 0 1 0 1 0 0 + 0 1 ( 3 10 ) In Eq. 3 10 is the same quantity as in Eq. 3 3 The exchange probability calculation in the second method of coupling REMD and constant pH MD utilizes the same energy terms required by the first method since obtainin g electrostatic potential energies does not require extra energy calculations. The advantage of implementing the second exchanging protocol over the first one should not be significant because it is the conformational sampling at higher temperature that gr eatly improves conformational sampling at lower temperatures. Allowing protonation states to change at exchange attempts does not provide extra gains in conformational sampling. In addition, one can always choose to sample protonation state space during th e MD propagation. Therefore, only the first method of performing exchanges was implemented. 3.2.2 Simulation Details Constant pH REMD simulations were carried out first on five reference compounds: blocked A spartate, Glutamate, Histidine Lysine and Tyrosi ne to test our method and implementation. The experimental p K a values of those reference compounds are known and listed in Table 3 1. We later performed constant pH REMD

PAGE 119

119 simulations on a model peptide ADFDA (Ala Asp Phe Asp Ala, unblocked termini) and the heptapeptide derived from OMTKY3 (residue s 26 to 32 with blocked termini). Four replicas were used in the reference compounds and ADFDA REMD simulations. The temperatures were 240, 300, 370 and 460 K for all six molecules. The pH range for the study of aci dic side chains was sampled from 2.5 to 6 and the pH range of histidine from 5.5 to 8. The basic side chains were titrated from pH 9 to 12. An interval of 0.5 was chosen for all titrations. Eight replicas were chosen for the heptapeptide with a temper ature range from 250 to 480 K. 10 ns were used for each replica in all REMD simulations and an exchange was attempted every 2 ps. A MC move to change protonation state was attempted every 10 fs. A second set of REMD runs was done with the same overall cond itions but different initial structures in order to check simulation convergence. To compare conformational and protonation state sampling, 100 ns of constant pH MD simulations were carried out for aspartate reference compound and ADFDA at the same pH valu es as in the REMD runs. For the heptapeptide, one set of 10 ns constant pH MD simulations were done at all pH values simulated by REMD method. Constant pH REMD and MD simulations were done using the AMBER 10 molecular simulation suite 199 The AMBER ff99SB force field 139 was used in all the simulations. The SHAKE al gorithm 145 was used to constrain the bonds connecting hydrogen atoms with heavy atoms in all th e simulations which allowed use of a 2 fs time step. OBC Generalized Born implicit solvent model 200 was used to model water environment in all our calculations. The Berendsen thermostat, 146 with a relaxation time of 2 ps, was used to keep the replica temperatu re around their target values. Salt

PAGE 120

120 concentration (Debye Huckel based) was set at 0.1M. The cutoff for non bonded interaction and the Born radii was 30 3.2. 3 Global Conformational Sampling Comparison Using Cluster Analysis In our study, global conforma tional samplings have been compared utilizing cluster analysis. 169,188 group is called a cluster. A cluster analysis measures the similarity be tween two objects. In the cluster analysis we performed, protein backbone similarity (measured by backbone RMSD) is considered and the hierarchical agglomerative clustering algorithm is employed. Hierarchical algorithm basically creates a hierarchy of clus ters and a hierarchical algorithm can be agglomerative or divisive. The hierarchical agglomerative algorithm starts with considering every object as a cluster and combine s si milar clusters into one cluster, while the divisive algorithm starts with one clus ter containing all objects and divides it into more groups. In our work, the c luster analysis was done using the Moil View program. 201 The MD and REMD trajectories (having same number of frames) at 300 K and under the same sol ution pH value were first combined. The ptraj module of the AMBER package has been utilized to used to T he combined trajectory was clustered based on peptide backbone atoms root mean square deviations (RMSD s ). A cluster cutoff RMSD of 1.5 is chosen for both ADFDA and the heptapeptide during our analysis. By clustering the combined trajectory, the MD an d REMD conformational samplings will populate the same clusters. The fraction of the conformational ensemble corresponding to each cluster (fractional population of each cluster) was calculated for MD and REMD run respectively. Two

PAGE 121

121 sets of fractional popu lation of clusters were generated. One must note that the fractional population of each cluster from MD and REMD trajectory may not be the same Therefore, the correlation between the two sets of fractional population can be investigated by plotting one se t against the other and doing a linear fitting The Moil View program will generate a file pointing out which cluster a snapshot in the combined trajectory belongs to. Thus, t he fractional population of each cluster was obtained for MD and REMD simulation If the MD and REMD simulations produced the same structural ensemble, the fractional population of a cluster from MD simulation will be the same as that from REMD simulation Cluster population fraction from REMD simulation vs that from MD simulation was plotted ( see Figure 3 7 A) The correlation coefficient values which represent the correlations between MD and REMD cluster population were calculated at each solution pH value by doing linear regression. 169,188 A h igh correlation between MD and REMD cluster population indicates that the structure ensembles are similar to each other. This method provides a direct comparison of global conformational sampling between MD and REMD simulations. The same technique was use d when studying convergence of constant pH REMD and MD trajectories ( see Figure 3 7 B and Figure 3 1 2 ) When investigating convergence of conformational sampling, snapshots from two constant pH REMD simulations (or two constant pH MD simulations) were combi ned. The two constant pH simulations should have the same temperatures and solution pH values T hey only differ in initial structures. A high correlation coefficient indicates the two structural ensembles are similar and two conformational samplings are co nverged, while a poor

PAGE 122

122 correlation means the structural ensembles are different and the conformational sampling depends on initial condition. 3.2. 4 Local Conformational S ampling and Convergence to Final State In our study, the local conformational sampling was examined by comparing the probability distribution a re 10, which would lead to a 36 36 histogram. These two dimensional histograms were normalized into populations and the convergence was the root mean squared deviation (RMSD same algorithm men tioned earlier in this section. Essentially, we were computing the RMSD between two matrices. The RMSD between the cumulative probability density at time and the final probability density ( all configurations were utilized to compute final probability density ) is given by, = 2 36 = 1 / 36 36 36 = 1 (3 11) w here ( ) is the th element of the cumulative probability density of the pairs at time and is the corresp onding element in the final probability density matrix 3.3 Results and Discussion 3.3.1 Reference Compounds We first applied our constant pH REMD met hod to the reference compounds. Table 3 1 shows the p K a values predicted by REMD simulations (10 ns for e ach replica)

PAGE 123

123 as well as the reference p K a values. All our p K a values were calculated by fitting to the HH equation. Agreement between constant pH REMD predictions and the reference values can be seen. Table 3 1 The REMD p K a predictions of reference compou nds pKa Aspartate Glutamate Histidine Lysine Tyrosine REMD 3.97(0.01) 4.41(0.01) 6.40(0.03) 10.42(0.01) 9.61(0.01) Reference 4.0 4.4 6.5 10.4 9.6 The numbers in parenthesis are the standard errors. The pH titration curves of the same reference compoun ds showed agreement between MD (100 ns) and REMD simulations. Figure 3 2 demonstrates the REMD and MD titration curves of aspartic acid re ference compound as an example. Figure 3 2 Titration curves of blocked aspartate amino acid from 100 ns MD at 300K and REMD runs. Agreement can be seen between MD and REMD simulations. We further studied the convergence of protonation states sampling. REMD and MD protonation fraction (cumulative protonation fraction) were plotted with respect to MC attempts for asparta te reference compound at all pH values. Figure 3 3 demonstrated the protonated fraction versus time at pH 4 as one example. According to

PAGE 124

124 Figure 3 3 it suggests that although the final p K a predictions are the same between REMD and MD simulations, the proto nation state sampling during REMD s imulations clearly converge s faster than that in a MD run Figure 3 3 Cumulative average protonation fraction of a spartic acid reference compound vs Monte Carlo (MC) steps at pH=4. 3.3.2 Model peptide ADFDA The model peptide ADFDA (as zwitterion) was chosen as a more stringent test of our constant pH REMD method. The charged termini will provide different electrostatic environment for each titratable Asp residue and hence a correct constant pH REMD model should reflec t this difference between titration curves of the two Asp residues. The Asp2 residue is closer to the NH 3 + so the deprotonated state is favored and the p K a value of Asp2 residue should shift below 4.0 (which is the p K a value of the reference aspartic dipe ptide). The Asp4 residue is closer to the COO negative charge and hence the p K a value should shift above 4.0. The titration curves of the model peptide ADFDA from REMD simulations are shown in Figure 3 4 We can clearly see that Asp2 and Asp4 have differe nt titration

PAGE 125

125 curves from each other and from the reference compound. The p K a value and Hill coefficient for each Asp residue were obtained by fitting titration curves to a Hill plot. The results are shown in Table 3 2. The REMD p K a predictions reflect the difference between Asp2 and Asp4 due to different peptide electrostatic environments. We also displayed the MD titration curves of Asp2 and Asp4 in Figure 3 4 and listed the MD p K a predictions and corresponding Hill coefficients in Table 3 2. The titration curve of Asp2 residue only showed a small difference between MD and REMD simulation. But we can see differences in titration behaviors of Asp4 between MD and REMD calculations when solution pH is below 5. Interestingly, Lee et al. studied blocked Asp Asp peptide using CPHMD method, reporting different Hill coefficient for each of the two Asp residues. Figure 3 4 The titration curves of the model peptide ADFDA at 300K from both MD and REMD simulations. MD simulation time was 100 ns and 10 ns were chosen for each replica for REMD runs. Table 3 2. p K a predictions and Hill coefficients fitted from the Asp2 Asp4 p K a Hill Coefficient p K a Hill Coefficient REMD 3.74 0.87 4.38 0.67 MD 3.76 0.89 4.54 0.85

PAGE 126

126 Convergence rates of Asp2 titration beh avior were compared between REMD and MD calculations due to the fact that Asp2 titration curves are very close. The cumulative protonated fractions versus MC attempts at pH 4 are shown in Figure 3 5 Likewise, faster convergence in protonation state sampli ng can be seen for REMD simulation even though both REMD and MD calculations resulted in the same final protonated fraction. Clearly, our constant pH REMD method accelerates the convergence of sampling of protonation states. Figure 3 5 Cumulative averag e protonation fraction of Asp2 in model peptide ADFDA vs Monte Carlo (MC) steps at pH=4. In addition to protonation state sampling, we also evaluated the conformational sampling in constant pH MD and REMD each solution pH were studied. The regions in Ramachandran plots sampled by MD and

PAGE 127

127 REMD simulations are the same at all pH valu es. Ramachandran plots for residue Asp2 at pH 4 are shown in Figure 3 6 as an example. Figure 3 6 (Ramachandran plots) for Asp2 at pH 4 in ADFDA. Ramachandran plots at other solution pH valu es are similar. For Asp2, constant pH MD and REMD sampled the same local backbone conformational space. Phe3 and Asp4 Ramachandran plots also display the same trend. Since the Ramachandran plot only represented local conformational sampling, we also evalua ted global conformational sampling by clustering MD and REMD trajectories and comparing the cluster populations. The MD and REMD cluster population R 2 values are listed in Table 3 3. A plot of cluster populations from MD and REMD trajectories at solution p H of 4 is shown in Figure 3 7 A as an example. The large R 2 values indicate that the MD and REMD sampled the same conformational space and generated the same structure ensemble. The small size of ADFDA and simple structure of each residue make 100 ns long e nough for MD to sample the relevant conformations. We further studied the convergence of REMD simulations by comparing global conformation distribution between two REMD simulations starting from two different structures. Cluster populations of the two REMD simulations at solution pH 4 are

PAGE 128

128 displayed in Figure 3 7 B. The R 2 value is 0.959 at pH 4. This large correlation tells us that the two REMD simulations provide the same structure ensemble and hence the two simulations are converged. Table 3 3. Correlation coefficient s between MD and REMD cluster populations pH=2.5 pH=3 pH=3.5 pH=4 R 2 0.94 0.90 0.79 0.93 pH=4.5 pH=5 pH=5.5 pH=6 R 2 0.85 0.98 0.92 0.96 The R 2 values were calculated by linear regression. Figure 3 7 Cluster p opulations of ADFDA at 30 0K. A) MD v s REMD at pH 4. Trajectories from MD and REMD simulations are combined first. By clustering the combined trajectory, the MD and REMD structural ensemble s will populate the same clusters. The fraction of the conformational ensemble corresponding to each cluster (fractional population of each cluster) was calculated for MD and REMD simulation, respectively. Two sets of fractional population of clusters were generated, and hence plotted against each other. B) T wo REMD runs from different starting st ructures at pH 4. Lar ge correlation shown in Figure 3 7 B suggests that the REMD runs are converged. Large correlations between two independent REMD runs are also observed at other solution pH values. Correlations between MD and REMD sim ulations can be foun d in Table 3 3 3.3.3 Heptapeptide derived from OMTKY3 We first compared the protonation state sampling between constant pH REMD and MD simulations. Titration curves of Asp3, Lys5 and Tyr7 from two sets of

PAGE 129

129 simulations are plotted in Figure 3 8 A and 3 8 B. F or each titratable residue, titration curves generated by constant pH REMD and MD are close to each other. Since the p K a value of Asp3 in this heptapeptide is experimentally determined to be 3.6, it will be interesting to evaluate how our predicted values compare to the experimental result. The p K a Figure 3 8 C. The predicted p K a value is 3.7 for both REMD and MD simulations and they are in excellent agreement with the experimental p K a value. Following the same procedures, our predicted p K a values of Lys5 and Tyr7 from constant pH REMD and MD simulations were obtained. Not surprisingly, the REMD and MD schemes yielded essentially the same predicted p K a values for Lys5 and Tyr7. Figu re 3 8. A) T itration curves of Asp3 in the heptapeptide derived from protein OMTKY3. B) T itration curves of Lys5 and Tyr7 in the heptapeptid e derived from protein OMTKY3. K a values of ots.

PAGE 130

130 Figure 3 8 Continued

PAGE 131

131 Although the final p K a predictions are the same for constant pH REMD and MD simulations, constant pH REMD showed clear advantage in the convergence of protonation state sampling. Again, we chose the cumulative average proton ation fraction vs MC steps to reflect protonation state sampling convergence for all three titratable residues. Several representative plots are shown in Figure 3 9 The trend that constant pH REMD simulations produce faster convergence in protonation frac tion is universal. Therefore, it is very clear that constant pH REMD method is better than constant pH MD in protonation state sampling. Figure 3 9 A) Cumulative average protonation fraction of Asp3 of the heptapeptide derived OMTKY3 vs MC steps. B) an d C) is c umulative average protonation fraction of Tyr7 and Lys5 in the heptapeptide vs MC steps respectively. Clearly, faster convergence is achieved in contant pH REMD simulations.

PAGE 132

132 Figure 3 9 Continued

PAGE 133

133 Conformational sampling is an important issue in constant pH studies. We first looked at the conformational sampling on peptide backbones. We evaluated backbone conformational sampling through Ramachandran plots. Six residues (from Ser2 to Tyr7) are studied here. Not surprisingly, Ramachandran plots from constant pH REMD and MD simulations are very close, suggesting that the overall local conformational samplings are similar. The Ramachandran plots of Asp3 at pH 4 are shown in Figure 3 10 as examples. The only exception is Tyr7 in acidic pH values. Ty r7 can visit the left handed alpha helix conformation during constant pH REMD runs but is not able to do that in constant pH MD runs. In general, constant pH REMD and MD yielded the same Ramachandran plots for the heptapeptide. Figure 3 10 Dihedral ang lity densities of Asp3 at pH 4 A) C onstant pH MD results. B) Constant pH REMD results. The two probability densities are almost identical, indicating that constant pH MD and REMD sample the same local conformational space. All others also show very similar trend. pH REMD and MD are similar for Ser2 to Thr6. It is interesting to determine how fast each sampling scheme reaches the final distribution. We studied evolution of backbone conformational sampling based on cumulative data as what we did in the case of

PAGE 134

134 protonation state sampling convergence. As described in the METHOD section, the ation time was calculated. The smaller a RMSD is, the closer a probability distribution reaches to the final distribution. Deviations were calculated starting from the second nanosecond with time intervals incremented by 100 ps. The cumulative time depend ence RMSD of Asp3 and Lys5 are also shown in Figure 3 1 1 as examples. As seen in the figures, these curves decrease faster in constant pH REMD simulations. Figure 3 1 1 suggests that although the final are similar between constant pH REMD and MD simulations, the constant pH REMD simulation clearly reaches the final state faster. Figure 3 11 The root mean probabili behaviors at other pH values also show that REMD runs converg e to final distribution faster. Cluster analysis was also ap plied to study the convergence of conformation sampling in the heptapeptide. By comparing cluster populations between the first and

PAGE 135

135 second half of one trajectory, one could check the convergence of that simulation. The two halves of a structural ensemble s hould yield the same populations in each cluster if convergence is reached. For example, simulations at pH 4, both constant pH REMD and MD yield about 20 clusters and the correlations coefficients are calculated through a linear regression. Cluster populat ion plots and correlation coefficients are shown in Figure 3 1 2 A much higher correlation coefficient can be seen in constant pH REMD simulation, suggesting the two halves of the constant pH REMD simulation at pH 4 populate each cluster much more similarl y than the corresponding constant pH MD does. Hence, much better convergence is achieved by the constant pH REMD run. Figure 3 1 2 Cluster population at 300 K from constant pH MD and REMD simulations at pH=4. Cluster analysis is performed using the enti re simulation. The populations in each cluster from the first and second half of the trajectory are compared and plotted. Ideally, a converged trajectory should yield a co rrelation coefficient to be 1. A) Constant pH MD. B) C onstant pH REMD. Much higher co rrelation coefficient can be seen in constant pH REMD simulation, suggesting much better convergence is achieved by the constant pH REMD run.

PAGE 136

136 3.4 Conclusion s In our work, we have applied replica exchange molecular dynamics (REMD) algorithm to the discrete protonation state model developed by Mongan et al. in order to study pH dependent protein structure and dynamics. Seven small peptides were selected to test our constant pH REMD method. Constant pH molecular dynamics (MD) simulations were ran on the same p eptides for comparison. The constant REMD method results are encouraging. The constant REMD method can predict p K a values in agreement with literature and experimental results. Constant pH REMD method also displays advantage in convergence behaviors during protonation states and conformational sampling. The REMD algorithm has been proven beneficial to study pH dependent protein structures. Our future work will include studies of pH dependent protein dynamics and application of this constant pH REMD to lar ge proteins.

PAGE 137

137 CHAPTER 4 CONSTANT p H REMD: STRUCTURE AND DYNAMICS OF THE C PEPTIDE OF RIBONUCLEASE A 4.1 Introduction The p rotein and peptide folding problem 202 is an important aspect of protein science and biophysical chemistry 203 In 1961, Anfinsen studied the refolding of denatured ribonuclease (RNase). 204 He first increas ed the temperature of the protein and the protein lost its functional three dimensional shape (native state). When An finsen lowered the temperature, he found that the RNase was able to refold into its normal shape, without any other help. His experiment raised questions about protein folding. In general, people are interested in the thermodynamics (such as free energy la ndscape, folding pathway and interactions in a protein ), folding kinetics (such as how fast a protein folds), and native state prediction for a given sequence in protein folding. 202 Both experimental and theoretical approaches have been employed to understand protein f olding. 205,206 From now on, our introduction to protein folding will focus on computer simulations. In a protein folding simulation, the concept of free energy landscape always plays an important role. 202,207 Many questions can be answered once the free energy landscape is obtained. Levinthal, 208 in 1968, proposed that it is impossible for a protein to search all its conformations during folding process because the time taken to visit all conformations will be much longer than the folding time observed. His argument is well some well defined folding pathways. ing is the free energy landscape theory which provides a statistical view of the folding landscape 202,203,207 The folding process does not require chemical reaction like steps between specific

PAGE 138

138 states. Basically, a protein folds on a funnel shaped free energy landscape which is defined by the amino acid sequence of the protein. Folding process is a direct ed visit of conformations on a landscape in order to reach the native state, which is the most thermodynamicall y stable conformation. Changing temperature, adding denaturant to the protein solution, or changing solution pH value of the protein system is able to change the free energy landscape, and hence affect protein folding. The free energy landscape of a protei n is often rugged 51 and requires advanced sampling technique s such as REMD method to sample the con formational space. Due to the visual limitation, a free energy landscape is frequently projected onto one or two reaction coordinates. In practice, the free energy landscape is often projected onto several important reaction coordinates such as the radius of gyration of a protein, the number of backbone hydrogen bonds, and native contacts. Principal component analysis has also been carried out to generate the folding free energy landscape. T he relative free energy (potential of the mean force PMF ) can be c alculated by the following, = = ln / (4 1) w here is the relative PMF between state A and state B defined by reaction coordinate(s), and ( ) are t he prob ability density of find state A, and B along the reaction coordinate(s) respectively. Knowing the free energy landscape s can help people understan d folding mechanisms Transition states intermediates and folding pathway s can be obtained from a fo lding free energy landscape. For example w hen the free energy barrier between folded and unfolded state is disappeared, the folding is called downhill folding, in which the folding time is determined by diffusion rate on the free energy landscape.

PAGE 139

139 One exa mple of the protein fol ding free energy landscape studies is simulating the folding of C terminal haripin of protein G, performed by Zhou et al in 2001. 184 The OPLS AA force field SPC explicit water model, and REMD algorithm ha ve bee n employed in their simulation. T he free energy landscape has been projected onto s even different reaction coordinates such as ra dius of gyration, number of hydrogen bonds, and fraction of native contacts T wo dimensional free energy landscapes along those reaction coordinates were generated in order to elucidate the folding pathway Four different states were found in the folding l andscape, native state, unfolded state, and two intermediate states. Structural features of each state were also characterized The formation of hydrophobic core and hydrogen bonding in the folding process ha s been investigated. They have found that the hy drophobic core and hydrogen bonds formed almost simultane ously after initial collapse. Although not investigated in this chapter, protein folding kinetics is also an important aspect of protein folding. 209 One example of the folding kinetics study is seeking the speed of protein folding. 210 Computer simulations have been performed to elucidate folding kinetics. 211 The Pande group at Stanford University pioneered computer simulations of folding kinetics. 206,211 213 When studying protein folding kinetics, the Pande group conducted m ultiple independent MD simulations starting fr om different initial conditions. T he probability of the native state in the structure ensemble was computed after a pre defined simulation time. Assuming the folding mechanism is two state fold ing and follows the first order reaction kinetics, and the transition time is much shorter than staying time in either state, the probability of barrier crossing can be given by,

PAGE 140

140 = 1 (4 2) where is simulation time and is the folding rate. In the limit of 1 / Eq. 4 2 can be simplified to according to the Taylor expansion. The probability of barrier crossing can be computed by using the fraction of simulations that crossed the barrier. Other methods utilized to explore folding kinetics include Markov state models. 195,198,214 217 One example of predicting folding time is given by stud ying the C hairpin of protein G In their studies, Pande and co workers 213 utilized the OPLS AA force field and the GB implicit solvent model using water like viscosity via Langevin collision coefficient. A total simulation 2700 independent simulations, among which 8 completely folded trajectories were found. Thus, a folding time of 4.7 2, which is in rthermore, the folding free energy landscape has been generated and the folding pathway and folding intermediates etc have also been probed. Another area of protein folding simulation is to probe protein folding through the unfolding simulations. The un folding simulations adopt the assumption that folding processes follow the reverse pathways of unfolding processes. Both temperatures and denaturants can be employed to denature proteins. Levitt and Daggett have been performed unfolding simulations extensi vely. 218 220 The C peptide, residues 1 to 13 from the N terminus of RNase A, is a peptide well studied by experiments. 5,7,221 226 In 1971, Brown and Klee 223 first observed the presence of helix of C peptide through circular dichroism (CD) spec troscopy This peptide was

PAGE 141

141 further studied extensively by the Baldwin group. 5,7,222,224,226 CD spectroscopy showed that the C peptide demonstrated pH helix formation The mean residue ellipticity at 222 nm of the C peptide showed a bell shaped pH profile, having a maximum at pH value of 5. M utation experiments indicated that the Glu2 and His12 in the C peptide were crucial to the pH dependent helix formations. 5,7, 224,226 Maximal mean residue ellipticity occurred at pH 5 because both the glutamate and histidine residues are charged at that pH NMR experiments on an analog of the C peptide (RN 24) by the Wright group also confirmed the formation of complete and part ial helix. 225 Two side chain interactions were believed to stabilize the partial helix formation in the C peptide and its analogs in the mutation experiments and NMR studi es. 7,224 226 A salt bridge between Glu2 and Arg10 side chains was proposed to improve the helix formation as the pH values increased to 5. The interaction between Phe8 and His12 was also believed to improve helix f ormation as the pH values reduced to pH of 5. The folding and side chain interactions of C peptide and its analogs were also extensively studied by molecular simulations. 227 235 Schaefer et al 232 studied the helical conformations and folding thermodynamics. The Okamoto group 228 230,23 3 235 has performed thorough investigations of the C peptide using a multicanonical algorithm (MUCA) and the replica exchange method (REM) in both implicit solvent and explicit solvent They have studied s econdary structures of the C peptide, roles of Glu 2 and His12 in the C peptide helix coil transition, and dielectric effect in the implicit solvent. Ohkubo and Brooks 231 utilized REMD simulations with the GB model to explore the helix coil transition of short peptides including the C peptide. Conformational entropy as a function of temperature has been explored for the C peptide and its analogues

PAGE 142

142 (different chain length). The conformational entropy has been found to be proportional to chain length over a wide range of temperatures. Felts and co workers 227 carried out REMD simulations with the AGBNP implicit solvent model to study the folding free energy landscape of the C peptide The free energy landscape was proj ected onto radius of gyration and helical length. The possible interaction between Glu2 Arg10 was also explored. Dielectric effects of AGBNP solvation model on helical length and salt bridge has been investigated too. In 2005, Sugita and Okamoto 233 performed replica exchange multicanonical algorithm simulations in explicit solvent to explore the folding mechanism and side chain interactions such as Glu2 Arg10 and Phe8 His12. They constructed folding free energy landscape along the principal component axes. The correlations between Glu2 Arg10 and Phe8 His12 interactions and the C peptide conformations have been elucidated. They have found that the minimum free energy conformation possess both interactions. They have also suggested that the purpose of Glu2 Arg10 salt bridge is to prevent helix extending to N terminus of the C peptide and the Phe8 His12 stabilizes the alpha helix conformation toward the C terminus. More importantly, Khandogin et al 112 studied the pH dependent folding of th e C peptide with REX CPHMD. I mportant electrostatic interactions such as the Lys1 Glu9, Glu2 Arg10 and Phe8 His12 interactions were also investigated The C peptide has also been selected to test the effect of force fields on protein folding simulations and simulation convergence In 2004, Yoda et al 234,235 tested six commonly employed force fields (AMBER94, AMBER96, AMBER99, CHARMM22, OPLS AA/L, and GROMOS96) on the C peptide as well as the C terminal fragment from the B1 domain of the G peptide in explicit water using generalized ensemble

PAGE 143

143 method. M elting curves have been studied. S econdary structures of both peptides were also computed and compared with experimental data. AMBER99 and CHARMM22 were found showing best agreement for the C peptide. In this chapter we present a study of the C peptide using constant pH REMD method introduced in t he previous chapter The effect of pH on the folding of C peptide and the structural ensemble is studied. We compare directly with experimental measurements of helicity, namely the mean residue ellipticity at 222 nm. Important electrostatic interactions su ch as Glu2 Arg10 salt bridge and Phe8 His12 interaction are also examined. 4.2 Method s 4.2.1 Simulation Details The C peptide we simulated has the sequence: KETAAAKFERQHM. The N terminus of the C peptide (lysine) is charged while the C terminus (methionin e) is capped with an amide For our study, constant pH REMD simulations were performed starting from a completely extended structure at pH values 2, 3, 4, 5, 6.5 and 8. Eight replicas were chosen with a temperature range from 2 60 to 42 0 K. A simulation tim e of 44 ns were used for each replica in all REMD run s and an exchange was attempted every 2 ps. The structures obtained from the first 4 ns were discarded, resulting in a 40 ns of production time for each replica. Glu2, Lys7, Glu9 and His12 are selected t o be titratable. A MC move to change protonation state was attempted every 10 fs. A second set of REMD runs was done at pH values of 2, 5 and 8 starting from a fully helical initial structure in order to check simulation convergence. The three pH values ar e selected to represent low pH, pH where maximum helicity was observed experimentally and high pH, respectively.

PAGE 144

144 AMBER 10 molecular simulation suite 199 was used to simulate the C peptide. The AMBER ff99SB force field 139 was used in all the simulations. The SHAKE algorithm 145 was used in all the simulations which allowed use of a 2 fs time step. OBC Generalized Born implicit solvent model 200 was used to model water environment in all our calculations. The Berendsen thermostat, 146 with a relaxation time of 2 ps, was used to keep the replica temperature around their target values. Salt concentration (Debye Huckel based) was set at 0.1 M. The cutoff for non bonded int eracti on and the Born radii was 30 (this cutoff is longer than the peptide). 4.2.2 Cluster Analysis When studying the folding of C peptide, the roles of cluster analysis are two fold. One role is to compare structural ensembles and check convergence at p articular temperature and solution pH value, while the other is to analyze a single ensemble of structures to investigate protein structures and interactions. As described in the previous chapter, c luster analysis was done using the Moil View program 201 and the C RMSD has been chosen to measure structure similarity. When comparing conformational sampling two different ways of comparisons have been adopted. The first way is to compare the first and the second halve s of one trajectory. In this case, cluster analysis was performed on a single trajectory and the cluster information can be utilized to study folding thermodynamics and interactions in the C peptide. The second way is to c ompare the structural ensembles pr oduced by simulations starting from the fully extended and fully helical structures. In the second case, the two trajectories (having same number of frames) at 300 K and under the same solution pH value were first combined. Then the combined trajectory was clustered on the basis of peptide backbone atoms root mean square deviations (RMSD s ). The

PAGE 145

145 population fraction corresponding to each cluster was obtained for both trajectories Th e correlation coefficient, which represents the correlation between the clus ter population s of the two trajectories, was calculated at each solution pH value by doing linear regression. A high correlation indicates that the structure ensembles are close to each other. This method provides a direct comparison of global conformation al sampling between the two trajectories A cluster cutoff RMSD of 2.0 is chosen during our analysis. 4.2.3 Definition of the Secondary Structure of Proteins ( DSSP ) Analysis The secondary structures of the C peptide have been explored by DSSP algorithm 236 which is proposed by Kabsch and Sander. The DSSP algorithm identifies the secondary structure of a residue by hydrogen bond calculations. The calculation is based on electrostatic ener gy between backbone carbonyl group and amide group, = 1 2 1 + 1 1 1 332 / (4 3 ) In the above equation, 1 and 2 are the partia l charges on each atoms. If the electrostatic energy is below 0.5 kcal/mol, then a hydrogen bond will assigned to corresponding carbonyl and amide groups. The secondary structure of a r esidue is labeled by one letter: G for 3 10 helix, H for alpha helix, I for pi helix, B for antiparallel beta sheet b for parallel beta sheet, and T for turns. 4.2.4 Computation of the Mean Residue Elli p ticity CD spectroscopy is one of the most commonly used techniques to study protein secondary structures and folding 237 Chiral molecules absorb left circularly polarized light (LCPL) and right circularly polarized light (RCPL) differently. CD spectroscopy

PAGE 146

146 measures the difference in absorbance of LCPL and RCPL of a chiral molecule. It can provide information of protein secondary structures Electromagnetic waves contain oscillating electric and magnetic fields perpendicular to e ach other and to the propagating directions. A circularly polarized light (CPL) has a n electric field vector rotating along its propagation direction but maintains its magnitude. This is in contra st to linearly polarized light which has an electric field v ector oscillating in one plane but change its magnitude. When a LCPL is propagating toward an observer, the electric field vector rotates counterclockwise, while the RCPL rotates clockwise. When a circularly polarized light passes through chiral molecules, the difference in the absorption of LCPL and RCPL is given by: = (4 4 ) w here and is extinction coefficient of LCPL and RCPL, respectively and is wavelength. has the dimension s of ( ) 1 or 2 1 The extinction coefficient can be calculated by Beer Lambert law : = where is the absorbance, is the concentration, and is the width of the cuvette This difference gives CD spectroscopy. Ma ny CD instruments record signal in ellipticity, which is measured in degrees The ellipticity can be calculated as: = 32 98 = 32 98 where 32.98 has unit of degree. A more frequently adopted measurement of CD is the molar elli pticity [ ] 238 = 100 = 3298 (4 5 ) Here, the molar ellipticity has units of 2 1

PAGE 147

147 The integrated intensity of a CD band is called rotational stren gth. Theoretically, for a electronic transition from ground state (0) to excited state ( ), the rotational strength can be calculated as, = 0 0 (4 6 ) w here 0 and is the wavefunction of electronic ground and excited state, respectively ; and is the electronic transition and magnetic transition dipole moment operator, respectively; and stan ds for the imaginary part. Eq. 4 6 suggests that the frequently adopted units of rotational strength are Debye Bohr magnetons (DBM, 1 DBM= 9 274 10 39 3 where is the unit of energy) Eq. 4 6 is origin dependent because the ma gnetic transition dipole moment operator is origin depende nt. In order to avoid this origi n dependence, the dipole velocity formulation can be em ployed, = / 2 0 0 0 (4 7 ) Here, is the charge of an electron, is the mass of an electron, and 0 is the frequency of the transition. According to the paper of Sreerama and Woody 238 CD spectrum can be calculated as, assuming each CD band (CD transition) is a Gaussian function of wavelength, = 2 278 / (4 8 ) w here and is the CD, rotational strength, wavelength and half bandwidth (one half of the width at 1 of its maximum) of the th transition, respectively. In Eq. 4 8, the constant 2.278 has the dimensions of 1 2 1

PAGE 148

148 The far ultraviolet ( far UV with a wavelength smaller than 250 nm ) CD spectra of proteins can yield important information about the secondary structures of proteins. 238 In the far UV range, peptide bonds in a protein are the main chromophore s Thus, the C D spectra in the far UV range are reported on a residue basis (mean residue ellipticity). In a protein CD spectrum, a positive band at ~190 nm and two negative bands at 208 nm and 222 nm can be found for helix. 239 In particular, a strong negative band at 222 nm is a leading indication of the presence of helical structures S tructures sheet will show two bands in CD spectra: a positive band at ~198 nm and a negative band at ~215 nm. 240 Compu ting protein CD spectra using quantum mechanical methods combining with Eq. 4 7 is only possible in principle due to the size and complexity of protein structures. The matrix method 241 using pre determined parameters has been adopted to tackle this problem. In th e matrix method, a secular matrix is constructed based on transition energies and interactions between transitions A protein is considered as a set of independent chromophores. Each local transition energies and interactions between transitions in differe nt chromophores are utilized to construct the secular matrix. A transition on a local chromophore is represented by a charge distribution. The charge distributions as parameters, are determined from quantum mechanical wavefunctions or experiments or a com bination of both. 242 244 The off diagonal elements of the secular matrix, which represent the interactions between transitions in different chromophore, are further simplified by c harge charge (monopole monopole) e lectrostatic interaction 238 = / (4 9 )

PAGE 149

149 Here, is the electrostatic energy between transition on chromophore and transition on chromophore sums over the point charges of transition on chromophore and sums over the point charges of transition on chromophore and denotes for the distance between two charge s. Diagonalization of the secular matrix using a unitary transformation will yield the eigenvalues and eigenvectors corresponding to all transitions of the protein. Eigenvalues provide information about transition energies and the eigenvectors describe the mixing of local transitions. The rotational strength can be obtained from eigenvectors. In this work, the algorithm developed by the Woody group 238,244 was used to compute the mean residue elliptcity. Detailed des cription of their algorithm can be found in the paper of Sreerama and Woody. The peptide transitions (two transitions at 140 and 190 nm, respectively and one transition at 220 nm ) were computed using the Matrix method 241 in the origin independen t form 245 Transition charge distributions (monopole charges) are obtained from INDO/S 246 semi empir ical electronic structure calculations. Side chain transitions of phenylalanine, tyrosine and tryptophan were also included in the calculation s T helix formation can be characterized by two negative bands at 208 and 222 nm, and a positive band at 192 nm. Following the experiments performed by the Baldwin group, the mean residue ellipticity at 222 nm ([ ] 222 ) was calculated to generate the pH p rofile. In practice, protein structure in PDB format and yields the mean residue ellipticity and the rotational strength as a function of wavelength. Therefore, the ptraj module of the AMBER 10 package has been utilized to

PAGE 150

150 gene rate a protein structural ensemble in order to find out an ensemble average of the mean residue ellipticity at 222 nm. 4.3 Results and Discussion 4.3.1 Testing Structural Convergence Conformational sampling convergence is investigat ed utilizing cluster an alysis, as described earlier Two ways of checking conformational sampling of the simulations from the fully extended structure are utilized. One way is to compare the first and the second halves of the trajectory and the other way is to compare to the str uctural ensembles produced by simulations starting from a fully helical structure. The R 2 values of the cross clustering are listed in Table 4 1. Plots demonstrating the cluster population correlations from both ways at pH 2 are showed in Figure 4 1 as an example. The large R 2 values indicate that converged structural ensembles are achieved through 40 ns simulations. Figure 4 1. Cluster population at 300 K from constant pH REMD simulations at pH 2. A) Cluster analysis is performed on the trajectory init iated from fully extended structure. The populations in each cluster from the first and second half of the trajec tory are compared and plotted. B) Two REMD runs from different starting structures at pH 2. Correlation coefficients at other p H values can be found in Table 4 1.

PAGE 151

151 Table 4 1. Correlation coefficients between two sets of cluster populations. pH = 2 pH = 3 pH = 4 pH = 5 pH = 6.5 pH = 8 R 2 (E vs E) 0.90 0.92 0.90 0.94 0.93 0.85 R 2 (E vs H ) 0.95 ----0.88 --0.84 E vs E means compa ring the first and the second halves of the trajectories starting from the fully extended structure. E vs H stands for comparing structural ensemble given by simulations starting from fully extended and fully helical structures, respectively. 4.3.2 p K a Ca lculation and Convergence Four residues of the C peptide are titratable in our constant pH REMD simulations: Glu2, Lys7, Glu9 and His12. Lys7 is always protonated in the pH range of 2 to 8, as expected. Thus, only the data from glutamate and histidine resi dues are analyzed. For each glutamate and histidine residue, the fraction of deprotonation at each pH value is K a value. The p K a values are 3.1, 3.7 and 6.5 for Glu2, Glu9 and His12 respectively. The cumulative average fraction of protonation vs constant pH MC attempts is chosen to study the convergence of the p K a calculation. The cumulative average fraction of protonation represents the time evolution of the protonation state sampling. As shown in Fig ure 4 2 a stabilized fraction of protonation is achieved through 40 ns simulations. 4.3.3 The Mean Residue Ellipticity of the C peptide The mean residue ellipticity of the C peptide at each pH value and at 300 K was computed. The pH profile of the [ ] 222 ( Figure 4 3 ) is clearly a bell shaped curve, in agreement to the experimental pH profile of the [ ] 222 The maximum of our calculated

PAGE 152

152 [ ] 222 is at pH value of 5, with a numerical value of ~ 6400 degcm 2 dmol 1 However, the computed values of [ ] 222 at the ends (pH = 2, 3, and 8) suggest that the helix is more populated in the simulations than in experiments at those pH values. Figure 4 2. Cumulative average fraction of protonation vs Monte Carlo (MC) steps. Only the two glutamate residues are show n here and the histidine residue is found to show the same trend. The pH values are selected such that the overall average fraction of protonation is close to 0.5. As mentioned in the section 2.2.2 the protonation state model involves using parameters fit ted at 300 K, thus results obtained at temperatures other than 300 K should be viewed qualitatively, not quantitatively. C peptide at a temperature lower than 300 K shows a more negative [ ] 222 (more helical) while the [ ] 222 becomes less negative (less h elical) when the temperature is higher than 300 K. E xperiments showed that the pH profile becomes flat at high temperatures. 5 Our results also reflect the same trend: pH profile of the [ ] 222 at 420 K is flat and less negative than those at 300 K, while the pH profile at 280 K is still bell shaped and more negative.

PAGE 153

153 Figure 4 3. Computed the mean residue ellipticity at 222 nm as a function of pH values. A bell shaped curve at 300 K is ob tained with a maximum at pH 5. The effect of temperature on mean residue ellipticity at 222 nm is also demonstrated. 4.3.4 Helical Structure s in the C peptide In order to examine the helical conformations in different environments, constant pH REMD at pH v alues 2, 5, and 8 are selected to represent the pH range. The secondary structures of the C peptide were computed utilizing the DSSP algorithm. 236 Any residue which according to the DSSP algorithm belongs to the 3 10 helix conformation is called helical. The helical percentages of each residue are shown in Figure 4 4 The maximum helical percentage of a residue is ~ 55% at pH 2 and 5, and the maximum helical percentage is ~ 40% a t pH 8. The averaged helical percentage at pH 5 is around 30%, which is in good agreement with experiments (29 2%) Figure 4 4 suggests that the C peptide contains a lot of non helical structures, even at pH 5 where the helical content is maximal.

PAGE 154

154 Figure 4 4. Helical Content as a function of residue number. We calculated the C RMSD vs the fully folded structure (the fully helical structure has a C RMSD of 0.8 relative to the ribonuclease A X ray structure Thr3 to His12 are chosen to calculate C RMSD ) and the C radius of gyration ( R g ) of the C peptide. The time series and the probability density of RMSDs and R g are illustrated in Figure 4 5. According to Figure 4 5B, two conformations can be seen at all three pH values. The conformation with the smaller RMSD represents structures closer to the fully helical structure and t he structural ensemble at pH 5 possesses more such kind of structures than the other two structural ensembles. Figure 4 5D demonstrate the probability density of the R g and it suggests that the C peptide is more compact at pH 5 than at pH 2 and 8. The res ults of R g agree with the results of RMSDs because the helical structures are more compact.

PAGE 155

155 Figure 4 5. A) Time series of C RMSDs vs the fully helical structure at pH 5. The first two residues at each end are not selected because the ends are very f lexible. B) Probability densitie s of the C RMSDs. Clearly, the structural ensemble at pH 5 contains more structures s imilar to the fully helical structure. C) Time series of C radius of gyration at pH 5. D) Probability density of the C radius of gyratio n. More compact structures are found at pH 5. We further studied the details of the C peptide structural ensemble with respect to pH values. The studies of helical structure were on the basis of our DSSP results. We first show the probability density of to tal number of helical residues at pH 2, 5 and 8 in Figure 4 6 A As expected, simulations at pH 5 generated the smallest number of non helical structures and the percentage is ~ 25%. Simulation at pH 8 generated the most non helical structures and ~ 37% of the structural ensemble possesses no helical

PAGE 156

156 residue. For those structures possessing helical residues, structures having four helical residues are the most probable and structures containing three helical residues are also common at all three pH values. B esides, structures possessing six helical residues are also found Furthermore, simulation at pH 5 yielded more configurations possessing seven residue and longer helices. Thus, longer helical chains are formed more often at pH 5. Figure 4 6 A) Prob ability densitie s of number of helical residues in the C peptide. B) Probability densities of the number of helical segments in the C peptide. A helical segment contains continuous helical residues. The probability of forming the second helical segment is very low at all three pH values, thus only the first helical segment is further studied. C) Probability densities of the starting position of a helical segment. D) Probability densities of the length of a helical segment (number of residues in a helical se gment).

PAGE 157

157 Next, the number of helical segment s ( a helical segment contains continuous helical residues) is studied and shown in Figure 4 6B The number of helical segment ranges from zero to two at all three pH values. However, C peptide structures having tw o helical segments are really rare. The probability densities of having two helical segments at pH 2 and 8 are ~ 0.05, while that at pH 5 is ~ 0.1. Due to the small population of the second helical segment, the analysis of the helical length (number of hel ical residues in a segment) and the helix starting position (residue number of the amino acid initiating a helical segment) is focused on the first helical segment Figure 4 6C demonstrates the probability density of helix starting position in the C peptid e. The helix starting position is affected by pH. The most probable starting position is affected by solution pH. At pH 2, Lys7 is the most favorable position to start a helix but the most probable place to initiate a helix is Thr3 at pH 5 and 8. At pH 2 a nd 5, Thr3, Ala6 and Lys7 are favorable positions to start a helix, while Thr3 and Lys7 are the favorable place to start a helix at pH 8. However, the effect of solution pH on the helical segment length is not as significant as the effect on helix starting position. Figure 4 6D shows that the three residue or four residue helices are dominant at all three pH values. 4.3.5 The Two Dimensional Probability Densit ies T wo dimensional (2D) probability density can be employed to study the correlations between impo rtant variables. The peaks in the plots indicate the coupling between two variables and represent stable conformations. The more populated a region is, the more stable the corresponding conformation is. The 2D probability densities between helix starting p osition and helical length are illustrated in Figure s 4 7 to 4 9 Helices consisting of Thr3 Ala5, Lys7 Arg10 and Glu9 His12 are present at all

PAGE 158

158 three pH values, while the number of helical conformations is more at pH 5 and 8. At pH 2 and 5, the most probab le helix formation is the four residue helix starting from Lys7 (Lys7 Arg 10). The 2D probability densitie s reveal that the six residue (Lys7 His12) helix and the seven residue (Ala6 His12) helix are found stable at pH 5. At pH 8, Thr3 Ala5 becomes the most favorable helical formation. Lys7 Arg10 and Lys7 His12 are also favorable. At pH 8, a new seven resi due helix (Thr3 Glu9) is found. Figure 4 7 2D probability density of helical starting position and helical length, pH = 2. Figure 4 8 2D probability density of helical starting position and helical length, pH=5.

PAGE 159

159 Figure 4 9 2D probability density of helical starting position and helical length, pH=8. 2D probability densitie s correlating helical length and C RMSDs relative to fully helical structure are shown in Figure s 4 1 0 to 4 1 2 As expected, structures having long helices (helical length > 7) correspond to the conformations with RMSDs smaller than 2.2 and this region is more populated at pH 5. Interestingly, configurations possessing four resi due helix can also yield RMSDs smaller than 2.2 suggesting that structures having partial helix can be similar to the fully helical too. Figure 4 1 0 2D probability density of helical length and C RMSD at pH = 2.

PAGE 160

160 Figure 4 1 1 2D probability density of helical length and C RMSD at pH = 5. Figure 4 1 2 2D probability density of helical length and C RMSD at pH = 8. 4.3.6 Important Electrostatic Interactions: Lys1 Glu9 and Glu2 Arg10 The salt b ridge between Glu2 and Arg10 was found in the X ray structure of RNase A. 247 Amino acid subst itution experiments on the C peptide indicated this salt bridge is crucial to the increase in helical content when the pH value is increasing to pH

PAGE 161

161 5. 7,224 Proton NMR experiments done by Osterhout et al 225 suggested that this salt bridge stabilizes partial helix instead of complete helix. They proposed that the RN 24 structural ensemble contains three major conformations: unfolded, complete folded and partial helix with Glu2 Arg10 interaction. Hansmann et al 229 also proposed that the salt bridge stabilizes partial helix by performing multicanonical simulations. Felts et al 227 foun d that the salt bridge is only significantly found in the globular non helical C peptide structures. Sugita and Okamoto 233 studied the C peptide using multicanonical REM and explicit solvent. They found that Glu2 Arg10 salt bridge does not stabilize helix directly, but to stop the helix extending to the N terminus. In the REX CPHMD study performed by Khandogin et al. they found that Lys1 Glu9, instead of Glu2 Arg10, contributes to the helix formati on. Th e Lys1 Glu9 and Glu2 Arg10 interactions are studied in our work. Figure 4 1 3 A and 4 1 3 B show the probability density vs charge distance of the two interactions at pH 2, 5 and 8. At pH 2, neither Lys1 Glu9 nor Glu2 Arg10 salt bridge is formed consistent wi th mostly protonated glutamate At pH 5 and 8, Glu2 Arg10 salt bridge is clearly formed ( Figure 4 1 3 A) while the Lys1 Glu9 salt bridge is formed in a much less extent (Figure 4 1 3 B) Figure 4 1 4 shows the correlation between the two salt bridges at pH 5. C learly, the two salt bridges cannot be formed at the same time. The effect of Glu2 Arg10 salt bridge on helical structure formation can be refl ected by conditional probabilities The probabilities of finding helical residue(s) given that the Glu2 Arg10 sal t bridge is formed are calculated at pH 2, 5 and 8. The conditional probabilities are 0.64, 0.73 and 0.63, respectively. Although at pH 2, the probability of forming Glu2

PAGE 162

162 Arg10 salt bridge is low (~ 1%), the chance of having a helical structure is 63% once it is formed. Th i s clearly shows the stabilizing effect of Glu2 Arg10 on helix formation. Figure 4 1 3 A) Probability density of Lys1 Glu9 distance (). The distance is the minimum distance between the side chain nitrogen atom of Lys1 and the side chai n ca rboxylic oxygen atoms of Glu9. B) Probability density of Glu2 Arg10 distance (). The distance is the minimum distance between side chain carboxylic oxygen atoms of Glu2 and guanidinium nitrogen atoms of Arg10. Figure 4 1 4 Two dimensional probabilit y density of Lys1 Glu9 and Glu2 Arg10 at pH 5. Apparently, Lys1 Glu9 and Glu2 Arg10 salt bridges cannot be formed simultaneously.

PAGE 163

163 The correlation between Glu2 Arg10 salt bridge and helical length, and helix starting position are further studied. Figure 4 1 5 A shows that the Glu2 Arg10 salt bridge can be found in non helical configurations, four residue and six residue helices at pH 5. Moreover, in the six residue helix, the Glu2 Arg10 salt bridge is always formed The same pattern is obtained at pH 8, thus t he pH 8 results are not shown here. Figure 4 1 5 B shows the correlation between the salt bridge and helix starting position at pH 5. When a helix is initiated at Thr3, the salt bridge is not formed. When a helix begins at Ala4, Lys7 and residues behind Lys7 only the salt bridge is seen. However, in the non helical configurations and helices begin at Ala6, both states are found. Besides, Lys7 is the most probable place to initiate a helix when the salt bridge is formed. Again, no salt bridge is found when a helix starts at Thr3. Combining the correlations between Glu2 Arg10 and helical length, and Glu2 Arg10 and helix starting position, the salt bridge clearly has the effect that preventing forming helices near the N terminus and stabilizing partial helix nea r the C terminus (Lys7 Arg10 and Lys7 His12). A B Figure 4 15. A) Two dimensional probability density of Glu2 Arg10 salt bridge formation and helical length at pH 5. According to the plot, the Glu2 Arg10 salt bridge can be found in four residue, six resi due and non helical structures. B) Two dimensional probability density of Glu2 Arg10 salt bridge and the helix starting position at pH 5. If a helix begins from Thr3, it cannot have a Glu2 Arg10 salt bridge. Thus, one role of the Glu2 Arg10 salt bridge is to prevent helix formation from Thr3.

PAGE 164

164 4.3.7 Important Electrostatic Interactions: Phe8 His12 His12 is believed to be responsible for the decrease in helical content when solution pH values increase from 5 to 8. 226 His12 was found to interact with Phe8. 221 However, the nature of the Phe8 His12 interaction is not completely clear. A weak hydrogen bond between the charged side chain of His12 ( proton donor) and the aromatic ring of Phe8 ( proton acceptor) is supported by the configuration in RNase A X ray structure 247 and ion screening experiments 222,226 but is in contrast to proton NMR experiment s 221 A contact between the aromatic ring of His12 and backbone carbonyl oxygen of Phe8 has been proposed to explain the proton NMR results. Sugita and Okamoto studied the interaction between the aromatic ring of Phe8 and the charged ring of His12. 233 They observed the contact between two rings has been made and stabilize s helix near the C terminus. However, the REX CPHMD results showed that the interaction between backbone carbonyl oxygen of Phe8 and the charged side chain of His12 is responsible for the increased helical content at pH 5. 112 Figure 4 1 6 A) Probability density of Phe8 backbone to His12 ring distance. The distan ce is the minimum distance between Phe8 backbone carbonyl oxygen atom and His12 imidazole nitrogen atoms B) Probability density of Phe8 ring to His12 ring distance. The distance is the minimum distance between Phe8 aromatic ring carbon atoms and His12 imi dazole nitrogen atoms.

PAGE 165

165 Figure 4 16. Continued We also studied ring ring and backbone ring interactions between Phe8 and His12 at pH 2, 5 and 8. The ring ring interaction is represented by minimum distance between aromatic atoms in Phe8 and the two side c hain nitrogen atoms of His12. The backbone ring interaction is represented by minimum distance between backbone carbonyl oxygen atom of Phe8 and the two side chain nitrogen atoms of His12. Figure 4 1 6 A and 4 1 6 B show the probability densities of each dista nce at three pH values. We found that the backbone ring contact is made at all three pH values. However, forming such a contact at pH 8 is much less favorable than doing that at pH 5. Interestingly, Phe8 backbone and His12 ring close contact and Glu2 Arg10 salt bridge formation are coupled (Figure 4 17 ). The ring ring contact is observed at pH 5 but not at pH 8. At pH 2, the ring ring contact is formed but is much less probable. More importantly, the integrated probability of making a backbone ring contact is larger than the integrated probability of forming a ring ring contact at pH 2 and 5. In order to separate configurations making a contact from the rest, a cutoff distance of 4.0 and 5.0 is adopted, in the case of backbone ring and ring ring contact, respectively. The integrated

PAGE 166

166 probability (area under the curve) of making backbone ring contact and ring ring contact is 0.34 and 0.22, respectively, at pH 5. The integrated probability is 0.23 and 0.14, respectively, at pH 2. Thus, the Phe8 backbone His1 2 ring interaction is the major form of the contact. A B Figure 4 17 A) Two dimensional probability density of Glu2 Arg10 distance and Phe8 His12 backbone to ring distance at pH 5. B) Correlations between Glu2 Arg10 salt bridge and Phe8 His12 contact at pH 5. We further examine the correlation between the Phe8 backbone His12 ring contact and helical properties such as helical length and helix starting position. The backbone ring contact is found in the four residue and six residue helices at pH 2 and 5. At pH 8, it can be seen in the four residue he lix. The 2D probability densitie s are similar at the three pH values, thus only the plot at pH 5 is shown as an example ( Figure 4 18 A and 4 18 B). Similar to the Glu2 Arg10 salt bridge, Lys7 is the most favorabl e place to initiate a helix with a contact between Phe8 and His12. Thus, the Phe8 His12 backbone ring contact stabilizes the helix formation near the C terminus (Lys7 to Arg10 and Lys7 to His12). However, unlike the Glu2 Arg10 interaction, helix formation initiated from Thr3 is able to form a contact between Phe8 and His12. Phe8 His12 contact does not affect helix formation near the N terminus.

PAGE 167

167 Figure 4 18 A) Two dimensional probability density of helical segment leng th and Phe8 His12 interaction. B) Tw o dimensional probability density of helical segment starting position and Phe8 His12 interaction. Phe8 His12 also stabilizes four residue and six residue structures. Helices begin at Lys7 and Phe8 His12 is coupled. Unlike Glu2 Arg10, Phe8 His12 stabilizes helices starting from Thr3. 4.3.8 Cluster Analysis Results Cluster analysis is performed to find out significant conformations and to examine important electrostatic interactions. The structures at pH 5 are clustered because both Glu2 Arg10 and Phe8 His1 2 contacts are more probable than at pH 2 or 8 so that the contacts can be studies in clusters. The top 20 populated clusters and their average helical percentage is plotted in Figure 4 19 A. The most populated cluster shows the largest average helical cont ent and the second most populated cluster shows a much lower helical content (close to the lowest among 20 clusters). The most populated cluster corresponds to the conformation yielding small C RMSDs (< 2.2 ) relative to the fully helical structure (Figu re 4 19 B). Interestingly, the plot showing helical percentage vs the residue number (Figure 4 19 C) reveals that the second most populated cluster only shows helical structures between Lys7 and His12. Thus, helices are only formed near the C terminus. Figur e 4 19 D demonstrates the probability density

PAGE 168

168 of the Glu2 Arg10 and Phe8 His12 interactions. Compare with the corresponding probability densities on the basis of the entire structural ensemble, forming a contact between Glu2 Arg10, and Phe8 His12 is more pr obable in the structures belong to the second most populated cluster than in the entire structural ensemble. This is especially obvious for the Glu2 Arg10 interaction. Results obtained from the second most populated cluster confirm that Glu2 Arg10 and Phe8 His12 contacts, especially the Glu2 Arg10 contact, stabilize partial helix formation near the C terminus. 4.4 Conclusions In this chapter, we have studied the pH dependent helix formation of the C peptide of ribonuclease A using constant pH REMD simulatio ns. The mean residue ellipticity at 222 nm at each pH value is computed and utilized to gauge helical content. The pH profile clearly demonstrates a bell shaped curved with a maximal helicity at pH 5, in good agreement with experimental results. The pH eff ect on the C peptide structural ensembles is studied at three representative pH values: 2, 5 and 8, representing the two ends in the pH profile and the pH value yielding the maximum helical content. At pH 2, helices consisting of Thr3 Ala5, Lys7 Arg10 and Glu9 His12 are formed and the Lys7 Arg10 is the most stable one. At pH 5, additional six residue (Lys7 His12) and seven residue (Ala6 His12) helices are stable helices but the most probable helix is the same as that at pH 2. At pH 8, the most favorable hel ix switched to Thr3 Ala5. Lys7 His12 and a new seven residue helix (Thr3 Glu9) are also present. Glu2 Arg10 salt bridge formation and its role in the helix formation are studied. We find that the salt bridge is formed and is more probable at pH 5. The Glu2 Arg10 salt bridge is found to stabilize helix formation near the C terminus. The nature of Phe8 His12 interaction and its role in helix formation are also explored. Backbone carbonyl

PAGE 169

169 oxygen of Phe8 and side chain charge of His12 contact is the major form. The role of Phe8 and His12 contact is similar to that of the Glu2 Arg10 salt bridge. Results from cluster analysis on trajectory generated at pH 5 confirmed the effects of Glu2 Arg10 and Phe8 His12 interactions. Figure 4 19 A) Top 20 populated clus ters and average helical perc entage. B) Probability densities of the C RMSD vs the fully helical structure of the top 2 populated clusters. C) Helical Percentage as a function of residue number of the top 2 populated clusters. D) Probability density of th e Glu2 Arg10 and Phe8 backbone His12 ring interactions in the second most populated cluster.

PAGE 170

170 CHAPTER 5 CONSTANT pH REMD: p K a CALCULATIONS OF HEN EGG WHITE LYSOZYME 5 .1 Introduction Hen egg white lysozyme (HEWL shown in Figure 5 1 ) has been selected to tes t p K a prediction methods or constant pH methods for a long time 125 This protein is a 129 amino acids enzyme and is the first enzyme to have its three dimensional structure determined by X ray crystallography 248,249 Lysozyme can be found in the secretions such as tears and saliva The function of this enzyme is to catalyze the hydrolysis of a polysaccharide and the reaction has an optimal pH around 5. 125 By hydrolyzing polysaccharides, lysozyme can damage the cell walls of certain bacteria. HEWL is a monomeric single domain enzyme whose active site is situated in a cleft between two regions. Two residues are crucial to the catalysis, Glu35 and Asp52. During the hydrolysis, a covalent enzyme substrate intermediate is formed. 249 In this process, Glu35 acts as the proton donor and Asp52 becomes the nucleophile. 249 The starting point of the catalytic mechanism is the donation of a proton from Glu35 to the substrate. Then, Asp52 will attack the anomeric carbon of the substrate and form a covalent bond with the substrate. In the final step, the enzyme substrate c omplex is hydrolyzed by a water molecule and the initial protonation states of Glu35 and Asp52 are restored. HEWL has been a good test system of p K a prediction studies for several reasons. First, a ccurate predicting the p K a values of both ionizable residue s in active site can help people identify proton donor and nucleophile in HEWL according to a simple criterion proposed by Nielsen and McCammon in 2003. 250 They proposed that if catalytic mechanism involves two acidic residues, then the proton donor should have a p K a value of at least 5.0 and the p K a of nucleophile should b e at least 1.5 pH units lower

PAGE 171

171 than that of proton donor. Second the p K a values of HEWL acidic residues were determined by Bartik et al 251 using t wo dimensional proton NMR. It shows several ionizable resid ues having p K a values much different from their intrinsic p K a values. Furthermore, there are more than 100 PDB entries of the wild type HEWL structure the effect of structural variation can be tested for p K a calculation methods, especially for the FDPB me thod. 250 Thus, our constant pH REMD method will be tested on HEWL. Figure 5 1. Crystal structure of HEWL (PDB code 1AKI). Residues in red represent aspartate and residues in blue are glutamate. Various constant pH methods have been tested on HEWL. Burgi et al 130 utilized their constant pH method to predict p K a values of HEWL. The RMS error between predicted and experimental p K a values was determined to be from 2.8 to 3.8 pH units. In 2004, Lee et al 114 applied their CPHMD method to four proteins: turkey ovomocoid (PDB code 1OMT), bovine trypsin inhibitor (1 BPI), HEWL (193L) and ribonuclease A (7RSA). The overall p K a RMS error relative to experimental data was around 1 pH unit.

PAGE 172

172 For HEWL, the average absolute error of all ionizable residues (including the termini) was 1.6 pH units, while the average absolute e rror of p K a values of acidic ionizable residues relative to experimental data was 1.5 pH uni ts. However, the p K a values of Glu35 and Asp52 were both 5.8, indicating that CPHMD results were not able to predict proton donor and nucleophile. In the same year, Mongan et al 127 published their discrete protonation state constant pH MD method. HEWL was also selected as the test system. In the study of performed by Mongan et al ., four different crystal structures of HEWL w ere utilized (1AKI, 1LSA, 3LZT and 4LYT). The RMSD of p K a values of all ionizable residues relative to experimental results were 0.86, 0.77, 0.88 and 0.95 for 1AKI, 1LSA, 3LZT, and 4LYT, respectively In addition to p K a predictions, Mongan et al also st udied protonation conformation correlation. Principal component analysis of a trajectory was conducted and projected onto the first two (largest eigenvalues) eigenvectors and association between conformation and protonation was observed. In 2006, Khandogin and Brooks 110 utilized REX CPHMD method to predict p K a values of 10 proteins. The RMS error values between REX CP HMD and experimental p K a values ranged from 0.6 to slightly greater than 1 pH unit. For HEWL, the RMS error between predicted and experimental p K a values was 0.6 pH unit and the maximum absolute error is 1.0 pH unit. So far, their HEWL p K a prediction RMS e rror is the smallest among constant pH p K a calculations on HEWL. Machuqueiro and Baptista presented HEWL p K a predictions from their stochastic titration constant pH MD with explicit water model in 2008 125 The RMS error between predicted and experimental p K a values were 0.82, and 1.13 for generalized reaction field, 252 and PME 154 treatment of long range electrostatics, respectively. A comparative FDPB calculation (single crystal structure

PAGE 173

173 which is the same as that utilized in constant pH MD and a protein dielectric constant of 2) was also con ducted and the RMS error was found to be 2.76. Since the constant pH method proposed by Baptista requires FDPB calculation, the selection of dielectric constant inside the protein was crucial. Machuqueiro and Baptista performed constant pH MD utilizing thr ee different dielectric constants ( =2, 4, and 8) combined with PME treatment of long range electrostatics The p K a RMS error values were 1.13, 1.02, and 1.12 for = 2, 4, and 8, respectively. More recently, the constant pH MD proposed by Mongan et al 127 was coupled with accel erated molecular dynamics (AMD) 133,134 and tested on HEWL by Williams et al 129 C onstant pH AMD and MD simulations of 5 ns in length have been performed. Only acidic ionizable residues in HEWL were taken into consideration by constant pH scheme. RMS error values between predicted and experimental p K a values were calculated. The constant pH AMD yielded an overall RMS error value of 0.73, while the original constant pH MD p K a RMS error was 0.80. The p K a RMS error of aspartates were 0.75, and 1.46 from constant pH AMD, and MD, respectively. The p K a RMS error of glutamates were 0.85, and 1.04 from constant pH AMD, and MD, respectively. In general, recent works utilizing various constant pH schemes have achieved RMS error values in the range of 0.6~1.13 for HEWL. In this chapter we present a study of HEWL using constant pH REMD algorithm. Both structural restrained and unrestrained simulations were done. p K a values from constant pH REMD are compared with ex per imental values We also investigated the p K a convergence, effect of structural restraint and conformation protonation correlations

PAGE 174

174 5.2 Simulation Details Crystal structure 1AKI (PDB code) has been taken as HEWL starting structure in our study. Water mo lecules in the crystal structure were striped first. Only aspartate and glutamate residues were studied so there are nine ionizable residues selected. H ydrogen atoms were added by the LEaP module in the AMBER suite. The post processed crystal structure was then minimized and heated from 0 K to 300 K. The restart structure from the heating process was taken as the initial structure for our constant pH REMD simulations. In this chapter all REMD runs refer to constant pH REMD simulations for simplicity The p H range was from 2 to 6 in an increment of 0.5 pH unit. Two sets of REMD simulations were performed: the unrestrained ones (ntr=0 in AMBER) and the restrained ones (ntr=1 in AMBER) In each REMD run, an exchange of structures was attempted every 500 MD st eps. 1000 exchange attempts were intended to use for both sets. Thus Simulation time of each replica in each set is 1 ns. In the unrestrained REMD runs, we chose the highest temperature to be 320 K in the hope that HEWL will not unfold at all temperature s In the restrained REMD runs, C atoms from residue 3 to 126 were restrained by harmonic potential s. The restraining harmonic potential has the following form: = 1 2 2 where and are Cartesian coordinates at current time a nd Cartesian coordinates of the reference structure, respectively, is the force constant of the harmonic potential which determines the strength of a restraint. In our simulations, the reference coordinates are the initial C atoms coordinates. By putti ng restraining harmonic potential on C atoms, the secondary structure of HEWL will be preserved and the highest temperature will be

PAGE 175

175 increase to 420 K in order to achieve better side chain conformational sampling The force constant of the harmonic potenti als wa s 1.0 kcal/mol 2 ( setting restraint_wt=1 in AMBER ) Several other REMD simulations were done according to results from the two sets of REMD runs. The general goal of those simulations was to test what we proposed from the two previous sets. First, a nother 1 ns constant pH REMD simulation with restraint on C atoms was continued for all the pH values in order to check the p K a convergence of the restrained simulations. Likewise, 1000 exchange attempts were cond ucted in those 1 ns simulations and the restraint strength is still 1.0 kcal/mol 2 Second, a new se t of constant pH REMD simulations with restraint on C atoms was performed. The force constant adopted in the second set was 0.1 kcal/mol 2 so that the effect of restraint strength can be tested The details of constant pH REMD simulations can be found in Table 5 1 Table 5 1. Simulation de tails of constant pH REMD runs pH values R estrained or not Restraint Strength Number of Replicas Temperature (K) Simulation Time (ns) Exchange Attempts 2~6 No 0 4 280~ 3 20 1 1000 2~6 Yes 1 8 280~420 2 2000 3, 4, 4.5 Ye s 0.1 8 280~420 2 2000 The restraint strength was represented by the force constant of a harmonic potential The unit of force constant is kcal/mol 2 For the REMD simulation with 1 kcal/mol 2 restraint, it was actually performed in two st age s. Each sta ge lasted for 1 ns and the purpose of the second stage was to check the p K a convergence. All simulations were done using the AMBER 9 molecular simulation suite 253 with the AMBER ff99SB force fields. 139 The SHAKE algorithm 145 was used to allow a 2 fs time step. OBC Generalized Born implicit solvent model 200 was used to model water

PAGE 176

176 environment in all our calculations. The Berendsen thermostat, 146 with a relaxation time of 2 ps, was used to keep the replica temperature around their target values. Salt concentration (Debye Huckel based) was set at 0.1M. The cutoff for nonbonded interaction and the Born radii was 30 5 .3 Protein C onformational and P rotonation State E quilibrium Model Suppose an ionizable side chain has o nly two conformations in equilibrium and each conformer has its own equilibrium in protonation state. We can use 1p, 1d, 2p and 2d to label conformer 1 in protonated form, conformer 1 in deprotonated form, conformer 2 in protonated form and conformer 2 in deprotonated form respectively. The equilibrium among all species is demonstrated in Figure 5 2 Figure 5 2 A simple schematic view of the c onformation p rotonation equilibrium in a c onstant pH simulation. Then, 12 the equilibrium constant between conformation 1 and 2 is (5 1) In the above model, p K a,1 and p K a,2 represent protonation equilibrium within each conformation. They can be expressed as: (5 2)

PAGE 177

177 and (5 3) So, the p K a of that ionizable residue is (5 4) 5.4 NMR Chemical Shift Calculations Theoretical NMR chemical shift titration curve was generated. Due to the limitation of system size, full quantum mechanical NMR calculations were performed only on i onizable residue dipeptide (ionizable residue with two ends blocked). The structure of ionizable dipeptide was extracted from the representative structures (representing different side chain conformations) generated from cluster analysis. Proper protonatio n states were assigned for each structure. All full quantum mechanical NMR calculations were done in Gaussian03 software package 254 using B3LYP functional and 6 311++G** basis set. Isotropic magnetic shielding constants were computed in vacuum using GIAO method. 255 Tetramethylsilane (TMS) was used as reference in order to obtain the ch emical shift. Recently, Merz and co workers 256 developed an a utomated fragmentation quamtum mechanical/molecular mechanical (AF QM/MM) approach to study protein properties. They have applied their method to compute protein chemical shift of Trp Cage. In this AF QM/MM model, one residue and the atoms near it (less th an 4 ) are assigned to the QM region and the rest of a protein will be put into the MM region. During NMR calculations, all atoms in the MM region will be viewed as point charges.

PAGE 178

178 We applied this AF QM/MM method to 1AKI to calculate chemical shift as well Again, all AF QM/MM calculations were based on representative structures. 5 .5 Results and Discussions 5 .5 .1 Structural Stability and p K a Convergence Since changing protonation state during simulation will cause discontinuity in force and energy, structur al stability in our simulations is important We chose to use C atoms root mean square deviation (RMSD) vs 1AKI structure as our metric. Figure 5 3 A shows us the C RMSD vs time in unrestrained REMD runs. In Figure 5 3 A, HEWL is instable at all the pH sim ulated. The RMSD can reach a very high value (~ 18 ) during simulations. Even at pH=4 where C RMSD values are small relative to the rest, the C RMSD can still go beyond 3 p K a be used. Figure 5 3 B show s the RMSDs in the restrained REMD runs. Although the RMSD values are small and stable throughout 2 ns simulations, the restrained REMD simulations still reveal problems according to Figure 5 3 B Our simulations use 1AKI which is resolved at pH=4.5 as sta rting structure. As pH is moving away from 4.5, one may expect HEWL will adopt conformations a little bit different from 1AKI. So a bigger RMSD should be expected where the pH value is far away from 4.5. This behavior has been confirmed in the work of Mong an et al However, putt ing restraint on C atoms results in the same RMSDs in the entire pH range. This may have negative effect on p K a predictions at pH values far away from 4.5.

PAGE 179

179 A B Figure 5 3 C RMSD vs crustal structure (PDB code: 1AKI). A) C RMSD vs 1AKI from REMD without restraint on C B) C RMSD vs 1AKI f rom REMD with restraint on C The restraint strength is 1 kcal/molA 2 In order to check protonation state sampling convergence from the restrained REMD simulations, p K a prediction error (pre d icted value minus experimental value ) against time as well as time evolution of prediction deviation ( predicted p K a value at

PAGE 180

180 current time minus the final predicted p K a value) are followed and demonstrated in Figure 5 4 and 5 5 According to those plots, st abilizations in p K a predictions are seen change average p K a predictions and their errors relative to experimental values. In order to show convergence in protonation state sampling is reached in a wide range of pH, a representative plot of Asp52 p K a deviations are shown in Figur e 5 5 B Convergence is clearly seen over the pH range. Figure 5 4 p K a prediction error as a function of time. T he predicted p K a at a given time is a cumulative result. For each ionizable residue, the time series of its p K a error is generated at a pH where the average predicted p K a is closest to that pH value. In this way, we try to eliminate any bias toward the energetically favored state. A flat li ne is an indication of convergence. Glu35 is not shown here due to poor convergence.

PAGE 181

181 A B Figure 5 5 A ) p K a prediction convergence to its final value. Similarly, t he p K a value at a given time is a cumulative average A flat line having y value of 0 is expected when p K a calculation convergence is reached. The same pH values are chos en for each ionizable residue as in Figure 5 4 B ) Asp52 p K a prediction convergence to its final value at multiple pH values. The pH values are selected in such a way that the p K a calculated at this pH will be used to compute composite p K a

PAGE 182

182 5 .5 .2 p K a Predictions A popular way to study the accuracy of p K a prediction is to look at the p K a RMS error relative to experimentally measured p K a to generate p K a simulations. Mongan et al proposed a way to calculate p K a their constant pH MD paper. They called p K a values calcul ated in their way c omposite p K a values A composite p K a is an average of all p K a values having an absolute offset less than 2 pH units. Here an offset means the difference between predicted p K a and its corresponding pH values. Table 5 2 shows p K a values and the p K a RMS erro r values from the 2 ns restrained REMD runs. Composite p K a values, p K a va l ues their RMS error values relative to experimental measurements are also listed in Table 5 2. We used the same experimental p K a values as Mongan et al did to calculate p K a RMS error In our work, the p K a predictions from while utiliz ing composite p K a values produces a RMS error value of 0.87. According to con stant pH simulation literatures the RMS error values of HEWL p K a prediction are around 0.8 for acidic ionizable residues. So there is no significant improvement in p K a prediction from our simulations. However, as we mentioned in the structural stability discussion, putting a restraint on C atoms of a protein lowers the ability to adjust its conformations. The further a pH value is away from crystal pH, the more a structure ensemble is skewed from the correct one. Simulations performed around pH 4.5 are less affected by the restraint than s imulations done at pH values far away from 4.5. Since the less a structural ensemble is skewed, the less human error in p K a predictions. So one may expect smaller p K a RMS

PAGE 183

183 error relative to experimental values will be seen around pH 4.5. p K a prediction RMS error relative to experimental values are plotted against pH values in Figure 5 6 As expected, a minimum having RMS error of 0.74 at pH 4. 5 can be found. An RMS error of 0.74 is among the best published HEWL predictions. Table 5 2. Predicted p K a values a nd their RMS errors relative to experimental measurements from the restrained REMD simulations. Exp 251 pH 2 pH 2.5 pH 3 pH 3.5 pH 4 pH 4.5 pH 5 pH 6 Com p Hill Glu7 2.85 3.61 3.58 3.46 3.03 2.99 2.93 2.36 3.37 3.27 3.23 Asp18 2.66 1.59 1.54 1.51 1.61 1.91 2.35 2.5 3.69 1.63 1.4 Glu35 6.2 3.76 3.65 4.36 4.14 4.31 4.53 4.76 4.61 4.27 4.58 Asp48 2.5 1.88 1.98 2.14 2.34 2.6 2.45 1.96 2.9 2.23 2.01 Asp52 3.68 2.71 2.45 2.63 2.82 3.05 2.72 2.77 3.99 2.73 2.68 Asp66 2.0 2.5 2.69 2.86 2.92 3.12 2.72 3.09 4.04 2.8 2.73 Asp87 2.07 2.32 2.43 2.64 2.49 2.54 2.64 2.79 3.62 2.51 2.42 Asp101 4.09 4.52 4.4 4.14 4.03 3.79 3.55 3.44 3.96 3.89 3.85 Asp119 3.2 2.71 2.78 3.01 3.01 3.25 3.01 2.89 3.97 2.96 2.9 RMS Error 1.04 1.1 0.91 0.89 0.83 0.74 0.79 1.12 0.87 0.84 In this table, K a Comp stands for the composite p K a value of an ionizable residue (see Mongan s paper for definition) and Hill stands for the p K a value obtained f rom the Hill s plot. The force constant of the harmonic potential used here is 1 .0 kcal/mol 2

PAGE 184

184 Figure 5 6 RMS error between predicted and experimental p K a vs pH value A minimum of p K a RMS error can be fou nd near the pH at which 1AKI crystal structure is resolved. 5 .5 .3 Constant pH REMD Simulations with a W eaker R estraint Based on what have been found so far, we propose that reducing restraint strength on C atoms will yield better p K a predictions. This is because reducing restraint strength will increa se degree of freedom in conformation sampling. HEWL can relax its structure further, even at pH 4.5 Thus a more accurate structure ensemble can be produced. This, in turn, will improve p K a calculations. Constant pH REMD simulations with a weaker restraint (harmonic potential on C atoms) of 0.1 kcal/mol 2 were carri ed out at three different pH values to test our hypothesis. First, as shown in Figure 5 7 A, all three simulations generate larger C RMSDs relative to 1AKI than those simulations with stronger restraint do This means HEWL relaxes more when a weaker restraint is used. Besides, the C RMSD fluctuations in all three runs are bigger than those in the 1 kcal/mol 2 REMD runs. This means more conformational space is visited. Another

PAGE 185

185 interesting point in the weaker restrained REMD runs is that the C RMSDs at pH 3 and 4 are larger than tho s e at pH 4.5. Simulations at pH 3 and 4 do tend to sample conformations that are different from at pH 4.5. The p K a prediction result s are listed in Table 5 3. p K a prediction deviation from th e final value vs time at pH value of 4.5 is shown in Figure 5 7 B to demonstrate protonation state sampling convergence. According to Table 5 3, n early 0.1 pH unit improvement in the RMS error of predicted p K a values can be seen at each pH for the weakly re strained REMD runs. However, among all three RMS error values, the best one is still obtained at pH 4.5 indicating that restraint is still favoring simulations near pH 4.5. After reducing the restraint strength our best p K a RMS error relative to experimen tal values is 0.62. Table 5 3. Predicted p K a values and their RMS errors relative to experimental measurements from weakly restrained REMD simulations. pH=3 pH=4 pH=4.5 1 0.1 1 0.1 1 0.1 Glu7 3.46 3.71 2.99 3.38 2.93 3.34 Asp18 1.51 1.57 1.91 1.76 2. 35 2.23 Glu35 4.36 5.09 4.31 5.23 4.53 5.24 Asp48 2.14 2.27 2.6 2.48 2.45 2.71 Asp52 2.63 2.47 3.05 2.88 2.72 3.29 Asp66 2.86 2.63 3.12 2.66 2.72 2.93 Asp87 2.64 2.52 2.54 2.79 2.64 2.88 Asp101 4.14 3.82 3.79 3.77 3.55 3.54 Asp119 3.01 2.22 3.25 2.2 1 3.01 3.38 RMS E 0.91 0.84 0.83 0.72 0.74 0.62 In Table 5 3, the number 1 in the second row means the force constant of the restraining potential is 1 kcal/mol 2 while 0.1 stands for 0.1 kcal/mol 2 RMSE stands for RMS Error.

PAGE 186

186 A B Figure 5 7 A) C R MSD of HEWL from weaker restraint REMD simulations. The RMSDs are larger than those with stronger restraints. When comparing RMSDs at different pH for simulations using weaker restraint, RMSDs are greater at pH 3 and 4 than those at pH 4.5. B) p K a predicti on deviation from final value at pH 4.5 from constant pH REMD with 0.1 kcal/mol 2

PAGE 187

187 5 .5 .4 Acti ve Site I onizable R esidue p K a P rediction : Asp52 Accurate calculations of the p K a values of ionizable residue s in active site are important because their protonat ion state s are crucial in enzyme reactions. In the case of HEWL, Asp52 works as a nucleophile. This requires Asp52 to be deprotonated during reactions which has an optimal pH around 5. In both restrained REMD, Asp52 is indeed deprotonated around pH 5. Howe ver, the error of Asp52 relative to experimental value is about 1 pH unit. Mongan and co workers also had the same trend except that a bigger error was obtained in their simulations They claimed that Asp52 Asn46 hydrogen bond caused the very low predicted p K a of Asp52. 127 Asp52 and residues that strongly interact with it (three asparagine residues: Asn44, Asn46 and Asn59) in the crystal structure of 1AKI (hydrogen atoms are added and proper protonation state is ch osen at pH 4.5) are shown in Figure 5 8 We studied those interactions which are represented by atom to atom distances in our REMD simulations. We find that Asp52 is closer to Asn59 and Asn44 rather than to Asn46, indicating that Asp52 has stronger interac tions with Asn59 and Asn44 than with Asn46. Time series of Asp52 carboxylic oxygen atoms to Asn59 and Asn44 ND2 distances at pH 3 are shown in Figure 5 9 As can be seen from Figure 5 9 A and 5 9 B Asp52 and Asn44, A sn59 stay within hydrogen bonding distanc e for a long time at pH as low as 3 Furthermore, hydrogen bonding distances between Asp52 and Asn44 and between Asp52 and Asn59 are coupled. Two oxygen atoms in the carboxylic group of Asp52 are able to work as proton acceptors simultaneously. This means that the deprotonated form of Asp52 is over stabilized by hydrogen bonding even at low pH values

PAGE 188

188 Figure 5 8 Asp52 in the crystal structure of 1AKI Its neighbors that having st rong electrostatic interactions are also shown. A B Figure 5 9 A ) Time s eries of Asp52 carboxylic oxygen atom OD1 to Asn59 and Asn44 ND2 distances at pH 3 in the 1 kcal/mol 2 constant pH REMD run. B) Time series of Asp52 carboxylic oxygen atom OD2 to Asn59 and Asn44 ND2 distances under the same condition Hydrogen bonds which are stabilizing deprotonated Asp52 are formed in a large extent even at a low pH. Next, hydrogen bond analysis was conducted with PTRAJ module in the AMBER suite for both sets of restrained REMD simulations. Hydrogen bonds can be found between Asp52 and a ll three asparagines (Asn44, Asn46, and Asn59) in both sets. The occupation times of Asp52 Asn44 and Asp52 Asn59 hydrogen bonding are longer than

PAGE 189

189 that of Asp52 Asn46 hydrogen bonding. Furthermore, the Asp52 Asn44 and Asp52 Asn59 hydrogen bonding are couple d according to the distances demonstrated in Figure 5 9. Asp52 is protonated only when the entire carboxylic group is pointing away from Asn44 and Asn59. T he Asp52 Asn44 and Asp52 Asn59 hydrogen bonding, not the Asp52 Asn46 hydrogen bonding, is responsible for low predicted p K a value of Asp52. The hydrogen bond contents are similar in both strongly and weakly restrained REMD simulations This indicates that the hydrogen bonding effect on Asp52 in our simulations is too strong. R educing restraint strength do help the conformational sampling of Asp5 2. 5.5 .5 Active S ite I onizable R esidue p K a P rediction : Glu35 Glu35 is another problematic case in our study. In the 1 kcal/mol 2 largest single residue error: the error is almost 2 pH unit s Exc luding Glu35 will lower the p K a RMS error value by nearly 0.2 pH unit. In the 0.1 kcal/mol 2 runs, the p K a value of Glu35 is improved, having an error around 1 pH unit. This is the main reason that smaller p K a RMS error s relative to experimental data are found in all three 0.1 kcal/mol 2 REMD simulations. Although the p K a error of Glu35 in the weakly restrained REMD simulation is large the good news for weakly restrained REMD simulations is that Glu35 can be correctly identified as proton donor based on the criterion pr oposed by Nielsen and McCammon: Glu35 has a p K a value ~5.2 and the p K a difference between Asp52 and Glu35 is greater than 1.5 pH units The predicted p K a value of Glu35 was determined to be 5.32 in the study performed by Mongan et al They claimed that a similar hydrogen bonding effect as Asp52 demonstrated was responsible for the low predicted p K a value of Glu35. 127 However, h ydrogen bonding analysis of our data does not show any significant

PAGE 190

190 hydroge n bonding is formed by Glu35, which is in contrary to what Mongan et al. claimed In the 1AKI crystal structure Glu35 side chain is in the vicinity o f Gln57, Trp108 and Ala110 side chains. Several key distances between Glu35 carboxylic group and Gln57, T rp108 and Ala110 side chains in the crystal structure are listed in Table 5 4. According to Table 5 4, Glu35 is in a hydrophobic region except that a close distance between Glu35 OE2 atom and Ala110 backbone amide nitrogen atom. The hydrophobic effect is t he main reason of an elevated p K a value of Glu35. However, w hen the carboxylic group is pointing toward the Ala110 amide group the deprotonated form of Glu35 will be favored If such a conformation is stable throughout simulations, the predicted p K a value will be smaller than what it supposed to be. We think one reason of a low predicted p K a value is that Glu35 is stuck in conformations stabilizing deprotonated form. But the weakly restrained simulations allow Glu35 to relax structure further and visit con formations stabilizing protonation more frequently Table 5 4. Distance between Glu35 carboxylic oxygen atoms and neighboring residue side chain atoms in 1AKI crystal structure Glu35 OE1 Glu35 OE2 Gln57 CB 3.56 5.25 Gln57 CG 3.85 5.84 Trp108 CB 5.36 3.43 Trp108 CG 5.43 3.94 Trp108 CD1 4.65 3.67 Ala110 N 4.65 3.09 Ala110 CB 4.19 3.48 The unit of all distances in Table 5 4 is Glu35 heavy atom RMSD relative to 1AKI as well as cluster analysis on the basis of those RMSDs are chosen to study Glu35 conformational sampling. Distributions of

PAGE 191

191 heavy atom RMSD which are shown in Figure 5 1 0 show that 2 conformations are found in the strongly restrained simulations : one centered at RMSD ~0. 1 (we label that conformation as conformation 1) and the other centered at ~0.6 (it is labeled as conformation 2) However, an extra conformation (conformation 3) is visited by the weakly restrained REMD simulations. Cluster analysis is employed to separate those conformations. For conformation 2, t he carboxylic gro up of Glu35 points toward the Ala110 amide group in both sets of the restrained REMD runs (Figure 5 11) The carboxylic group in conformation 1 also points toward the Ala110 amide group, although in a lesser extent. However, conformation 3 (shown in the we akly restrained runs only) contains configurations in which Glu35 carboxylic group is pointing away from Ala110 amide group (Figure 5 12B) In this conformation the Glu35 side chain is in the hydrophobic region and the protonated species is favored. A too l ow percentage of conformation 3 i s responsible for the low predicted p K a value of Glu35. A Figure 5 1 0 A) Time series of the Glu35 heavy atoms (excluding two carboxylic oxygen atoms) RMSD relative to crystal structure 1AKI B) Probability distribution of the RMSD. The conformation centered at RMSD ~0.1 is labeled as conformation 1. The one centered at ~0.6 is named conformation 2. Apparently, an extra conformation (conformation 3) is visited by the weakly restrained REMD simulation.

PAGE 192

192 B Figure 5 10 Continued A B Figure 5 11. A) Representative Structure of conformation 1. B) Representative Structure of conformation 2. The s tructure ensemble is generated from REMD simulations with stronger restraining potential. The carboxylic group of Glu35 in co nformation 2 is clearly pointing toward the amide group of Ala110. Deprotonated form of Glu35 tends to de crease the electrostatic energy. Furthermore, conformation 1 does not particularly favor the protonated Glu35. No significant stabilizing factor is fou nd for the protonated Glu35.

PAGE 193

193 Figure 5 12. Representative Structure of conformation 3 from cluster analysis. Glu35 is in the hydrophobic region, consisting of Gln57, Trp108 and Ala110 Conformation 1 and 2 in the weakly restrained simulation s are basica lly the same as those demonstrated in Figure 5 11 Another possible reason of underestimating p K a value of Glu35 is the use of implicit solvent in constant pH MD and REMD simulations. Imoto et al. suggested that Glu35 and Asp52 were coupled by two water mo lecules through hydrogen bonding. Glu35 carboxylic group acted as a proton donor in the hydrogen bonding. Thus the protonated form of Glu35 was stabilized and contributed to the elevated p K a value. Two water molecules are indeed found between Glu35 and Asp 52 in the 1AKI crystal structure and they are within hydrogen bonding distances to Glu35 and Asp52. If the hypothesis is true the use of implicit solvent breaks this hydrogen bonding network. Thus a stabilizing factor of protonated Glu35 is missing. A c on stant pH algorithm employing explicit solvent is needed to study this effect. 5. 5 .6 Correlation betwe en C onformation and P rotonation As described earlier, one advantage of utiliz ing constant pH methods is that the conformational sampling and the protona tion state sampling are directly coupled. In this

PAGE 194

194 work, s ide chain dihedral angles are chosen to study conformation protonation coupling. Asp119 1 and 2 dihedral angles at pH 3 will be shown as representatives. Two dimensional histograms between d ihedral angles and protonation state s are displayed in Figure 5 1 3 A two dimensional ( 2D ) histogram is generated by putting bins in dihedral ang le and protonation state space ( As explained in the second chapter, considering s yn and anti configuration of p rotons will generate five protonation state s in the case of ionizable aspartate in AMBER. They can be labeled as 0, 1, 2, 3 and 4 in which state 0 stands for deprotonated state and the rest represent protonated species ). A B Figure 5 1 3 A) Correlation between side chain dihedral angle 1 and protonation states. B) Correlation between side chain dihedral angle 2 and protonation states. Our 2D histograms can show the correlations between dihedral angle distribution and protonation state distribut ion. Two conformations are obtained in 1 space: conformation 1 having 1 angle around 60 while conformation 2 having 1 angle around 170 In Figure 5 1 3 A, we can clearly see that conformation 1 is coupled with protonated form and most stru ctures in conformation 2 are in deprotonated state. According to Figure 5 1 3 B, similar behavior can be seen in 2 space too. Most

PAGE 195

195 deprotonated Asp119 are found having 2 near 40 and 140 while configurations showing 75 and 100 of 2 are p rotonated. A closer look at the 1AKI crystal structure reveals that side chains of Asp119 and Arg125 are close to each other (the carboxylic group of Asp119 and the guanidinium group of Arg125 are in hydrogen bond distance). Since Arg125 has a positive ch arge on its guanidinium group, it stabilizes the deprotonated Asp119 when two side chains are close to each other We calculated p K a of Asp119 in 1AKI using H++ (H++ is a web based FDPB server developed by Alexy Onufriev s group at Virginia Tech The FDPB equation is solved on the basis of only one protein structure ). 257,258 The calculated p K a of Asp119 using FDPB method is 1.1, 0.7 and 1.3 when the internal dielectric constant is set to be 2, 4, and 6, respectivel y. All three p K a values are much lower than experimental p K a value of 3.2. This behavior agrees with what we jus t explained: Asp119 Arg125 side chain coupling stabilizes the deprotonated form of Asp119. The single structure FDPB based p K a calculations yiel d such low p K a values because only one conformation is visited by Asp119. Therefore Asp119 must sample other conformations in order to yield accurate p K a predictions. Time evolution of distance between Asp119 and Arg125 side chain is shown in Figure 5 1 4 to reflect that conformations other than crystal conformation are visited in our constant pH REMD runs. In Figure 5 1 4 we can clearly see that the close contact between Asp119 and Arg125 side chains can be broken during our simulations. Allowing side chai n s to move will result in a p K a value of 3.0 in our simulations. The comparison between constant pH and single structure FDPB algorithm clearly demonstrates the importance of conformational sampling in p K a calculations.

PAGE 196

196 Figure 5 1 4 Minimal distance bet ween Asp119 side chain carboxylic oxygen atoms (OD1 and OD2) and Arg125 guanidinium nitrogen atoms. Since guanidinium group has three nitrogen atoms, the minimal distance is the shortest distance between Asp119 OD1 (or OD2) and those three nitrogen atoms. Therefore another way to look at conformations is combining both Asp119 and Arg125. Now distance s between Asp119 CG and Arg125 CZ atoms are selected to distinguish different conformations. Figure 5 1 5 A shows the CG CZ distance probability distribution. Th e probability distributions also reveal that two conformations exist. One conformation is centered at CG CZ distance of 4.2 which represents the Asp119 and Arg125 coupling is on. The other conformation is actually representing all structures not belongin g to the previous conformation. Based on the distance between Asp119 CG and Arg125 CZ, we can say the coupling is off. The 2D histogram between distance and protonation state at pH 3 is shown in Figure 5 1 5 B. As can be seen in the 2D histogram contour plot short distance conformation is indeed in the deprotonated state. The p K a of shorter distance conformation is negative infinity. Although several snapshots possess both protonated state and short distance, 2D histogram doesn t reveal them as a stable conf ormation. So, the short distance conformation is purely coupled with deprotonated

PAGE 197

197 form. W e also obtain the p K a value of the longer distance conformation is 3.3 according to Hill s plot. A B Figure 5 1 5 A) Probability distribution of Asp119 CG to Arg125 C Z distances. The Asp119 CG to Arg125 CZ distance is used to distinguish conformations. B) Coupling between conformations and protonation states. 5. 5 .7 Conformation P rotonation E quilibrium M odel Due to the coupling between conformation and protonation equil ibrium, knowing the pH effect on conformational equilibrium will be interesting and important. Again, Asp119 is selected as the representative of our study. First, we want to show the derivation and the analytical form of K 12 as a function of pH values in a general case. From now on, we will label conformation 1 in deprotonated form as 1d. The, 1p, 2d and 2p stand for conformation 1 in protonated form, conformation 2 in deprotonated form and conformation 2 in protonated form, respectively. According to eq. 2 and 3, [1p] = [1d] 10 (pKa,1 pH) and [2p] = [2d] 10 (pKa,2 pH) We can substitute [1p] and [2p] in eq. 1 with [1d] and [2d] so the conformational equilibrium constant will have the form: ( 5 5 )

PAGE 198

198 In Eq. 5 5 [1d]/[2d] is the equilibrium constant of conformation 1 and 2 in deprotonated form and it is equal to the K 12 at high pH where both conformations are in the deprotonated form. So K 12 has the final anal ytical formula: ( 5 6 ) where K 12,h stands for K 12 at high pH. In our derivation, conformation 1 always has a smaller p K a value than conformation 2. So the de nominator always increases faster than the numerator when pH values going down. Considering that K 12,h is a constant, then K 12 is a sigmoid function. When pH is much greater than both p K a values, K 12 becomes K 12,h When pH is much smaller than both p K a val ues, K 12 reaches its lower bound. In the case of Asp119, the p K a value is minus infinity for conformation 1 when we use Asp119 CG and Arg125 CZ distance to distinguish two conformations. The ratios of K 12 and K 12,h from both analytical derivations and actu al simulations are plotted in Figure 5 1 6 Close agreement between K 12 /K 12,h plots generated from simulations and conformation protonation equilibrium model is seen in Figure 5 1 6 A. The agreement shows that the model could represent conformational equilibr ium in our constant pH REMD simulations. So, further use of that model is possible. Different p K a 1 and p K a 2 values are also used in order to test how two p K a values affect shape and inflection point of the sigmoid function. According to Figure 5 1 6 B, 5 1 6 C and 5 1 6 D if the difference between p K a 1 and p K a 2 is large (greater than 1 pH unit, approximately), the inflection point will appear at a pH value that equals to p K a 2 p K a 1 will affect the inflection point only when the difference is small. If we v iew a K 12 / K 12,h plot as a titration curve and the inflection point is the p K a value, then the K 12 / K 12,h plot yields a p K a value equals to p K a 2 values, which is 3.3 in the case of Asp119.

PAGE 199

199 A B C D Figure 5 1 6 K 12 / K 12,h as a function of pH and its depend ence on p K a,1 and p K a,2 Since the analytical form of K12, pKa,1 and pKa,2 are known and the sum of all fractions is unity, we can figure out fractions of each species. The analytical expressions of each species are: 1 = 12 12 + 1 1 1 + 10 1 (5 7) 1 = 12 12 + 1 10 1 1 + 10 1 (5 8) 2 = 1 12 + 1 1 1 + 10 2 (5 9) 2 = 1 12 + 1 10 2 1 + 10 2 (5 10)

PAGE 200

200 In o ur study of Asp119, p K a 1 is minus infinity which lead to [1p] is equal to zero. K 12 h is calculated as the average of all [1d]/[2d], which results in a K12,h of 1.6. Anot her K 12 h of 1.8, which is the K 12 at pH 5, is also tried. Then, fractions of each species from both analytical formula and actual simulations are shown in Figure 5 1 7 A B Figure 5 1 7 A) Fraction of each species as a function of pH (titration curves) o btained from equations based on conformation protonation equilibrium. The effect of 12 is tested. B) Comparison of titration curves derived from actual simulations and from the equilibrium equations Firstly, the fraction of 2 vs pH plots are almo st identical for two K 12,h values. This means that although the fractions of 1d and 2d are affected, the sum of 1d and 2d is

PAGE 201

2 01 not. Secondly, titration curves derived from analytical formula and actual simulations agree with each other very well. The agreeme nt among titration curves leads to similar p K a values. Both analytical titration curves using different K 12,h yield p K a values to be between 2.8 and 2.9 with negligible difference and the actual simulation titration curve gives a p K a value of 3.0. The anal ysis demonstrates that the equilibrium model could represent protonation equilibrium in our simulations. 5. 5 .8 Theoretical NMR T itration C urves Since the model can be used to simplify conformation protonation equilibrium in our constant pH REMD simulati ons, it is interesting to know whether it has some practical meanings. Reproducing experimental titration curves offers us a good objective. So, quantum mechanical calculations of NMR chemical shift ( ) are performed and their results are demonstrated and discussed in this part. As we have shown earlier, the dynamics of Asp119 generates two conformations indicating whether the Asp119 Arg125 electrostatic interaction is on or off Our NMR calculatio ns are based on the representative structures of each conformation, in proper protonation state. Due to the size of HEWL molecule, full quantum mechanical calculations are too expensive. So our first trial is using Asp119 dipeptide. Chemical shifts of the 1d, 2p and 2d are obtained and the fractions of each species at different pH can be calculated using eq. 7, 8 and 10. At each pH value, the theoretical chemical shift used to make a titration curve is calculated as follows: The chem ical shifts of 1d, 2d and 2p are 2.17, 2.48, 3.03 ppm respectively and the theoretical NMR titration curve is plotted in Figure 5 1 8 Compare theoretical titration curve with experimental one, the trend is correctly reproduced. At low pH, the theoretical a nd

PAGE 202

202 experimental chemical shifts agree well: 3.03 ppm versus 3.13 ppm. However, the difference between calculated and experimental high pH chemical shifts is greater than 0.6 ppm. This makes our calculated ( low pH high pH ) is 0.75 ppm while the experimental difference is only 0.21 ppm. Figure 5 1 8 Theoretical NMR chemical shifts as a function of pH. It s plotted to see if the conformation protonation equilibrium model can reproduce experimental titration curve based on NMR chemical shift measurements. The problem at high pH could be that a dipeptide cannot accurately represent Asp119 and its environment especially we have known there is a strong Asp119 Arg125 Coulomb interaction. So a set of QM/MM calcula tions was conducted using the entire HEWL molecule. The new chemical shifts are 2.58, 2.69 and 3.25 ppm for 1d, 2d and 2p. Comparing chemical shifts based on dipeptide and the entire molecule, differences of 2p and 2d are 0.22 ppm and 0.21 ppm. More import antly, both 2p chemical shifts are similar to experimental low pH (each one shows the difference near 0.1 ppm). The differences are small for 2p and 2d because there are no significant interactions for Asp119 in conformation 2. Unlike 2p or 2d, the chemic al shift of 1d is improved by 0.41

PAGE 203

203 ppm, telling that using the whole HEWL molecule does change 1d chemical shift a lot. After applying QM/MM method on the entire HEWL, the calculated ( low pH high pH ) becomes 0.63 ppm. The theoretical titration curve usin g QM/MM technique is also displayed in Figure 5 1 8 But no matter whether a dipeptide or the entire HEWL is used in NMR calculations, the p K a values are around 2.9 as expected. NMR titration curves yield the same p K a value as protonation (deprotonation) fr action vs pH does. The NMR titration curve calculations validate the use of conformation protonation equilibrium model and confirm its applicability. This model can be used to simplify a lot analysis involving further calculations. 5 6 Conclusions In this chapter, constant pH REMD simulations are performed to study the p K a of hen egg white lysozyme Three sets of constant pH REMD simulations have been performed: one set of simulations are conducted without restraining potential, while a harmonic potential i s put on the C atoms in the other two sets of REMD simulations. The force constants of the two harmonic potentials are 1, and 0.1 kcal/mol 2 respectively so that the effect of restraint strength on p K a prediction accuracy can be studied In our constant pH REMD simu lations, the unrestrained ones are foun d to be structurally instable. The C atom RMSD relative to crystal structure can be as high as 18 Due to the effect of restraining potential, HEWL in a restrained simulation is stable and similar to the crystal st ructure, according to the C atom RMSD values. In the restrained simulations with a force constant of 1 kcal/mol 2 accurate p K a predictions are achieved. The overall RMS errors between predicted and experimental p K a values are 0.87 and 0.84, dependent of p K a calculation methods. Unfortunately, those two

PAGE 204

204 RMS errors are not better than constant pH MD results obtained by Mongan et al The advantage of incorporating REMD method is not observed. However, a plot showing RMS error as a function of pH value yield s the smallest RMS error at pH 4.5, at which the crystal structure was resolved. Supported by the work of Mongan et al., we propose that the further away from crystal pH value, the stronger the biasing effect from the restraining potential. The biasing eff ect of conformational sampling will in turn affect p K a predictions. As expected, r educing the strength of harmonic potential result s in improved p K a predictions Likewise, the smallest p K a RMS error of 0.62 is obtained at pH 4.5 in the weakly restrained co nstant pH REMD simulations An RMS error of 0.62 is among the best p K a predictions generat ed from constant pH simulations. The p K a predictions of catalytic ionizable residues are of particular interest in the case of HEWL. Constant pH REMD simulations with stronger restraining potential failed to identify proton donor under the criteria proposed by Nielsen and McCammon in 2003. The weakly restrained constant pH REMD simulations are able to predicted proton donor and nucleophile, although the errors of predi cted p K a values of Glu35 and Asp52 are among the largest in our simulations. Hydrogen bonding is found to be responsible for the large error of Asp52. The hydrogen bonding of Asp52 with Asn44 and Asn59 over stabilize s the deprotonated form of Asp52, causin g the p K a value of Asp52 too small. For Glu35, conformational sampling also plays a role in underestimating its p K a value. However, other factors such as the use of implicit solvent may affect the p K a prediction of Glu35 too. In this work, we also focus ed on conformation and protonation equilibrium in constant pH REMD simulations. Correlatio ns between protonation and side chain

PAGE 205

205 dihedral angles 1 and 2 are studied. Other representation of conformations such as whether an important electrostatic interaction is formed or not is also adopted. In both cases, the coupling between conformation and protonation is observed. The effect of conform ation protonation coupling is partially reflected by the comparison between constant pH and single structure FDPB algorithms. Constant pH REMD yields better p K a values are seen because more conformation space is visited. The conformation protonation equil ibrium is further studied. Equilibrium constants between conformations are derived in order to show how pH affects conformation equilibrium. The conformational equilibrium constant is shown to be pH dependent and it s a sigmoid function of pH values. The s hape of the sigmoid al function is influenced by p K a values of each conformation. Titration curves which are the means to obtain p K a values are also derived from conformation protonation equilibrium. All analytical results are in good agreement with our sim ulations. In addition, we apply this conformation protonation equilibrium to reproduce experimental NMR titration curve by carrying out full QM and QM/MM calculations. First, we showed the importance of protein environment to chemical shift calculations. C alculation using isolated ionizable side chain can only qualitatively reproduce experimental NMR titration curve. The error mainly comes from the high pH end where an isolated side chain assumption fails. After adding protein environment, our theoretical t itration curve is greatly improved and good agreement to experimental result is obtained. Our conformation protonation equilibrium model can be used to represent our simulations and will simplify further calculations.

PAGE 206

206 LIST OF REFERENCES (1) Bettelheim, F. A. Introduction to general, organic, and biochemistry ; 8th ed.; Thomson Brooks/Cole: Belmont, CA, 2007. (2) Dey, A.; Verma, C. S.; Lane, D. P. Br. J. Cancer 2008 98 4 8. (3) Vogelstein, B.; Lane, D.; Levine, A. J. Nature 2000 408 307 310. (4) Mat thew, J. B.; Gurd, F. R. N.; Garciamoreno, E. B.; Flanagan, M. A.; March, K. L.; Shire, S. J. Crc Cr. Rev. Biochem. 1985 18 91 197. (5) Bierzynski, A.; Kim, P. S.; Baldwin, R. L. Proc. Natl. Acad. Sci. U. S. A. 1982 79 2470 2474. (6) Ferguson, N.; Sc hartau, P. J.; Sharpe, T. D.; Sato, S.; Fersht, A. R. J. Mol. Biol. 2004 344 295 301. (7) Shoemaker, K. R.; Kim, P. S.; Brems, D. N.; Marqusee, S.; York, E. J.; Chaiken, I. M.; Stewart, J. M.; Baldwin, R. L. Proc. Natl. Acad. Sci. U. S. A. 1985 82 234 9 2353. (8) Garcia Mira, M. M.; Sadqi, M.; Fischer, N.; Sanchez Ruiz, J. M.; Munoz, V. Science 2002 298 2191 2195. (9) Hunenberger, P. H.; Helms, V.; Narayana, N.; Taylor, S. S.; McCammon, J. A. Biochemistry 1999 38 2358 2366. (10) Demchuk, E.; Geni ck, U. K.; Woo, T. T.; Getzoff, E. D.; Bashford, D. Biochemistry 2000 39 1100 1113. (11) Dillet, V.; Dyson, H. J.; Bashford, D. Biochemistry 1998 37 10298 10306. (12) Harris, T. K.; Turner, G. J. IUBMB Life 2002 53 85 98. (13) Laidler, K. J. Chemi cal kinetics ; 3rd ed.; Harper & Row: New York, 1987. (14) Fersht, A. Structure and mechanism in protein science : a guide to enzyme catalysis and protein folding ; W.H. Freeman: New York, 1999. (15) Simonson, T.; Carlsson, J.; Case, D. A. J. Am. Chem. Soc 2004 126 4167 4180. (16) Lee, A. C.; Crippen, G. M. J. Chem. Inf. Model. 2009 49 2013 2033. (17) Langsetmo, K.; Fuchs, J. A.; Woodward, C. Biochemistry 1991 30 7603 7609.

PAGE 207

207 (18) Garcia Moreno, B.; Dwyer, J. J.; Gittis, A. G.; Lattman, E. E.; Spenc er, D. S.; Stites, W. E. Biophys. Chem. 1997 64 211 224. (19) Garcia Moreno, B.; Fitch, C.; Karp, D.; Gittis, A.; Lattman, E. Biophys. J. 2002 82 300a 300a. (20) Tanford, C. Adv. Protein Chem. 1962 17 69 165. (21) Dwyer, J. J.; Gittis, A. G.; Karp D. A.; Lattman, E. E.; Spencer, D. S.; Stites, W. E.; Garcia Moreno, B. Biophys. J. 2000 79 1610 1620. (22) Harms, M. J.; Castaneda, C. A.; Schlessman, J. L.; Sue, G. R.; Isom, D. G.; Cannon, B. R.; Garcia Moreno, B. J. Mol. Biol. 2009 389 34 47. ( 23) Mehler, E. L.; Fuxreiter, M.; Simon, I.; Garcia Moreno, E. B. Proteins: Struct., Funct., Genet. 2002 48 283 292. (24) Anderson, D. E.; Becktel, W. J.; Dahlquist, F. W. Biochemistry 1990 29 2403 2408. (25) Dyson, H. J.; Jeng, M. F.; Tennant, L. L. ; Slaby, I.; Lindell, M.; Cui, D. S.; Kuprin, S.; Holmgren, A. Biochemistry 1997 36 2622 2636. (26) Bashford, D.; Case, D. A.; Dalvit, C.; Tennant, L.; Wright, P. E. Biochemistry 1993 32 8045 8056. (27) Wang, Y. X.; Freedberg, D. I.; Yamazaki, T.; Wi ngfield, P. T.; Stahl, S. J.; Kaufman, J. D.; Kiso, Y.; Torchia, D. A. Biochemistry 1996 35 9945 9950. (28) Dyson, H. J.; Tennant, L. L.; Holmgren, A. Biochemistry 1991 30 4262 4268. (29) Jeng, M. F.; Dyson, H. J. Biochemistry 1996 35 1 6. (30) Wi lson, N. A.; Barbar, E.; Fuchs, J. A.; Woodward, C. Biochemistry 1995 34 8931 8939. (31) Callis, P. R. Methods Enzymol. 1997 278 113 150. (32) Callis, P. R.; Burgess, B. K. J. Phys. Chem. B 1997 101 9429 9432. (33) Vivian, J. T.; Callis, P. R. Bio phys. J. 2001 80 2093 2109. (34) Inoue, M.; Yamada, H.; Yasukochi, T.; Kuroki, R.; Miki, T.; Horiuchi, T.; Imoto, T. Biochemistry 1992 31 5545 5553.

PAGE 208

208 (35) Kajander, T.; Kahn, P. C.; Passila, S. H.; Cohen, D. C.; Lehtio, L.; Adolfsen, W.; Warwicker, J. ; Schell, U.; Goldman, A. Structure 2000 8 1203 1214. (36) Bartlett, G. J.; Porter, C. T.; Borkakoti, N.; Thornton, J. M. J. Mol. Biol. 2002 324 105 121. (37) Jiang, Y. X.; Ruta, V.; Chen, J. Y.; Lee, A.; MacKinnon, R. Nature 2003 423 42 48. (38) Luecke, H.; Richter, H. T.; Lanyi, J. K. Science 1998 280 1934 1937. (39) Bashford, D.; Case, D. A. Annu. Rev. Phys. Chem. 2000 51 129 152. (40) Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T. J. Am. Chem. Soc. 1990 112 6127 6129. (41) Cramer, C. J. Essentials of computational chemistry : theories and models ; J. Wiley: West Sussex, England ; New York, 2002. (42) Raha, K.; Merz, K. M. In Annual reports in computational chemistry ; Spellmeyer, D. C., Ed.; Elsevier: Amsterdam ; Boston, 2005 ; Vol. 1, p p113 130. (43) Dixon, S. L.; Merz, K. M. J. Chem. Phys. 1996 104 6643 6649. (44) Vreven, T.; Morokuma, K. In Annual Reports in Computational Chemistry ; Spellmeyer, D., Ed.; Elsevier: Amsterdam ; Boston, 2006; Vol. 2, p p35 51. (45) Field, M. J.; Bash, P. A.; Karplus, M. J. Comput. Chem. 1990 11 700 733. (46) Singh, U. C.; Kollman, P. A. J. Comput. Chem. 1986 7 718 730. (47) Warshel, A.; Levitt, M. J. Mol. Biol. 1976 103 227 249. (48) Kamerlin, S. C. L.; Haranczyk, M.; Warshel, A. J Phys. Chem. B 2009 113 1253 1272. (49) Monard, G.; Merz, K. M. Acc. Chem. Res. 1999 32 904 911. (50) Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. J. Chem. Phys. 1953 21 1087 1092. (51) Wolynes, P. G.; Onuchic, J. N.; Thirumalai, D. Science 1995 267 1619 1620. (52) Itoh, S. G.; Okumura, H.; Okamoto, Y. Mol. Simul. 2007 33 47 56. (53) Mitsutake, A.; Sugita, Y.; Okamoto, Y. Biopolymers 2001 60 96 123.

PAGE 209

209 (54) Berg, B. A.; Neuhaus, T. Phys. Lett. B 1991 267 2 49 253. (55) Berg, B. A.; Neuhaus, T. Phys. Rev. Lett. 1992 68 9 12. (56) Lyubartsev, A. P.; Martsinovski, A. A.; Shevkunov, S. V.; Vorontsovvelyaminov, P. N. J. Chem. Phys. 1992 96 1776 1783. (57) Marinari, E.; Parisi, G. Europhys. Lett. 1992 19 451 458. (58) Hansmann, U. H. E. Chem. Phys. Lett. 1997 281 140 150. (59) Swendsen, R. H.; Wang, J. S. Phys. Rev. Lett. 1986 57 2607 2609. (60) Earl, D. J.; Deem, M. W. Phys. Chem. Chem. Phys. 2005 7 3910 3916. (61) Fukunishi, H.; Watanabe, O.; T akada, S. J. Chem. Phys. 2002 116 9058 9067. (62) Sugita, Y.; Okamoto, Y. Chem. Phys. Lett. 1999 314 141 151. (63) Tanford, C.; Kirkwood, J. G. J. Am. Chem. Soc. 1957 79 5333 5339. (64) Tanford, C.; Roxby, R. Biochemistry 1972 11 2192 2198. (65 ) Bashford, D.; Karplus, M. Biochemistry 1990 29 10219 10225. (66) Gilson, M. K. Proteins: Struct., Funct., Genet. 1993 15 266 282. (67) Antosiewicz, J.; Mccammon, J. A.; Gilson, M. K. J. Mol. Biol. 1994 238 415 436. (68) Antosiewicz, J.; McCammon J. A.; Gilson, M. K. Biochemistry 1996 35 7819 7833. (69) Bashford, D.; Karplus, M. J. Phys. Chem. 1991 95 9556 9561. (70) Yang, A. S.; Gunner, M. R.; Sampogna, R.; Sharp, K.; Honig, B. Proteins: Struct., Funct., Genet. 1993 15 252 265. (71) Yan g, A. S.; Honig, B. J. Mol. Biol. 1993 231 459 474. (72) Madura, J. D.; Briggs, J. M.; Wade, R. C.; Davis, M. E.; Luty, B. A.; Ilin, A.; Antosiewicz, J.; Gilson, M. K.; Bagheri, B.; Scott, L. R.; Mccammon, J. A. Comput. Phys. Commun. 1995 91 57 95. ( 73) Nicholls, A.; Honig, B. J. Comput. Chem. 1991 12 435 445.

PAGE 210

210 (74) Beroza, P.; Fredkin, D. R.; Okamura, M. Y.; Feher, G. Proc. Natl. Acad. Sci. U. S. A. 1991 88 5804 5808. (75) Bone, S.; Pethig, R. J. Mol. Biol. 1985 181 323 326. (76) Harvey, S. C .; Hoekstra, P. J. Phys. Chem. 1972 76 2987 &. (77) Garcia Moreno, B.; Fitch, C. A. Methods Enzymol. 2004 380 20 51. (78) Simonson, T.; Brooks, C. L. J. Am. Chem. Soc. 1996 118 8452 8458. (79) Mehler, E. L.; Eichele, G. Biochemistry 1984 23 3887 3891. (80) Mehler, E. L.; Guarnieri, F. Biophys. J. 1999 77 3 22. (81) Alexov, E. G.; Gunner, M. R. Biophys. J. 1997 72 2075 2093. (82) Barth, P.; Alber, T.; Harbury, P. B. Proc. Natl. Acad. Sci. U. S. A. 2007 104 4898 4903. (83) Georgescu, R. E .; Alexov, E. G.; Gunner, M. R. Biophys. J. 2002 83 1731 1748. (84) Gunner, M. R.; Alexov, E.; Torres, E.; Lipovaca, S. J. Biol. Inorg. Chem. 1997 2 126 134. (85) Livesay, D. R.; Jacobs, D. J.; Kanjanapangka, J.; Chea, E.; Cortez, H.; Garcia, J.; Kid d, P.; Marquez, M. P.; Pande, S.; Yang, D. J. Chem. Theory Comput. 2006 2 927 938. (86) You, T. J.; Bashford, D. Biophys. J. 1995 69 1721 1733. (87) Kollman, P. Chem. Rev. 1993 93 2395 2417. (88) Straatsma, T. P.; Mccammon, J. A. Annu. Rev. Phys. Chem. 1992 43 407 435. (89) Warshel, A.; Sussman, F.; King, G. Biochemistry 1986 25 8368 8372. (90) Russell, S. T.; Warshel, A. J. Mol. Biol. 1985 185 389 404. (91) Jorgensen, W. L.; Briggs, J. M. J. Am. Chem. Soc. 1989 111 4190 4197. (92) Merz K. M. J. Am. Chem. Soc. 1991 113 3572 3575. (93) Hu, H.; Yang, W. T. Annu. Rev. Phys. Chem. 2008 59 573 601. (94) Li, G. H.; Zhang, X. D.; Cui, Q. J. Phys. Chem. B 2003 107 8643 8653.

PAGE 211

211 (95) Riccardi, D.; Schaefer, P.; Cui, Q. J. Phys. Chem. B 2005 109 17715 17733. (96) Bas, D. C.; Rogers, D. M.; Jensen, J. H. Proteins: Struct., Funct., Bioinf. 2008 73 765 783. (97) Jensen, J. H.; Li, H.; Robertson, A. D.; Molina, P. A. J. Phys. Chem. A 2005 109 6634 6643. (98) Li, H.; Hains, A. W.; Everts, J. E.; Robertson, A. D.; Jensen, J. H. J. Phys. Chem. B 2002 106 3486 3494. (99) Li, H.; Robertson, A. D.; Jensen, J. H. Proteins: Struct., Funct., Bioinf. 2004 55 689 704. (100) Li, H.; Robertson, A. D.; Jensen, J. H. Proteins: Struct., Funct., Bio inf. 2005 61 704 721. (101) Minikis, R. M.; Kairys, V.; Jensen, J. H. J. Phys. Chem. A 2001 105 3829 3837. (102) Day, P. N.; Jensen, J. H.; Gordon, M. S.; Webb, S. P.; Stevens, W. J.; Krauss, M.; Garmer, D.; Basch, H.; Cohen, D. J. Chem. Phys. 1996 105 1968 1986. (103) Gordon, M. S.; Freitag, M. A.; Bandyopadhyay, P.; Jensen, J. H.; Kairys, V.; Stevens, W. J. J. Phys. Chem. A 2001 105 293 307. (104) Mongan, J.; Case, D. A. Curr. Opin. Struct. Biol. 2005 15 157 163. (105) Baptista, A. M. J. Ch em. Phys. 2002 116 7766 7768. (106) Baptista, A. M.; Martel, P. J.; Petersen, S. B. Proteins: Struct., Funct., Genet. 1997 27 523 544. (107) Borjesson, U.; Hunenberger, P. H. J. Chem. Phys. 2001 114 9706 9719. (108) Borjesson, U.; Hunenberger, P. H. J. Phys. Chem. B 2004 108 13551 13559. (109) Khandogin, J.; Brooks, C. L. Biophys. J. 2005 89 141 157. (110) Khandogin, J.; Brooks, C. L. Biochemistry 2006 45 9363 9373. (111) Khandogin, J.; Brooks, C. L. Proc. Natl. Acad. Sci. U. S. A. 2007 1 04 16880 16885. (112) Khandogin, J.; Chen, J. H.; Brooks, C. L. Proc. Natl. Acad. Sci. U. S. A. 2006 103 18546 18550.

PAGE 212

212 (113) Khandogin, J.; Raleigh, D. P.; Brooks, C. L. J. Am. Chem. Soc. 2007 129 3056 3057. (114) Lee, M. S.; Salsbury, F. R.; Brooks, C. L. Proteins: Struct., Funct., Bioinf. 2004 56 738 752. (115) Mertz, J. E.; Pettitt, B. M. Int. J. Supercomp. Appl. 1994 8 47 53. (116) Kong, X. J.; Brooks, C. L. J. Chem. Phys. 1996 105 2414 2423. (117) Chen, J. H.; Brooks, C. L.; Khandogin, J Curr. Opin. Struct. Biol. 2008 18 140 148. (118) Baptista, A. M.; Teixeira, V. H.; Soares, C. M. J. Chem. Phys. 2002 117 4184 4200. (119) Dlugosz, M.; Antosiewicz, J. M. Chem. Phys. 2004 302 161 170. (120) Dlugosz, M.; Antosiewicz, J. M. J. Phys Chem. B 2005 109 13777 13784. (121) Dlugosz, M.; Antosiewicz, J. M. J. Phys.: Condens. Matter 2005 17 S1607 S1616. (122) Dlugosz, M.; Antosiewicz, J. M.; Robertson, A. D. Phys. Rev. E 2004 69 021915. (123) Machuqueiro, M.; Baptista, A. M. J. Phy s. Chem. B 2006 110 2927 2933. (124) Machuqueiro, M.; Baptista, A. M. Biophys. J. 2007 92 1836 1845. (125) Machuqueiro, M.; Baptista, A. M. Proteins: Struct., Funct., Bioinf. 2008 72 289 298. (126) Machuqueiro, M.; Baptista, A. M. J. Am. Chem. Soc 2009 131 12586 12594. (127) Mongan, J.; Case, D. A.; McCammon, J. A. J. Comput. Chem. 2004 25 2038 2048. (128) Walczak, A. M.; Antosiewicz, J. M. Phys. Rev. E 2002 66 051911. (129) Williams, S. L.; de Oliveira, C. A. F.; McCammon, J. A. J. Chem. Theory Comput. 2010 6 560 568. (130) Burgi, R.; Kollman, P. A.; van Gunsteren, W. F. Proteins: Struct., Funct., Genet. 2002 47 469 480.

PAGE 213

213 (131) Meng, Y. L.; Roitberg, A. E. J. Chem. Theory Comput. 2010 6 1401 1412. (132) Schaefer, M.; Karplus, M. J Phys. Chem. 1996 100 1578 1599. (133) Hamelberg, D.; Mongan, J.; McCammon, J. A. J. Chem. Phys. 2004 120 11919 11929. (134) Hamelberg, D.; Mongan, J.; McCammon, J. A. Protein Sci. 2004 13 76 76. (135) Ponder, J. W.; Case, D. A. Adv. Protein Chem 2003 66 27 85. (136) Allinger, N. L.; Yuh, Y. H.; Lii, J. H. J. Am. Chem. Soc. 1989 111 8551 8566. (137) Leach, A. R. Molecular modelling : principles and applications ; 2nd ed.; Prentice Hall: Harlow, England ; New York, 2001. (138) MacKerell, A. D. In Annual reports in computational chemistry Spellmeyer, D. C., Ed.; Elsevier: Amsterdam ; Boston, 2005; Vol. 1, p p91~102. (139) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Proteins: Struct., Funct., Bioinf. 2006 65 712 725. (140) MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz Kuczera, J.; Yin, D.; Karplus, M. J. Phys. Chem. B 1998 102 3586 3616. (141) Daura, X.; Mark, A. E.; van Gunsteren, W. F. J. Comput Chem. 1998 19 535 547. (142) Jorgensen, W. L.; Tirado Rives, J. J. Am. Chem. Soc. 1988 110 1657 1666. (143) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J. Am. Chem. Soc. 1995 117 5179 5197. (144) Verlet, L. Phys. Rev. 1967 159 98. (145) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. J. Comput. Phys. 1977 23 327 341. (146) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; Din ola, A.; Haak, J. R. J. Chem. Phys. 1984 81 3684 3690.

PAGE 214

214 (147) McQuarrie, D. A. Statistical thermodynamics ; University Science Books: Mill Valley, Calif., 1973. (148) Nose, S. J. Chem. Phys. 1984 81 511 519. (149) Berendsen, H. J. C.; Grigera, J. R.; S traatsma, T. P. J. Phys. Chem. 1987 91 6269 6271. (150) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. J. Chem. Phys. 1983 79 926 935. (151) Mahoney, M. W.; Jorgensen, W. L. J. Chem. Phys. 2000 112 8910 8922. (152) Allen, M. P.; Tildesley, D. J. Computer simulation of liquids ; Clarendon Press ; Oxford University Press: Oxford [England] New York, 1987. (153) Ewald, P. P. Annalen Der Physik 1921 64 253 287. (154) Darden, T.; York, D.; Pedersen, L. J. Chem. Phys. 19 93 98 10089 10092. (155) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Chem. Phys. Lett. 1995 246 122 129. (156) Kirkwood, J. G. J. Chem. Phys. 1935 3 300 313. (157) Straatsma, T. P.; Mccammon, J. A. J. Chem. Phys. 1991 95 1175 1188. (158) Zwan zig, R. W. J. Chem. Phys. 1954 22 1420 1426. (159) Bennett, C. H. J. Comput. Phys. 1976 22 245 268. (160) Shirts, M. R.; Chodera, J. D. J. Chem. Phys. 2008 129 124105. (161) Jorgensen, W. L.; Ravimohan, C. J. Chem. Phys. 1985 83 3050 3054. (162 ) Hansmann, U. H. E.; Okamoto, Y. Nucl. Phys. B 1995 914 916. (163) Wang, F. G.; Landau, D. P. Phys. Rev. E 2001 64 056101. (164) Wang, F. G.; Landau, D. P. Phys. Rev. Lett. 2001 86 2050 2053. (165) Falcioni, M.; Deem, M. W. J. Chem. Phys. 1999 11 0 1754 1766. (166) Kofke, D. A. J. Chem. Phys. 2002 117 6911 6914.

PAGE 215

215 (167) Liu, P.; Kim, B.; Friesner, R. A.; Berne, B. J. Proc. Natl. Acad. Sci. U. S. A. 2005 102 13749 13754. (168) Li, H. Z.; Li, G. H.; Berg, B. A.; Yang, W. J. Chem. Phys. 2006 12 5 144902. (169) Okur, A.; Roe, D. R.; Cui, G. L.; Hornak, V.; Simmerling, C. J. Chem. Theory Comput. 2007 3 557 568. (170) Roitberg, A. E.; Okur, A.; Simmerling, C. J. Phys. Chem. B 2007 111 2415 2418. (171) Rathore, N.; Chopra, M.; de Pablo, J. J. J. Chem. Phys. 2005 122 024111. (172) Sanbonmatsu, K. Y.; Garcia, A. E. Proteins: Struct., Funct., Genet. 2002 46 225 234. (173) Kone, A.; Kofke, D. A. J. Chem. Phys. 2005 122 206101. (174) Trebst, S.; Troyer, M.; Hansmann, U. H. E. J. Chem. Phys 2006 124 174903. (175) Nadler, W.; Hansmann, U. H. E. Phys. Rev. E 2007 76 065701. (176) Nadler, W.; Hansmann, U. H. E. Phys. Rev. E 2007 75 026109. (177) Nadler, W.; Hansmann, U. H. E. J. Phys. Chem. B 2008 112 10386 10387. (178) Opps, S. B. ; Schofield, J. Phys. Rev. E 2001 6305 056701. (179) Zhang, W.; Wu, C.; Duan, Y. J. Chem. Phys. 2005 123 154105. (180) Sindhikara, D.; Meng, Y. L.; Roitberg, A. E. J. Chem. Phys. 2008 128 024103. (181) Abraham, M. J.; Gready, J. E. J. Chem. Theory Comput. 2008 4 1119 1128. (182) Zhang, C.; Ma, J. P. J. Chem. Phys. 2008 129 134112. (183) Rosta, E.; Buchete, N. V.; Hummer, G. J. Chem. Theory Comput. 2009 5 1393 1399. (184) Zhou, R. H.; Berne, B. J.; Germain, R. Proc. Natl. Acad. Sci. U. S. A 2001 98 14931 14936. (185) Lyman, E.; Ytreberg, F. M.; Zuckerman, D. M. Phys. Rev. Lett. 2006 96 028105. (186) Liu, P.; Shi, Q.; Lyman, E.; Voth, G. A. J. Chem. Phys. 2008 129 114103.

PAGE 216

216 (187) Liu, P.; Voth, G. A. J. Chem. Phys. 2007 126 045106. (188) Okur, A.; Wickstrom, L.; Layten, M.; Geney, R.; Song, K.; Hornak, V.; Simmerling, C. J. Chem. Theory Comput. 2006 2 420 433. (189) Ballard, A. J.; Jarzynski, C. Proc. Natl. Acad. Sci. U. S. A. 2009 106 12224 12229. (190) Kamberaj, H.; van der Vaart, A. J. Chem. Phys. 2009 130 074906. (191) Nguyen, P. H. J. Chem. Phys. 2010 132 144109. (192) Sugita, Y.; Okamoto, Y. Chem. Phys. Lett. 2000 329 261 270. (193) Mitsutake, A.; Okamoto, Y. Chem. Phys. Lett. 2000 33 2 131 138. (194) Mitsutake, A.; Okamoto, Y. J. Chem. Phys. 2004 121 2491 2504. (195) Andrec, M.; Felts, A. K.; Gallicchio, E.; Levy, R. M. Proc. Natl. Acad. Sci. U. S. A. 2005 102 6801 6806. (196) van der Spoel, D.; Seibert, M. M. Phys. Rev. Lett. 2006 96 238102. (197) Yang, S. C.; Onuchic, J. N.; Garcia, A. E.; Levine, H. J. Mol. Biol. 2007 372 756 763. (198) Buchete, N. V.; Hummer, G. Phys. Rev. E 2008 77 030902. (199) Case, D. A.; Darden, T. A.; T.E. Cheatham, I.; Simmerling, C. L.; Wang J.; Duke, R. E.; Luo, R.; Crowley, M.; Walker, R. C.; Zhang, W.; Merz, K. M.; B.Wang; Hayik, S.; Roitberg, A.; Seabra, G.; Kolossvry, I.; K.F.Wong; Paesani, F.; Vanicek, J.; X.Wu; Brozell, S. R.; Steinbrecher, T.; Gohlke, H.; Yang, L.; Tan, C.; Mongan, J.; Hornak, V.; Cui, G.; Mathews, D. H.; Seetin, M. G.; Sagui, C.; Babin, V.; Kollman, P. A.; University of California, San Francisco: San Francisco, 2008. (200) Onufriev, A.; Bashford, D.; Case, D. A. J. Phys. Chem. B 2000 104 3712 3720. (201) Elber, R.; Roitberg, A.; Simmerling, C.; Goldstein, R.; Li, H. Y.; Verkhivker, G.; Keasar, C.; Zhang, J.; Ulitsky, A. Comput. Phys. Commun. 1995 91 159 189. (202) Dill, K. A.; Ozkan, S. B.; Shell, M. S.; Weikl, T. R. Annu. Rev. Biophys. 2008 37 289 316. (20 3) Dobson, C. M. Nature 2003 426 884 890.

PAGE 217

217 (204) Anfinsen, C. B.; Haber, E.; Sela, M.; White, F. H. Proc. Natl. Acad. Sci. U. S. A. 1961 47 1309 1314. (205) Mayor, U.; Johnson, C. M.; Daggett, V.; Fersht, A. R. Proc. Natl. Acad. Sci. U. S. A. 2000 97 13518 13522. (206) Snow, C. D.; Nguyen, N.; Pande, V. S.; Gruebele, M. Nature 2002 420 102 106. (207) Brooks, C. L. Acc. Chem. Res. 2002 35 447 454. (208) Levinthal, C. J. Chim. Phys. Phys. Chim. Biol. 1968 65 44 45. (209) Gruebele, M. Annu. Re v. Phys. Chem. 1999 50 485 516. (210) Kubelka, J.; Hofrichter, J.; Eaton, W. A. Curr. Opin. Struct. Biol. 2004 14 76 88. (211) Snow, C. D.; Sorin, E. J.; Rhee, Y. M.; Pande, V. S. Annu. Rev. Biophys. Biomol. Struct. 2005 34 43 69. (212) Snow, C. D .; Qiu, L. L.; Du, D. G.; Gai, F.; Hagen, S. J.; Pande, V. S. Proc. Natl. Acad. Sci. U. S. A. 2004 101 4077 4082. (213) Zagrovic, B.; Sorin, E. J.; Pande, V. J. Mol. Biol. 2001 313 151 169. (214) Jayachandran, G.; Vishal, V.; Pande, V. S. J. Chem. Ph ys. 2006 124 054118. (215) Singhal, N.; Snow, C. D.; Pande, V. S. J. Chem. Phys. 2004 121 415 425. (216) Swope, W. C.; Pitera, J. W.; Suits, F. J. Phys. Chem. B 2004 108 6571 6581. (217) Swope, W. C.; Pitera, J. W.; Suits, F.; Pitman, M.; Elefther iou, M.; Fitch, B. G.; Germain, R. S.; Rayshubski, A.; Ward, T. J. C.; Zhestkov, Y.; Zhou, R. J. Phys. Chem. B 2004 108 6582 6594. (218) Daggett, V.; Levitt, M. J. Mol. Biol. 1993 232 600 619. (219) Daggett, V.; Levitt, M. J. Cell. Biochem. 1993 223 223. (220) Daggett, V.; Levitt, M. Curr. Opin. Struct. Biol. 1994 4 291 295. (221) Dadlez, M.; Bierzynski, A.; Godzik, A.; Sobocinska, M.; Kupryszewski, G. Biophys. Chem. 1988 31 175 181. (222) Baldwin, R. L. Biophys. Chem. 1995 55 127 135. (223 ) Brown, J. E.; Klee, W. A. Biochemistry 1971 10 470 476.

PAGE 218

218 (224) Fairman, R.; Shoemaker, K. R.; York, E. J.; Stewart, J. M.; Baldwin, R. L. Biophys. Chem. 1990 37 107 119. (225) Osterhout, J. J.; Baldwin, R. L.; York, E. J.; Stewart, J. M.; Dyson, H. J .; Wright, P. E. Biochemistry 1989 28 7059 7064. (226) Shoemaker, K. R.; Fairman, R.; Schultz, D. A.; Robertson, A. D.; York, E. J.; Stewart, J. M.; Baldwin, R. L. Biopolymers 1990 29 1 11. (227) Felts, A. K.; Harano, Y.; Gallicchio, E.; Levy, R. M. Proteins: Struct., Funct., Bioinf. 2004 56 310 321. (228) Hansmann, U. H. E.; Okamoto, Y. J. Phys. Chem. B 1998 102 653 656. (229) Hansmann, U. H. E.; Okamoto, Y. J. Phys. Chem. B 1999 103 1595 1604. (230) La Penna, G.; Mitsutake, A.; Masuya, M.; Okamoto, Y. Chem. Phys. Lett. 2003 380 609 619. (231) Ohkubo, Y. Z.; Brooks, C. L. Proc. Natl. Acad. Sci. U. S. A. 2003 100 13916 13921. (232) Schaefer, M.; Bartels, C.; Karplus, M. J. Mol. Biol. 1998 284 835 848. (233) Sugita, Y.; Okamoto, Y. Bio phys. J. 2005 88 3180 3190. (234) Yoda, T.; Sugita, Y.; Okamoto, Y. Chem. Phys. 2004 307 269 283. (235) Yoda, T.; Sugita, Y.; Okamoto, Y. Chem. Phys. Lett. 2004 386 460 467. (236) Kabsch, W.; Sander, C. Biopolymers 1983 22 2577 2637. (237) John son, W. C. Annu. Rev. Biophys. Biophys. Chem. 1988 17 145 166. (238) Sreerama, N.; Woody, R. W. Methods Enzymol. 2004 383 318 351. (239) Gratzer, W. B.; Doty, P.; Holzwarth, G. M. Proc. Natl. Acad. Sci. U. S. A. 1961 47 1785 1791. (240) Manning, M C.; Illangasekare, M.; Woody, R. W. Biophys. Chem. 1988 31 77 86. (241) Bayley, P. M.; Nielsen, E. B.; Schellma.Ja J. Phys. Chem. 1969 73 228 243. (242) Clark, L. B. J. Am. Chem. Soc. 1995 117 7974 7986.

PAGE 219

219 (243) Hirst, J. D. J. Chem. Phys. 1998 1 09 782 788. (244) Woody, R. W.; Sreerama, N. J. Chem. Phys. 1999 111 2844 2845. (245) Goux, W. J.; Hooker, T. M. J. Am. Chem. Soc. 1980 102 7080 7087. (246) Ridley, J.; Zerner, M. Theor. Chim. Acta 1973 32 111 134. (247) Wlodawer, A.; Svensson, L. A.; Sjolin, L.; Gilliland, G. L. Biochemistry 1988 27 2705 2717. (248) Blake, C. C. F.; Koenig, D. F.; Mair, G. A.; North, A. C. T.; Phillips, D. C.; Sarma, V. R. Nature 1965 206 757 761. (249) Vocadlo, D. J.; Davies, G. J.; Laine, R.; Withers, S. G. Nature 2001 412 835 838. (250) Nielsen, J. E.; McCammon, J. A. Protein Sci. 2003 12 313 326. (251) Bartik, K.; Redfield, C.; Dobson, C. M. Biophys. J. 1994 66 1180 1184. (252) Tironi, I. G.; Sperb, R.; Smith, P. E.; Vangunsteren, W. F. J. Chem Phys. 1995 102 5451 5459. (253) Case, D. A.; Darden, T. A.; T.E. Cheatham, I.; Simmerling, C. L.; Wang, J.; Duke, R. E.; R.Luo; Merz, K. M.; Pearlman, D. A.; Crowley, M.; Walker, R. C.; Zhang, W.; Wang, B.; S.Hayik; Roitberg, A.; Seabra, G.; Wong, K. F.; Paesani, F.; Wu, X.; Brozell, S.; Tsui, V.; H.Gohlke; Yang, L.; Tan, C.; Mongan, J.; Hornak, V.; Cui, G.; Beroza, P.; Mathew, D. H.; C.Schafmeister; Ross, W. S.; Kollman, P. A.; University of California, San Francisco: San Francisco 2006. (254) Frisc h, M. J. T., G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Montgomery, Jr., J. A.; Vreven, T.; Kudin, K. N.; Burant, J. C.; Millam, J. M.; Iyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N.; Pe tersson, G. A.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B.; Bakken, V.; Adamo, C.; Jaramillo, J.; Gompe rts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.; Morokuma, K.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Zakrzewski, V. G.; Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas, O.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Ortiz, J. V.; Cui, Q.; Baboul, A. G.; Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith, T.; Al Laham, M. A.; Peng, C. Y .; Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W.; Gonzalez, C.; and Pople, J. A.; Gaussian, Inc.: Wallingford CT, 2004.

PAGE 220

220 (255) Ditchfie.R Mol. Phys. 1974 27 789 807. (256) He, X.; Wang, B.; Merz, K. M. J. Phys. Chem B 2009 113 10380 10388. (257) Anandakrishnan, R.; Onufriev, A. J. Comput. Biol. 2008 15 165 184. (258) Gordon, J. C.; Myers, J. B.; Folta, T.; Shoja, V.; Heath, L. S.; Onufriev, A. Nucleic Acids Res. 2005 33 368 371.

PAGE 221

221 BIOGRAPHICAL SKE TCH the Dalian University of Technology at Dalian, Liaoning Province and studied chemical his college, Yilin has developed an interest in the computational chemistry, especially the In August 2004, Yilin came to the University of Florida and began his life as a graduat e student. His original plan was to keep studying the electronic structure theory. However, he was impressed by the research of Dr. Adrian E. Roitberg. Later, he joined the Roitberg group and started his career in the molecular modeling.