Structural Characterization of Macromolecular Assemblages

Permanent Link: http://ufdc.ufl.edu/UFE0013110/00001

Material Information

Title: Structural Characterization of Macromolecular Assemblages
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0013110:00001

Permanent Link: http://ufdc.ufl.edu/UFE0013110/00001

Material Information

Title: Structural Characterization of Macromolecular Assemblages
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0013110:00001

Full Text







Copyright 2005


Robbie J. Reutzel

To Jeanette Chun and Anne Lagos.


My deepest thanks and respect go to Dr. Robert McKenna. Without his guidance,

insight and patience throughout my young career in science, this dissertation would not

have been possible. Dr. McKenna has not only taught me much technical skill, but he

has also been an excellent role model to me. I would also like to extend my deepest

gratitude to Dr. Mavis McKenna, whom I've worked with and learned a great deal from

on many aspects of my project during my time in the McKenna lab. I would also like to

thank the members of my supervisory committee, Drs. Ben Dunn, Brian Cain, Stephen

Hagen, Michael Bubb and Nicole Horenstein for useful advice and assistance with my

scientific work.

I would never have been able to make it where I am today without the advice and

example of my parents Robert and Connie, my brother Daniel and my sister Meghan. No

matter how much stress I am under, their calming voices have always lended the support

and love needed to get through the most difficult times. All I can say is thank you from

the bottom of my heart.

I would like to thank the past and present members of the McKenna lab especially

Drs. Lakshmanan Govindasamy, Hyun-Joo Nam, and David Duda. Nathan and Holly

Bryant have always been there for me, and I thank them for their support and friendship.

I would like to thank Zoe Fisher for her friendship and for her technical expertise in the

laboratory. I would also like to thank Craig Yoshioka for being a roommate as well as a

workmate and a good friend. I would like to thank the students I have mentored,

especially Rose Mikulski and Sarah Shah, for teaching me as much as I taught them. I

would also like to thank everyone in the Biochemistry Department, especially Pat Jones

and Brad Moore for keeping my schedule in order for the last five years.


A C K N O W L E D G M E N T S ................................................................................................. iv

L IST O F T A B L E S ........................................................................... ........... ix

LIST OF FIGURES ............................... ... ...... ... ................. .x

A B S T R A C T .......................................... ..................................................x iii


1 IN TR OD U CTION ............................................... .. ......................... ..

M olecular B asis of D disease ................................................... ........ ............... 1
Survey of M acrom olecular Interactions .......... ............................ ........... .............. 7
Large, Symmetric Oligomers Built From Identical Subunits.............................. ....9
Genom e Coding Efficiency ................. ................. ................................... 10
E rro r C o n tro l ................................................................................................. 12
Regulation ............... ...............................12
Structure of V viruses .............. ...... ............ ... ...... .. .... .. ........... 13
Quasi-equivalence ...................................... ... ............ .. ............. 15
Spontaneous Biological Self-Assembly ........................................ ............... 19
Equilibrium Assembly .............................. ......... ...... ............... 20
Nucleation-Elongation Assembly ......... .......... .................. .. ................... 22

2 OVERVIEW OF EXPERIMENTAL TECHNIQUES .............. ............... 25

Macromolecular Structure Determination by X-ray Crystallography ........................25
C rystal Preparation .................................... ............... .... ....... 27
Crystal Tw inning ................. .......... .. ...... ... ...... .. .. ............ 35
Diffraction Data Collection and Data Processing ............................................37
C ryocry stallography ............... ...... .......... ........ .. .. .. .. ................ .. 42
Derivatization of Macromolecular Crystals for Heavy Atom Structure
D term nation ....................................... .. .......... ........... ... ............... 42
Structure Determination by Isomorphous Replacement, Anomalous
Dispersion and Molecular Replacement .................................................44
Model Refinement, Validation and Interpretation.................. ...............48
Cryo-Electron Microscopy Structure Determination............................................... 49
Sam ple P reparation ........ ................................................................ .. .... ..... .. 49

D ata C collection and D igitization ......................... ...... ... ..... ..... .....................50
Divide and Conquer: The Marriage of X-Ray Crystallography and Cryo-Electron
M icroscopy .............. ...... ... .......... .....................................59

MEDIATED BY AN ANTI-PARALLEL G-ACTIN DIMER ................................61

Introduction .............. ....... ............... ................................... 61
M materials and M ethods .................................................... ............................... 64
Purification and Crystal Preparation ....................................... ............... 64
D ata Collection and Processing........................................... .......................... 65
Structure Determination and Refinem ent.........................................................68
Structure A analysis .................................. .. .... ... ..... ............ 69
R e su lts ............... ... .. ................. ..........................................6 9
Space-group A ssignm ent.......................................................... ............... 69
Structure Solution ............ ...... .... ......... .. ...... .. ................. 72
Crystal Lattice R earrangem ents ........................................ ....... ............... 78
Structure of G-actin ............... ..... .................. ..... ................ 85
Crystal Contacts.......................................... ........ 86
Latrunculin, ATP, and Divalent Cation Interactions................................ 90
Discussion .............................. ... ....................... 92
Crystal Lattice R eordering ............................................................................ 92
Ligand and Prosthetic Group Binding................. ......... ...............................95
Implications of the Anti-parallel Dimer on Meshwork Formation and
B ranching ................. .................... ........... .............. ............... . ...... 95
"Sidetrack" Polymerization Mechanism Involving the Anti-parallel Dimer ......98

A S SE M B L Y ...................................... ............................................... 100

Introdu action ................................................................................................ ..... 100
M materials and M ethods ........................................... ....................................... 103
Purification and Sample Preparation..... .......... ....................................... 103
R results .................. .... ............................................. .................. 109
Capsid Structures of Single and Geminate Particles ........................................111
D iscu ssion ................................................................................................ ..... 1 19

AND BIOLOGICAL SELF-ASSEMBLY ....................................................126

Introduction ......................... ................ ..... ....................... 126
Directed Assembly: Small Spherical Viruses........................ ...............131
F-Actin Assembly: Nucleation-Elongation and "Sidetrack" Polymerization..132
M materials and M methods ........................................... ....................................... 135

Calculation of Buried Surface Area, Association Energy and Solvation
Energy ............................................................................ ................ ............... 135
R results ............. .......... ....... ...................... ...... .......................137
Calculation of Buried Surface Area, Association Energy and Solvation
Energy ...... ................................. ...........137
Correlation of Uncoating With Buried Surface Area in AAVs......................143
Prediction of Dynamic and Directed Assembly Systems.............................. 144
D discussion ....................... ........... .. ....... ........... ....... ...............145
Structural Intermediates of MSV Single and Geminate Capsids ....................148
Structural Intermediates of F-actin.................... ................... ............... 149
Biophysical Basis for AAV Transduction Efficiency ......................................150

L IST O F R EFER EN CE S ............................ ........... .............................. ............... 153

BIOGRAPHICAL SKETCH .............. ............................................................ 180


Table page

3-1 Data Processing Assuming P4, P422 and P222 Laue Groups.............................71

3-2 Data Collection, Processing, and Model Refinement of Three Crystal Forms........74

3-3 ATP and Latrunculin Binding Sites of Actin in the P43212 Crystal Systema ..........91

5-1 Calculation of Buried Surface Area for Representative Macromolecular
C om plexes ................................................................................................... ....... 137

5-2 Calculation of Buried Surface Area for Putative F-actin Polymerization
In term ed iates ........................................ ...................................................13 9

5-3 VIPER Analysis of Representative Capsids .......................................................... 141


Figure page

1-1 H em oglobin and Sickle-C ell D disease ........................................... .....................2

1-2 Electron Tomographic Reconstruction of a Dictyostelium discoideum Motile
C e ll ............................................................................................... . 3

1-3 Protein Structural Transition at the Route of Prion Disease................................

1-4 Structure of TM V.................. ..................................... .... ......... 11

1-5 T=3 Cowpea Chlorotic Mottle Virus (CCMV). ................................................17

2-1 Flowchart of Structure Solution Using X-ray Crystallographic Analysis .............26

2-2 Bar Graph Illustration of the Growth of the PDB From 1972-2005...................27

2-3 Bragg's Formalism for Constructive Interference From Reflecting Planes in a
C crystal L attice .................................................................... ......... 40

2-4 Growth of CryoEM Depositions to the PDB ................... ......................... 51

2-5 Flowchart of Steps Taken During Structure Determination by CryoEM ..............52

3-1 Crystals and Diffraction Pattern of Actin LD Complexed With Latrunculin A
an d P o ly ly sin e ................................................... ............. ................ 6 6

3-2 X-ray Diffraction Oscillation Photograph Showing Changes Due to
D dehydration. ........................................................................67

3-3 Self-Rotation Function Stereographic Projections of the P4 Reduced Data Set
for (A) K= 900 section and (B) the K = 1800 section............................................72

3-4 Estimation of the Twinning Fraction (a) of the Reduced P4 Data Set..................79

3-5 Actin Packing in the P212121 and P43 Crystal Lattices.............................81

3-6 Crystal Lattice Packing Diagrams of the (A) P212121, (B) P43212, and (C) P43
C ry sta l F o rm s ................................................................. ................ 8 2

3-7 Actin Anti-Parallel Dimer Superimposition. ................................. ............... 84

3-8 Close-Up View of the (A) P212121 and (B) P43 Actin Anti-Parallel Dimer..........85

3-9 Structure of the Actin Monomer Complexed With ATP and Latrunculin (LAR)
in the P43212 Crystal System ...................................... ......... ............. .. 87

3-10 Anti-Parallel Dimer (Lower dimer, LD) Type IIA Interaction ..............................88

3-11 Actin Nucleation, Polymerization, Branching, and Cross-Linking ....................97

4-1 MSV-CP Refolding and Purification from Inclusion Bodies .............................110

4-2 Negative-Stain Electron Micrograph of Oligomeric Species Found in a Typical
M SV Preparation. ............................................................ .............. .. 111

4-3 Geminate and Single MSV Particles Negative-Stained and Frozen in Vitreous
Ic e ............................................................................ 1 1 2

4-4 Views of Geminate and Single MSV Capsids Down Symmetry Axes .............13

4-5 Geminate Particle Capsid Topology ...........................................................114

4-6 Close-up Views of the Different Capsomers from the Low (25 A) and High
(9.3 A) Resolution Reconstructions............... ........... .. .............. ............... 115

4-7 The two Pseudo-Three Fold Symmetry Axes in the Geminate Particle ..............117

4-8 View of Connecting Density Between the Two Heads of the Geminate Capsid. 118

4-9 Close-Up View of the Icosahedral Symmetry Axes in the Isometric T=1 Single
P a rtic le ......................................................................................................1 1 9

4-10 H elixhunter Analysis Results......................................................................... 120

4-11 Fit of the Pentamer Model Into the Density for the Apical, Peripentonal and
Equatorial Capsomers of the Geminate Particle.............................................. 121

4-12 Fit of the Pentameric Capsomer Within the Density from the T=l Isometric
P article's 5-fold A xis ............................................................ .. ................... 122

4-13 Close-Up Comparative View of the Apical Capsomer from the Geminate
Particle and the Capsomer From the T=l Isometric Particle............................123

4-14 DNase I Treatment of Geminate Particles ................................. ............... 124

4-15 DNase I at the Equator ............ ... .............. ............... 125

5-1 A ctin Treadm killing ....... ............................................................ ................. 33

5-2 Ribbon Diagrams of Four Representative Oligomeric Protein Assemblies ........138

5-3 Actin Intermediates Implicated in the Assembly Mechanism ...........................139

5-4 Biophysical Survey of Spherical Virus Capsids Ranging From T=l to T=13 ....140

5-5 Assembly Mediated by the F and G Protein Pentamers of DX174 ...................143

5-6 Nucleic Acid Mediated MSV Geminate Particle Assembly.............................144

5-7 Buried Surface Area Used to Correlate Viral Uncoating with Free Energy of
A sso ciatio n .................................................. ................ 14 5

5-8 Calculation of the Free Energy of Association of Protein Complexes ..............147

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy



Robbie J. Reutzel

December 2005

Chair: Robert McKenna
Major Department: Biochemistry and Molecular Biology

Self-assembly is a process used by many macromolecular systems to build large

highly-symmetric oligomers. Biological polyhedra, such as virus particles, are known to

follow a directed assembly pathway following an obligate order of oligomeric states

toward a final structure of minimal energy. Maize streak virus, the type member of the

geminivirus family, is a virus known to form two distinct capsid morphologies. The

mature infectious capsid is composed of 110 copies of a 244 amino acid coat protein

arranged in two joined pseudo T=l heads. This particle is known to package the full

complement of viral ssDNA, and its assembly is thought to be genome-dependent.

A T=l icosahedral particle is also known to form but it only packages fragmented

sub-genomic DNA. The structures of both the single and geminate particles have been

solved by cryoEM to 9.7 and 9.3 A resolution, respectively. Interpretation of the volume

data for the geminate particle suggests a possible structural role for the viral ssDNA

genome and this is further verified by disruption of mature particles by treatment with

DNase I. Analysis by buried surface area calculations suggests that MSV assembles from

pentameric intermediates.

Helical macromolecular complexes, such as actin filaments (F-actin), follow a

distinct, dynamic, self-assembly pathway. In vitro, the first detectable species after

initiation of F-actin polymerization is an anti-parallel G-actin dimer (held together by a

disulfide bond at Cys-374) that does not exist in mature F-actin. This species has been

crystallized and its structure solved in three different space groups. The dynamics

illustrated by these different crystal forms are postulated to be examples of the types of

orientation changes that occur to actin during polymerization. An assembly mechanism

utilizing the structural intermediates trapped in these crystal lattices is described and


Both the structure solutions of the MSV capsids as well as the F-actin

polymerization intermediates have allowed possible modes of assembly to be modeled.

In additions, the calculation of buried surface area is suggested as a method to lend

structural insight toward the viral uncoating process, the prediction of biological

assembly intermediates, and the stability of certain assemblages.


Molecular Basis of Disease

At the beginning of the 20th century, there was little knowledge regarding the

molecular cause of diseases. The crystal structure of hemoglobin (Hb) was a landmark

discovery that helped to usher in the advent of macromolecular crystallography as a vital

analytical technique while highlighting the effects of structure and specific amino acid

mutations in the complementary nature of protein-protein interaction sites (Perutz, et al.,

1968). The structure determination of Hb and the accompanying atomic modeling was of

paramount importance to understanding the correlation of certain amino acid

substitutions to respiratory diseases (Perutz and Lehman, 1968). The elucidation of the

structure of Hb composed of two ca-3 dimers illustrated the similarity with myoglobin

(Mb), and suggested strategies of common protein evolution (Kendrew et al., 1961).

Consequently, it was found that, in Hb, the most well-known mutation, E6V, was

caused by a gain of function in the Hb monomer. This change allows polymerization into

filaments, which causes a sicklingg" of the red blood cells (Figure 1-1). This single

amino acid substitution, creating a complementary hydrophobic binding surface between

Hb monomers, causes sickle-cell anemia that can eventually lead to organ failure. It did

not take long for this discovery to be proven a paradigm across biology: protein-protein

interaction defects cause disease and/or malfunction of the pathway and macromolecules





Figure 1-1. Hemoglobin and Sickle-Cell Disease. (A) Normal hemoglobin. (B)
Hemoglobin with the glu-val mutation at position 6 leading to aggregation
(C) Hemoglobin fibril formed from the interaction of monomeric Hb units
as shown in panel B (Adapted from Noguchi, 2005).

Since Perutz's early work, many diseases have been characterized by

macromolecular misrecognition at the atomic level. Many forms of cancer have been

found to be the result of constitutively activated signal transduction pathways that are

largely governed by the association and chemical regulation of many different proteins

through extensive associations (Weber and Gioeli, 2004). The same can be said for the

regulation and normal function of the immune system (Blaes et al., 2005). These events

Figure 1-2. Electron Tomographic Reconstruction of a Dictyostelium discoideum Motile
Cell. The actin filaments are shown in orange-red, the ribosomes and other
assemblages in green and cell membranes in blue. The protein
concentration is estimated at 50-400 mg/ml. Adapted from Medalia et al.,
2002 (Chebotareva et al., 2004; Zimmerman and Trach, 1991; Ellis and
Minton, 2003).

must occur in the context of a cell possessing a protein concentration of 50-400 mg/ml

(Chebotareva et al., 2004). This solute concentration dictates that 40% of the cellular

volume is excluded and decreases the available area for biochemical reactions

(Zimmerman and Trach, 1991). These factors affect the kinetic rates of many cellular

activities. Cryo-electron tomographic reconstructions illustrate this crowding effect by a

large network of filamentous actin (F-actin) and other cellular organelles such as

ribosomes (Medalia et al., 2003)(Figure 1-2).

Perhaps the most intriguing example of protein interactions mediating illness is the

case of prion diseases (Prusiner et al., 1998). Diseases such as Creutzfeldt-Jakob disease

(CJD), scrapie and bovine spongiform encephalopathy (BSE) are characterized by the

structural transition of a predominantly a-helical, normal cellular protein (PrPC) to one

containing a large percentage of P-pleated sheet (PrPsC) with no change in the amino acid

sequence of the polypeptide chain (Nguyen et al., 1995). The PrPsC protein is capable of

converting normal PrPc proteins to its own deadly conformation (Inouye and Kirschner,

1997). Once this transition has occurred, the PrPsc proteins are capable of forming 3-

crystallites that eventually assemble into -100 A-wide rods, large enough to be visualized

by negative-stain electron microscopy (Nguyen et al., 1995)(Figure 1-3). It is this

formation of insoluble, protease-resistant, fibrils, through non-native protein-protein

interactions, that causes extensive damage to the brain and nervous system leading

eventually to dementia and death (Mastrianni et al., 1996). Although not entirely

understood, this mechanism by which a protein can "infect" other normal cellular

polypeptides by influencing their tertiary structural interactions, highlights the

importance of protein interactions in disease states.

In a process somewhat analogous to prion disease, amyloid plaque (AP) and

neurofibrillary tangle (NFT) formation from non-native protein interactions leads to the

symptoms associated with Alzheimer's disease (Selkoe, 2001). This is caused by hyper-

phosphorylation of the microtubule-associated protein Tau by microtubule associated

protein kinases (MAPKs) leading to unusual self-associations and formation of NFT's

(Rapoport and Ferreira, 2000). The architecture of many cells, comprised of F-actin and

microtubules, does not have many diseases directly linked to faulty recognition of

complementary surfaces, presumably because without correct assembly of these

components the cell is not viable. Indeed, the cytoskeletal constituent F-actin, in a

concerted effort with microtubules, is needed for such elementary cellular processes as

mitosis and cell migration during development (Lenart et al., 2005; Baas et al., 2005).


Figure 1-3. Protein Structural Transition at the Route of Prion Disease. (A) PrPC the
normal cellular changes to the protease resistant PrPs' (scrapie) form which
eventually leads to (B) fibril formation. These fibrils can cause diseases
such as CJD and BSE. Adapted from Sharon, 2005.

The biomedical relevance of many assemblages is apparent when surveying the

natural macromolecular complexes forming the cytoskeleton, signal transduction

pathways and biological polyhedra such as virus capsids. Incidentally, this phenomenon

of identical primary structures leading to different tertiary structure is a normal

occurrence in virus shell proteins which can routinely assume different conformations

within a viral capsid (Harrison et al., 1978). Many of these cellular structures can

assemble spontaneously, in favorable conditions, in a process analogous to assembly line

production of automobiles (Crane et al., 1950). This property, among others, has

spawned great interest in the fields of biotechnology and nanoscience (Zhang and

Glotzer, 2004).

By studying the ways that nature has constructed its simplest machines,

nanoscientists can create their own structures such as DNA-binding nanotubes for

biotechnology, that, like their natural counterparts, assemble spontaneously and with

high-fidelity (Audette et al., 2004). Examples in this sector of research include electronic

circuitry based on binding of gold and nickel beads to icosahedral viruses at specific

distances along the capsid surface (Chatterji et al., 2005; Blum et al., 2004). By using the

already well-documented characterization of biological assemblages such as amino-acid

sequence and spatial relation of subunits in a polymer, intricate decorations of metal

particles imparting different electrostatic properties to the complex can be used to

develop new technologies.

It is not hard to imagine a burgeoning field of molecular therapies utilizing "smart"

nanomachines to deliver drugs to specifically diseased cells and offer new and better-

targeted therapies for cancer. In some ways, gene therapy research, using spherical

carrier molecules such as the adeno-associated virus (AAV) serotypes, is already taking

advantage of specific viral tropisms for targeted therapeutic delivery (Burger et al.,

2004). Biotechnology will continue to grow in the future and knowledge of the

interactions in macromolecular assemblages will help foster the development of

molecular medicine and nanoscience.

In all of the examples mentioned above, a greater degree of understanding of the

mechanisms involved in disease and biotechnology applications has been attained

through structural analyses (Inouye and Kirschner, 1997; Rossmann et al., 1985; Kabsch

et al., 1990; Yonekura et al., 2003; Kerfeld et al., 2005; Chen et al., 2000; Fotin et al.,

2004a, 2004b). These studies have mainly relied on a combination of X-ray

crystallography (XRC) and cryo-electron microscopy (CryoEM) data. Since these

structure determination methods have become commonplace and are routinely used to

enhance the description of biological associations, an overview of both approaches will

be presented with the aim of clarifying these technical processes in chapter 2.

Survey of Macromolecular Interactions

In an analysis of the protein-protein interactions found in the yeast Saccharomyces

cerevisiae it was estimated that close to 1,000 oligomeric complexes containing two to

many proteins are likely formed (Uetz et al., 2000). Since this first study of the

"interactome", several groups have analyzed the proteomes of Helicobacterpylori,

Caenorhabitis elegans, Drosophiloa melanogaster, and Homo sapiens (Rain et al., 2001;

Li et al., 2004; Giot et al., 2003; Stelzl et al., 2005).

More focused studies on the interactome for specific diseases and signaling

networks have also been conducted for Huntington's disease and the transforming growth

factor-P (TGF-P) signal transduction cascade (Goehler et al., 2004; Colland et al., 2004).

It is astounding that with the limited volume available for the recognition of associating

proteins that the vast number of complexes can be formed correctly in the volume not

excluded by macromolecules. A survey of E. coli proteins in the SWISS-PROT protein

sequence database including membrane-bound, structural and soluble proteins highlights

the importance of protein complex formation by further illustrating that many proteins

function in complexes (Bairoch et al., 2005).

This analysis further illustrates that dimers and tetramers are formed preferentially,

comprising 38 and 21% of all entries, respectively (Goodsell and Olson, 2000). These

studies, while probing specific organisms' genomes, will not take into account other

assemblies, such as viruses and carboxysomes that exist elsewhere in biology and will

vastly underestimate the amount of homooligomeric associations that are known to form.

Nevertheless, this interaction information has uncovered thousands of protein complexes

across species and has integrated data for known regulatory pathways, and disease

cascades while describing some functions of previously uncharacterized polypeptide

species (Stelzl et al., 2005).

There has also been much work in determining the types and characteristics of the

intermolecular forces used to hold these assemblies together. For a stable

macromolecular assembly to form, the forces used in subunit association must be larger

than those working to dissociate the complex (Crane et al., 1950). While this may seem

obvious, probing the ways that cells overcome this seemingly simple problem can yield

great insight. A measure of the forces holding a complex together can be attained using

buried surface area calculations from atomic level XRC data, and has been studied

extensively (Lee and Richards, 1971; Chothia and Janin, 1975; Eisenberg and

McLachlan, 1986). Biophysical techniques such as calorimetry can also be used to

measure the energy associated with subunit interactions (Horton and Lewis, 1992).

Macromolecular assemblages are usually held together by many weak forces such

as: hydrogen bonding, buried surface area (hydrophobic effect), electrostatic bonding,

and van der Waals interactions (Johnson, 1996; Jones and Thornton, 1996; Hummer et

al., 1995). In this way, complex formation is much the same as protein folding. Because

the strength contributed by each type of interaction is not great, many thousands of these

interactions may be used in oligomerization (Zlotinick et al, 2000). This is possible

because the forces are in effect multiplied, and can allow the calculation of an

equilibrium rate constant (Ka) when the contribution from each is determined (Albeck

and Schreiber, 1999).

This ability of macromolecules to use multiple weak interactions in complex

formation is far from a liability as it allows the specificity that is derived from the fitting

of complementary surfaces of interacting entities (Zielenkiewicz and Rabczenko, 1984).

The recognition between complementary surfaces is important for several reasons. First,

the surface area that is buried between interacting proteins excludes water between

subunits, as required for the hydrophobic effect. The second reason is because the

repulsive forces that exist from interactions between a small number of non-matching

surfaces are strong enough to force incomplete assembly. This means that a large

number of interactions must be present to offset this effect (Pace et al., 1996). Third,

atoms possessing the ability to form electrostatic interactions and hydrogen bonds must

be oriented in a cooperative conformation in order for the bonds to form (Morozov et al.,

2004). All of these necessary pre-requisite conditions are achieved through the efficient

recognition of complementary molecular surfaces.

To summarize, the types of interactions that exist in most macromolecular

assemblages are weak, but find strength in numbers. Instead of using a few covalent

interactions, the thousands of non-covalent associations are utilized for the purposes of

regulation, complementarity, and flexibility (Harrison et al., 1978; Speir et al., 1995).

This dynamic aspect of macromolecular recognition plays a key role in many

spontaneous self-assembly reactions (Crane, 1950). This exquisite specificity

demonstrated by these interactions and first illustrated by Perutz's study ofHb illustrates

the importance of these associations and the disease states that follow when they


Large, Symmetric Oligomers Built From Identical Subunits

As more detailed structural knowledge of macromolecular assemblages is revealed

primarily by cryoEM and XRC methods as well as other biophysical techniques, two

trends are noticed. 1) As proteins have evolved to perform more complex tasks,

evolution has dictated the construction of elaborate protein machines from multiple

identical subunits instead of creating a single, long protein chain. 2) The evolution of

macromolecular assemblages has preferentially stressed the building of the larger

complexes using point group, helical and planar symmetry operators (Steven et al., 2005).

This phenomenon is thermodynamically driven; the most stable associations of multiple

subunits are those that share multiple, equivalent interactions, necessarily leading to

symmetric assemblies (Blundell and Srinivisan, 1996). Viruses, cytoskeletal filaments,

and ring-shaped DNA encircling machines all highlight this basic trend. This allows the

organism/virus several advantages such as genomic coding efficiency, error control in

translation, and the ability to finely control recognition and interaction (Goodsell and

Olson, 2000).

The building of large structures also allows enzymes to use allosteric mechanisms,

provides greater stability against denaturation, and allows a greater degree of buried

surface (Hatley et al., 2003). This is not to downplay the invaluable role that hetero-

oligomeric interactions represent. Those complexes formed from two or more non-

identical proteins tend to be prevalent in transient types of interactions. This can be seen

in many signal transduction cascades that typically involve a post-translational

modification, such as phosphorylation, in a quick and dynamic type of association. For

the purpose of maintaining focus, the majority of this review will focus on homo-

oligomeric macromolecular assemblages.

Genome Coding Efficiency

Early investigators quickly realized that the formation of macromolecular

assemblages built from identical subunits eliminates the need to specify separate parts

(Crick and Watson, 1956). For example, tobacco mosaic virus (TMV), a helical, rod-

shaped virus, consists of 2130 protein subunits each 158 amino acids long and a single-

stranded RNA molecule of 6390 nucleotides (Namba and Stubbs, 1986)(Figure 1-4).

Over a million nucleotides of RNA would be necessary if a separate gene for each viral

coat protein (CP) was needed. This would necessarily be 150-times longer than the entire

viral RNA genome.

The virus conserves its nucleic acid by using a single copy of the coat protein (less

than 10% of its genome) to make the 2130 identical copies of protein that assemble into

the virus coat. In this way, nature has developed a clever and redundant method to

conserve genome and decrease the size of the protein compartment necessary (Crane,

1950). A caveat to this observation should be added with regard to larger organisms

where sizeable stretches of non-coding nucleic acid are often found (Wolynes, 1996).


Lateral View of End-On View of
RNA Dependent TMV

300 X 15 nm
Figure 1-4. Structure of TMV. (A) Mature TMV virion composed of 17 subunits per
ring of protein. The capsid is 300 x 15 nm when mature. (B) TMV-CP
shown in blue with RNA genome in yellow. Assembly of TMV occurs in a
genome dependent fashion. (C) End-on view of TMV capsid without RNA.
The hollow cavity is used to store the RNA. Pictures adapted from (Sforza,

Error Control

Translation of RNA to a protein chain by the ribosome is know to have an error

rate approaching 1 in 2000 amino acids (Goodsell and Olson, 2000). Using TMV as an

example, and given this error rate, the chances of each amino acid of CP to be correct

approaches 99.9%. This means that -90% of TMV subunits will be made correctly. Of

the 5% of CP translated incorrectly, some subunits will be assembly-competent while

others will be unable to assemble correctly.

If one polypeptide was used to house this RNA genome, it would necessarily

contain over 300,000 amino acids and would virtually never be efficiently translated

correctly (Culver, 2002). This inherent control over what subunits are able to recognize

complementary surfaces naturally allows the efficient sorting of good building blocks

from bad, and the destruction of those proteins unable to assemble productively (Ferrell

et al., 2000). This allows a selective advantage to the virus, whereby only productive

binding of subunits to assembly capsids will occur.


Self-assembly also allows exquisite regulation of the state of the complex.

Crowding inside cells allows the intracellular concentrations of proteins in different areas

to be controlled (Metlina, 2004). The cell can also regulate the concentrations of ions

such as Ca, and the pH, and salt concentrations at different localities (Carlier, 1991;

Pollard, 1990; Zlotnick, 1994). Since many assembly reactions are very sensitive to the

concentration of free subunits in the vicinity as well as ion concentrations and pH, this

adds another level of regulation.

There are also many classes of proteins that can act as regulatory molecules for

assemblages. There are at least a hundred actin binding proteins (ABP's) that bind to

either F or globular actin (G-actin) and modulate the assembly/disassembly balance

depending upon cellular need. The same can be said for microtubule associated proteins

(MAPs). Another complex form of regulation that is used to control assembly reactions

are post-translational modifications such as phosphorylation and glycosylation. It is well-

documented that intermediate filaments are readily phosphorylated during mitosis in

order to promote disassembly (Chou et al., 2003).

Structure of Viruses

The capsids of most characterized viruses consist of a protein coat comprised of

many subunits, with or without a lipid envelope, surrounding a core of nucleic acid (RNA

or DNA). The viral capsid exists to protect the genome from the outside environment

while identifying cellular receptors to interact with, eventually leading to infection and

uncoating (Thomas et al., 2004). Viruses have adopted several different strategies in

structure. These include: helical non-enveloped (TMV), helical enveloped

(Influenzavirus), non-enveloped icosahedral (Parvoviruses, Poliovirus,

Polyomaviurs/Papillomavirus), and enveloped icosahedral (Herpesvirus/HIV (Semliki

Forest Virus (SFV)). These classes of virus structure all exhibit high degrees of

symmetry and share many common features during their lifecycles. Of course, this is just

a small number of examples and these viruses also differ in many aspects. Because of

this great degree of symmetry, viruses epitomize the efficiency of nature in preferentially

evolving to the most stable and adaptable assemblages.

The knowledge of virus structures has co-evolved with technological advances in

the biophysical techniques used to study them. While the preparation and preliminary

XRC analysis of spherical virus crystals was recorded in 1938, it was not until 1978 that

the first structure of a virus at high-resolution was presented (Bawden and Pirie, 1938;

Bemal et al., 1938; Harrison et al., 1978). This structure, of tomato bushy stunt virus

(TBSV), was followed several years later by the structures of the first human viruses

(human rhinovirus 14 (HRV 14) and poliovirus) (Rossmann et al., 1985; Hogle et al.,

1985). While the field of virology awaited the high level of detail attainable by XRC,

many discoveries regarding the structure and symmetry of viruses (mainly by X-ray fiber

diffraction and negative-stain electron microscopy) were made.

The first recorded structural characterization of a virus started with an X-ray fiber

diffraction study on tobacco mosaic virus (TMV) (Bemal and Fankuchen, 1937). This

X-ray photograph was the beginning of measurements toward establishing the size,

shape and substructure symmetry of this helical, rod-shaped virus while inadvertently

laying the basis for the now well-established concept of self-assembly from many,

identical viral subunits (Bernal and Fankuchen, 1941; Harris and Knight, 1952; Cochran

et al., 1952; Franklin and Klug, 1955).

Remarkably, without a wealth of structural, biochemical or physical evidence, it

was hypothesized and later proven that all small viruses are built up of protein subunits

packed together in a regular manner (Crick and Watson, 1956). In an elaboration of this

basic concept, the reasons cited for the likelihood of this model included the conservation

of viral genome if only one structural protein is used (Crick and Watson, 1956). Despite

the experimental limitations of studying virus samples using X-ray methods at this time

(limitations on computing power, lack of intensity of X-ray sources etc.) a plethora of

physicochemical knowledge was still gleaned from analysis of raw diffraction data and

the information it gave on virus substructure and unit cell packing (Klug and Caspar,

1960; Caspar, 1956; Klug et al., 1957).

Using the method of isomorphous replacement developed by Perutz in his group's

study of hemoglobin, Holmes and Franklin (1958) determined the number of subunits in

one helical repeating unit of TMV. The infectivity of TMV RNA ribonucleicc acid) had

already been observed as well as the "coding" of the coat protein by the genome (Gierer

and Schramm, 1956). This gradually led to the uncovering of the limited number of ways

that a virus particle, confined to only rotational and translational symmetry elements,

could build itself (Klug and Caspar, 1960).


In connection with the structure of helical viruses, Caspar and Klug (1962) realized

that the use of the same contact points over and over again in packing subunits leads to a

symmetrical particle. However, in the case of spherical viruses, out of all the types of

symmetry possible for a structure of limited extent, only the cubic point groups were

likely to lead to an isometric structure (Crick and Watson, 1956; Caspar and Klug, 1962).

This is because the identical arrangement of subunits of any shape on the surface of a

sphere in this fashion will contain the same specific bond sites (Klug et al., 1957; Caspar

and Klug, 1962).

Experimental evidence obtained by X-ray diffraction and electron microscopy

confirmed the cubic nature of the symmetry and showed that particles of both TBSV and

turnip yellow mosaic virus (TYMV) belonged to the icosahedral point group (Caspar,

1956; Klug et al., 1957). Soon after, poliovirus was also found to contain an icosahedral

substructure (Finch and Klug, 1959). In the words of Caspar and Klug (1962) "the

advantage of icosahedral symmetry over the other types was that it allows the use of the

greatest possible number, namely 60, of identical asymmetric units to build a spherical

framework in which they are also identically packed." This would also allow the

smallest feasible subunits to economize and protect the genome.

These experimental findings still lacked the answer to "how" these subunits packed

and how many subunits comprised each particle. By modeling the coat protein of a

spherical virus on the physical attributes of the TMV coat protein, Caspar and Klug

(1962) showed that 60 of these units packed in accordance with icosahedral (532 point

group) symmetry would lead to a protein shell of diameter -150-250 A. This diameter

was in agreement with the experimental data for the smallest known viruses at the time

(Loeb and Zinder, 1961; Kassanis and Nixon, 1960). While this model helped advance

the understanding of virus symmetry and construction, there were still paradoxes. It had

already been shown that viruses quite often were comprised of more than 60 subunits,

and having diameters much larger than the first simple model (Finch and Klug, 1960).

In an elaboration of their first model, Caspar and Klug (1962) showed that if 60n

subunits are used to decorate the surface of a sphere, only 60 units in each of n unique

chemical environments could be equivalently related. In mathematical terms, an

icosahedron containing 20 equilateral triangular faces, and therefore 20T facets, where T

is the triangulation number given by the rule: T = Pf2, (where P = h2+hk+k2, for all

pairs of integers h and k having no common factor) and f is any integer, so that the

number of (quasi-equivalent) subunits = 60T (Caspar and Klug, 1962). This principle

corresponds to clusters of 12 T pentamers, 20 T trimers, and 30 T dimers on the spherical

virus surface. As an illustration of the viral asymmetric unit and quasi-equivalence,

cowpea chlorotic mottle virus, a T=3 virion, is shown (Figure 1-5).

Figure 1-5. T=3 Cowpea Chlorotic Mottle Virus (CCMV). The surface lattice can be
viewed as a tiling of pentameric and hexameric units with delineations for
each viral asymmetric unit. There are three coat proteins in three quasi-
equivalent environments construction the virion shell.

This fundamental theorem was shown to be correct for the first time in the X-ray

crystallographic structure of TBSV (Harrison et al., 1978). It is important to remember

that while quasi-equivalence appears to govern the structural morphology of spherical

viruses, this is not necessarily the case. Rather, quasi-equivalence is merely a

consequence of virus capsids assembling into the most energetically favorable formation,

to perform their primary tasks of genome protection and further cell infection. In fact,

the structures of L-A virus, Simian Virus 40 (SV40) and polyoma virus, while adhering

to Caspar and Klug's correctly hypothesized triangulation numbers, are not correctly

predicted by the theory of quasi-equivalence (Twarock, 2004). There are differences

such as the incorrect predicted absolute configuration of the particles and the number of

subunits used to build them.

The principles of quasi-equivalence are also illustrated in other areas of biology.

Examples of this theme include the hexagonal arrays of connexon molecules at the gap

junctions of certain cell types. The membrane protein bacteriorhodopsin is also known to

form hexagonal sheets in the plane of the lipid bi-layer (Kamihira et al., 2005). These

structures are readily visible by negative-stain electron microscopy and are akin to the

packing of marbles in a box. Clathrin coats on the surface of a membrane can also be

seen as an hexagonal array, and must form pentameric units, a necessary step for the

invagination of the membrane leading to a closed polyhedral structure (Fotin et al.,


While the development of assemblages based on many, medium-sized building

blocks adhering to point group and helical symmetry seems an underlying theme in

biology, it is important to realize that this is merely a consequence of several omnipresent

factors. Indeed, while less efficient, a system that could perform the same set of tasks

using a few large, asymmetric subunits could have evolved. An example of this

evolutionary end-point is the sarcomere length defining protein titin found in muscle. It

is composed of thousands of amino acids in a single polypeptide chain, but exists as an

example of an evolutionary roadblock (Labeit and Kolmerer, 1995).

Clearly, nature tends to obey the laws of thermodynamics leading toward the most

efficient building processes of molecular machines. Examples are found throughout

biology. The cytoskeleton, virus capsids, and protein production and destruction

machinery ribosomee, proteasome, GroEL/ES complex) are functional illustrations of the

evolution of subunit-constructed assemblages (Lowe et al., 1995; Braig et al., 1994; Gao,


Spontaneous Biological Self-Assembly

As stated above, oligomeric proteins consisting of many, medium-sized building

blocks tend toward spherical polyhedra, helical, linear polymers, and sometimes planar

point groups such as hexagonal arrays (Blundell and Srinivasan, 1996). This concept of

self-assembly is one of the central edicts of biology. In vitro analysis of true self-

assembly from purified components of viruses, bacterial flagella, ribosomes, and

cytoskeletal filaments has revealed the general properties of these processes. For

example, large biological structures, such as the mitotic spindle, are constructed from

molecules that assemble by defined pathways without the aid of templates (Bloom,

2005). The characteristics of the constituent molecules decide the assembly mechanism

and ultrastructural morphology of the final complex.

For most assemblages, multiple, weak but highly specific noncovalent associations

hold together the building blocks, comprised of proteins, lipids and nucleic acid (Chothia

and Janin, 1975; Horton and Lewis, 1992). This same strategy is used even by the largest

cellular entities, including chromosomes, nuclear pore complexes, transcription initiation

machinery, vesicle fusion assemblies, and intercellular junctions. As stated earlier, the

ability of subunit molecules to assemble spontaneously into the complicated structures

required for cellular function greatly increases the amount of information stored in the


Analysis of reaction kinetics suggests that biological self-assembly can be

generally broken into two categories: 1) equilibrium assembly: examples include

spherical viruses and other biological polyhedra (Zlotnick, 1994) and 2) nucleation-

elongation assembly: including the cytoskeletal filaments actin and microtubules as well

as bacterial flagella.

Equilibrium Assembly

Since the reconstitution of TMV from purified protein and RNA components,

viruses have been used as models to study many aspects of the assembly of

macromolecular structures (Frankel-Conrat and Williams, 1955; Chiu et al., 1997).

Because of the extensive literature available regarding virus assembly and the lack of the

same type of information for other biological polyhedra, most of this discussion will

focus on viruses.

Virus assembly pathways are good examples of directed self-assembly using an

obligate order of states (Zlotnick, 1994). While all viruses don't have the same structural

intermediates, they do follow the same basic trend toward a structure of minimal free

energy. Viruses can adopt different shapes and sizes, but invariably their major viral coat

protein (CP) forms a capsid around the infectious genomic nucleic acid, which could be

ssDNA, ssRNA, dsDNA, or dsRNA. In the assembly and maturation of many viruses,

recognition and encapsidation of the genomic nucleic acid by the CP is necessary for

infectious capsid formation. This CP is usually capable of performing other functions

such as receptor recognition during infection. Equilibrium assembly generally, but not

necessarily, lacks a true nucleation event. Also, when compared to linear polymer

assembly, subunits can be added indefinitely making the size range of the assembly large.

Viral capsids are very rarely found half assembled, and most CP are found either

within complete capsids or as disassembled subunits (Zlotnick, 1994). The end subunits

in a linear polymer are in equivalent environments, while subunits comprising the protein

shell of a virus are not necessarily under the same environment. A consequence of the

experimental conditions chosen, including changes in pH, genomic material available,

mutations made to the subunits used, and salt concentration can result in different forms

of capsid being formed (Fox et al., 1996). An example of this is seen in the virus-like

particles (VLP) of human papillomavirus 16 (HPV16) formed from an N-terminal

truncation mutant (Chen et al., 2000). In this instance, the normally T=71 capsids are

replaced by T=l VLP's. The equilibrium assembly model, assumes that free subunits are

in equilibrium with fully formed capsids and is able to account for this situation

(Zlotnick, 1994).

While spherical virus capsids are the most readily obvious and well-studied forms

of self-assembled polyhedra there are many other examples in biology. Carboxysomes,

responsible for carbon fixation in cyanobacteria and certain chemoautotrophs, are

regularly shaped, polyhedra likely built from pentameric and hexameric building blocks

(Cannon et al., 2001; Kerfeld et al., 2005). The isolation of these five and six subunit

containing species has invited the speculation that these organelles structurally hold true

to icosahedral symmetry and may be evidence of the evolution of virus capsids (Cannon

et al., 2001).

Brodsky (1988) showed the pH dependent assembly of clathrin polyhedral lattices

from purified heavy chain trimers each with an attached light chain. This basket-like

structure is important in movement of proteins to and from the plasma membrane and

between certain intracellular organelles. Structural characterization of high-resolution

cryoEM and fitting of the heavy chain crystal structure, shows three main categories of

polyhedron morphology composed of hexameric and pentameric blocks (Fotin et al.,

2004a; Fotin et al., 2004b). Spontaneous assembly of these intracellular entities has also

been demonstrated by change of pH and ionic strength (Ybe et al., 1998; Brodsky, 1988).

Although not well characterized, an assembly pathway similar to small, spherical viruses

seems likely. Other closed protein assemblies such as GroEL, composed of 14 subunits

arranged in D7 point group symmetry, deserve mention as oligomers that may also follow

the equilibrium pathway.

Nucleation-Elongation Assembly

The characteristics of nucleation-elongation assembly include a great dependence

upon both temperature and concentration of subunits as well as a lag-phase between

nucleation initiation and rapid elongation of the polymer (Jackson and Berkowitz, 1980).

Unlike the equilibrium mechanism, the nucleation event is generally necessary for

polymerization and as stated previously, many different sized polymers may exist at one

time, each with equivalent subunits at either polymer end. This nucleation event is the

rate-limiting step of polymerization and the nuclei consist of small subunit aggregates,

sometimes referred to as "seeds" (Desai and Mitchison 1997; Carlier, 1991).

While an oversimplification, the subunits comprising the middle of the polymer can

be treated as structurally equivalent for the purposes of this discussion. Much of the

knowledge amassed regarding this mechanism has been collected from two model

systems; F-actin and microtubule polymerization. Unlike most examples given for

equilibrium assembly, those complexes following the nucleation-elongation pathway

have functions that dictate the ability to assemble and disassemble readily due to

sometimes subtle changes in cellular conditions (Jackson and Berkowitz, 1980; Egelman

and Orlova, 1995). There has been much controversy over the size of the nuclei that are

needed during nucleation-condensation assembly. This is especially true of F-actin

polymerization, although many think the nuclei are trimeric (Bubb et al., 2002; Sept and

McCammon, 2001). Finding the nucleating species is compounded by the fact that

proteins such as the Arp2/3 complex and y-tubulin oligomers are known to nucleate

assembly of F-actin and microtubules, respectively, in vivo (Oegema et al., 1999).

F-actin and microtubules both require a rate-limiting nucleation event to start

polymerization, and the kinetics of elongation are similar. F-actin polymerization will

continue until a steady-state or critical concentration is reached, at this state G-actin

subunits are adding to the + end of the filament while those at the end of the filament

are dissociating in a process known as "treadmilling"(Carlier, 1991; Pollard, 1990). The

microtubule system follows a process termed "dynamic instability" where the tubulin

polymer is switching between states of steady elongation and rapid shortening, as well as

treadmilling (Baas et al., 2005; Rodionov and Borisy, 1997). This tends to make

polymers of approximately the same length in vitro, as subunits add and fall off

constantly. Other examples of helically shaped oligomers that follow the basic properties

of nucleation-elongation assembly are present in both eukaryotic and prokaryotic


The kinetic characterization of the DNA-segregating prokaryotic actin homolog

ParM suggests an evolutionary mechanism for the development of the microtubule

system in higher organisms (Garner et al. 2004). Although the structure of this

oligomer's building blocks is more similar to actin, its kinetics follow those of

microtubules, even demonstrating the phenomena of dynamic instability, a mechanism

only thought to have existed in tubulin polymerization. Another member of this

prokaryotic family of polymeric protomers is MinD. Though not well-studied, this

assembly also follows the general characteristics of the nucleation-condensation pathway

(Suefuji et al., 2002). Collagen, a triple helical, structural protein comprising one quarter

of all protein in the human body, is another example of a protein oligomer following the

nucleation-elongation pathway. Like many biological polymers, spontaneous self-

assembly due to changes in pH has been well documented (Silver, 1981).

One of the most complex molecular machines studied to date is the bacterial

flagella assembly. One of the greatest triumphs of structural biology in recent history is

the structure solution of the flagellar filament by cryoEM to 4 A resolution (Yonekura et

al., 2003). The self-assembly properties of this protomer have been well-documented for

some time but the single-particle image reconstruction allowed tracing of the protein

main-chain and visualization of the contacts involved in flagella function (Metlina,



Macromolecular Structure Determination by X-ray Crystallography

X-ray crystallographic (XRC) analysis of ordered arrays of protein molecules

arranged in three dimensional crystals has allowed the high-resolution structure

determination of many proteins, nucleic acids, and large macromolecular assemblages. A

flowchart outlining this method is shown in Figure 2-1. The level of detail attainable by

XRC, although dependent upon the quality of the crystals grown, now allows the highly

accurate placement of hydrogen bonds and water molecules, the positions of which are

essential to biological function (Kang et al., 2004).

A historical perspective of structural biological achievements in the 20th century

follows closely with the advent and technological advancements made in the field of

biological crystallography. The increase in crystal structures of macromolecules,

including viruses, protein-nucleic acid complexes, and other large macromolecular

machines, has seen a steady increase over the past several decades and has grown to

thousands (Berman et al., 2000)(Figure 2-2). A step-wise description of the techniques

involved in crystallographic structure determination, on both a practical and theoretical

level, will be presented here.

Heavy Atom


Crystals Data collection Diffraction Electron Density

Figure 2-1. Flowchart of Structure Solution Using X-ray Crystallographic Analysis. (A)
The steps leading to structure determination include crystallization, data
collection, and data processing. Once the structure has been determined, it
is time to move to (B) model refinement, validation and electron density
interpretation. The blue box and arrow denotes an iterative refinement loop
that should continue until convergence.


C eikN r .] I, a L iir e' 'u Irp:' j:1+

70 :& ,j 6t1 .jj. IIr: ryj

1972 1973 1974 1975 1976 4977 19781979 1980 1981 1982 19B3 1984 1985 1986 1987 1998 1989 I 199 1 1 1992 1993 1994 1995 1996 1997 1999~ 99 20 2001 2002 202 0 2 4 2005
Yew Last patea 07-Jr-20L5
Figure 2-2. Bar Graph Illustration of the Growth of the PDB From 1972-2005. The
growth of the database is expected to increase as structural biology uses
more high-throughput methodologies for structure determination. (Berman,
2000; www.rcsb.org/pdb)

Crystal Preparation

The first step in macromolecular structure determination is the production of highly

purified sample (95-99% homogeneous). This is a crucial process, and often can mean

the difference between diffraction-quality crystals and amorphous precipitate. There are

many methods of crystallization. The most common approach for in-house

crystallization is the hanging-drop vapor diffusion set-up with the sparse matrix screening

methodology using commercially available precipitant compounds (McPherson, 1982).

These sample kits contain hundreds of previously successful crystallization

conditions varying salt and polyethylene glycol (PEG) concentrations as well as pH

ranges. This approach relies on an equilibration between a crystallization drop

containing protein solution mixed with precipitant with a more highly concentrated well

solution, causing water molecules to slowly diffuse from the drop to the well forming

. n- I -

_.-n .n oi l 1 EI

~-~; r

crystals. There are many strategies that can be employed when screening crystallization

conditions and it is important to realize that amorphous aggregation does not mean failure

(Bergfors, 2003). Aggregation in these drops, if the protein is not completely denatured,

can be used in an attempt at microseeding by running a cat's hair through it and then

putting these "seeds" into a drop containing lower levels of precipitant (salt, or PEG)

(Fitzgerald and Madson, 1986; Stura and Wilson, 1990).

The idea of this seeding is that typically, nucleation occurs at precipitant

concentrations that are higher than those that are optimal for crystal growth. Since

seeding is not always successful, if a large number of crystallization drops appear to be

precipitating, screening should be further continued using lower concentrations of

protein. These screenings can be attempted both in the cold-room at 4 C or at room

temperature. The reason for this is that thermal fluctuations can have a large effect on

diffusion rates, equilibration, and sample aggregation and in some cases can be the

difference between successful and unsuccessful screening (Dock-Bregeon et al., 1988).

Alternatively, the batch method (Chayen et al., 1990) of crystallization can be

attempted both in-house and at microgravity (DeLucas et al., 1994). This method does

not rely upon equilibration of the protein solution against the precipitate. Instead, the

drop containing the precipitant solution and the protein is under a thin layer of oil. This

method allows a probing of the nucleation conditions necessary for crystal growth and

can be used with protein/precipitant drops of less than 1 al.

High-throughput setup conditions can be tested by sending specimen samples to the

Hauptman-Woodward Institute (Buffalo, NY) where the protein is screened by more than

1500 different crystallization conditions using -600 al of sample (Luft et al., 2001). The

use of this technique in conjunction with a microgravity environment has been shown to

decrease convection within the crystallization drop that can prevent the incorporation of

disoriented molecules into the lattice and lead to larger, less mosaic crystals (Chernov et

al., 2003). It is important to remember that crystallization is only one of several possible

outcomes to molecular association. There is also the probable possibility that nonspecific

aggregation or precipitation of the supersaturated solution will lead to the favorable lower

energy state by a reduction of solute concentration (Weber, 1991).

Crystal nucleation. The distinction between macromolecular crystal nucleation

and growth is deserving of its own discriminatory analysis. Only recently has the

nucleation stage of protein crystallization been regarded as important, and as such,

specific information on nucleation kinetics is not abundant (Garcia-Ruiz, 2003).

Although far from a well-understood process, it has long been known that

macromolecular crystals grow from solutions as growth units freely moving toward and

attaching to an open surface. Still, it is difficult to imagine this scenario at the beginning

of a crystallization event (Feher and Kam. 1985). Although it is not very difficult to

imagine the process occurring, the kinetics associated with this nucleation event are very

difficult to study and to describe.

While controversial, it can be said that the consensus opinion is one where small

clusters of molecules are first formed whose cohesion forces are stronger than those

working to dissolve the cluster (Muhlig et al., 2001). At this point, the formation is still

too small to be called a nucleus, as growth units are still known to leave this site of

clustering (Berry, 1995 PhD thesis). The formation of a critical size defined by the ratio

of surface area to volume of the aggregate is a more clearly defined description of a

nucleation event (Feher and Kam, 1985).

The next step in describing the kinetics of nucleation has been studied using a

variety of experimental techniques, including: optical microscopy, atomic force

microscopy, dynamic light scattering, small-angle X-ray and neutron scattering, and

calorimetry (Galkin and Vekilov, 2000; Yau and Vekilov, 2000; Juarez-Martinez et al.,

2002). From these data, two interpretations have been put forth. One hypothesis follows

the classical nucleation pathway whereby formation of the nucleus occurs by the addition

of either monomers or oligomers but as single growth units (Malkin and McPherson,


Another opinion cites the formation of small, polymeric clusters following the

characteristics of diffusion-limited-aggregation processes. These clusters than rearrange

to form ordered tetrameric formations held together mainly through hydrophobic

interactions (Igarashi et al., 1999). This contention highlights the lack of a truly

understood mechanism for the initiation of the crystallization processes. This area is still

highly studied and it is hoped that a better understanding of crystal nucleation will lead to

insights and detection of protein aggregation in human diseases such as sickle-cell

anemia and cataracts (Galkin et al., 2002; Pande et al., 2001).

Crystal growth. A paradox and difficulty in going from nucleation events to

crystal growth is that the concentration of protein necessary for nucleation appears to be

much greater than that which favors crystal growth. Also, when protein crystallization is

compared to that of non-biological molecules, it is noticed that the kinetic coefficients are

much lower (Chernov et al., 2003). Once a nucleation event has been initiated, the

addition of molecules to the cluster will ultimately determine the overall symmetry of the

lattice produced. Crystal faces will grow until "poisoned" by impurities at the growing

surfaces (Hu et al., 2001). The sheer size of many biomacromolecules dictates a slower

overall velocity of crystal growth and makes the orientation of the molecule a much

bigger factor than for small molecule crystallization.

The growth of crystals relies on a combination of the nature of the growing crystal

surface and the diffusion-limited aggregation of the supersaturated solution. A great

paradox of biochemistry is the observation that the greater the solubility of a protein

solution the greater the number of diffusional associations experienced by the molecules

and the easier crystallization becomes. Likewise, the addition of molecules to the rough

features of a crystal face takes less energy than addition to a smooth area where an

additional nucleation is required. Rough surfaces can be further delineated into kinked

and stepped crystal faces, with growth occurring the quickest along kinked planes where

single molecules may add to the lattice (Malkin et al, 1996). Regardless, of the growth

face of the crystal considered, the likelihood of imperfect protein molecules and

impurities adding to the surface will increase as polymerization continues. This will

ultimately cause severe lattice defects and stop the growth of the lattice (Sato et al.,

1992). Of course, in a closed system, with conditions adjusted for crystal growth the

lattice would grow indefinitely, without defects, mosaicity or a size limit.

The formation of a protein crystal can be likened to the process of protein folding.

In both cases, the molecules form interactions that must be stronger than those working to

dissociate them while striving to reach the lowest overall energy state. This makes de

novo prediction of conditions to grow biomolecular crystals extremely difficult even for

small proteins grown in multiple crystallization conditions (Durbin and Feher, 1990).

These difficulties continue to make crystallization the rate-limiting step in structure

determination by XRC. This has also dictated that random sets of crystallization

screening conditions appear to be more efficient than those predicted by the use of logical

models. For obvious reasons, this does not suggest that in-depth research on

biocrystallization should be abandoned (McPherson, 1991). An understanding of

macromolecular crystallization would shed light on many biological processes such as

enzymatic reactions and self-assembly (Crane, 1950). Nevertheless, once adequate

crystallization conditions have been elucidated, the data collection, processing, and

structure determination can proceed.

X-ray diffraction analysis of macromolecular crystals. X-ray diffraction

methods are indirect techniques used to derive an initial model from electron density

images computed from experimentally measured amplitudes and derived phases. This

model is then iteratively refined to better fit the diffraction data until the differences

between the calculated and observed data are minimized (Chiu et al., 1997). The atoms

comprising macromolecular crystals are covalently bonded to each other at a length of 1-

2 A (. 1-.2 nm) and share other interactions (hydrogen bonds and polar bonds) on the

order of 2.5-3.5 A (Blow, 2002).

In order to "see" these atoms, light of a wavelength of only a few A units must be

used. This is the reason for the use of X-rays in most diffraction experiments. Waves of

this length have both electric and magnetic components and both of these field vectors

will vary sinusoidally as the wave is propagated. Synchrotron sources (Cornell high

energy synchrotron source (CHESS), advanced photon source (APS), Brookhaven

national laboratory (BNL) etc.) and rotating anode generators, the two most common

methods for XRC analysis of macromolecular crystals, typically produce X-rays in the

range of 0.5-1.6 A. The need for specimens arranged periodically in a crystalline array

arises from the need for the amplification of diffraction signal during an experiment.

There are several types of scattering that may occur during XRC analysis.

Coherent scattering is the type of interaction, between radiation and atom, which

contributes to the measured signal from sample scattering (Rossmann and Arnold, 2003).

This occurs when an X-ray photon's magnetic field causes the electron from an atomic

orbital within the molecule comprising the crystal to absorb the X-ray's energy and

vibrate. This electron will then emit an X-ray, of the same wavelength and energy, in a

random direction. This scattered wave is then collected on an X-ray sensitive detector

positioned behind the sample. In a quantum mechanical sense, this process can be

described as the direct elastic scattering of a photon by an electron.

The other major type of scattering that will take place during a diffraction

experiment is incoherent scattering. Typically, this occurs when the X-ray photon,

instead of being absorbed, causes transitions within the atom leading to the emission of

X-rays of lower energy. This is the main cause of radiation damage to biological crystals

by causing ionization of side chains and an overall decrease in the intensity of scattering

(Sliz et al., 2003). Surprisingly, a very small percentage of X-ray radiation is actually

scattered coherently making data collection an inefficient but tractable procedure.

The diffraction limit of most biomolecular crystals is on the order of 1-2 A,

although there are exceptions (Duda et al., 2000; Teeter et al., 1993; Rypniewski et al.,

2001). Crystals of inorganic molecules will typically reach resolutions much higher than

there biomolecular counterparts, allowing the building of extremely accurate atomic

models. The interaction energy between macromolecules and small molecules is on the

same absolute scale: about ten hydrogen bonds, four salt bridges, or 400 A2 hydrophobic

buried surface (Haas and Drenth, 1995). The difference lies in the sheer size difference

of proteins and macromolecular complexes, making them more sensitive to distorting

forces. Also, because acquisition of highly-purified sample is much more difficult with

biomolecules, the incorporation of impurities within the crystal lattice is far more

probable than for small molecules (Caylor et al., 1999). The orientations of the

macromolecules within a crystal may also be slightly different because interaction

energies will not change greatly between biomolecules (Hwa and Frey, 1991). The

dynamic properties of amino acids will also lead to flexible regions of the molecule

adversely affecting the resolution limit of the diffraction pattern (Reutzel et al., 2004).

An analysis of the crystal lattice reveals the presence of a repeating unit, known as

the unit cell consisting of the basis vectors a, b and c and angles ca, 3, and y (Haas and

Drenth, 1995). In an ideal crystal, the contents of every unit cell are identical in three

dimensional space, have the smallest possible volume, and form equivalent contacts

throughout the lattice (Blow, 2002). The symmetry operations that exist within the crystal

lattice constructed by the macromolecule or complex of interest can be defined as an

operation, which when applied, results in an identical structure to the original and can

consist of both rotational and translational components. This is referred to as the Laue

symmetry of the crystal and this symmetry is reflected in the intensities of the diffraction

pattern. In small molecule crystallography, both inversion and mirror symmetry elements

can also be a part of the unit cell transformations in lattice construction. Due to the

chirality of amino acids, protein, and sugars biomolecular crystals lack these symmetry

elements. The lattice rotational operators are angles of 60, 90, 120, 180, and 3600

corresponding to 6, 4, 3, 2, and 1-fold (identity) rotational axes of symmetry. Crystal

symmetry can also contain screw axes, a combination of a rotation and translation within

the unit cell. Combinations of these different symmetry operations give the crystal its

space group, consisting of rotational only or rotational and translational components.

Mathematical theory has proven that 230 different combinations of symmetry elements

can occur. Biological macromolecules can crystallize in one of 65 enantiomorphic space

groups (Hahn, 1995). Within the crystal there may also exist a rotational symmetry

element not specified by the crystallographic operators. This is defined as non-

crystallographic symmetry (NCS) and is exemplified in the crystal structures of spherical

virus particles. These operators can be used for averaging of data points once an initial

structural model has been built, and may aid in refinement of large biological complexes.

Crystal Twinning

As the speed and scope of macromolecular structure determination by XRC

increases, the need for high-quality, suitable crystals has become even more important

(Chandra et al., 1999). There are many complications that can arise even when crystals

have been successfully grown. These factors, such as high mosaic spread and lack of

large crystal size, pale in comparison to the phenomenon of crystal twinning as an

obstacle in structure determination. There are several types and causes of

macromolecular crystal twinning. The hallmark of twinning is the detection of two or

more crystalline domains, each at different orientations, in the same crystal specimen.

This will cause their diffraction lattices to overlap either completely or partially (Redinbo

and Yeates, 1993; Yeates and Fam, 1999). The operation that will align the crystal axes

of each domain when applied, is called the twinning operator, and its solution is essential

to detwinning the data set.

If it is only possible to align two crystal axes, the twinning is referred to as epitaxial

or non-merohedral. In this case, two interpenetrating lattices can be observed within the

diffraction pattern and it is usually possible to integrate only one's unique data (Liang et

al., 1996). The case where the diffraction spots, and therefore the lattices, completely

overlap is particularly difficult to diagnose because the defect is not obvious from the

diffraction pattern. This type of twinning is known as merohedral twinning. This can be

further divided into two classes. Class I, where the twin domains each have a different

volume, and thus contribute unevenly to the diffraction pattern, is referred to as partial

merohedral twinning. In class II merohedral twinning each twin domain has very close to

the same volume and each domain will contribute the same amount to the diffraction

spots. This is known as "perfect" twinning and will cause the twin lattice symmetry to be

greater than the true Laue symmetry. The twin operator, in this case, has added an extra

relationship in the lattice symmetry that is not part of the Laue symmetry of the single

crystal (Yeates, 1997).

The most important aspect of twinning is in the detection. Without this crucial

piece of information, it is not uncommon for a dataset to remain unsolved for years and

some experts contend that a large percentage of failed structure solutions can be

attributed to undetected merohedral twinning (Yeates, 1997). Even with the plethora of

detection methods that now exist, the best approach is to find an untwinned crystal if at

all possible. The methods of twin detection rely largely on the intensity statistics of the

collected data sets, although the first procedure described here does not (Parsons, 2003).

The first, and easiest detection method is to look at the packing density within the

asymmetric unit. If the volume of the crystal is too small to accommodate the molecule,

there is a good chance that the crystal is perfectly merohedrally twinned and the space

group has been assigned incorrectly (Chandra et al., 1999). When analyzing the intensity

statistics of an untwinned crystal, the centric and acentric intensities and amplitudes

follow a predictable pattern, which breaks down in the case of merohedral twinning

(Wilson, 1949). In the twinned case, the value of each measured intensity is:

Iobs(hi) = (1-a)/I(hi) + al(h2) (2-1)

Iobs(h2) = al(hi) + (1-a)/I(h2) (2-2)

where, lobs denotes the observed intensity of measured reflection, a is the twin fraction

of the crystal, and hi and h2 are the contributions of each twin domain to the observed

measurement. Similar techniques, such as the Britton plot (Britton, 1972) can be used to

derive the value of a and once this has been achieved, structure refinement can be carried

out in the normal manner.

Diffraction Data Collection and Data Processing

Diffraction from a macromolecular crystal is best thought of as coherently scattered

waves from thousands of planes that dissect each unit cell in an equivalent manner. It is

obvious that the unit cell of a macromolecular crystal consists of many electrons, and that

each plane of the crystal contains information about all the electrons comprising the unit

cell. It is important to realize that since crystals are three-dimensional their diffraction

patterns are also. Since the detection of scattered X-rays takes place on a two-

dimensional image plate or CCD, it is not hard to understand that many images must be

recorded, with the crystal oscillated slightly for each, in order to fully sample diffraction

space (Chiu et al., 1997). Also, because of the lattice function of the crystal, the

diffraction pattern is discontinuous. Millions of molecules of the protein or complex

contribute to the coherent scattering and the spacing between the points sampled in the

pattern will be inversely related to the spacing between lattice points in the crystal (Chiu

et al., 1997). Consequently, analysis of the spacing between intensity peaks reveals

information about the overall space group symmetry and unit cell dimensions.

The diffraction pattern of the crystal represents reciprocal space and indices (h, k, 1)

are assigned to each spot corresponding to the plane within the crystal from which they

were scattered. The number of electrons effectively contributing to the scattering is

known as the structure factor (F) because it is dependent upon the atomic distribution in

the unit cell and scattering direction (Rossmann and Arnold, 2003). This sum will also

take into account the phase differences that exist between the scattered waves. For n

atoms within the unit cell,
n n
F(S) = j cos (2 7 rj x S) + i tj sin (2 7 j x S) (2-3)
j=1 j=1
where r represents the vector between atoms, and S is a vector perpendicular to the plane

reflecting the incident beam at an angle 0, and S=S(hkl) (Drenth, 1999). Atomic

scattering factors, are proportional to the number of electrons within the atom and are

expressed in terms of the scattering by a single electron. This value decreases with

increase scattering angle (sin 0/k, resolution). This scattering can be thought of as a

constructive interference pattern of the X-rays from planes within the crystal. The nature

of this interference was first described by Sir W.H. Bragg and his son Sir W.L. Bragg in

1913 and earned them the Nobel Prize in physics in 1915. The diffraction conditions,

called Bragg's Law, refers to the simple equation:

nk = 2 d sin 0 (2-4)

where n is any integer, X is the wavelength, d is the distance between scattering planes,

and 0 is the scattering angle. A diagrammatic description of Bragg's formalism is shown

in Figure 2-3. This dictates that constructive scattering will only occur when nk = 2d sin

0, where X is the wavelength of the incident X-ray, d is the distance between the

scattering planes, and 0 is the scattering angle. When the conditions of Bragg's law are

fulfilled, all unit cells will scatter in phase and the amplitude of the wave scattered by the

crystal will be proportional to the structure factor amplitude F. As stated earlier, every

point in the diffraction pattern represents the amplitude of a scattered wave from a set of

planes that intersects each unit cell within the crystal equivalently. Likewise, information

for every atom within the cell is present in the measured spot intensity.

Despite being able to measure the intensity of each scattered wave from the crystal

to very high accuracy, there is still a crucially important piece of information missing. In

order to describe a wave function fully, both the amplitude and the phase (c) must be

used in calculations. The information collected during a scattering experiment represents

the intensity (I) of the scattered wave.

The amplitude can be indirectly calculated from this and it is found that F a Ui. The

phase information is lost because unlike an electron microscope, where diffracted

electrons can be refocused to recreate an image of the specimen, it is not possible to

adequately refocus the scattered X-rays (Chiu et al., 1997). This missing phase

information must be found either by structure determination by molecular replacement or

by the heavy-atom techniques isomorphous replacement (SIR, MIR) and anomalous

dispersion (SAD, MAD) or a combination of the two (SIRAS, MIRAS).

Lattice planes are d
reflecting planes

2d sin 0

Figure 2-3. Bragg's Formalism for Constructive Interference From Reflecting Planes in
a Crystal Lattice. The Bragg conditions of diffraction state that an X-ray
will be constructively reflected when nk = 2d sin 0.

From a practical standpoint, the best orientation of the crystal with regards to the

observed diffraction pattern is first found by collecting images at 0, 90, 180, and 2700

rotations. This aids the quality of data collected and should be a normal procedure to

make sure that apparent low-quality diffraction is not due merely to poor crystal

screening. Depending on the initial measurement of mosaic spread, an oscillation angle

greater than the mosaicity should be used to optimize the amount of full reflections

recorded. Alternatively, thin slicing of the data (between .1-.5 oscillation angle) can be

used to collect more accurate partial reflections and increase the separation between

diffraction spots (Pflugrath et al., 1999).

Mosaic spread is a measurement of how well the unit cells within a crystal have

packed together to form the crystal morphology. In macromolecular crystals, lattice

packing is usually not perfect leading to a measurable value for mosaic spread. The

coherence of the X-ray beam also plays a role in the degree of mosaic spread of a

biological crystal, and images collected at synchrotron sources will have a lower

mosaicity because of this (Rossmann and Arnold, 2003).

Regardless of the type of crystal being tested, the space group symmetry of the

lattice dictates the amount of oscillation data that should be collected. According to

Friedel's law, 1800 of data is the most that will theoretically have to be collected if no

other symmetry exits within the crystal system. This is because, assuming the

contribution from anomalous scattering is negligible, F(hkl) = F(-h-k-l). Higher

symmetry space groups allow less data to be collected because reciprocal space can be

adequately sampled with a smaller amount of measurements. This is simply due to the

occurrence of more equivalent reflections due to lattice symmetry.

Once enough data has been collected to adequately sample reciprocal space (95-

100% of possible reflections measured), the initial diffraction images can be used for

space group determination, followed by processing of the complete set of diffraction

images, reduction of data to only unique reflections and scaling of the data for intensity

normalization and calculation of Rsym. The Rsym value measures how well the collected

data fits the imposed space group symmetry elements. The collection of high quality data

is of utmost importance for structure determination as well as refinement of the

macromolecular model. The calculation of the Rfactor and Rfree values (two determinants

of crystal structure quality) will all be dependent upon the initial measurements of the



A noted effect of exposure of biomolecular crystals to X-rays is the production of

free radicals and ionization of amino acid side chains that ultimately leads to degradation

of diffraction due to radiation damage. There are many consequences to this, the most

common being an increase in unit cell dimensions and mosaic spread of the measured

reflections. These effects can be overcome by cooling macromolecular crystals to liquid

N2 or He temperatures and has now become a common procedure (Parkin and Hope,

1998). In fact, synchrotron data collection at the highest intensity beamlines would not

be possible without the use of a cryojet.

Before attempting cryogenic data collection, a suitable cryoprotectant solution must

be found. Typically this involves the addition of up to 30 % glycerol to the mother liquor

of the crystal in question. Other methods involve the use of polyethylene glycol (PEG)

additives and in some cases very high concentrations of salts (1-2M) (Petsko, 1975;

Hope, 1988). The reason for taking these measures is to protect the crystal against ice

formation and radiation damage while preserving the highest resolution diffraction.

Cryoprotectants prevent the formation of crystalline ice on the specimen that will result

in large diffraction rings and crystal damage (Garman and Mitchell, 1996). Since its

inception in 1985, cryocrystallography has become an integral part of data collection and

will continue to be so with the use of newer more intense synchrotron radiation data

collection facilities.

Derivatization of Macromolecular Crystals for Heavy Atom Structure

For proteins that share no sequence homology with any known folds, crystals must

be derivatized with atoms having a larger number of electrons than the normal carbon,

hydrogen, nitrogen or oxygen making up biomolecules. The need for incorporating

heavy atoms into the crystal lattice is to use there atomic positions for phase information.

This is possible because heavy atoms have much higher atomic weights than the atoms

typically comprising a normal protein (C, N, O, H, S) making them easy to locate by

Patterson search methods.

These crystals should be soaked in -1-10 mM heavy atom solutions containing

metal salts such as gold, silver, mercury, platinum, samarium, uranium etc. or -1M

solutions of bromine, or iodine salts (Green et al., 1954; Dauter et al., 1999). Metals such

as mercury (Hg) can be used because they are known to react with free sulfhydryl groups

present at non-disulfide bonded cysteine residues (Petsko et al., 1978). The reason for

the soaking of a heavy atom into the crystal lattice is for phasing by isomorphous or

anomalous dispersion techniques (Green et al., 1954; Hendrickson et al., 1991). If there

is no homology model for molecular replacement phasing (Rossmann, 1990) the location

of heavy atoms within the lattice can be used for initial protein phases.

For crystals containing heavy atoms, great care must be taken to collect data of

high redundancy >20 and completeness >99%. This is because these techniques require

highly accurate data and as much sampling of reciprocal space as possible. Molecular

biological techniques such as engineering proteins into selenomethionine derivatives for

use in anomalous dispersion experiments has quickly become the preferred method of

novel fold structure determination (Hendrickson et al., 1991). If an anomalous dispersion

experiment is to be attempted, a synchrotron beamline with a fluorescence detector and

adjustable wavelength X-ray source needs to be used for tuning to the absorption edge of

the incorporated metals. The anomalous signal is created because, unlike normal atomic

scattering, the energy and phase of the partially absorbed X-ray photon now has an

additional imaginary component. It is this small difference in scattering (5-8%) that is

used for structure determination by this technique. In general, an exposure time that

allows high-resolution data collection and the fewest overloaded pixels will be used.

For anomalous dispersion, the inverse-beam protocol should be used to measure

Friedel mates as close together in the data collection as possible. Friedel mates are those

reflections 180 from each other (h k 1, -h -k -1) and it is the intensity difference between

these reflections that is used to locate the heavy atoms in anomalous dispersion


Structure Determination by Isomorphous Replacement, Anomalous Dispersion and
Molecular Replacement

Novel structure determination by isomorphous replacement was first utilized by

Perutz's group in 1954 (Green et al., 1954). It exploits the fact that the scattering factor

of metals such as gold, silver, and mercury are much larger than those of the atoms

making up the protein crystal (C, N, 0, and S). This is because the number of electrons

contributing to the diffraction is much larger for metal atoms. Utilizing the difference in

scattering ability, the positions of the heavy atoms within the crystal lattice can be found

with only the intensity data using Patterson search techniques (Rossmann and Arnold,


For Patterson techniques, it is assumed that the difference in the structure factors of

a native protein with an incorporated heavy atom and those of the native protein are

caused by the heavy atom derivatization alone. In other words, the unit cells of the native

protein and derivative crystals are identical (isomorphous) and there is no difference in

the folding or relative orientation of the protein molecule between data sets. This allows

the generalization that FPH-FH=FP. So, by calculating the difference of the structure

factors for reflections with large changes between the heavy atom derivative and the

native protein, the locations of heavy atoms can be elucidated. This is typically done by

calculating Patterson maps, which give vector lengths that correspond to the distances

between atoms within the unit cell.

The peaks at these vector lengths are the product of the scattering factors for each

atom lying at the head and tail of the vector. It is obvious that vectors corresponding to

distances between heavy atoms will yield much higher peaks than those corresponding to

vectors between the lighter atoms within the protein. Once a Patterson map is solved for

the location of the heavy atom, single isomorphous replacement (SIR) phases can be

calculated (Rossman, 1960) using Fourier syntheses. These SIR phases are not enough to

solve the phase ambiguity, however, since the phase distribution is bi-modal. These

phases can be used as coefficients in difference Fourier syntheses with amplitude data

from a second derivative to locate these sites (Blow and Rossman, 1961). These new

phases are then used to break the phase ambiguity and calculate multiple isomorphous

replacement phases (MIR) that can be further improved with more derivatives to finally

yield an interpretable electron density map.

Anomalous dispersion techniques such as MAD (multiple wavelength anomalous

dispersion) and SAD (single wavelength anomalous dispersion) can be attempted in

much the same fashion as MIR (Hendrickson et al., 1991). The MAD technique relies on

the difference in the scattering factor (f) of a particular heavy atom when the wavelength

of X-ray is changed. The difference between isomorphous replacement and anomalous

dispersion lies in the fact that only a single crystal is being used in anomalous dispersion

and the differences input into Patterson maps are those caused by differences inscattering

of a heavy atom around its absorption edge. The reason for this difference in intensity is

because of the absorption of energy by the anomalously scattering atom that corresponds

to a retardation in phase of Friedel reflections (Bijvoet, 1954). The difference in the

structure factors of these reflections is the anomalous contribution of the heavy atom to

the scattering factor. This allows the calculation of an anomalous difference Patterson

map using the coefficients Fh k F-h k -1 to give the interatomic distances of anomalously

scattering atoms.

There are many programs designed to solve Patterson functions for the location of

heavy atoms. Many of these algorithms are slightly different and it is difficult to know

those that will work best on a case-by-case basis. The normalization of the native and

derivative data, to put intensity measurements on a consistent scale, is extremely

important in preparing data for Patterson maps. The programs FHSCAL and the scaling

utility from Crystallography and Nuclear Magnetic Resonance Systems CNS both do

these data normalization calculations and should be used before Patterson maps are

calculated (Brunger et al., 1998; Kraut et al., 1962; Wilson, 1949).

The programs CNS, CCP4 (Collaborative Computational Project, Number 4, 1994)

SHELXS, SnB and/or SOLVE have been developed for automated Patterson map

interpretation and heavy-atom location to calculate isomorphous and anomalous

difference Patterson maps (Schneider and Sheldrick, 2002; Miller et al., 1994; Brunger et

al., 1998; Terwilliger and Berendzen, 1999). Once an interpretable electron density map

has been calculated, manual model building can be conducted in graphics programs such

as O, COOT or Xtalview (Jones, 1978; McRee, 1999; Emsley and Cowtan, 2004).

Molecular replacement (MR) exploits the similarity of the tertiary fold of many

proteins and macromolecular complexes for structure determination. A clue to the degree

of structural homology that exists between two protein molecules is a pair-wise BLAST

search. Generally, if two proteins share -25% sequence identity their three-dimensional

structures will be superimposable to within a few Angstrom units. The proof of this

principle lies in the early roots of protein crystallographic analysis. Sperm whale

myoglobin shares a large degree of identity with horse deoxyhemoglobin and, likewise,

with human hemoglobin. In hindsight, utilization of this information would have allowed

the structure determination of both hemoglobin forms using myoglobin in a

straightforward and less tedious manner (Kendrew et al., 1958; Perutz, 1968).

Structure determination using MR consists of two separate operations, a rotation

search and a translation operation. Both operations rely upon the Patterson function to

derive a structure solution. The Patterson function records the interatomic distances

within a molecule and is not dependent upon any choice of origin. It is this property of

the Patterson function that allows its use in the rotational search procedure (Rossmann,

1990). Essentially, the Patterson functions of both a molecule of unknown and known

structure are compared and the best fit of intramolecular vectors is found by

superimposing the vectors from the functions of both molecules while simultaneously

keeping the origin fixed. This will allow the orientation of the molecule of unknown

structure to be found.

Once the orientation of the new molecule has been determined, the origin of the

molecule must be found. This can be achieved by placing the origin of the molecule of

known structure in the determined orientation at points within the unit cell and using the

calculated Patterson function, comparing it to the Patterson function of the unknown

molecule at these intervals (Blow, 2002; Rossmann, 1990). Consideration of the unit cell

packing must be taken into account in this step. It is important to understand that since

both the rotation and translation functions utilize Patterson search techniques, there is no

phase information utilized in this structure determination procedure. There are many

programs with slightly different algorithms that have been developed to implement this

process such as CNS, and CCP4 and it has become the fastest and most direct method of

macromolecular structure determination (Brunger et al., 1998; Navaza, 1994).

Model Refinement, Validation and Interpretation

Once an initial model has been constructed, iterative refinement can be carried out

either with the CNS package using rigid-body, simulated annealing, B-factor refinement

and conjugate-gradient energy minimization or with the CCP4 program Refmac5 which

does a similar refinement but also allows the option of TLS (translation, liberation, and

screw-rotation) refinement of individual protein subdomains (Murshudov et al., 1997).

These algorithms will refine the model based upon accepted bond lengths and angles that

have been found in all proteins solved to date and will continually check the measured

F's to the calculated F's from the model. The goal of model refinement, whether by

using least-squares or maximum likelihood statistical treatment, is to improve the

agreement between the observed and calculated data.

Rigid body refinement involves the movement of a protein molecule or domain to

better fit the electron density, but does not take into account the conformation of the

amino acid side-chains. This is generally a preliminary step in refinement, and facilitates

a finer treatment of the model in subsequent steps.

Simulated annealing utilizes a heating simulation on the molecule and is useful for

exploring a greater volume of parameter space to find better models and break out of the

multiple shallow energy minima that exist (Rossmann and Arnold, 2003; Brunger et al.,

1987; Kirkpatrick et al., 1983). Individual B-factor refinement refines each atom in the

model's B-factor allowing for a better measurement of the movement and dynamics of

the molecule. This, combined with conjugate-gradient energy minimization, will apply

restraints on bond lengths and angles to bring the molecule to the most energetically

favorable conformation. Once these steps have been finished maps are calculated using

these improved phases and the degree of agreement with the model is examined manually

and adjusted accordingly. These refinement steps are part of an iterative refinement loop

and are repeated several times to attain the best model.

Structure validation of the refined model is an important final step in structure

determination and using a program such as PROCHECK is a good idea (Laskowski et al.,

1993). Validation is used to make sure that bond lengths and angles are in accordance

with acceptable values. Once validated, the electron density can be interpreted to

elucidate the biological implications the structure contains. Once structure refinement

and validation is complete, final models are submitted to the PDB (Protein Data Bank)

for public knowledge (Figure 2-2).

Cryo-Electron Microscopy Structure Determination

Sample Preparation

Cryo-electron microscopy (cryoEM), as its name implies, involves the collection of

transmission electron microscopy data at temperatures low enough to discourage ice

crystal formation (Thuman-Commike and Chiu, 2000). The technique is currently

growing at a rapid rate as the steps of data collection and structure determination become

more automated. This high-throughput approach, similar to what XRC has adopted, has

translated to an increase in the number of depositions to the PDB for cryoEM models

(Jiang and Ludtke, 2005)(Figure 2-4). A flow chart of the steps involved in the process is

depicted in Figure 2-5.

Unlike negative-stain EM, cryoEM typically does not utilize electron dense

compounds, such as uranyl acetate, to create the contrast necessary for particle

visualization (Figure 2-5C) (Baker et al., 1996). Instead the EM grid is rapidly plunged

in liquid ethane to allow preservation of the specimen's three-dimensional structure in

vitreous ice (Dubochet et al., 1982). This offers several advantages such as avoiding

specimen drying, allowing data collection in solution conditions, and the ability to

observe biochemical reactions, such as virus particle maturation, under native conditions

(Steven et al., 2005; Rader et al., 2005). This also means that the technique relies on the

contrast derived almost purely from the defocus of the microscope, as the solution the

sample is frozen in is usually comprised of the same atoms (C, H, N, and 0) as the

specimen itself (Figure 2-5C) (Jiang and Ludtke, 2005).

Specimen preparation and freezing is far from a trivial process, and great care must

be taken to avoid thick ice formation, and particle aggregation so that the highest quality

data can be collected. Once a sample-specific freezing protocol has been developed, data

collection can proceed (Figure 2-5).

Data Collection and Digitization

Once micrograph grids have been successfully frozen, data collection can take

place on either a liquid nitrogen or helium cooled microscope (Orlova and Saibil, 2004).

It is important for the investigator to set a goal for the structure analysis. If the overall

morphology of a virus or other macromolecular complex is needed, a resolution of -20 A

should suffice. On the other hand, if the localization of individual domains of proteins is

the goal of the study, a resolution greater than 10 A will be necessary (Bottcher et al.,

1997; Conway et al., 1997; Trus et al., 1997; Thuman-Commike and Chiu, 2000). The

resolution desired for the final reconstruction will dictate aspects of the data collection

procedure such as defocus values used and number of micrographs and particles boxed.






0 "T" -M
1980- 1985- 1990- 1995- 2000-
1984 1989 1994 1999 2004

Figure 2-4. Growth of CryoEM Depositions to the PDB. The growth of cryoEM can be
be measured by the number of macromolecular models are being deposited
into the PDB on a yearly basis. The trend of the graph suggests that many
model will be deposited in the future. Adapted from Jiang and Ludtke,

While liquid nitrogen cooling combined with a highly coherent electron beam from

a field emission gun (FEG) has produced some very desirable results, liquid helium

cooled specimen stages have proven more effective at minimizing particle movement,

and have allowed the use of lower dosages of electrons, greatly decreasing the amount of

radiation damage through imaging (Hewat and Neumann, 2002; Orlova and Saibil, 2004).

Typically, low magnification is used because biological samples can only withstand an

electron dose of 10-15 electrons/A2 without incurring serious radiation damage. For

high-resolution data acquisition this corresponds to a magnification of 50-80,000. Even

with this supposed "low" dose significant damage can still evolve.

Data Collection

Projections and Class Averaging 3D Reconstruction

Figure 2-5. Flowchart of Steps Taken During Structure Determination by CryoEM. (A)
The data must first be digitized, and the particles selected. Once this is done
the experimenter must decide if contrast transfer function (CTF) correction
will be instituted and a preliminary model must be generated for (B)
refinement. The blue box illustrates the iterative nature of refinement
particularly in particle classification and alignment. The loop stops when
the structure determination converges. (C) Real data used to illustrate data
collection, projection classification and class averaging and 3D

The defocus values utilized for the data collection process will determine the

amount of contrast present in the image and the resolution extent. Close-to-focus

micrographs will preserve the high-resolution data present in the sample but will also

affect the experimenter's ability to adequately discern the particles. The defocus level is

also the largest determinant of the contrast transfer function (CTF) correction which will

be discussed in a further section (Ludtke et al., 1999). Because of the need to visually

identify the particles present on micrographs, the technique of focal pair data collection

was developed (Zhou et al., 1998; Fuller, 1987). This technique takes both a close and

far-from-focus image of a section of the microscope grid so the determination of the

particle coordinates on the close-to-focus grid can be taken from the particles boxed on

the far-from-focus grid.

The method of data acquisition, either by charge coupled device (CCD) or

photographic film, must be chosen before the experimental setup depending on what

level of detail is necessary for the structural analysis. While more sensitive, photographic

film requires scanning of the micrographs to get them into electronic form (Sander et al.,

2005). This requires a great deal of time and effort on the part of the experimenter and

the necessity of the process should be determined based on the level of structural detail

required to answer the biological question. The quality of the micrographs used in the

structure analysis is also crucial. Micrographs with excessive charging, ice

contamination, and astigmatism must be discarded (Thuman-Commike and Chiu, 2001).

An initial cursory visual inspection of micrographs should be done followed by

quantitative assessment of the defocus parameters by optical diffraction. Scanning must

also be done with a step size equal to V2 the desired resolution (Ludtke, 1999). Once in

electronic form, computer image processing can begin.

Particle selection. The next step in single particle structure determination is

referred to as particle "boxing" where hundreds to thousands of particle views are picked

from several to many micrographs (Thuman-Commike and Chiu, 2000). The goal of the

particle selection process is to only choose those views that appear to be lacking

specimen damage and contamination and that represent as many novel two-dimensional

orientations as possible. This can be the most intensive and time-consuming aspect of

single particle analysis, especially if micrographs are picked by eye.

Several attempts have been made to develop automatic particle selection methods,

with mixed results (van Heel, 1981; Frank and Wagenknecht, 1984; Saad et al., 1998).

These protocols have enjoyed some success with particles possessing high symmetry

such as spherical viruses, but are not as robust when applied to less spherical and

symmetric objects. After particles have been selected an initial 3D model must be

attained either from the selected particles or an existing model. Then the particles must

be aligned so that all particles are positioned similarly. This is most often done by

comparing a reference against each particle image to force the particle into a position that

resembles the reference (Frank and Radermacher, 1992). Great care must be taken both

to discard any particles that seem damaged, and to assess the average power spectrum of

groups of boxed particles, for determination of actual defocus levels and clear indications

of CTF zeros (Zhou, 1996).

Contrast transfer function and envelope correction. The contrast transfer

function (CTF) is a phenomenon in EM that distorts the generated images. This makes

them less than "true" projections of the specimen being imaged (Ludtke et al., 1999).

This distortion is caused by the objective-lens focal setting and from the spherical

aberration present in all electromagnetic lenses. It varies with the accelerating voltage

and defocus level used during data collection and, if not corrected, will cause severe mass

displacements and phase errors (Erickson and Klug, 1970; Hanszen, 1971). This

amplitude contrast occurs whenever an object scatters or absorbs illumination resulting in

partial loss of the incident beam (Rossmann and Arnold, 2003). For biological specimens,

depending on their mass, the amplitude contrast generally has a value between 5 and 15

%. A mathematical description of the CTF is given by

CTF(v) = {(-F2amp)1/2 sin [x(v)] + Famp cos [x(v)]} (2-5)

where X(v) = rXv2 (Af- 0.5 Csk2v2), v is the spatial frequency (in A-1), Famp is the

fraction of amplitude contrast, X is the electron wavelength (in A), where X = 12.3/(V +

0.000000978V2)1/2 which means 0.037, 0.025, and 0.020 A for 100, 200 and 300 keV

electrons, respectively), V is the voltage (in volts), Af is the underfocus (in A) and Cs is

the spherical aberration of the objective lens of the microscope (in A) (Rossmann and

Arnold, 2003; Baker et al., 1999). The reader should notice the overall dependence of the

CTF on resolution, wavelength, defocus, and spherical aberration.

The CTF is further affected by both the envelope function and the noise level of the

micrographs. The envelope function is mainly affected by the electron beam coherence,

specimen drift, and astigmatism (Zhou et al., 1996; Ludtke et al., 1999). A mathematical

description of the measured data (M) is most accurately described as a conglomeration of

these aforementioned factors:

M (s, 0, 4) = C (s)E(s)F(s, 0, 4) + N(s, 0, 4) (2-6)

where, C(s) is the CTF, E(s) is the envelope function, and N(s, 0, )) represents random

noise. Because of the dependence of the CTF on the defocus level of the micrograph and

the ability to accurately measure and model the noise level, it is possible to correct the

data for these attenuating parameters and get a 3D reconstruction that is close to a true

depiction of the single particle being imaged (Ludtke et al., 1999; Saad et al., 2001).

Also, by utilizing either a true one dimensional structure factor of the specimen obtained

by small angle X-ray scattering (SAXS), or an "artificial" structure factor from several

micrographs at different defocus levels, the CTF and noise for each micrograph can be

corrected for by plugging the value F(s, 0, )) into equation 2-6 (Ludtke et al., 1999).

Once these parameters are correctly determined they can be applied to the data during the

reconstruction process during alignment of 2D images to generate class averages (Saad et

al., 2001; Ludtke et al., 1999). Additionally, better data is now attainable by using FEG

electron sources rather than tungsten filament sources (Mancini et al., 1997; Zhou and

Chiu, 1993; Zemlin, 1994; Rossmann and Arnold; 2003). These sources have the

advantage of insuring high phase contrast high resolution even in highly defocused

images, higher resolution, better beam penetration, and less deleterious effects to

specimen, such as charging.

Initial three-dimensional model acquisition. For obtaining symmetry

information about the particle of interest, negative-stain EM can be utilized as a high-

contrast method to detect symmetry based on reciprocal space or common-line methods

(Rossmann and Tao, 1999; Ludtke et al., 1999; DeRosier et al., 1969). Also, particles

from far-from-focus micrographs can be utilized to determine symmetry information as

well as biochemical data regarding size, shape, and stoichiometry of macromolecular

specimens. The algorithms used for single particle analysis are now becoming

sophisticated enough to overcome many deficiencies in the initial model. However, the

accuracy of this model will dictate the amount of time necessary for refinement to

converge (Ludtke et al., 1999).

Model refinement. Once particles have been boxed from many micrographs,

aligned similarly, and corrected for CTF artifacts several strategies exist for continuing

on in the reconstruction process. The initial 3D model is first used to generate a series of

projections at a specified angular increment. The relative orientation of each particle

must be determined either with respect to a reference orientation of the original three-

dimensional object from selected particle data and obtained symmetry information, or

from the model of a homologous protein or macromolecular assembly (Baker and Cheng,

1996; Ludtke et al., 1999).

Once each particle is aligned to each reference image, a dot product is computed in

order to calculate the goodness of fit of the particle to the projection. Each particle is

then assigned an orientation depending on the fit with each model projection. The

particles in each orientation are then averaged together to create class averages for use in

the next cycle of refinement (Ludtke et al., 1999). This creation of class averages and

their use in iterative refinement is carried out in the program EMAN (electron micrograph

analysis). Once particle orientations have been obtained, classification of different

particles based on orientation will proceed. After computation of class-averages, the final

step of 3D structure computation is undertaken.

Once each particle is assigned to a class average, the particles are then averaged

and re-assigned a set of euler angles (orientation). The next step is to combine all of the

individual particle data to compute the three-dimensional reconstruction using a method

referred to as weighted back-projection. In this method, the real-space 3D density is

calculated from a series of two-dimensional Fourier space images using a series of

transformations (Thuman-Commike, 2001; Zampighi et al., 2005).

This reconstruction is performed to a specified resolution and in an iterative

process, the model is refined to better fit the particle data. The accuracy and quality of

the reconstruction relies largely upon a great number of different particle orientations and

the accuracy of the alignment, and can easily be biased by the type of reference image

used. An assessment of the resolution of the 3D density map should then be calculated.

Typically, this is done by splitting the data used for the reconstruction into two sets (even

and odd numbered particles) and performing two independent structure determinations on

them. Fourier shell correlation (FSC) can then be used to assess the resolution and

quality of the model:

FSC Y(F F2*) (2-7)

(IF;I)2. (IFI)2

where F and F2 are the structure factors for the two maps, IF 1 and IF21 are the structure

amplitudes, F2* is the complex conjugate of F2, and the sums are taken separately over

all structure factors within a given resolution shell (Baker et al., 1999).

Overcoming the difficulties of single particle analysis through advances in

algorithms and basic equipment used has allowed the high-resolution (>10 A)

determination of many types and sizes of macromolecular complex. In the past year, the

extension of the technique to studies of both extremely large (>6000A diameter) and

small (<300 kDa) samples has been achieved (Xiao et al., 2005; Cheng et al., 2004).

While no formal method of model validation exists for the judgement of model

accuracy, subjective methods do exist for determining resolution and model precision.

The observation of secondary-structural elements such as helices and strands at

resolutions of 8-12 and 6-7 A, respectively, is a good indication of structure quality and

reliability. While these resolution estimation methods are subjective, they are being

perfected and tested to increase their effectiveness. In the near future it is expected that

the tracing of a complete polypeptide chain by the cryoEM technique will be possible,

and with it will come a new era of structural biology.

Divide and Conquer: The Marriage of X-Ray Crystallography and Cryo-Electron

As mentioned earlier, the structural characterization of macromolecules and

macromolecular complexes is a useful component in understanding biological processes.

While traditionally XRC has been the method of choice for high-resolution structural data

acquisition, cryoEM has recently been used to attain close to this level of detail (Ludtke

et al., 2004; Yonekura et al., 2003). Even with these advances, a great deal of computing

power and intensive work is necessary to achieve results like these. With this in mind,

the "divide and conquer" approach, whereby a cryoEM model of a large macromolecular

complex is first generated at low to medium resolution followed by fitting of crystal

structures of the individual constituent molecules, has proven particularly fruitful (Baker

and Johnson, 1996; Rossmann et al., 2001). The fitting can also be done using a

homology model of a protein in the complex. This fitting technique has the advantage of

artificially increasing the resolution of the single particle reconstruction and has been


successful in many cases (Yonekura et al., 2003; Rossmann et al., 2005; Fotin et al.,

2004a,b). There are also automated programs used to search electron density maps for

secondary structural elements now in use. As the study of larger, more complex

macromolecular assemblages increases in the future, the prevalence of this technique will

increase as it becomes essential for structural biology.



Actin is a ubiquitously expressed protein of 375 amino acids encoded by a large,

highly conserved gene family. Some single cell organisms like yeast have a single actin

gene whereas humans have six actin genes and some plants have as many as 60

(Hennessey et al., 1993). Sequence comparisons of actin from amoebas and vertebrates

display remarkable conservation; sharing 80% sequence identity (Gallwitz and Sures,

1980). Actin is the most abundant cytosolic protein in eukaryotic cells, making up 10%

weight of the total cell protein in muscle and 1-5% of the total weight in non-muscle

cells. Actin exists in the cell as a globular monomer termed G-actin and as a filamentous,

helical polymer termed F-actin.

Mature F-actin is an array of actin monomers in the form of two-start, right-handed

helices (Holmes et al., 1990). The pitch is -72 nm over -13 protomers (Sase et al.,

1997). In vivo each actin monomer contains a divalent (Mg2+ or Ca2+) ion completed

with either ATP or ADP but experimental conditions can also yield an actin molecule

without bound nucleotide (De La Cruz and Pollard, 1995; De La Cruz et al., 2000;

Steinmetz et al., 1997). The nucleation, elongation, and branching of actin filaments

have been shown to be the driving force in many forms of pathogenic cell motility as well

as pinocytosis and organelle movement (Ploubidou and Way, 2001; Taunton, 2001).

The rapidity with which these constructive events occur emphasizes the efficiency

and dynamic nature of the nucleation mechanism, actin-actin contacts, and the actin

monomer itself.

The addition of millimolar concentrations of salts to G-actin solutions induces F-

actin polymerization. The investigation of F-actin formation and assembly, with and

without different divalent cations, has revealed variation in polymerization speed and

assembly dynamics (Steinmetz et al., 1997). Such an array of F-actin polymerization

properties may lend itself to multiple mechanisms of filament building. The classical

pathway involving three steps, (I) monomer activation, (II) nucleation, and (III)

elongation (Carlier, 1991; Pollard, 1990) may be only one component of a variety of

possibilities of filament assembly dictated by different cellular needs. Millonig et al.,

(1988) showed that even under a wide variety of polymerizing conditions, the first

detectable step in polymerization is the anti-parallel dimerization of G-actin.

This dimer, often referred to as the lower dimer (LD), due to its observed migration

on native gels, has been suggested as the filament building block in highly motile cells

because of its propensity toward forming a highly branched and "ragged" filament

meshwork (Steinmetz et al., 1997). This alternative "sidetrack" polymerization pathway

in which two G-actin monomers form the transient LD that is not seen in F-actin, may act

as both an intermediate actin assembly catalyst and as a "bundling factor" between two

nearby actin filaments (Pederson and Aebi, 2002).

Other evidence of filament elongation by the LD is supported by the stabilization of

this dimer by the actin binding proteins gelsolin, and actobindin and the small molecule

swinholide A (Bubb et al., 1994, 1995; Hesterkamp et al., 1993). Also, polylysine has

been shown to induce F-actin formation in vivo, and when G-actin is covalently cross-

linked during this induced polymerization LD actin accumulates, suggesting a nucleating

role for this dimer in filament formation (Bubb et al., 2002).

Historically, the major problem with studying the structures of polymerization

intermediates is that actin readily assembles into filaments under condensation conditions

(Kabsch et al., 1990). This has led to crystal structures of actin mainly being studied as

complexes (Dawson et al., 2003; Kabsch et al., 1990; Morton et al., 2000; Otterbein et

al., 2002; Schutt et al., 1993). This variety of interactions shows the dynamic nature of

the molecule and its promiscuous ability to complex with many other proteins. Since the

publication of the Holmes' filament model (Holmes et al., 1990), and the wide

acceptance of this model as the structure of F-actin, a clear mechanism for the nucleation,

elongation, and branching of this cytoskeletal element has eluded investigators.

We provide structural evidence for a proposed filament assembly mechanism

utilizing an actin trimer intermediate, incorporating the LD to build the classic parallel

dimer configuration of F-actin (Holmes et al., 1990). This model is derived from the

observed structural changes in the LD and trimer contacts that occur when orthorhombic

P212121 crystals (Bubb et al., 2002) are converted, by condensation lattice shrinkage, to

two new tetragonal crystal forms. A tetragonal P43212 form induced by slow solvent

evaporation of the orthorhombic crystal at room temperature, and a twinned P43 form

induced by the quick dip of an orthorhombic crystal into cryoprotectant and subsequent

flash-freezing prior to data collection. These induced crystal lattice changes produce

conformational rearrangements of the actin molecules that may be likened to an actin

filament-forming precursor event. The dynamic nature of these observed actin dimer and

trimer interactions are discussed as structural occurrences that may mimic events that

occur during F-actin nucleation and assembly in vivo.

Materials and Methods

Purification and Crystal Preparation

Rabbit skeletal muscle acetone powder was prepared from frozen muscle (Pel-

Freez, Rogers, AR) in Buffer G (5.0 mM Tris, 0.2 mM ATP, 0.2 mM dithiothreitol, 0.1

mM CaC12, and .01% sodium azide, pH 7.8). The procedure followed the method of

Feuer et al., (1948). Approximately 0.5 kg of hind and back rabbit muscle was chilled on

ice for 20 min. before mincing. Myosin was extracted from the minced muscle in three

volumes of GS buffer pH 6.5 (0.1 M NaH2PO4, 0.05 M Na2HPO4, 0.3 M NaC1, 1 mM

MgCl2, 1 mM Na4P207, and 0.1 mM PMSF), by stirring at 4C for 15 min.

This solution was then centrifuged at 1000 x g: for 25 min at 4C, the pellets were

then resuspended in 10 volumes of cold 0.4% NaHCO3, 0.1 mM CaCl2, stirred at 40C for

15 min and rapidly filtered through autoclaved cheesecloth. The residue was then

resuspended in 1 volume carbonate buffer (10 mM NaHCO3, 10 mM Na2CO3, 0.1 mM

CaCl2) and stirred at 40C for 10 min. This mixture was diluted with 10 vol. cold ddH2O

and filtered through cheesecloth. The residual material was then extracted with 2

volumes cold acetone powder at room temperature for 10 min., and filtered through

cheesecloth discarding the filtrate. This was repeated until the filtrate was visibly clear

and free from lipids. The resultant powder was spread out onto filter paper and allowed

to dry overnight in a fume hood, then stored at -200C.

Acetone powder was extracted at 0 C for 30 min. with 200 ml of buffer G (5.0 mM

Tris, 0.2 mM ATP, 0.2 mM dithiothreitol, .1 mM CaCl2, and 0.01% sodium azide, pH

7.8) and filtered through cheesecloth. The residue was then washed with 100 ml of

Buffer G and the supernatant fluids were combined and centrifuged at 10,000 x g for 1

hour. 0.6 M KC1 and 2 mM MgCl2 were then added to induce the actin to polymerize and

this solution was centrifuged for 3 hours at 80,000 x g. This F-actin pellet was then

resuspended in 30 ml buffer G and dialyzed for 3 days changing the buffer G every 24

hours. (Spudich and Watt, 1971). This solution was then centrifuged at 80,000 x g for 3

hours yielding pure ATP G-actin.

This ATP G-actin solution was then completed with equimolar latrunculin A and

120 pM polylysine for crystallization of the anti-parallel dimer (Yarmola et al., 2000).

Crystals of the polylysine-actin-latrunculin A-ATP complex were grown at 277 K using

the hanging-drop vapor-diffusion method (McPherson, 1982). A 10 pl crystallization

droplet was formed by mixing a solution of equimolar components of actin and

latrunculin A at a concentration of 240 pM with 120 pM polylysine in crystallization

buffer [1.3 M(NH4)2S04, 3.0 mMMg C12,60 mMimidazole pH 6.7]. The droplet was

suspended over 1 ml of the crystallization buffer. Useful diffraction-quality crystals

appeared after 2-4 weeks (Figure 3-1A-B)(Bubb et al., 2002)

Data Collection and Processing

Tetragonal P43212 crystal form. The P43212 crystal form X-ray diffraction data

were collected "in-house" using an R-AXIS IV++ image plate system with Osmic

mirrors, 0.3 mm collimator, and a Rigaku HU-H3R CU rotating anode operating at 50 kV

and 100 mA. Each image was collected with a crystal-to-detector distance of 150 mm

and 1.00 oscillation angel with an exposure time of 3 min per image. The initial 30

frames of diffraction data collected diffracted X-raysto 3.2 A resolution and conformed to

the previously reported lattice symmetry and dimensions of the orthorhombic P212121

crystals, with unit cell parameters a = 101.5, b = 103.1 and c = 126.9 A (Figure 3-1 A-B).

Due to equipment failure, the crystal data collection was interrupted, and the crystal was

left suspended in a capillary tube for six days while an engineer from molecular structure

corporation (MSC) repaired the malfunction.

Upon re-exposure to X-rays, the next 120 images showed a much-improved

diffraction to 2.3 A resolution, compared to the first 30 images collected. More

unexpected was that these 120 images cell dimensions were not consistent with the

original orthorhombic form, and conformed to the P422 Laue group symmetry with unit

cell parameters a = 100.9, and c = 103.9 A (Figure 3-2).


Figure 3-1. Crystals and Diffraction Pattern of Actin LD Complexed With Latrunculin
A and Polylysine. (A) The largest crystals appeared after two weeks and
had dimensions of -150 x 100 |tm. (B) Diffraction spots can be measured
to 3.5 A resolution. The crystals adhere to space group P212121 with unit
cell dimensions a = 101.5, b = 103.1 and c = 126.9

Tetragonal P43 crystal form. The X-ray diffraction data for the P43 crystal form

were collected on beamline Al at the Cornell high-energy synchrotron radiation source.

The crystal was cryoprotected with 30% PEG 400 and flash-frozen prior to data

collection. The data were collected at 100K with an ADSC Quantum-4 CCD (Charge

Coupled Device) detector using a 0.2 mm collimator. A total of 150 images were

collected with a crystal-to-detector distance of 150 mm, a 1.00 oscillation angle, and a

wavelength of 0.9290 A. The crystals diffracted to 3.0 A resolution and belong to the

Laue group P4, with unit cell dimensions ofa = 101.5 and c

Figure 3-2.

104.2 A.

X-ray Diffraction Oscillation Photograph Showing Changes Due to
Dehydration. Split image comparison between the (A) Orthorhombic
P212121 and (B) Tetragonal P43212 crystal system diffraction quality. Note
for the orthorhombic P212121 data the Bragg reflections have a mean mosaic
spread of 0.9 and extend to 3.5 A resolution, whereas the tetragonal P43212
data have a mean mosaic spread of 0.4 and extend to 2.3 A resolution.
These diffraction images were obtained from the same crystal with a 6-day
delay between data collections. Both data sets were collected at room
temperature using an R-AXIS IV++ image plate system with Osmic mirrors
and a Rigaku HU-H3R CU rotating anode operating at 50 kv and 100mA.
The crystal-to-detector distance was 150 mm with a 1.00 oscillation angle,
and an exposure time of 3 min per image.

All images for both the P43212 and P43 data sets were indexed, integrated and

scaled with the HKL suite of programs DENZO and SCALEPACK (Otwinowski and

Minor, 1997).

Structure Determination and Refinement

The structure determination of the tetragonal P43212 and P43 crystal forms were

performed using the G-actin monomer (Bubb et al., 2002; 1LCU) as a search model for

molecular replacement (Rossmann, 1990).

Tetragonal P43212 crystal form. The P43212 crystal structure was initially phased

with the CNS software package (Brunger et al., 1998) using standard cross-rotation and

translational searches. After one cycle of refinement in CNS (rigid body, annealing,

minimize, and B-individual) further refinement was carried out in CCP4 program Refinac

5 (rigid body, simulated annealing, restrained individual B-factor refinement, conjugate

gradient minimization, and bulk solvent correction) (Murshudov et al., 1997).

After 5 rounds of iterative refinement, the latrunculin, ATP, and Mg2+ ions were

placed into electron density using a Fourier Fo F, electron density map. Refinement was

completed by incorporating TLS (translation, liberation and screw-rotation) refinement

using the four actin subdomains as pseudo rigid bodies in the CCP4 program Refinac 5.

Because of the unusual observed crystal rearrangement, the P43212 data were also

reduced and the structure solved in both the P212121 and P43 space groups to similar

statistics and actin-actin contacts. Therefore the data was assigned as the P43212 space

group based on its higher Laue symmetry group.

Tetragonal P43 crystal form. For the P43 data, the data was found to be twinned

and this hemihedral twinning event was characterized using the intensity statistics and

distributions obtained from the TRUNCATE, DETWIN (Collaborative Computational

Project, Number 4, 1994 (CCP4)) and CNS programs (Brunger et al., 1998). The self-

rotation function calculations were performed on data between 10 and 4 A resolution

using the POLARRFN program (CCP4). The structure of actin in the P43 crystal form

was solved entirely using the twinning procedures and was refined with simulated-

annealing, individual B-factor and energy-minimization procedures using the CNS

software interspersed with rounds of manual model building with graphics program O

(Jones, 1978). The quality of the final refined structure was validated with the

PROCHECK and CNS programs (Laskowski et al., 1993; Brunger et al., 1998).

Structure Analysis

Least-squares superimpositions of the actin structures determined in the P43212,

P43, and P212121 crystal forms and the LD in F-actin were performed using the programs

O and Xtalview (Jones, 1978; McRee, 1999). To investigate the possible role of the LD

in filament branching, least square superimposition between each monomer of the LD

and one monomer of each of two identical models of the Holmes' filaament was

performed, effectively "linking" two F-actins [Holmes et al., 1990; 1ATN].

For all three crystal systems, ligand and prosthetic group interactions were

analyzed and defined using the program LIGPLOT (Wallace et al., 1995). Figures were

made with the software XtalView, BOBSCRIPT, Raster3D, and PyMol (DeLano, 2002;

Esnouf, 1999; McRee, 1999; Merritt and Bacon, 1997).

The atomic coordinates and structure factors for the actin structures in the crystal

forms P43212 and P43 have been deposited with the Protein Data Bank with Accession

numbers, 1RDW, and 1RFQ, respectively.


Space-group Assignment

Tetragonal P43 crystal system. Initially, because of the previously recorded

diffraction data collected for actin crystals grown as described in (Bubb et al., 2002), the

cryo-collected diffraction data were indexed in the orthorhombic Laue group P222, with

unit-cell parameter a = 101.5, b = 101.5, c = 127.0 A for crystals obtained using the same

crystallization conditions but for which the data were collected at room temperature

(Bubb et al., 2002). Also observed, but less dramatic, was that the unit-cell parameters a

and b were now identical for this new crystal form. The observed Bragg diffraction

appeared to be ordered and all reflections were predicted based on unit-cell parameters.

Furthermore, the frozen crystal diffracted X-rays to higher resolution (3.0 A resolution)

compared with those for which data were collected at room temperature (3.5 A

resolution). This may have been more a consequence of using synchrotron radiation for

data collection compared with an in-house source, rather than the change in crystal form.

Therefore, because of the possible ambiguity in Laue group assignment, the

diffraction images were re-indexed using experimental parameters derived from

processing the data set as P1 and integrated and scaled as the tetragonal P4, P422, and

the orthorhombic P222 Laue groups. This resulted in Rsym values of 0.081, 0.141 and

0.129, respectively. The predicted reflections for all three crystal system settings fitted

the observed data equally well.

Table 3-1 gives the data-processing statistics for all three Laue group assignments.

The significantly lower Rsym value indicated that the crystal system might be P4 and that

the observed pseudo-twofold crystallographic symmetry may be caused by a non-

crystallographic symmetry (NCS) dimer (LD-actin) in the crystal lattice.

Table 3-1. Data Processing Assuming P4, P422 and P222 Laue Groups
P4 P422 P222
Resolution No. Rsym Comp No. Rsym Comp No. Rsym Comp
(A) Refls (%) (%) Refls (%)
50.0-6.46 2166 0.049 99.6 1253 0.1 99.5 2246 0.089 95.5
6.46-5.13 2131 0.058 99.9 1161 0.123 99.9 2177 0.11 97.5
5.13-4.48 2112 0.068 99.8 1144 0.132 100 2168 0.12 98.1
4.48-4.07 2110 0.08 100 1128 0.153 99.9 2148 0.145 98.5
4.07-3.78 2079 0.096 99.9 1111 0.164 100 2136 0.153 98.8
3.78-3.56 2139 0.119 99.9 1135 0.18 100 2154 0.17 99
3.56-3.38 2075 0.166 99.8 1101 0.23 100 2138 0.221 99.1
3.38-3.23 2095 0.213 100 1104 0.274 100 2123 0.266 99.2
3.23-3.11 2117 0.38 100 1115 0.435 100 2136 0.362 99.1
3.11-3.00 2071 0.408 99.8 1089 0.477 100 2133 0.525 99.1
Overall 21085 0.081 99.8 11341 0.141 99.9 21559 0.129 98.4
Rsym = Ilobs Iavg|/y lobs, where the summation is over all reflections.

Matthews coefficients VM (Matthews, 1968) were also calculated for the three

crystal systems, assuming the molecular weight of G-actin to be 42 kDa. The VM value

for all three Laue groups was -3.6, assuming one G-actin molecule per asymmetric unit

for the crystal system P422 and two G-actin molecules per asymmetric unit for the crystal

systems P222 and P4. Therefore, neither the orthorhombic nor the tetragonal crystal

systems could be eliminated based on these calculations.

Self-rotation function calculations (Rossmann and Blow, 1963) for kappa (K) set at

90 and 1800 were performed using data between 10 and 3 A resolution. Figure 3-3 shows

self-rotation function results for the P4 crystal system; similar results were observed for

the other two crystal systems (data not shown). For K = 90 a single strong peak parallel

to the c axis (characteristic of a fourfold crystallographic axis) was observed, whereas for

K = 1800 twofold rotation axes were observed parallel to the a and b axes and to the

diagonal between them. The interpretation of the K = 90 and 1800 rotation-function

results taken together for all three crystal systems gave the impression that the crystal

system had a crystallographic fourfold symmetry axis parallel to the c axis with only

pseudo-422 symmetry, the actin-actin NCS dimer twofold being masked by the pseudo-

422 crystal symmetry, as the NCS twofold was diagonal to the a and b crystallographic

axes. The assignment of Laue group P4 was also in agreement with the observed Rsym

values (Table 3-1). The space group was therefore assigned as P41 or P43 based on

possible packing arrangements and inspection of the 001 reflections, which showed a

clear indication of a screw axis along the c-axis.


b b,'',,
/a a Q

Figure 3-3. Self-Rotation Function Stereographic Projections of the P4 Reduced Data
Set for (A) K= 900 section and (B) the K = 1800 section. Calculated for data
between 10 and 3 A resolution using the POLARRFN program CCP4.
Structure Solution
Tetragonal P43212 crystal form. The structure determination of actin in the

tetragonal P43212 crytal form was achieved using a G-actin monomer (PDB accession

code:1LCU, Bubb et al., 2002) as the molecular replacement search model using the CNS

software package (Brunger et al., 1998). The cross-rotation search, using data between

15 and 4 A, yielded one unique solution (01 = 382.2, 02= 12.9, and 03 =315.20), 7.65 Co

above mean, consistent with one actin monomer within the crystallographic asymmetric

unit. Using this orientation matrix, the translation searches were calculated assuming the

possible space groups (P422, P4212, P4122, P41212, P4222, P42212, P4322, and P43212).

The space group P43212 gave a single clear solution (Tx = 1.7, Ty = 51.0, and T = -

3.6 A) with a correlation coefficient of 0.687 and a packing value of 0.448. This model

was oriented and positioned and refined using rigid body, simulated annealing and B-

factor refinement methods within the CNS and Refmac programs using all data to 2.3 A

resolution (Brunger et al., 1998; Murshudov et al., 1997).

The final refined crystallographic Rfactor and Rfree values were 16.2 and 22.4%,

respectively (Table 3-2). The refined model of G-actin comprised 365 amino acid

residues in complex with latrunculin, ATP, two Mg2+ ions and 112 water molecules. No

electron density could be assigned to residues 41-50 (DNase I loop) and residues 221-236

had conflicting overlapped cylindrical electron density along a 2-fold crystallographic

contact where an c-helix has been reported in all actin structures solved. Therefore, this

helical region was modeled with half occupancy (0.5) given the ambiguity in the electron

density map.

Tetragonal P43 crystal form. The structure of actin in the P43 crystal form was

initially determined with the assumption that the space group was either P41 or P43, but

without the consideration that the data might be twinned. This procedure used standard

molecular-replacement protocols (Rossmann, 1990) using a G-actin monomer (PDB

accession cod: 1LCU, Bubb et al., 2002) as the search model in the CNS program

package (Brunger et al., 1998). The results below are only given for the P43 space group,

as it was evident that there were no solutions assuming P41 as the space group. Cross-

rotation function calculations carried out with data between 30 and 4 A resolution yielded

two unambiguous (NCS) solutions (solution A, 01 = 357.3, 02 = 10.9, and 03 = 3.4;

solution B, 01 = 154.5, 62 = 167.2, and 03 = 151.90).

Table 3-2. Data Collection, Processing, and Model Refinement of Three Crystal Forms
Space Group P212121a P43212 P43
Dimensions (A) 101.5, 100.9, 101.5,104.2
103.1, 103.9
Wavelength (A) 1.5418 1.5418 0.929
Resolution limit (A) 3.5 2.3 3

Rsymb (%) 11.5 9.3 8.1
Unique Reflections 15407 22951 21085
Completeness (%) 88.8 93.9 99.8
Matthew's coefficient (A3 /Da) 3.95 3.25 3.6
Solvent Content (%) 70.6 60.7 64.5
Model Refinement
Rfactorc 0.193 0.162 0.193
Rfreed 0.266 0.224 0.261
Number of Actin residues/ 375/2 365/1 365/2
Number of ATPs 2 1 2
Number of Latrunculins 2 1 2
Number of ions 2 2 2
Number of water molecules 70 112 44
Average B-value (A2) 62.4 38.4 63.4
Rmsd Bonds (A) 0.01 0.03 0.01
Rmsd Angles 1.4 2.9 1.5
a Previouly reported actin-dimer (Bubb et al., 2002)
b Rsym = Iobs Iavg|/I lobs, where the summation is over all reflections.
CRfactor = Fobs Fcalcl/I Fobs
d For calculation of Rfree, 5% of reflections were reserved
e Number are for the contents of the crystallographic asymmetric unit.

A translation search orienting the G-actin monomer monomer using solution A

gave a position of Tx = -26.0, Ty = 40.1 and Tz = 85.9 A. The combined correlation

coefficient for the two NCS-related monomers was 0.55, with a packing value of 0.46.

This solution was refined with cycles of simulated-annealing, individual B-factor and

energy-minimization procedures in CNS (Brunger et al., 1998) interspersed with rounds

of manual model building using the program O (Jones, 1978), but the Rwork and Rfree

values did not converge to less than 32 and 36%, respectively. These results called for a

re-examination of the assigned space group and data processing and the possibility that

the diffraction data were twinned.

Detection of twinning. Generally, two distinct types of twinning effects can occur

in crystals, either during crystal growth (probably not the case in this situation), where the

crystal is composed of two completely different species orientations, or when some sort

of stress is applied to a crystal and the twinning is the product of a polymorphic

transformation. An example of this induced deformation twinning is caused during the

flash-freezing of a crystal to cryogenic temperatures (the most likely event that occurred

in this case) (Yeates, 1997). During data processing, no split reflections that would

suggest epitaxial twinning were observed. This indicated that if the data were twinned it

would be a case of hemihedral twinning (a type of merodhedral twinning), in which the

diffraction patterns of the twin domains are completely superimposed on each other.

The first sign of possible hemihedral twinning was the observation that the

diffraction data could be indexed, integrated and scaled with good overall Rsym values in

the P4 Laue group, but only reasonably well (especially in the lower resolution shells) in

the P422 and P222 Laue groups. Table 3-1 shows the resolution-dependence of Rsym.

The overall Rsym does not give a clear indication that the data is twinned, whereas the

Rsym values in the lower resolution shells (50.0-4.0 A resolution) were nearly a factor of

two larger for he data scaled as P422 or P222 compared with P4. This is a useful

indicator that the data is twinned, as the apparently higher symmetry may arise from the

twin operator and therefore introduce symmetry that is not part of the Laue symmetry of

the space group. Another way to observe signs of hemihedral twinning in a data set is to

plot cumulative intensity distribution.

Figure 3-3 A shows the calculated cumulative distribution of intensity for the

reduced P4 data using the software TRUNCATE (CCP4). The plot profile for the acentric

reflections is sigmoidal in shape compared with the plot of the theoretical standard curve

distribution of untwined data (Stanley, 1972; Yeates, 1997; Breyer et al., 1999). This

plot profile for the reduced P4 data indicates that there are far less weaker acentric

reflections in the data set than are generally observed for untwinned data.

A comparison of the cumulative intensity distribution of the reduced P4 data set to

standard untwinned data showed that there was a high degree of twinning, with an

estimated twinning fraction (ca) of 0.37. This observed distribution added further

supporting evidence that the data were indeed twinned. To fully assess the twin fraction

of the observed data, two additional twinning tests were employed using the programs

CNS (Brunger et al., 1998) and DETWIN (CCP4).

For acentric perfectly twinned data, the expected ratio of the average squared

intensity to the square of the average intensities, <12>/<1>2, is 1.5, whereas for untwinned

data this value is 2.0. For the data set reduced as P4 this value was 1.69. In a similar

way, the Wilson ratio /2, where E is the normalized structure factor, is expected

to give a value of 0.785 for perfectly twinned data and 0.885 for untwinned data. For the

P4 data set this value was 0.857. Both these ratios therefore gave a strong indication htat

the crystal was partially hemihedrally twinned (Yeates, 1988).

The twin fraction (c) was also determined using two other methods. A plot (Fig. 3-

4) of the cumulative distribution of H, where H is defined as H = Iltwinhl twinh2/(Itwinhl +

Itwinh2) and Itwinhi and I, ,, are the observed intensities of two acentric reflections related

by a twin law (in this case, the point-group P4 relationship where h,k,l is related to h,-k,-

/), allows a to be estimated directly from the relationship = /2 a and

= (1 -

2a)2/3 (Yeates, 1988).

Similarly, a plot (Fig. 3-4C) of the number of negative reflections accumulated

after detwinning a data set against a (Britton plot; Britton, 1972) can be used to

extrapolate the value a of a data set. The mean value of a was determined using the

software DETWIN (CCP4) and was directly calculated from the H and the Britton plots to

be 0.37 and 0.39, respectively. Based on the determined values for the twin fraction, the

P4 data were assigned a mean twinning fraction of 0.376 applying the twin operator (h,-

k, -).

Application of twinning fraction and operator. The previously described P43

structure solution (which converged to an Rwork of 32% without taking into account that

the data were twinned) was further refined within the CNS software package using the

twinning fraction and the twinning operator (h,-k,-l). The self-rotation function was re-

interpreted and the two twofold symmetry operators seen along the a and c axes were

found to be a function of the twin operator (Figure 3-3). After one cycle each of rigid-

body, individual B-factor and energy-minimization refinement (allowing the two NCS-

related actin molecules to be refined independently of each other) the twinned Rwork and

Rfree values decreased to 25.2 and 26.3%, respectively. The improvement of these

parameters was concomitant with significant improvement in the quality of the calculated

electron-density maps.

On inspection of 2Fo-Fo and Fo-Fo electron-density maps, some actin side chains

were repositioned and the ATP and latrunculin A molecules and an Mg2+ ion were

identified and positioned using the molecular-graphics program O (Jones, 1978). The

ATP and thee latrunculin A molecular geometries were obtained from the HIC-UP

database (Kleywegt and Jones, 1998). Refinement and model building continued for

several more cycles until convergence fo the Rwork and Rfree values. In the final stage of

refinement, solvent molecules were included in positive electron density at the 1.50 level

using the automatic water-picking protocol in the CNS (Brunger et al., 1998) program.

The final refined structure contained the LD-actin, each monomer consisting of 365

amino-residues (ten amino acids in the DNase loop were disordered), ATP, latrunculin A

a Mg2+ ion and 44 ordered water molecules, with a final R factor of 0.193 and Rfree of


Crystal Lattice Rearrangements

Tetragonal P43212 crystal form. This crystal was initially the orthorhombic

P212121 form, with unit cell dimensions a = 101.5, b = 103.1, and c = 126.9 A, consistent

with that previously reported (Bubb et al., 2002). The crystal lattice rearrangement to the

P43212 form, with unit cell dimensions a = 100.9 and c = 103.9 A was only observed

after the 6 day delay in data collection. This lattice rearrangement, presumably due to

water loss, was the most prominent feature observed in the continuation of data


collection. This effect improved the resolution of diffraction from 3.5 to 2.3 A and

decreased the mosaic spread of the Bragg reflections from 0.9 to 0.4.

A 80


Distribution of Z




Distribution of H

entric twinned


i 10

No. negative

0.1 0.2 .3

0.4 0.5

Figure 3-4. Estimation of the Twinning Fraction (c) of the Reduced P4 Data Set. (A)
Cumulative intensity N(z) distribution plot, where N(z) is the percentage of
reflections with I/, less than or equal to Z. Shown is the distribution for
centric and acentric (Itwinned) for the P4 reduced data set and (Iuntwinned)
the theoretical distribution of untwinned data. (B) Estimation of a by
plotting the cumulative fractional intensity difference acentric twin-related
intensities ( Yeates, 1988). (C) Estimation of a by Britton plots (Britton,
1972; Fisher and Sweet, 1980). The number of negative intensities after
detwinning is plotted as a function of the assumed twinning fraction. An
overestimation of a will increase the number of the negative intensities and
the actual value of a is extrapolated from this increase (dotted lines). Figure
(A) was generated using the program TRUNCATE and (B) and (C) using

0 .1 0,2 H 0.3

The complete data set contained 22 951 unique reflections (93.9% complete),

resulting in an Rsym of 9.3%. Table 3-2 gives a summary of the data collection statistics.

Figure 3-2 shows the characteristics of the measured Bragg reflections between the

orthorhombic P212121 and tetragonal P43212 systems. The diffraction pattern of the

orthorhombic crystal showed a large solvent ring and poor diffusely scattering data,

whereas the reflection peak profiles for the tetragonal P43212 crystal exhibited sharp

boundaries (Figure 3-2).

The largest variation in the unit call dimensions between the orthorhombic and this

crystal form was a decrease of the c-axis dimension from 126.9 to 103.9 A (18%

reduction). Also, the solvent fractions and consequently the Matthews' coefficient

(Matthews, 1968) changed from 70.6% and 3.95 A3/Da for the orthorhombic form (with

two monomers per asymmetric unit) to 60.7% and 3.25 A3/Da for the tetragonal form

(with one monomer per asymmetric unit) (Table 3-2). This was consistent with an

overall volume decrease of 2.7 x 105 A3 from the orthorhombic to this tetragonal form.

Tetragonal P43 crystal form. The most significant difference between the

orthorhombic P212121 and the tetragonal P43 crystal systems was a shrinkage of the c-axis

by -23 A (Fig 3-5). The shortening of this unit-cell parameter and the space group-group

transformation were presumably mediated by the dehydration owing to exposure to the

crystal to cryoprotectant. The solvent content in this crystal form was 64.5% down from

70.6% in the orthorhombic system. Dehydration events when subjecting crystals to high

concentrations of glycerol have been documented and indeed have been used to gain

higher resolution diffraction (Heras et al., 2003). Figure 3-5 shows the packing


b 20



Figure 3-5. Actin Packing in the P212121 and P43 Crystal Lattices. Crystal lattice
packing diagrams viewed (A and B) down the c-axes and (C and D) down
the a-axes of the P212121and P43 crystal forms, respectively. Shown four
actin anti-parallel dimers (LD-actin). The LD-actin molecules are depicted
as either orange and yellow or blue and cyan pairs. (A and B) Note the
pseudo 2-fold and 4-fold symmetry, in the P212121 andP43 crystal forms
(crystallographic and pseudo symmetry shown in black solid and gray
dotted lines, respectively). Also shown the direction of the non-
crystallographic symmetry axis within one of the LD-actin molecules. (C
and D) The open ellipsoids show the relative compression of the c-axis in
the P43 compared to the P212121 crystal form. The LD-actin molecules slide
over each other decreasing the c-axis by -23 A.

arrangement of the actin molecules in both crystal forms. The lattice transformation is

shown from the pseudo-fourfold axis of the orthorhombic system (Figure 3-5 A) to the

crystallographic fourfold axis of the tetragonal form (Figure 3-5 B).

While the molecular packing looks similar down the c axis, Figures 3-5 and 3-69

show the extent of the lattice collapse along the c axis within the tetragonal crystal and

the new interactions that have been formed. The major consequence of the c-axis

collapse is the formation of a new actin-dimer interface in the P43 crystal lattice between

residues 221-236 of symmetry-related actin monomers (Figure 3-5 D). This new crystal

interface involves extensive hydrogen-bonding interactions that may stabilize the P43

crystal lattice and could be the possible cause of the increased resolution of diffraction of

the tetragonal crystal form (3.0 A) compared with that of the orthorhombic crystal form


dimer (Lower dimer, LD, orange and blue) and actin monomer (red)
forming a trimer, shown in relation to other symmetry related actin
molecules (gray). Note the collapse of the solvent channel in the z-direction
of the (B) P43212 and (C) P43 crystal forms, compared to the (A) P212121
crystal form. Also indicated by open black circles are the crystal contacts
(type IIa: LD, lib: secondary anti-parallel dimer, and III: trimer contact) as
described in the text.

The observed lattice rearrangements were either caused by or were the cause of a

more subtle difference between the LD-actins that exist in both crystal forms. Bubb et

al., (2002) showed that within the orthorhombic crystal system, the so-called DNase I

loop (residues 41-52) from subdomain II of the G-actin monomer makes contacts with a

symmetry-related LD-actin interface. The residues making this interaction are well

ordered and form a trimer between the LD-actin and and another actin monomer

comprised of mainly electrostatic interactions. This interaction occurs with only one

monomer of each of the LD-actins, the other monomer's DNase I loop being disordered.

An examination of the actin-molecule structures in the tetragonal crystal form shows that

neither LD-actin monomer's DNase I loop are ordered and, as such, there are no clear

contacts forming a trimer.

Least-squares superimposition in the program O (Jones, 1978) of one monomer of

the LD-actin from the orthorhombic crystal onto the corresponding monomer of this

dimer from the tetragonal crystal revealed an r.m.s. deviation of -1.5 A for the Ca atoms

(treating each monomer as a rigid body). This superimposition of the LD-actins from

both crystal forms also highlighted the relative differences in rotation and translation

between the actin monomers that form this actin antiparallel dimer (Fig. 3-7). The

transformation that describes this difference in the actin-actin dimer formations is:

0.9582 -0.1012 0.2678 -1.2489
0.0476 0.9787 0.1995 + -7.2288
-0.2823 0.1784 0.9426 4.7836

This movement is attributed to the flexibility around the disulfide bond and the

subsequent cleavage of this bridge between the monomers that form the LD-actin (Figure

3-7). It is still unclear as to what has caused the shearing of this disulfide bond that forms

the LD interaction. Studies of radiation damage caused by synchrotron data collection

have showed that with intense X-ray beams highly specific features such as cleavage of

disulfide bonds, loss of density for some ionizable side chains and an overall increase in

atomic B factors can occur (Burmeister, 2000; Leiros et al., 2001). Figure 3-8 (A and B)

shows the two LD-actin interfaces found in the orthorhombic and tetragonal space



Figure 3-7. Actin Anti-Parallel Dimer Superimposition. Structural overlay of the LD-
actin in the P43 cyann) and P212121 (blue) crystal forms. Viewed (A) about
the non-crystallographic 2-fold and (B) a 900 rotation. Only one actin
monomer (bottom monomer in fig. A and B) of each LD-actin molecule is
superimposed (r.m.s.d 1.5 A). Note the relative -200 clockwise rotation
within the LD-actin pairs (upper monomer in B) between the P212121 and
P43 crystal forms. Close-up stereo view of the LD-actin interface (C)
P212121 (D) P43 as depicted in (A). Note the break of the disulphide bond in
the P43 crystal form.

The disulfide bridge between the monomers in the LD-actin for the orthorhombic form is

2.03 A (Figure 3-8 A), whereas in the twinned-tetragonal form the cysteine residues are

6.5 A apart (Figure 3-8 B). This breakage of the disulfide line appears to have allowed

the displacement of one monomer within the LD-actin in the tetragonal crystal form.


Phi:3'Lys373 PhK 375 Lys373

/ : Phe375 Lys
PheP37 Lys373

Figure 3-8. Close-Up View of the (A) P212121 and (B) P43 Actin Anti-Parallel Dimer
interface, depicted as viewed in the open rectangle drawn in Fig. 3-5(A).
Residues, Lys373, Cys374 and Phe375 are shown as a ball-and-stick
representation. Electron density (red) as shown as a 2Fo Fc map contoured
at 20 for residue Cys374. Note the break of the disulfide bond in the P43
crystal form.

Structure of G-actin

Tetragonal P43212 and P43 crystal forms compared to the orthorhombic

P212121 form. The tertiary structure of G-actin for both the P43212 and P43 crystal forms

was as previously described in the literature (Bubb et al., 2002; Kabsch et al., 1990;

Morton et al., 2000). A structural superimposition of the five independent actin

monomers from the three space groups gave a mean root mean square (rms) deviation of

1.3 A. The G-actin structure consists of four subdomains (subdomain 1: residues 1-32,

70-144, and 338-372, subdomain 2: residues 33-69, subdomain 3: residues 145-180 and

270-337, and subdomain 4: residues 181-269) (Figure 3-9 A). In the P43212 and P43

structures the DNase I loop (residues 41-50) was disordered.

In previous literature, this loop has been shown to be the most conformationally

variable region of actin (Bubb et al., 2002; Kabsch et al., 1990; Otterbein et al., 2001).

The observed average B-factor of the actin structure in the P43212 crystal from is 38.4 A2,

half that of actin in the P212121 (62.4 A2) crystal form, demonstrating a greater degree of

structural order in the P43212 crystal form (Table 3-2). The P43 crystal form, on the other

hand, has a high overall B factor of 63.4A2, possibly indicative of ionization damage

caused by synchrotron data collection (Leiros et al., 2001).

Crystal Contacts

Comparisons of the crystal packing between the actin molecules within the three

crystal systems show that they exhibit distinct differences. There are three major actin-

actin interactions that were compared. These include the previously reported LD

interaction involving the disulfide bond between Cys 374 of symmetry related monomers

(Figure 3-10, Type IIA) (Bubb et al., 2002); a newly created dimer contact in the

tetragonal forms between symmetry related helices of subdomain 4 (residues 221-236)

(Figure 3-9, Type IIB); and a trimer interaction between the DNase I loop of an actin

molecule bound into the cleft created by the LD formation (Figure 3-9, Type III) are also


The LD, type IIa actin-actin contact, is an NCS operator for the P212121 and P43

crystal forms, whereas it is a crystallographic twofold in the P43212 crystal form (Figures

3-10 A-C).