Group Title: BMC Structural Biology
Title: A Model of tripeptidyl-peptidase I (CLN2), a ubiquitous and highly conserved member of the sedolisin family of serine-carboxyl peptidases
Full Citation
Permanent Link:
 Material Information
Title: A Model of tripeptidyl-peptidase I (CLN2), a ubiquitous and highly conserved member of the sedolisin family of serine-carboxyl peptidases
Physical Description: Book
Language: English
Creator: Wlodawer, Alexander
Durell, Stewart
Li, Mi
Oyama, Hiroshi
Oda, Kohei
Dunn, Ben
Publisher: BMC Structural Biology
Publication Date: 2003
Abstract: BACKGROUND:Tripeptidyl-peptidase I, also known as CLN2, is a member of the family of sedolisins (serine-carboxyl peptidases). In humans, defects in expression of this enzyme lead to a fatal neurodegenerative disease, classical late-infantile neuronal ceroid lipofuscinosis. Similar enzymes have been found in the genomic sequences of several species, but neither systematic analyses of their distribution nor modeling of their structures have been previously attempted.RESULTS:We have analyzed the presence of orthologs of human CLN2 in the genomic sequences of a number of eukaryotic species. Enzymes with sequences sharing over 80% identity have been found in the genomes of macaque, mouse, rat, dog, and cow. Closely related, although clearly distinct, enzymes are present in fish (fugu and zebra), as well as in frogs (Xenopus tropicalis). A three-dimensional model of human CLN2 was built based mainly on the homology with Pseudomonas sp. 101 sedolisin.CONCLUSION:CLN2 is very highly conserved and widely distributed among higher organisms and may play an important role in their life cycles. The model presented here indicates a very open and accessible active site that is almost completely conserved among all known CLN2 enzymes. This result is somehow surprising for a tripeptidase where the presence of a more constrained binding pocket was anticipated. This structural model should be useful in the search for the physiological substrates of these enzymes and in the design of more specific inhibitors of CLN2.
General Note: Start page 8
General Note: M3: 10.1186/1472-6807-3-8
 Record Information
Bibliographic ID: UF00100053
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: Open Access:
Resource Identifier: issn - 1472-6807


This item has the following downloads:


Full Text


BMC Structural Biology ioMed

Research article

A model of tripeptidyl-peptidase I (CLN2), a ubiquitous and highly
conserved member of the sedolisin family of serine-carboxyl
Alexander Wlodawer* 1, Stewart R Durell2, Mi Lil,3, Hiroshi Oyama4,
Kohei Oda4 and Ben M Dunn5

Address: 'Protein Structure Section, Macromolecular Crystallography Laboratory, National Cancer Institute at Frederick, Frederick, MD 21702,
USA, 2Laboratory of Experimental and Computational Biology, National Cancer Institute, Bethesda, MD 20892, USA, 3Basic Research Program,
SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA, 4Department of Applied Biology, Faculty of Textile Science,
Kyoto Institute of Technology, Sakyo-ku, Kyoto 606-8585, Japan and 5Department of Biochemistry and Molecular Biology, University of Florida,
Gainesville, Florida 32610, USA
Email: Alexander Wlodawer*; Stewart R Durell; Mi Li;
Hiroshi Oyama oyama@ir.. 1i . ir-. Kohei Oda; Ben M Dunn
* Corresponding author

Published: I I November 2003
BMC Structural Biology 2003, 3:8

Received: 23 July 2003
Accepted: I I November 2003

This article is available from:
2003 Wlodawer et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all
media for any purpose, provided this notice is preserved along with the article's original URL.

Background: Tripeptidyl-peptidase I, also known as CLN2, is a member of the family of sedolisins
(serine-carboxyl peptidases). In humans, defects in expression of this enzyme lead to a fatal
neurodegenerative disease, classical late-infantile neuronal ceroid lipofuscinosis. Similar enzymes
have been found in the genomic sequences of several species, but neither systematic analyses of
their distribution nor modeling of their structures have been previously attempted.
Results: We have analyzed the presence of orthologs of human CLN2 in the genomic sequences
of a number of eukaryotic species. Enzymes with sequences sharing over 80% identity have been
found in the genomes of macaque, mouse, rat, dog, and cow. Closely related, although clearly
distinct, enzymes are present in fish (fugu and zebra), as well as in frogs (Xenopus tropicalis). A
three-dimensional model of human CLN2 was built based mainly on the homology with
Pseudomonas sp. 101 sedolisin.
Conclusion: CLN2 is very highly conserved and widely distributed among higher organisms and
may play an important role in their life cycles. The model presented here indicates a very open and
accessible active site that is almost completely conserved among all known CLN2 enzymes. This
result is somehow surprising for a tripeptidase where the presence of a more constrained binding
pocket was anticipated. This structural model should be useful in the search for the physiological
substrates of these enzymes and in the design of more specific inhibitors of CLN2.

Although the existence of tripeptidyl-peptidase I (TPP-I)
was first noted over 40 years ago [1], the structural and
mechanistic basis of its activity has been largely misunder-

stood until quite recently. The situation changed after it
was shown that TPP-I is identical to an independently
characterized enzyme named CLN2. It was also demon-
strated that mutations leading to abolishment of the

Page 1 of 10
(page number not for citation purposes)



enzymatic activity of CLN2 were the direct cause of a fatal
inherited neurodegenerative disease, classical late-infan-
tile neuronal ceroid lipofuscinosis [2]. This important
observation was followed by the identification of CLN2 as
a serine peptidase [3,4], without, however, specifying its
structural fold and the details of the catalytic site. More
accurate placement of CLN2 within the context of a family
of related enzymes became possible only after high-reso-
lution crystal structures of two bacterial enzymes with a
limited sequence similarity to CLN2, sedolisin and kuma-
molisin, became available [5-7]. These structures defined
a novel family of enzymes, now called sedolisins or ser-
ine-carboxyl peptidases, that is characterized by the utili-
zation of a fully conserved catalytic triad (Ser, Glu, Asp)
and by the presence of an Asp in the oxyanion hole [8].
Sedolisin and its several variants (e.g., kumamolisin,
aorsin [9 ], and physarolisin [10]) have now been found in
archaea, bacteria, fungi and amoebae, whereas the higher
organisms seem to contain only variants of CLN2 [8]. The
physiological role of sedolisins in the lower organisms has
not yet been elucidated.

Despite the potential medical importance of CLN2 and
related enzymes, no systematic studies of their genomic
distribution have been published to date. There are also
no published reports of the crystallization of this enzyme.
In the absence of an experimental structure obtained by
crystallography or NMR it is sometimes necessary to resort
to molecular modeling in order to provide a structural
basis for the explanation of the biological properties of an
enzyme, and, in particular, to initiate design of its inhibi-
tors. Examples of such very successful and useful mode-
ling efforts are provided by HIV protease [11], or very
recently by the peptidase from a coronavirus involved in
the severe acute respiratory syndrome [12], among others.
We have now applied the tools of molecular homology
modeling to predicting a structure of CLN2 that could be
used as a basis for a search for the biological substrates of
this family of enzymes and for the design of specific

Results and discussion
Sequence comparisons of the CLN2-like enzymes
Mammalian enzymes homologous to human CLN2 [2,41
form a subfamily of sedolisins with highly conserved
sequences (Figure 1). These enzymes are expressed with a
prosegment consisting of 195 residues that is cleaved off
during maturation, yielding the active catalytic domain.
Complete sequences are available for CLN2 from six spe-
cies in which it has been found so far (human, macaque,
dog, mouse, rat, and cow). The full-length enzymes con-
sist of 563 amino acids arranged in a single polypeptide
chain containing both the prosegment and the catalytic
domain, with the exception of mouse CLN2 that has a sin-
gle deletion in the prosegment. The overall sequence iden-

tity for these enzymes is 81%, whereas the similarity is
92%. A pairwise comparison of the human and mouse
enzymes yields 88% identity and 94% similarity, consid-
erably higher than the median 78.5% identity reported for
all identified mouse-human orthologs [13]. Thus, mam-
malian CLN2 appears to be a highly conserved enzyme.

In addition to the mammals, CLN2-like enzymes are also
found in fugu (puffer fish unannotated record
SINFRUP00000077297 in the NCBI fugu sequence data-
fugu.html) and zebrafish (contig wz4596.2 in the
zebrafish EST database, lab).
Only a fragment of the sequence of the latter enzyme
agreed with the former. However, a few comparatively
minor modifications can bring the zebrafish sequence
into good agreement with that of the fugu CLN2. These
modifications include a deletion of a single nucleotide
from a run of three, as well as three insertions of repeated
nucleotide pairs (Figure 2). It must be stressed that these
modifications are speculative and may lead to prediction
of several incorrect amino acids; however, they bring the
two sequences into good global agreement (69% identi-
ties and 83% similarities).

The available amino acid sequence of the fugu CLN2 ana-
log, named by us sedolisin-TPP [8], is also in good agree-
ment with the sequences of the mammalian orthologs
(Figure 3). The only major difference in the translated
amino acid sequence compared to the mammalian and
zebrafish enzymes is in the amino terminus of the
propeptide region that is shorter by 30 amino acids (not
shown). It is very likely that this represents a fault in the
assembled sequence rather than a real variation, since the
current coding frame is not initiated with a methionine,
and a few extra residues are present in the full genomic
sequence available from the fugu sequencing consortium

CLN2 is present not only in fish, but also in amphibians,
in particular in Xenopus tropicalis (a species of frog). A
partial sequence of its sedolisin-TPP (AL594774) found in
the EST database
X tropicalis/blast server.shtml spans the middle part of
the catalytic domain, without reaching the part of the
active site closer to the N terminus that contains the aspar-
tic and glutamic acids that belong to the catalytic triad.
However, the sequenced part of the enzyme shows 75%
identity with the fugu sedolisin-TPP, and 69% identity
with human CLN2 (Figure 3). Sequence similarity to bac-
terial or fungal sedolisins is much lower, indicating that
the enzyme found in frogs might also share the functional
properties of the CLN2 subfamily.

Page 2 of 10
(page number not for citation purposes)

BMC Structural Biology 2003, 3

Human .. LF ILS S .RT A SL R V -.
Macaque L LF ILS' S -'RT -. -RA:PEEL L- .R--, L .-SE
Mouse RL-GLL LVIAG .T N WM-- .- V ------ A-K.R L ..-
Cow F-L FALILSK S E- RT.-- .. 'A.- LRUV -

Human S. : -N D : H L QK. ;. I SI
Mouse L .-S..-'. G L LDVAEL Q-. -TLLT < -S RN.D. .V
Rat -SR -T D E Q"- .T R-V' L ARD-'-SVT :F TC'V
Cow L -.-S-L .-NV D. - H '. '.L- OK. '- I .. -;I
Dog G ;H T D E .. TFRT S RN": T SV

Human H E V T. L
Macaque H -E V '. -- L: .
Mouse -. .. .. .- R KT I- .-:-. -- .: . -
Rat R AF lPF F -iRYV G ^AKT-I I- '---L' -. VLVA .-FP L-- : P --
Cow HCA L- .-_ H '.- E V .- A.- .' T. -L. ---
Dog S RY E I-. I- LR K -'".WV - -T- : L .

Human VT.
Macaque VT -.
Mouse QVVG
Rat GV P
Cow VT. -
Dog VT.

. -~ K SQ. S . D. -AQ '
'-: ::GVT SVI YK -LSQ -"TSAV:z.C "':- -YF D- AQ-O'-. 7GNF,
S L 0- .-AK .-. T .A-. ... N.. -TE - ..S T
'T-SL- AR- -T YF NSTE .- SS-
S .. . I-K- S ... S . D.:!- -AQ ,- N
. TAG T .. : .- .-.A AE .'. N

Human .Q. . s sP G.- .. M
Macaque H T '- 0. 0 .CA sVY -0G M
Mouse K K. .E LA.' -C-LLLL -
Rat KVE SP'-A"' L
Cow S V Q 3 '_ .. - .-G'. .M -
Dog V Q. E . -- S.- . L ':

Human -
Macaque -
Mouse S. -
Rat S' -
Cow A- ,V;
Dog --
Human P -
Macaque PA ..
Mouse -
Rat P .
Cow P
Dog "

- -- :. 0. -T
L--:: TI
~- TI... K:--
::)rIVv F T- S j..
~ ~jJ~\ ~L ..~A3JL'~s:~I0~
-F... 0- OS

QE- A. N-I ..-
7 7QE~ -1N71
KNA '-IV..-V
OE -I N: I -
'O --t RV T-I -

-710 ,

.-. TK -S- .P - .
.' .VAQ--K-:S --'
VAQ- K -
- -AVTK- .S P ..
. __ & VQ--S..P _-:_..y

Human A.~ R :*~-. B
Macaque Ff N L2j ~ V

Rat F .1S W C L N" !T
Cow~\ C> AL R -. .~ ~ I S -
Dog ':-)S : .'. L-.SL'"

Human -
Macaque PR*
Mouse '.- -
Cow PR
Dog EI

-A -H' -:A N .. .

R-- R--.. -- .'.. gL-L AL IK -

Figure I
Sequence comparisons of mammalian CLN2-like enzymes.These sequences correspond to the complete enzymes,
including the prosegment. Residues forming the active site are shown in yellow on red background, other conserved residues
identified as important for the stability of the enzyme are marked with yellow background, residues identical in at least 5 of the
structures are green, and residues similar in their character are shown in magenta. The maturation cleavage point generating
the N terminus of the active enzyme is marked with black triangles.

Page 3 of 10
(page number not for citation purposes)

BMC Structural Biology 2003, 3

gene aaagaaagaaatacctttaggcccaagttttgctgcttccagtccgtatgtg
orig K E R N T F R P K F C C F QS V C A

gene ccaactgtgggggggacttcttttaaaaccccttcaactc:cacctatgag
orig N C GG D F F N P F N S P M R

gene gtacagatt taatcaccggagggg-, cttaattatgtgtttccaatgccg
orig Y R L I T G G A L C V S N A G

gene gattatcaggttgatgctgttagagcgtatctgaagagtgttcagtctctt
orig L S G C C S V S E E C S V S
415D Y Q V D A V R A Y L K S V Q S L
Figure 2
Corrected gene sequence of the zebrafish CLN2.This putative sequence shows the manual corrections that bring it into
alignment with the sequence of the fugu enzyme. Inserted nucleotides are marked in green and a deleted one in red.

The discovery of highly conserved CLN2-like enzymes not
only in mammals but also in two fish species and one of
frogs may indicate that these peptidases are universally
present in the vertebrates, and that their important role
identified in humans [2] and mice [14] might be a more
general feature.

Modeling of the structure of human CLN2
The medical importance of CLN2 and the lack of a crystal
structure inspired attempts at protein modeling. The first
such model assumed that this enzyme is membrane-
bound, with the sequence 271-294 (numbering corre-
sponding to the mature enzyme, Figure 1) forming the
putative transmembrane anchor [15]. However, in view of
the subsequently obtained structures of the fully water-
soluble sedolisins, this model was clearly incorrect.
Exploiting the sequence similarity between CLN2, sedoli-
sin, and kumamolisin (Figure 4), we have now used the
experimentally obtained structures of the latter two
enzymes to form a new, homology-derived model of
human CLN2. The primary basis of the homology model
was the structure of a complex of sedolisin with a cova-
lently-bound inhibitor, pseudo-iodotyrostatin. Although
it has not been directly shown that this compound can
inhibit human CLN2, other similar peptides with an alde-
hyde functionality on their "C-termini" are weak, but
detectable, inhibitors of this enzyme (Oda, unpublished).
It is thus likely that pseudo-iodotyrostatin or a similar
inhibitor might work for CLN2 as well, although the
actual contacts between the inhibitor and the enzyme that
are seen in the model have to be treated with caution.

Another reason for the modeling of a pseudo-iodotyrosta-
tin complex is that CLN2 is a tripeptidase, and that this
inhibitor it represents the only experimental structure of a
tripeptide analog bound to sedolisin.

The r.m.s. deviation between the corresponding Ca coor-
dinates of the model of CLN2 (Figure 5) and the experi-
mental structure of sedolisin is about 1.75 A, not much
larger than the experimental difference between sedolisin
and kumamolisin. Interestingly, the Cys327 and Cys342
residues in the model were found to be ideally positioned
to form a disulfide bond even though this was not part of
the design strategy. That this bond likely occurs in the real
protein is suggested by the fact that these two cysteines are
strictly conserved in all known animal species of CLN2
(Figures 1 and 3), although they are absent in all known
sequences of bacterial sedolisins. Thus, if this disulfide
were experimentally found to exist in CLN2 it would pro-
vide support for the correctness of the model.

Comparisons of the substrate binding pockets of CLN2 and
Since the principal known activity of CLN2 is that of a
tripeptidase, it is expected that three substrate-binding
pockets, S1 through S3 (using the nomenclature of
Schechter and Berger [16]), should be discernible. Resi-
dues P1-P3 of the inhibitor that should occupy these
pockets are shown in Figure 6. All the available structures
of the complexes of either sedolisin or kumamolisin with
inhibitors contain either a tyrosine or a phenylalanine
occupying the S1 pocket. Parts of this pocket are fully

Page 4 of 10
(page number not for citation purposes)

BMC Structural Biology 2003, 3

L-.. I K -0- .SGTS -. C .,--. FHDS- AQ- RA- .GN-A-QA
I .' ... ALA- 'A :OAQ- V" V YSPA VE-S- --SG-K HT
Y -VTP A I-N ..7AKD' TAA- -.,V: -- YHPA -.AE S--GGT- MS

S A- O0 R R Q'L A. S VYSS ---G - LM E:








--T -- .- TT
S -

H-L. SS- F AS _'
K- Q 0Y RS .
KV -:-Q ?Y TT-
QS_- OT F TT -



" S


.. --KN -QVTR:VS. ... -: -PM D .AP SRYQSTA






Human F-A -KT: LNP
Xenopus .............
Fugu Y:E LTALL- -
Zebrafish Y VF AS MD-

Figure 3
Sequences of the catalytic domains of CLN2. Complete sequences are shown for CLN2 from human, fugu, and
zebrafish, together with the partial sequence of putative CLN2 in Xenopus tropicalis. Residues identical in all four enzymes are
colored green and those similar are colored magenta. Active site residues are marked as in Figure I.

conserved among different sedolisins, whereas other parts
of it differ. The right-hand side of the pocket (in the view
used in Figure 6) is made of the main chain including res-
idues 164-165 (unless otherwise indicated, the number-
ing refers to the sequence of the mature human CLN2).
This part of the main chain is held in place through a
hydrogen-bonded interaction with the side chain of
Thr279, part of the signature sequence SGTSAS surround-
ing the catalytic Ser280 and its equivalents in the other
enzymes. Aspl 65 itself is also conserved since that residue
provides the lining of the oxyanion hole, so it can be
safely assumed that this part of the S1 pocket is virtually
identical. No side chains point into the pocket from there,
though, so its importance is limited to providing a steric
barrier and excluding solvent. Another wall of the pocket

is made up of the main chain of residues 130-132 that
flank the conserved Glyl31. Again, this part of the chain
does not provide any specific interactions with the P1 res-
idue of the substrate.

Considerable differences are seen, however, at the bottom
of the pocket, where the side chains of Asp 179 in kuma-
molisin and the equivalent Serl90 in sedolisin make
hydrogen bonds to the P1 tyrosine of the inhibitor (if
present). The equivalent residue in CLN2 is Thrl 82, but it
is very unlikely that it can assume an orientation that
would allow it to make a hydrogen bond to a P1 tyrosine.
Other polar residues in the vicinity are Glu175 in sedoli-
sin and the corresponding Asp169 in kumamolisin. How-
ever, the residue found here in CLN2 is Cysl70, much less

Page 5 of 10
(page number not for citation purposes)

.............. I FMQ I

F.. I, 1_; EH I
VI- _.. DO- F

-E L- I DLVL..................
E L L:EQ Q.K- -AA .
V '.-L: -L: DK E.K. -.AS .: !

BMC Structural Biology 2003, 3


Figure 4
Sequence alignment of bacterial and mammalian enzymes. Alignment of the sequences of sedolisin, kumamolisin, and
human CLN2 used in the construction of the model of the latter enzyme. The colors scheme is the same as in Figure 2.

Figure 5
A homology-derived model of human CLN2. Ribbon diagram of the Ca trace of CLN2, with the segments that were
modeled based on the highly conserved core of sedolisin and kumamolisin (r.m.s. deviation of I A) colored in red. Side chains
of the residues that were found to be mutated in the genes of families of patients with late-infantile neuronal ceroid lipofusci-
nosis [17] are marked in ball-and-stick.

polar than its counterparts. There is also no equivalent to
the polar interaction between Glu175 and Glul71 in
sedolisin, since the equivalent of the latter residue in
CLN2 is Serl66, much smaller and pointing away.

The side of the S1 pocket that is created by the very flexible
side chain of Arg179 in sedolisin contains only a much
smaller Serl 74 in CLN2, and thus is much more open in
the latter protein. This part of the pocket, with the main
chain of the protein quite distant from the substrate, is

indeed not well conserved in these proteins, with kuma-
molisin missing it entirely due to a deletion in the corre-
sponding sequence position. In summary, the S1 pocket
in CLN2 has less polar character than the equivalent
pocket in the related proteins, and is lacking direct polar
anchors for any side chains that might be present in the

The S2 pocket in CLN2 is also quite open and accessible
to solvent. It is most likely larger than the equivalent

Page 6 of 10
(page number not for citation purposes)

BMC Structural Biology 2003, 3

Figure 6
A model of the active site of human CLN2. The enzyme is shown in complex with pseudo-iodotyrostatin, a good inhibi-
tor of the sedolisin family of peptidases. Only selected residues of the enzyme are explicitly shown on the background consist-
ing of the molecular surface. The stick model of the inhibitor is colored gold and the P I-P3 residues are labeled in black. Similar
views have been previously published for the experimentally-determined structures of sedolisin and kumamolisin [8]. The fig-
ure was prepared using the program DINO

pockets in either sedolisin or kumamolisin, since these are
limited by Trp81 in the former and Trp129 in the latter
(these residues originate from different parts of the back-
bone in the two enzymes and are not topologically
related). Tyrl30, an equivalent of the latter residue in
CLN2, is unlikely to come into direct contact with the P2
residue of the substrate due to its greater distance (almost
4 A for the closest atoms).

An important remaining puzzle is that the predicted struc-
ture of CLN2 does not show any clear limitations of the S3
pocket that could explain the tripeptidase activity of this
enzyme. The location of the P3 side chain of the substrate
is ambiguous, since it could point in either one of two
directions by exchange with the N-terminal amine. The
only negatively charged residue of CLN2 that is found in
this vicinity is Asp132. Although in the current model the
distance between its carboxylate and the nitrogen of the
N-terminus of the modeled substrate is about 6 A, these
two groups could be brought into hydrogen-bonding
range by some allowed changes in the torsion angles of
the protein. Such a conformational change would involve
breaking of the hydrogen bond between Asp132 and
Serl39. However, this latter interaction is not likely to be
structurally crucial since the serine is not absolutely con-
served in all CLN2-like enzymes.

Location of mutations implicated in disease
Location of mutations found through a genetic survey of
families of classic late-infantile neuronal ceroid lipofusci-

nosis patients has been described previously [17]. Most
such mutations result in expression of either truncated
enzyme or in incorrect intron-exon splices. However,
some of the mutations lead to single amino acid
substitutions in the mature enzyme. Such mutations
include 192N, E148K, C170R and C170Y, V190D, G194E,
Q227H, R252H, A259E, and S280L (Figure 5). Only the
role of the latter mutation is completely clear, since it
replaces the catalytic serine of CLN2 with a side chain that
cannot support its enzymatic activity. No other residues
appear to be located in the immediate vicinity of the sub-
strate. Residues Vall90, Gly194, and Arg252 are very
highly conserved not only in CLN2 but also in other sed-
olisins and must play an important structural role. The
reasons why the remaining mutations would lead to the
loss of enzymatic activity are much less clear, but the wide
distribution of these mutations in the structure supports
the conclusion that any modifications to CLN2 that
would abolish or impair its function could lead to the
development of the disease.

Substrates and inhibitors of CLN2
Little is known at this time about biologically-relevant
substrates of CLN2. Various defects that include trunca-
tions and single-site mutations in CLN2 have been found
in the genes of patients that display symptoms of late-
infantile neuronal ceroid lipofuscinosis [17]. One of the
symptoms of the disease is the accumulation of an
autofluorescent material, ceroid-lipofuscin, in lysosomal
storage bodies in various cell types, primarily in the nerv-

Page 7 of 10
(page number not for citation purposes)

BMC Structural Biology 2003, 3

ous system. Since a major component of such bodies
appears to be intact subunit c of mitochondrial ATP syn-
thase, this protein has been implicated as a potential bio-
logical target of the protease. It has been shown recently
that CLN2 can indeed degrade this subunit on its N termi-
nus [18], but the unambiguous proof that this is indeed
the most important target is still lacking. CLN2 is capable
of processing a number of different angiotensin-derived
peptides [19], with the efficiency of cleavage dependent
on the length of such peptides. The most efficiently proc-
essed peptide consisted of 14 amino acids, with the
tripeptide Asp-Arg-Val removed from its N terminus. The
model of CLN2 presented here can easily accommodate
this peptide on the P side of the substrate-binding site,
although the exact mode of binding of the long P' portion
of the substrate remains obscure. The observation that an
analogous peptide acetylated on its N terminus cannot be
processed supports the postulate that the interactions of
the N-terminal amino group with the side chain of
Asp132 may be the most important feature defining the
tripeptidase specificity of CLN2. A number of different
tripeptides can be serially processed from glucagon, with
their sequences varying widely [20]. Again, however, all of
these tripeptides can be easily accommodated in the sub-
strate-binding site of the CLN2 model. Other potentially
biologically relevant substrates include cholecystokinin
and possibly other neuropeptides [21].

An intriguing property of CLN2 is its reported ability to
cleave collagen-related peptides [22]. The tripeptides
resulting from such processing include Gly-Pro-Met, Gly-
Pro-Arg, and Gly-Pro-Ala. It has been recently reported
that kumamolisin, and particularly a closely related pro-
tein from Alicyclobacillus sendaiensis (kumamolisin-As) can
efficiently cleave not only collagen-related peptides, but
also native type I collagen [23]. With the substrate-bind-
ing site of CLN2 resembling that of kumamolisin more
than sedolisin (the latter enzyme has low, if any, colla-
genase activity), the potential collagen-processing role of
CLN2 might warrant further investigation.

Since the catalytic machinery of CLN2 matches closely
that of sedolisin, kumamolisin, and other members of the
family of serine-carboxyl peptidases, the enzymatic mech-
anism of all these enzymes is most likely the same. Design
of inhibitors specific for CLN2 should incorporate the fea-
tures that have been proven to be important for the
related enzymes, such as the placement of an aldehyde
functionality capable of making covalent interactions
with the catalytic serine, or the utilization of chloromethyl
ketone for the same purpose. Since the few inhibitors that
have been successfully used in the studies of sedolisins are
either longer than tripeptides or contain blocking groups
on their N termini, new tripeptide-based inhibitors with

free N termini are now being synthesized (Oda, unpub-
lished). It will be necessary to test the binding properties
of different substrates in order to determine the most
promising peptide sequences. Analysis of the model of
CLN2 suggests that the size of the S1 subsite is much larger
than in either sedolisin or kumamolisin, and thus the use
of a large PI group might be indicated. Of course, the
availability of an experimental crystal structure will make
the design of inhibitors easier and we are continuing our
efforts to crystallize CLN2 from different sources.

Homology modeling
Three-dimensional, atomic-scale models of CLN2 were
developed by exploiting the sequence similarity to the
sedolisin and kumamolisin proteins (r.m.s. deviation of
1.0 A for 273 pairs of Ca atoms in the core of the
enzymes). Presently, these two enzymes are the only
members of the newly-defined sedolisin/serine-carboxyl
peptidase family [8] for which the crystal structures have
been published [5-7]. The actual Protein Data Bank [24]: entries used in the modeling were
1GA4.pdb and 1GT9.pdb for sedolisin and kumamolisin,

The first step was to form a global, multiple sequence
alignment between all known members of the sedolisin
family. Studies have shown that incorporating the specific
patterns of amino acid residue-type variation and conser-
vation among a family of homologous proteins provides
superior results over simple, pair-wise sequence align-
ment [25]. Sequence files representing the different
subfamilies were extracted from the non-redundant Gen-
Bank database [26] using sedolisin, kumamolisin, and the
human CLN2 sequences as queries to the web-based ver-
sion of the BLAST program [27]: http:// Initial multiple sequence
alignments were formed with the ClustalX computer pro-
gram [28]. As is expected for a family of proteins, highly-
conserved segments were found aligned to the crystal
structure-identified core regions of the sedolisin and
kumamolisin sequences. Subsequently, the sequences
were divided into two groups: those closer to sedolisin
than kumamolisin and vise-a-versa. The alignment of these
two groups was then manually set by the observed struc-
tural alignment of the sedolisin and kumamolisin
proteins. Finally, some additional adjustment was
required to correct the few places where highly conserved
residues of the core regions were slightly out of alignment
among different subfamilies of sequences.

The model of human CLN2 was built using the structure
of sedolisin completed with the inhibitor pseudo-tyrosta-
tin [5,6] as a template. The reason for this choice is that
while different protein models were generally compara-

Page 8 of 10
(page number not for citation purposes)

BMC Structural Biology 2003, 3

ble, the chosen inhibitor was most compatible with the
tripeptidase character of CLN2. With the correspondence
of residues specified in the alignments, atomic coordi-
nates were transferred to the target sequence by a variety
of methods, including the homology modeling modules
of the Look/GeneMine [29] and DeepView [30] computer
program packages. For the core and active site of the pro-
tein, coordinates for identical residues were simply trans-
ferred unchanged; whereas, special care was required to
position the side chains of residues differing from the
template. This was first accomplished automatically by
the two computer packages, then manually adjusted in
the Quanta molecular modeling package (Accelrys, Inc.)
to better mimic the templates and optimize the interac-
tions with surrounding residues. A similar two-step
approach was used to manifest the insertions and dele-
tions in the variable, loop regions of the protein, where it
was necessary to create new backbone as well as side chain
coordinates for the models. It should be noted that, for
obvious reasons, the conformation of poorly conserved
loop regions is generally the least accurate aspect of a
homology model. Fortunately, these problematic loops
will not significantly affect the active site of the model,
since only two of them impinge on the boundary of this
highly conserved, functional region.

Refinement and analysis of the model
The model was finished by performing energy minimiza-
tion in vacuo with the computer program CHARMM [31 ].
This refined the structure by bringing the covalent geome-
try and non-bonded interactions into agreement with
experimentally observed and calculated values. Such opti-
mizations included adjusting bond lengths, 3-point
angles and 4-point dihedral angles, as well as eliminating
atomic overlap and forming salt-bridges and hydrogen
bonds. Since presently the potential energy functions used
to describe the atomic-scale models are not sufficiently
comprehensive and accurate, the final energy of the
model was not used as an indicator of the realistic quality
of the structure. The final quality of the structure was ana-
lyzed with the computer program PROCHECK [32]. The
structure was deposited at the PDB under accession code

Authors' contributions
AW initiated this project and analyzed the genomic distri-
bution of this family of enzymes. SRD contributed the
modeling of the three-dimensional structure of CLN2. ML
analyzed the model and compared it to the crystal struc-
tures of sedolisins. HO, KO and BMD contributed their
experience gained from studies of serine-carboxyl pepti-
dases and the design of their inhibitors, aimed at analysis
of substrate-enzyme interactions and enzyme specificity.
All authors read and approved the final manuscript.

Extensive discussions with Dr. A. Barrett (Wellcome Trust Sanger Institute,
Hinxton, UK) are gratefully acknowledged. Modeling efforts were facilitated
by Dr. Robert Jernigan (Iowa State University). This work was supported
in part by a Grant-in-Aid for Scientific Research (B), 15380072, from the
Ministry of Education, Culture, Sports, Science and Technology of Japan (to
K.O.); by NIH grants DK18865 and A13921 I (to B.M.D.); and in part with
Federal funds from the National Cancer Institute, National Institutes of
Health, under contract No. NO I-CO-12400. The content of this publica-
tion does not necessarily reflect the views or policies of the Department of
Health and Human Services, nor does the mention of trade names, com-
mercial products or organizations imply endorsement by the U. S.

I. Ellis S: Studies on the serial extraction of pituitary proteins.
Endocrinology 1961, 69:554-570.
2. Sleat DE, Donnelly RJ, Lackland H, Liu CG, Sohar I, Pullarkat RK and
Lobel P: Association of mutations in a lysosomal protein with
classical late- infantile neuronal ceroid lipofuscinosis. Science
1997, 277:1802-1805.
3. Rawlings ND and Barrett AJ: Tripeptidyl-peptidase I is appar-
ently the CLN2 protein absent in classical late-infantile neu-
ronal ceroid lipofuscinosis. Biochim Biophys Acta 1999,
4. Lin L,, Sohar 1,, Lackland H, and Lobel P: The human CLN2 pro-
tein/tripeptidyl-peptidase I is a serine protease that autoac-
tivates at acidic pH. J Biol Chem 2001, 276:2249-2255.
5. WlodawerA, Li M, DauterZ, GustchinaA, Uchida K, Oyama H, Dunn
BM and Oda K: Carboxyl proteinase from Pseudomonas
defines a novel family of subtilisin-like enzymes. Nature Struct
Biol 2001, 8:442-446.
6. Wlodawer A, Li M, Gustchina A, Dauter Z, Uchida K, Oyama H,
Goldfarb NE, Dunn BM and Oda K: Inhibitor complexes of the
Pseudomonas serine-carboxyl proteinase. Biochemistry 2001,
7. Comellas-Bigler M, Fuentes-Prior P, Maskos K, Huber R, Oyama H,
Uchida K, Dunn BM, Oda K and Bode W: The 1.4 A crystal struc-
ture of kumamolysin: a thermostable serine- carboxyl-type
proteinase. Structure 2002, 10:865-876.
8. Wlodawer A,, Li M,, Gustchina A,, Oyama H,, Dunn BM, and Oda K:
Structural and enzymatic properties of the sedolisin family
of serine-carboxyl peptidases. Acta Biochim Polon 2003,
9. Lee BR, Furukawa M, Yamashita K, Kanasugi Y, Kawabata C, Hirano
K, Ando K and Ichishima E: Aorsin, a novel serine proteinase
with trypsin-like specificity at acidic pH. Biochem ] 2003,
10. Nishii W, Ueki T, Miyashita R, Kojima M, Kim YT, Sasaki N,
Murakami-Murofushi K and Takahashi K: Structural and enzy-
matic characterization of physarolisin (formerly physa-
ropepsin) proves that it is a unique serine-carboxyl
proteinase. Biochem Biophys Res Commun 2003, 301:1023-1029.
I I. Weber IT, Miller M, Jask61lski M, Leis J, Skalka AM and Wlodawer A:
Molecular modeling of the HIV-1 protease and its substrate
binding site. Science 1989, 243:928-93 1.
12. Anand K,, Ziebuhr J,, Wadhwani P,, Mesters JR, and Hilgenfeld R:
Coronavirus main proteinase (3CLpro) structure: basis for
design of anti-SARS drugs. Science 2003, 300:1763-1767.
13. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal
P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE,
Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B,
Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown
SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S,
Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins
FS, Cook LL, Copley RR, Coulson A, Couronne 0, CuffJ, Curwen V,
Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitza-
kis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn
DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A,
Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey
TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt
L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M,
Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A,

Page 9 of 10
(page number not for citation purposes)

BMC Structural Biology 2003, 3

HIavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I,
Jaffe DB,Johnson LS, Jones M,Jones TA, JoyA, Kamal M, Karlsson EK,
Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby
A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T,
Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S,
Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH,
McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD,
Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E,
Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash
WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor
MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin
KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC,
Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM,
Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J,
Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T,
Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith
DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M,
Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C,
Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M,
Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K,
Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson
RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM,
Zody MC and Lander ES: Initial sequencing and comparative
analysis of the mouse genome. Nature 2002, 420:520-562.
14. Katz ML and Johnson GS: Mouse gene knockout models for the
CLN2 and CLN3 forms of ceroid lipofuscinosis. EurJ Paediatr
Neurol 2001, 5 Suppl A: 109-1 14.
15. Orry A, and Wallace BA: A proposed model for the late-infan-
tile neuronal ceroid lipofuscinosis (Batten Disease) protein
CLN2. Protein Pept Lett 1999, 6:1-5.
16. Schechter I and Berger A: On the size of the active site in pro-
teases. I. Papain. Biochem Biophys Res Commun 1967, 27:157-162.
17. Sleat DE, Gin RM, Sohar I, Wisniewski K, Sklower-Brooks S, Pullarkat
RK, Palmer DN, Lerner TJ, Boustany RM, Uldall P, Siakotos AN, Don-
nelly RJ and Lobel P: Mutational analysis of the defective pro-
tease in classic late-infantile neuronal ceroid lipofuscinosis, a
neurodegenerative lysosomal storage disorder. Am j Hum
Genet 1999, 64:1511-1523.
18. Ezaki J, Takeda-Ezaki M and Kominami E: Tripeptidyl peptidase I,
the late infantile neuronal ceroid lipofuscinosis gene prod-
uct, initiates the lysosomal degradation of subunit c of ATP
synthase. j Biochem (Tokyo) 2000, 128:509-516.
19. Warburton MJ and Bernardini F: The specificity of lysosomal
tripeptidyl peptidase-I determined by its action on angi-
otensin-ll analogues. FEBS Lett 2001, 500:145-148.
20. Vines D and Warburton MJ: Purification and characterisation of
a tripeptidyl aminopeptidase I from rat spleen. Biochim Biophys
Acta 1998, 1384:233-242.
21. Bernardini F, and Warburton MJ: Lysosomal degradation of
cholecystokinin-(29-33)-amide in mouse brain is dependent
on tripeptidyl peptidase-I: implications for the degradation
and storage of peptides in classical late-infantile neuronal
ceroid lipofuscinosis. Biochem j 2002, 366:521-529.
22. McDonald JK, Hoisington AR and Eisenhauer DA: Partial purifica-
tion and characterization of an ovarian tripeptidyl peptidase:
a lysosomal exopeptidase that sequentially releases colla-
gen-related (Gly-Pro-X) triplets. Biochem Biophys Res Commun
1985, 126:63-71.
23. Tsuruoka N, Nakayama T, Ashida M, Hemmi H, Nakao M, Minakata
H, Oyama H, Oda K and Nishino T: Collagenolytic serine-car-
boxyl proteinase from Alicyclobacillus sendaiensis strain
NTAP-1: Purification, characterization, gene cloning, and
heterologous expression. Apple Environ Microbiol 2003, 69:162-169.
24. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H,
Shindyalov IN and Bourne PE: The Protein Data Bank. Nucleic
Acids Res 2000, 28:235-242.
25. Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T and
Chothia C: Sequence comparisons using multiple sequences
detect three times as many remote homologues as pairwise
methods. j Mol Biol 1998, 284:1201-1210.
26. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Rapp BA and
Wheeler DL: GenBank. Nucleic Acids Res 2002, 30:17-20.
27. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W and
Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation
of protein database search programs. Nucleic Acids Res 1997,

28. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F and Higgins DG:
The CLUSTAL_X windows interface: flexible strategies for
multiple sequence alignment aided by quality analysis tools.
Nucleic Acids Res 1997, 25:4876-4882.
29. Lee C, and Irizarry K: The GeneMine System for genome/pro-
teome annotation and collaborative data mining. IBM Systems
journal 2001, 40:592-603.
30. Guex N and Peitsch MC: SWISS-MODEL and the Swiss-Pdb-
Viewer: an environment for comparative protein modeling.
Electrophoresis 1997, 18:2714-2723.
31. Brooks BR,, Bruccoleri RE,, Olafson BD,, States DJ,, Swaminathan S,
and Karplus M: CHARMM: A program for macromolecular
energy, minimization, and dynamics calculations.J Comp Chem
1983, 4:187-217.
32. Laskowski RA, MacArthur MW, Moss DS and Thornton JM: PRO-
CHECK: program to check the stereochemical quality of
protein structures. Appl Crystallogr 1993, 26:283-291.

Page 10 of 10
(page number not for citation purposes)

Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours you keep the copyright

Submit your manuscript here: BioMedcentral adv.asp

BMC Structural Biology 2003, 3

University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs