<%BANNER%>
UFIR
xml version 1.0 encoding utf-8 standalone no
mets ID sort-mets_mets OBJID sword-mets LABEL DSpace SWORD Item PROFILE METS SIP Profile xmlns http:www.loc.govMETS
xmlns:xlink http:www.w3.org1999xlink xmlns:xsi http:www.w3.org2001XMLSchema-instance
xsi:schemaLocation http:www.loc.govstandardsmetsmets.xsd
metsHdr CREATEDATE 2013-01-11T20:09:07
agent ROLE CUSTODIAN TYPE ORGANIZATION
name BioMed Central
dmdSec sword-mets-dmd-1 GROUPID sword-mets-dmd-1_group-1
mdWrap SWAP Metadata MDTYPE OTHER OTHERMDTYPE EPDCX MIMETYPE textxml
xmlData
epdcx:descriptionSet xmlns:epdcx http:purl.orgeprintepdcx2006-11-16 xmlns:MIOJAVI
http:purl.orgeprintepdcxxsd2006-11-16epdcx.xsd
epdcx:description epdcx:resourceId sword-mets-epdcx-1
epdcx:statement epdcx:propertyURI http:purl.orgdcelements1.1type epdcx:valueURI http:purl.orgeprintentityTypeScholarlyWork
http:purl.orgdcelements1.1title
epdcx:valueString Chaperonin genes on the rise: new divergent classes and intense duplication in human and other vertebrate genomes
http:purl.orgdctermsabstract
Abstract
Background
Chaperonin proteins are well known for the critical role they play in protein folding and in disease. However, the recent identification of three diverged chaperonin paralogs associated with the human Bardet-Biedl and McKusick-Kaufman Syndromes (BBS and MKKS, respectively) indicates that the eukaryotic chaperonin-gene family is larger and more differentiated than previously thought. The availability of complete genome sequences makes possible a definitive characterization of the complete set of chaperonin sequences in human and other species.
Results
We identified fifty-four chaperonin-like sequences in the human genome and similar numbers in the genomes of the model organisms mouse and rat. In mammal genomes we identified, besides the well-known CCT chaperonin genes and the three genes associated with the MKKS and BBS pathological conditions, a newly-defined class of chaperonin genes named CCT8L, represented in human by the two sequences CCT8L1 and CCT8L2. Comparative analyses from several vertebrate genomes established the monophyletic origin of chaperonin-like MKKS and BBS genes from the CCT8 lineage. The CCT8L gene originated from a later duplication also in the CCT8 lineage at the onset of mammal evolution and duplicated in primate genomes. The functionality of CCT8L genes in different species was confirmed by evolutionary analyses and in human by expression data. Detailed sequence analysis and structural predictions of MKKS, BBS and CCT8L proteins strongly suggested that they conserve a typical chaperonin-like core structure but that they are unlikely to form a CCT-like oligomeric complex. The characterization of many newly-discovered chaperonin pseudogenes uncovered the intense duplication activity of eukaryotic chaperonin genes.
Conclusions
In vertebrates, chaperonin genes, driven by intense duplication processes, have diversified into multiple classes and functionalities that extend beyond their well-known protein-folding role as part of the typical oligomeric chaperonin complex, emphasizing previous observations on the involvement of individual CCT monomers in microtubule elongation. The functional characterization of newly identified chaperonin genes will be a challenge for future experimental analyses.
http:purl.orgdcelements1.1creator
Mukherjee, Krishanu
Conway de Macario, Everly
Macario, Alberto JL
Brocchieri, Luciano
http:purl.orgeprinttermsisExpressedAs epdcx:valueRef sword-mets-expr-1
http:purl.orgeprintentityTypeExpression
http:purl.orgdcelements1.1language epdcx:vesURI http:purl.orgdctermsRFC3066
en
http:purl.orgeprinttermsType
http:purl.orgeprinttypeJournalArticle
http:purl.orgdctermsavailable
epdcx:sesURI http:purl.orgdctermsW3CDTF 2010-03-01
http:purl.orgdcelements1.1publisher
BioMed Central Ltd
http:purl.orgeprinttermsstatus http:purl.orgeprinttermsStatus
http:purl.orgeprintstatusPeerReviewed
http:purl.orgeprinttermscopyrightHolder
Krishanu Mukherjee et al.; licensee BioMed Central Ltd.
http:purl.orgdctermslicense
http://creativecommons.org/licenses/by/2.0
http:purl.orgdctermsaccessRights http:purl.orgeprinttermsAccessRights
http:purl.orgeprintaccessRightsOpenAccess
http:purl.orgeprinttermsbibliographicCitation
BMC Evolutionary Biology. 2010 Mar 01;10(1):64
http:purl.orgdcelements1.1identifier
http:purl.orgdctermsURI http://dx.doi.org/10.1186/1471-2148-10-64
fileSec
fileGrp sword-mets-fgrp-1 USE CONTENT
file sword-mets-fgid-0 sword-mets-file-1
FLocat LOCTYPE URL xlink:href 1471-2148-10-64.xml
sword-mets-fgid-1 sword-mets-file-2 applicationpdf
1471-2148-10-64.pdf
sword-mets-fgid-3 sword-mets-file-3
1471-2148-10-64-S16.PDF
sword-mets-fgid-4 sword-mets-file-4
1471-2148-10-64-S22.PDF
sword-mets-fgid-5 sword-mets-file-5
1471-2148-10-64-S7.PDF
sword-mets-fgid-6 sword-mets-file-6 applicationmsword
1471-2148-10-64-S9.DOC
sword-mets-fgid-7 sword-mets-file-7
1471-2148-10-64-S12.PDF
sword-mets-fgid-8 sword-mets-file-8
1471-2148-10-64-S13.PDF
sword-mets-fgid-9 sword-mets-file-9
1471-2148-10-64-S18.PDF
sword-mets-fgid-10 sword-mets-file-10
1471-2148-10-64-S19.PDF
sword-mets-fgid-11 sword-mets-file-11
1471-2148-10-64-S17.PDF
sword-mets-fgid-12 sword-mets-file-12
1471-2148-10-64-S5.PDF
sword-mets-fgid-13 sword-mets-file-13
1471-2148-10-64-S6.PDF
sword-mets-fgid-14 sword-mets-file-14
1471-2148-10-64-S20.PDF
sword-mets-fgid-15 sword-mets-file-15
1471-2148-10-64-S2.DOC
sword-mets-fgid-16 sword-mets-file-16
1471-2148-10-64-S15.PDF
sword-mets-fgid-17 sword-mets-file-17
1471-2148-10-64-S3.PDF
sword-mets-fgid-18 sword-mets-file-18
1471-2148-10-64-S11.PDF
sword-mets-fgid-19 sword-mets-file-19
1471-2148-10-64-S14.DOC
sword-mets-fgid-20 sword-mets-file-20
1471-2148-10-64-S21.DOC
sword-mets-fgid-21 sword-mets-file-21
1471-2148-10-64-S1.DOC
sword-mets-fgid-22 sword-mets-file-22
1471-2148-10-64-S8.PDF
sword-mets-fgid-23 sword-mets-file-23
1471-2148-10-64-S10.DOC
sword-mets-fgid-24 sword-mets-file-24
1471-2148-10-64-S4.PDF
structMap sword-mets-struct-1 structure LOGICAL
div sword-mets-div-1 DMDID Object
sword-mets-div-2 File
fptr FILEID
sword-mets-div-3
sword-mets-div-4
sword-mets-div-5
sword-mets-div-6
sword-mets-div-7
sword-mets-div-8
sword-mets-div-9
sword-mets-div-10
sword-mets-div-11
sword-mets-div-12
sword-mets-div-13
sword-mets-div-14
sword-mets-div-15
sword-mets-div-16
sword-mets-div-17
sword-mets-div-18
sword-mets-div-19
sword-mets-div-20
sword-mets-div-21
sword-mets-div-22
sword-mets-div-23
sword-mets-div-24
sword-mets-div-25



PAGE 1

Xl_cct8 Xt_cct8 Bt_CCT8 Hs_CCT8 Ptr_73944 Mmu_E04280 62 Hs_CCT81P 74 Ec_L53750 68 Mm_Cct8 Rn_Cct8 18 100 29 Cf_L78339 Cf_L78399 25 100 Md_L12067 66 Gg_CCT8 98 100 0.02Human CCT8 pseudogenesXl_cct5 Gg_CCT5 Oa_L89686 Md_L13512 Rn_Cct5 Mm_Cct5 Cf_L06873 Ec_L70912 Ptr_CCT5 Mmu_CCT5 Hs_CCT5 25 Hs_CCT5-3P Hs_CCT5-2P Hs_CCT5-1P 99 100 79 100 93 96 82 100 74 100 0.05Human CCT5 pseudogenesSupplementary figure S9. M L trees of individual CC T monomer families including human pseudogenes (in red font). See Legends for Figure S5 and for Figure 2 for species abbreviations. The scale bar represents the indicated number of substitutions per position for a unit branch length. Xl_cct7 Gg_CCT7 Md_L33062 Rn_Cct7 Mm_Cct7 Hs_CCT72P 98 Bt_M33783 Ec_L50014 Cf_L10226 93 Hs_CCT7 Ptr_CCT7 Hs_CCT71P 35 98 50 28 40 80 99 0.02Human CCT7 pseudogenes



PAGE 1

conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 99984688888558875231899999999999999999 8 7 L E P T G V P V L I L K E G T Q R T Y G R E A L R T N I M I V R A I A E T L R T V E P M G I P V L I L K E G T Q R T A G R D A L R T N I M A A R AVA E M I K T T E P V G I P V I I L K E G T Q R S Y G R E A L R A N I M A V R A I A Q I L K T M A L G V P V L I L K E G T Q R V Y G R E A L RN N I L A A K V L AE V L K S M A L Y S V P V L I L K E G T Q R T I G R D A L R A N I M A A R A L AE V L K T M A A T G Y P V L I L K E G T Q R T Y G R E A L R A N I L A A R V L A E M L K S M M T G Q V P I L V L K E G T Q R E Q G K N A Q RN N I E A A K A I A D A V R T 0 4 0 3 0 2 0 1 1 conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 44788764677765142688647740432256665434 4 4 T Y G P K G M D K M L V D S L G D I T I T N D G A T I L D K M D V Q H P T A K L T Y G P K G M D K M L V D A L G D V T I T N D G A T I L D K A E I Q H P AA K M T Y G P K G M D K M L V D S L G D I T I T N N G A T I L D K M D V A H P AA K M S L G P R G L D K M L V D S F G D V T I T N D G A T I L K E M E I Q H P AA K L S L G P R G L D K M L V D S F G D I T V T N D G A T I V K E M E V Q H P AA K L S L G P R G L D K M L V D A F G D I T V T N D G A T I V K E M E I Q H P AA K L T L G P K G M D K M L V D S I G D II I S N D G A T I L K E M D V E H P T A K M 0 8 0 7 0 6 0 5 conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 32222136777777764110378999999999985278 8 4 V V Q I A K G Q D E E V G D G T K T A V I F A G E L L R YAE E L L D K N V H P L V Q I AK S Q D Y E V G D G T K T A V I F A G E L L R H A E E L L D K N I H P L V Q I S K G Q E D E A G D G T K T T V I F A G E L L K E AE K L L D I N I H P M VEVAK A Q D A E V G D G T T S A V V L A G M L L D R A E N L L D Q N I H P L VEVAK A Q D A E V G D G T T S A V V F A G A L L E K AE E L L E Q N I H P L VEVAK A Q D A E V G D G T T S V V V L A G A L L E K AE K L L D E N L H P I VEVSK A Q D T A V G D G TT T A V V L S G E L L K Q A E T L L D Q G V H P 0 2 1 0 1 1 0 0 1 0 9 conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 46415789999999999998744888668999999988 5 4 T I I V S G YKKA A E E A V K K L H E I A E P I N D E E T L K K I A M T S L T T I I V G G Y R K A L E E A L S F L Y Q I A E P I NN D E T L KKV A R T A L T T I I V E G YKE A L R K ASE V I E S I A E P Y D D V E K L K L I A K T S L N T T I I E G YKK A L D F A L A E L E K L G I K I N D K Q L L K R I A S T S L Y T II I E G Y T K A M K E A I R I L E E I A I K P M D R G L L R K I V D T A I A T II I E G Y T K A M E E A L R L V D E A A V P V E D D S V L R R I A E T T L A T V I S N G Y R L A V N E A R K I I D E I A E K S T D D A T L R K I A L T A L S 13 0 14 0 15 0 160 1 A 2 B C 3 D C E 4 F N-TERMINAL EQU A T ORIAL DOMAIN 1A6D .A Sec Str description 1A6D .A Sec Str description 1A6D .A Sec Str description 1A6D .A Sec Str description

PAGE 2

conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 86124756999999999999985378873688645589 9 8 SK A V H G A R E H L A E I V V K A V R Q V A E K R G D K W Y I D L D A I Q I I SK A V H E A R D Y F A E I S V K A I K Q I A E K R G D K Y Y I D L D N V Q I I SKAVAE A R D Y F A E L AV E A V R T V A E R R G D R W Y V D L N N I Q I V SKY V G S G L D K L T D M V VE A V L K V A E P R G D G T Y V R L D R I K I E SK Y I G K G G E K L A N M A I D A A L T V A E R R P D G T Y F R I D D V K I E S K F V G T G R D K I I S M V I D A I R T V A E K R P D G G Y V D L D YV K I E G K N T G L S N D F L A D L VV K A V N A V A E V R D G K T I V D T A N I K V D 17 0 18 0 19 0 200 conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 54677514645787478447752576433210142789 8 5 K K H G G S L R D T K L I Y G I V L D K E V V H P G M P K K V E N A Y I V L L D KK Y G G S L L D S L L V Y G I V L D K E V V H P G M P R R V E N A K I A L I D K K H G G S L R D T R L V R G I V L D K E V V H P D M P R R V E N A R I A L L D KK K G G S L L D S Q L V E G I V L D K E V V H P G M P K R V E N A Y I V L L D KK K G G S I A D T Q L V Y G I V L D K E V V H P G M P R R V E N A K I A L I D KK K G G S L L D S K L V R G I V L D K E V V H P A M P K R V E N A K I L V L D K K N G G S V N D T Q F I S G I V I D K E K V H S K M P D V V K N A K I A L I D 21 0 22 0 23 0 240 conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 37741478548998746747899999999999999999 9 9 A P L EV E K P E I D A E I R I N D P A F L K K F L E E E E K I L E E M V N K I A P L E I E K P E I D A E I R I T D P S Q L R A F L D Q E E E I L K K M V D K I T P L E I E K P E I D L E I S I T S P E Q I K A L Y E K Q E R I L Q E K I E K I A P L EV E K P E I T A K I N I T S P E Q I K A F L D E E A R L L K E M VE K I A P L EV E K P E I T A K I N I T S P D L I K A F L D E E A K L L K E M V D K I A P L E V Q K P E L T T K I R V T D I E K L E S F L E E E T R M L R D M VE K I S A L E I K K T E I E A K V Q I S D P S K I Q D F L N Q E T N T F K Q M VE K I 25 0 26 0 27 0 280 conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 98458748874561178888872256157665133657 8 8 Y N V A M EV V I T Q K G I D EV A Q H F L A K K G I M A V R R V K R S D I E K A D T G A N V V I C Q K G I D EV A Q H F L A K K G I L A V R R V K R S D M E K A A T G A N V V I T Q K G I D D V A Q H F L A K K G I L A V R R V K R S D I E K Y N I A L EV V I T Q K G I D EV A Q H F L A K K G I M A V R R V K R S D L E K A D T G A N V V I C Q K G I D EV A Q H F L A K K G I L A V R R V K R S D M E K A A T G A N V V I T Q K G I D E V A Q H F L A K K G I L A V R R V K R S D I E K KK S G A N V V L C Q K G I D D V A Q H Y L A K E G I Y A V R R VKK S D M E K 29 0 30 0 31 0 320 5 6 7 8 9 10 1 1 H 12 I 13 G N-TERMINAL INTERMEDIA TE DOMAIN APIC AL DOMAIN 1A6D .A Sec Str description 1A6D .A Sec Str description 1A6D .A Sec Str description 1A6D .A Sec Str description

PAGE 3

conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 88861770552036767777776531342103787568 8 8 I A R A T G A K I V S N I D D L T P E D L G F A K L V E E R K V G E D K M V F I L EK A T G G R I I S N I D D L K P E D L G E A E L V E E R K V G E D K M V F V I A R A T G A R I V T D I E D L R P E D L G Y A E L V E E R K V G E D K M V F I L EY A T G G R I V S S L R D L K P E D L G F A K L V E E R K V G N D K M V F I L EK A T G G K I V S S I R D L K P E D L G Y A E L V E E R R V G N D K M V F V VAK A T G A K I V T S L R D L K P E Y L G Y A E L VE E R K V G E D K M V F I L AK A T G A K I V T D L D D L T P S V L G E A E T V E E R K I G DD R M T F V 330 340 350 360 conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 62788845889874587689999999999999999998 5 2 E G C P N P R A V T I L I R G G L E R L V D E A E R S I N D A L H AV A D A I R E G C K N P KAV S I V I R G G L E R L V D E A E R S L R D A L A A T A D A V K E G A K N P KS V T I L L R G G F E R L V D E A E R S L H D A L S VV A D A I M E G C P N P KA V T I L L R G A N D M V L D E A E R S I N D A L H V L R N V L R E G C K N P KA V T I L V R G A N D M V L D E V E R S L K D A L N V L R N V M R E G A K N P KS V T I L L R G A N D M L L D E A E R N I K D A L H G L R N I L R M G C K N P KAV S I L I R G G T D H V V S E V E R A L N D A I R VV A I T K E 370 380 390 400 conf jnet HYPBU1 STAMA1 AERPE HYPBU2 STAMA2 AERPE2 1A6D.A 78612477657899999999999853788612689999 9 9 D G K I V A GG G A V E V E V A K Y L R E I A P K I G G K E Q L AVE A F A R A D G K I V A GG G A V E V E L A K H L R K Y A K T V G G K E Q L A I E A F A K S D G K I V A GG G A V E A E V A K V L Y E Y A S K L P G K T Q L AVE A F A R A K P M I V P GG G A V E V E L A L R L R K F A E S L G G K E Q L A VEAYA E A V P K I L P GG G A P E V E L A L R L R E F A A K I G G K E Q L A I E A F A A A E P K I V GGG G A V E V E L A L K L K E F A R T V G G K Q Q L A I EAYA E A D G K F L W GG G A V E A E L A M R L A K Y A N S V G G R E Q L A I E A F A K A 410 420 430 440 conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 99999999874589768999999853788855888874 5 7 L E G L P M A L A E N A G L E P V E I I M K L R A A H A K A E K W V G V N V F K L E G L V M A L A E N A G L D P I E I I M K L R A A H E K E E K W I G I N V F T VE A L P Q A L A H N A G H D P I E V L V K L R S A H E K P E K W Y G V D L D T L E E I P M I L A E S A G M D A L Q A L M D L R R L H A E G K T L A G I N V L N L E E I P M I L A E T A G Q D P L E V L M K L R Q L H S E G K I N A G I D V I N L E T I P T V L A E S A G M D A L E A L L K L R S L H S Q G Y K F A G V N V L E L E I I P R T L A E N A G I D P I N T L I K L K A D D E K G R I S V G V D L D N 450 460 470 480 J 14 15 16 17 K 18 L M O 19 N C-TERMINAL INTERMEDIA TE DOMAIN C-TERMINAL EQU A T ORIAL DOMAIN 1A6D .A Sec Str description 1A6D .A Sec Str description 1A6D .A Sec Str description 1A6D .A Sec Str description

PAGE 4

conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 7765310115523477899999999999887630055 1 1 2 G D V D D M K K L G V I E P V S V K A N A I K A G T E A A T M V L R I D D I I A G D L A N M M E L G V I E P V S I K A N A I K S G V E A A T M V L R I D D V I A G E I V D M W S R G V L E P M R V K L N A L K A A T E V A S L I L R I D D V I A S K I E D M V K I N V I E P I L V K E Q V L K S A T E A A T T I L K I D D V I A G K V V D I T K I N V V E P L I V K T N V I K S A T E A A T T I L K I D D I I S G K I E D M T K I N V Y E P V L V K K Q V I KS A S E A A I S I L K I D D V I A N G V G D M K A K G V V D P L R V K T H A L E S A V E V A T M I L R I D D V I A 490 500 510 520 conf jnet HYPBU1 STAMA1 AERPE1 HYPBU2 STAMA2 AERPE2 1A6D.A 5677777777777777667777 8 9 9 R EEKEEEK K K G E E E E K E E K E E I E K P E E T G K K G G E G G E E E KEEEEKE E K R G G E E E K T EEK K K G K K G G E E E KKEEKEKE K E R G A G G K F G G F P KKKE K K G K T G E E E E E E G G G S K F E F SK K S T P P S GQGGQG Q G M P G G G M P E Y 530 54020 P 21 Q 22 1A6D.A Sec Str description 1A6D.A Sec Str descriptionSupplementary figure S10. Secondary-structure predictions (line "jnet") from the alignment of archaeal thermosome-subunit sequences, aligned to the secondary-structure description (line "1A6D.A Sec Str description") of the thermosome alpha-subunit of Thermoplasma acidophilum (sequence 1A6D.A), as provided in the PDB entry 1a6d, chain A. Included in the alignment are also each of the two thermosomesubunit sequences from the Crenarchaeota Aeropyrum pernix (AERPE), Staphylothermus marinus (STAMA) and Hyperthermus butylicus (HYPBU). Red cylinders represent alpha helices. Beta strands are represented as yellow arrows. Helices described in the thermosome PDB structure are labelled alphabetically and strands are labelled numerically, following the assignents shown in Figure 6. Vertical lines indicate boundaries between tertiary-structure domains (apical, intermediate and equatorial). Predictions obtained using JPRED-3. Histidine residues are colored in red, proline residues in blue, cysteine residues in yellow and aliphatic residues I, L and V are boxed in the JPRED-3 output. Position-specific confidence in the secondary structure assignments is scored in line "conf" from 0 (lowest confidence) to 9 (highest confidence). A score S indicates a posterior probability ( p) range such that S p < S /10+1. /10



PAGE 1

RESEARCHARTICLEOpenAccess Chaperoningenesontherise:newdivergent classesandintenseduplicationinhumanand othervertebrategenomes KrishanuMukherjee 1,2 ,EverlyConwaydeMacario 3 ,AlbertoJLMacario 3* ,LucianoBrocchieri 1,2* Abstract Background: Chaperoninproteinsarewellknownforthecriticalroletheyplayinproteinfoldingandindisease. However,therecentidentificationofthreedivergedchaperoninparalogsassociatedwiththehumanBardet-Biedl andMcKusick-KaufmanSyndromes(BBSandMKKS,respectively)indicatesthattheeukaryoticchaperonin-gene familyislargerandmoredifferentiatedthanpreviouslythought.Theavailabilityofcompletegenomesequences makespossibleadefinitivecharacterizationofthecompletesetofchaperoninsequencesinhumanandother species. Results: Weidentifiedfifty-fourchaperonin-likesequencesinthehumangenomeandsimilarnumbersinthe genomesofthemodelorganismsmouseandrat.Inmammalgenomesweidentified,besidesthewell-knownCCT chaperoningenesandthethreegenesassociatedwiththeMKKSandBBSpathologicalconditions,anewly-defined classofchaperoningenesnamedCCT8L,representedinhumanbythetwosequencesCCT8L1andCCT8L2. Comparativeanalysesfromseveralvertebrategenomesestablishedthemonophyleticoriginofchaperonin-like MKKSandBBSgenesfromtheCCT8lineage.TheCCT8LgeneoriginatedfromalaterduplicationalsointheCCT8 lineageattheonsetofmammalevolutionandduplicatedinprimategenomes.ThefunctionalityofCCT8Lgenes indifferentspecieswasconfirmedbyevolutionaryanalysesandinhumanbyexpressiondata.Detailedsequence analysisandstructuralpredictionsofMKKS,BBSandCCT8Lproteinsstronglysuggestedthattheyconservea typicalchaperonin-likecorestructurebutthattheyareunlikelytoformaCCT-likeoligomericcomplex.The characterizationofmanynewly-discoveredchaperoninpseudogenesuncoveredtheintenseduplicationactivityof eukaryoticchaperoningenes. Conclusions: Invertebrates,chaperoningenes,drivenbyintenseduplicationprocesses,havediversifiedinto multipleclassesandfunctionalitiesthatextendbeyondtheirwell-knownprotein-foldingroleaspartofthetypical oligomericchaperonincomplex,emphasizingpreviousobservationsontheinvolvementofindividualCCT monomersinmicrotubuleelongation.Thefunctionalcharacterizationofnewlyidentifiedchaperoningeneswillbe achallengeforfutureexperimentalanalyses. Background Hsp60-likechaperoninproteinsarewellknownfortheir roleinassistingproteinfoldingandinprotectingcells fromthedeleteriouseffectsofstress[1-5].Theeukaryoticcellexpressesrepresentativesoftwodistinctgroups ofchaperoningenesthatareotherwisetypicalof bacteria(GroupI)orarchaea(GroupII).Ineukaryotes, GroupIchaperoninsaremostlyexpressedinmitochondriaandchloroplasts,andGroupIIchaperoninsare foundintheeukaryoticcytosol[1,6-10].Chaperonin proteinsformtypicalmulti-subunitdouble-ringedstructurescollectivelycalled “ chaperonins ” [9-13].TheGroup Ichaperoninsaretypicallyformedbytheproductsofa singlegene( groEL inbacteria; hsp60 / cpn60 inmitochondria)assembledintoa14-subunitdouble-ringed structureinbacteriaandintoadoubleorsingle-ringed structureinmitochondria[14].EukaryoticGroupII *Correspondence:macarioster@gmail.com;lucianob@ufl.edu 1 DepartmentofMolecularGeneticsandMicrobiology,UniversityofFlorida, CollegeofMedicine,1660SWArcherRoad,Gainesville,FL32610,USA 3 UniversityofMaryland,ColumbusCenter,701EastPrattStreet,Baltimore, MD21202,USA Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 2010Mukherjeeetal;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited.

PAGE 2

chaperoninproteinsassembleinasimilardouble-ringed oligomericstructure,calledTRiCorCCTcomplex[15], composedof16subunitsthatinhumanareencodedby ninedistinctgenes( tcp1 / cct1 cct2-5 cct6A-B cct7-8 ) [8-10].TheCCTcomplexismostlyknownforitsrole infoldingthecytoskeletonproteinsactinandtubulin [7,16]andmutationsinindividualCCTsubunitsleadto defectsinthefunctioningofthecytoskeletonandmitosisarrest[17]. Asforotherchaperones,themalfunctioningofchaperoninproteinshasbeenassociatedwithvarious humanpathologicalconditio ns,thechaperonopathies [18-20].Inthisrespect,besidesthecanonical cct and cpn60 genesdescribedabove,threedivergent hsp60 -like geneshavebeenmorerecen tlyidentified[21-23]in associationwithpathologicalconditions.Onegene, MKKS[21],wasnamedforitsassociationwiththe developmentaldiseaseMcKusick-KaufmanSyndrome andwassoonafteralsoident ifiedasBBS6[24]forits associationwiththeBardet-BiedlSyndrome(BBS), anotherdevelopmentalcond itioninvolvingciliumrelateddysfunction[25].Morerecentlytwoother hsp60 -likeBBSgenes,namedBBS10[22]andBBS12 [23],havebeenidentifiedamongfourteengenes(BBS1 toBBS14)sofarassociatedwithBBS.TheproteinproductsofMKKS/BBS6,BBS10andBBS12localizetothe basalbodyofciliaandtothecentrosome[26-28].We willhereafterrefertotheMKKS/BBS6geneasMKKS, andcollectivelytothethree hsp60 -likeBBSgenesasthe “ BBSgenes ” .Theidentificationofthesegenesprovides newperspectivesonthespectrumoffunctionalitiesof Hsp60-likeproteinsineukaryotesandontheirrolein development. Therecognitionofchaperonopathieshasincreasedthe importanceofelucidatingtheentiresetofchaperone genespresentinthehumangenome[19].Thework reportedherewasconceivedto:a)identifyallHsp60likesequencesencodedinthehumanandothergenomesincludingalldivergedchaperoningenes;b)reconstructtheevolutionaryoriginsandrelationsofdiverged chaperoningenes;c)distinguishwithbioinformatics methodsfunctionalgenesfrompseudogenes;d)characterizestructuralpropertiesofthecorrespondingproteins.Wemostlydevotedourattentiontothe characterizationoftheevol utionaryhistoryandstructuralpropertiesofnewlyorrecentlyidentified sequences,referringthereadertothevastamountof publishedliteratureforinformationonfunctional/structuralpropertiesandtheevolutionaryhistoryofmitochondrialCpn60orCCT-complexproteins. Exhaustivesearchesof hsp60 -likesequenceswerecarriedoutinhumanandothergenomesfollowingand extendingour “ chaperonomics ” methodologicalprotocol [29].Theextensiveanalysisofthegenomesofhuman andothervertebratespeciesleadtotheidentification andcharacterizationofmanypreviouslyunknown sequencesandtothediscoveryofanew,mammal-specificclassofchaperoninprot eins.Classification,evolutionaryanalysisandstructuralcharacterizationof divergedchaperonin-likesequencesshouldprovidevaluableinformationforfuturestudiesonthefunctional rolesoftheseproteins.ResultsChaperoninsequencesinthehumangenomeToidentifyallhuman hsp60 -likesequenceswequeried thehumangenomeusingtheninehumanCCTsubunit andmitochondrialCpn60sequences.Analogousextensivesearcheswereperformedinthemouseandratgenomesusingcorrespondingqueries.Inthehuman genome,wefoundatotalof54sequenceswithsignificantsimilaritytoHsp60proteins(Tables1and2).FifteensequenceshadaNCBIEntrez[30]genedescriptor assigned.Nineofthesecorrespondedtothecanonical CCT-subunitsequencesandone,HSPD1,encodedthe mitochondrialCpn60protein.ThreesequencescorrespondedtotheBBSgenesMKKS,BBS10andBBS12. Werecoveredtwoadditionaluncharacterizedsequences designatedintheNCBIEntrezGenedatabaseas CCT8L1andCCT8L2.BesidesthesecompleteHsp60likesequences,asequencedomainconservedacross eukaryotespecieswithhighestsimilaritytotheapical domainoftheCCT3proteinhasalsobeenreportedin PIKFYVE[31],akinasebelongingtotheFab1pprotein familyinvolvedincornealpathologicalconditions[32]. Inaddition,weidentified39otherhuman hsp60 sequencesthatdidnotcorrespondtoagenedescriptor intheNCBIEntrezGenedatabase(Table2).Allof thesesequencescontainedin-framestopcodonsor frame-shifts,suggestingthattheyweremostlikelypseudogenes.Thirty-fiveofthesehadnotbeendescribedin thePseudogene.orgpseudogenedatabase[33]and33 werenotlistedintheEnsembldatabase[34],andare hereannotatedandclassifiedforthefirsttime.Inanalogoussearchesofthecompletegenomesofmouseand rat,weidentifiedineachgenome14chaperoningenes (nineforthecanonicalCCTmonomers,oneforthe mitochondrialCpn60,threeBBSgenesandoneCCT8L gene),38pseudogenesinmouseand61pseudogenesin rat(seeadditionalfile1:TableS1,formousesequences, andadditionalfile2:TableS2,forratsequences).EvolutionaryoriginsofhumanBBSandCCT8LgenesAmaximum-likelihood(ML)phylogenetictreeof humanchaperonin-likeproteins(Figure1a)indicated thatHsp60-likeBBSproteinsaremonophyletic(bootstrapsupport86%)andthattheircommonancestor derivedfromaduplicationeventintheCCT8lineageMukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page2of19

PAGE 3

(bootstrapsupport88%).Thetreealsoshowedthatthe uniqueancestorofthetwocloselyrelatedgenes CCT8L1andCCT8L2alsooriginatedintheCCT8lineagefromamorerecentduplicationevent(bootstrap support75%).TherelationofBBSandCCT8Lproteins withtheCCT8chaperoninsubunitwasconfirmedwith strongconditionalprobabilitysupport(0.99)byBayesian treeconstruction(Figure1b). AlthoughtheassociationofBBSandCCT8Lproteins withtheCCTlineagewasrobustlysupported,thehigh divergenceofthesesequencescouldproduceclustering inthetreesduetolong-branchattraction.Toaddress thisconcern,webuiltindependentMLtreesforeach BBSorCCT8Lsequenceaddingthemseparatelytothe treeofCCTsubunits.Allindividualtreesconfirmed withstrongbootstrapsu pporttheassociationofeach BBSorCCT8LlineagewiththeCCT8lineage(seeadditionalfile3:FigureS1,additionalfile4:FigureS2,additionalfile5:FigureS3andadditionalfile6:FigureS4). AMLevolutionarytreeincluding hsp60 -genehomologs foundinthegenomesofeighteenothervertebratespecies,includingrepresentativesofseveralmammals, chicken,frogs,andfish,al soconfirmedtheoriginof BBSandCCT8LgenesfromtheCCT8lineage(see additionalfile7:FigureS5). WedidnotfindCCT8Lgenesinthegenomesof chicken, Xenopuslaevis ,or Daniorerio ,representatives respectivelyofthereptile/bird,amphibianandfish lineages.However,amongmammalsweidentified orthologsofCCT8Lgenesingenomesnotonlyofplacentalmammals(Eutheria),butalsoofthemarsupialopossum(Metatheria)andof theegg-layingplatypus (Prototheria),suggestingthattheCCT8Lgeneclassoriginatedattheonsetofmammalevolution.AllCCT8Lgene orthologswereintron-less,indicatingthattheirancestor originatedfromaretro-transpositionevent.Twocopiesof CCT8Lsequenceswerefoundinhumanandchimpand oneCCT8Lgeneinallothergenomesexamined,includingthosefromtheotherprimaterhesusmonkey( Macaca mulatta )andgraymouselemur( Microcebusmurinus ) (Figure2),suggestingthataduplicationoftheCCT8L geneoccurredinHominoideaaftertheirseparationfrom oldworldmonkeys.However,thelonegenecopyof CCT8Lidentifiedinrhesusmonkeyclusteredwith CCT8L1inevolutionarytrees(Figure2),suggestingan earlierduplicationofthegeneandsuccessivelossofthe CCT8L2copyfromthegenomeofrhesusmonkey.Close inspectionofproteinalignmentsrevealedthattherhesus monkeyCCT8Lsequenceincludedananomalously divergedsegmentofabout50aminoacidsofuncertain alignment.Excludingthissegmentfromtheanalysiswe obtainedadifferentandmorerobustlysupportedtree topology(75%vs.20%bootstrapvalue,seeadditionalfile 8:FigureS6,panelsaandb),consistentwithalaterduplicationoftheCCT8LgeneinHominoidea.Thetreealso indicatedthattheremovedsegmentwasaloneresponsible fortheoverallhigherevolutionaryratepredictedforthis sequence(seeadditionalfile8:FigureS6). Table1Thehuman hsp60 genesName1AlternativenamesStart2End3Str4Chr5Loc6IF7Exons8aa9CCT110TCP1,CCTa,CCT a TCP-1a 160,119,520160,130,731-6q25.3212,7556,401 CCT2 CCT b ,TCP-1b 68,266,31768,280,052+12q15114535 CCT3 CCT g ,TCP-1g 154,545,617154,572,307-1q23.1313,13,12545,544,507 CCT4 CCT ,TCPD,TCP-161,950,07661,969,146-2p15113539 CCT5 CCT ,TCP1E,TCP-110,303,45310,317,892+5p15.2111541 CCT6A CCT ,CCT -1,TCP-1,CCT6,Cctz,HTR3, TCP20,TCPZ,TTCP20 56,087,03656,098,269+7p11.2214,13531,486 CCT6B CCT -2,TCP-1-2,Cctz2,TSA303,Tcp2030,279,18330,312,525-17q12114530 CCT7 CCT h ,TCP-1h ,Ccth,NIP7-173,320,27973,333,494+2p13.2212,7543,339 CCT8 CCT ,TCP-1,Cctq29,350,67029,367,782-21q21.3115548 CCT8L1 LOC155100151,773,495151,775,165+7q36.111557 CCT8L2 GROL,CESK115,451,77015,453,440-22q11.111557 MKKS BBS610,333,89810,342,162-20p12.224,4570,570 BBS10 C12orf58,FLJ2356075,263,72775,266,269-12q21.212723 BBS12 C4orf24,FLJ35630,FLJ41559123,882,498123,884,627+4q2711710 HSPD1 GROEL,HSP60,SPG13,CPN60,HuCHA60198,060,018198,071,817-2q33.1211,11573,573 (PIKFYVE)11CFD,FAB1,PIP5K,PIP5K3209,182,591209,190,094+2q34152241OfficialNCBIEntrezgenedatabasename;2Startand3Endofcodingregion;4Strand “ + ” indicatessequencedstrand. “ ” indicatescomplementarystrand;5Chromosome;6Chromosomelocation;7Numberofisoforms;8Numberofexons.Multiplenumbersindicatethenumberofexonsineachisoform;9Totalamino acids;10TheofficialEntreznameisTCP1.CCT1improvesconsistencywithothersubunitgenenames.11Fab1_TCPsequencedomainofPIKFYVEkinase,most similartotheapicaldomainofCCT3.Featuresrefertothedomainportionofthegene/protein.Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page3of19

PAGE 4

Table2Thehuman hsp60 pseudogenesName1Start2End3Str4Chr5Loc6Ex7P/D8Ka/Ks9LRT10FS11SC12aa13CCT1-1P19,986,63819,987,216+12p12.21P0.750.1652190 CCT1-2P41,621,75641,623,646-5p13.12?D1.210.1575512 CCT1-3P1442,801,03042,802,033+7p14.13D0.682.1011367 CCT3-1P16,177,57816,178,178+8p221P1.020.022159 CCT4-1P64,177,57864,409,590+Xq123D0.651.7603512 CCT4-2P140,344,301140,345,787-7q344D0.821.24210278 CCT5-1P14,1578,382,08678,382,680+13q31.11P0.810.2034549 CCT5-2P1578,382,86678,382,967-13q31.11?---134 CCT5-3P114,876,388114,877,290+5q22.31P0.252.1263201 CCT6-1P14,692,96514,693,954-5p15.21P1.000.022330 CCT6-2P109,013,584109,014,117-11q22.31P0.900.002178 CCT6-3P1664,162,81264,171,325+7q11.218D0.433.0654289 CCT6-4P191,915,332191,916,879+3q281P0.57 6.90 **94292 CCT6-5P14,1664,853,56464,865,440+7q11.2110D0.840.34126399 CCT7-1P1492,251,62792,307,366-5q151P0.451.8835145 CCT7-2P150,242,815150,243,240+6q25.11P0.870.1034552 CCT8-1P14145,141,482145,143,137-1q21.11P1.140.1023561 HSPD1-1P14135,744,902135,745,039-5q31.11P1.460.270048 HSPD1-2P14,1721,919,40221,920,175-5p14.31P0.900.1011264 HSPD1-3P43,602,02943,602,280-20q13.124D--0184 HSPD1-4P88,065,67388,066,269+6q151P0.551.0845199 HSPD1-5P1455,191,05355,192,769+12q13.21P0.563.0821499 HSPD1-6P1436,783,61236,785,195-3P22.31P0.592.4622443 HSPD1-7P187,263,9387,265,475+8p23.11P1.120.1854396 HSPD1-8P145,986,418145,987,946+4q31.211P0.632.3243458 HSPD1-9P187,785,9327,787,502-8p23.11P0.910.0853416 HSPD1-10P8,058,8848,082,857+12p13.312?D0.780.9411307 HSPD1-11P95,130,45995,132,169+5q155D0.742.4466375 HSPD1-12P78,321,37278,323,341+13q31.11P0.62 4.98 *54410 HSPD1-13P153,068,626153,068,943+6q25.21P0.542.8412108 HSPD1-14P1437,465,28837,466,827-13q13.34D0.683.363361 HSPD1-15P19,269,39419,270,353+5p14.34D0.741.2444241 HSPD1-16P105,082,802105,083,755+11q22.32?D0.51 6.24 *54199 HSPD1-17P34,077,07034,078,293+1p35.13D0.741.4822217 HSPD1-18P56,105,68456,108,736+20q13.325D0.48 10.84 **32299 HSPD1-19P50,318,86850,319,008+10q11.231?2.420.720047 HSPD1-20P78,924,34178,924,478-12q21.311?0.403.080046 HSPD1-21P60,994,43060,994,876-5q12.16D--06155 HSPD1-22P29,181,85129,183,334-21q21.32?D0.691.4343441PseudogenenamesfollowtheHUGOnomenclature.Theyarecomposedofthenameoftheparentalgenefollowedbyauniquenumberidentifierandthe suffix “ P ” (Pseudogene);2Startand3Endpositionsofthepseudogeneonthechromosome;4Strand;5Chromosome;6Locationonthechromosome;7Numberof exons.Aquestionmarkindicatesgenefragmentswithuncertainnumbersofexons;8Processed(P),duplicated(D)orundetermined(?);9Ratioofnonsynonymousvs.synonymoussubstitutionrates;10LikelihoodRatioTest(LRT)values.Valuesdifferentfrom1.0withprobability p <0.01(**)or p <0.05(*)are showninbold-face;11NumberofFrame-Shiftsrecognizedinthecodingregionofthepseudogene;12Numberofin-frameStopCodonsrecognizedinthecoding regionofthepseudogene;13Lengthinaminoacidsofpseudo-translationoftherecognizedpseudogenesequence;14Tenpseudogenespreviouslyreportedin theEnsembl(roman),Pseudogene.org(italics)orNCBI(bold)databases:CCT1-3P=OTTHUMG00000033751;CCT5-1P= Human.chr13.mb78 ;CCT6-5P= ENSP00000275603, Human.chr7.mb64 ;CCT7-1P=ENST00000399032;CCT8-1P= Human.chr1.mb145 ;HSPD1-1P=ENSG00000162241, Human.chr5.mb135 ;HSPD12P=ENSP00000328369;HSPD1-5P=LOC644745;HSPD1-6P=LOC645548;HSPD1-14P=OTTHUMG00000016753;15,16,18Tandemlyduplicated;17Previously identifiedasHsp60s2(Hsp60shortform2).Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page4of19

PAGE 5

DifferentiationrateofBBSandCCT8LproteinsThebranchlengthsofthetreesshowninFigure1indicatethatBBSandCCT8Lproteinshavedifferentiatedat muchhigherratesthanCCTsubunits.Weapplieda newly-developed,unbiasedmeasureofdifferentiation called “ B-index ” (seeMethods)tocalculatedifferentiationofMKKS,BBS10andBBS12proteinsfromtheir respectivelastancestorcommontoActinopterygii(rayfinnedfishes)andSarcoptery gii(includingtetrapods), determinedbyrootingthetreeswithCCT8proteins fromcorrespondingfishandtetrapodspecies.Similarly, wecalculateddifferentiationofCCT8Lproteinsfroma eutherialancestorrootingtheirtreewithcorresponding setsofCCT8proteins(seefootnotesofTable3and legendforFigure2forspeciesrepresentedineachtree). WeestimatedfortheMKKSfamilyanaverageevolutionarydistancefromtheirrootofalmost0.7substitutionspersite,correspondingtoa6-foldhigherrateof differentiationcomparedtothenumberofsubstitutions estimatedinCCT8proteinsoverthesameperiodof time.ForBBS10andBBS12,wecalculatedadistanceof about1.0-1.2substitutionspersite,correspondingtoa substitutionrateabout8-10timeshigherthaninCCT8. Finally,forthemammal-specificfamilyofCCT8Lproteins,weestimatedanevolutionarydistancefromtheir mammalrootofabout0.3substitutionspersite.The smallerdivergenceofCCT8LproteinscomparedtoBBS proteinsreflectsthemorerecentoriginoftheCCT8L gene.However,whenscaledtotheevolutionofCCT8 sequencesoverthesameperiodsoftime,thesubstitutionrateofCCT8Lproteinswasabout14-15times higherthaninCCT8and1.4-2.3timeshigherthanin BBSproteins.FunctionalconstraintsintheevolutionofCCT8LgenesWetestedfunctionalityofCCT8Lgenesfromseveral speciesestimatingratiosofnon-synonymousandsynonymoussubstitutionrates(Ka/Ks)alongtheirrespective lineages(seeMethods).Theresultsofthisanalysisare showninTable4,whichindicatesthegene(s)analyzed (foreground),thetwogenesusedtoidentifyforeground andbackgroundbranches,theestimatedKa/Ksvalues Figure1 EvolutionarytreesofCCTproteins .(a)Maximum-likelihoodevolutionarytreeofallhumanchaperonin-likeproteins,includingCCT monomers,MKKS,BBS10,BBS12andthetwomembers,CCT8L1andCCT8L2,ofthenewlydefinedCCT8Lclass.Numbersassociatedwitheach branchindicatebootstrapsupportfrom100replicates.Treerootedbythearchaealthermosomealphasubunitof Sulfolobussolfataricus (Ss_ThsA).(b)Bayesianevolutionarytreeofthesamesequencesshownin(a).Thenumbersassignedtoeachbranchindicateposterior probabilities.Treerootedbythethermosomealphasubunitof Thermoplasmaacidophilum (Ta_ThsA).Thescalebarsrepresenttheindicated numberofsubstitutionsperpositionforaunitbranchlength. Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page5of19

PAGE 6

andtheirsignificance.Theevolutionarylineagesfor whichKa/Ksvalueswereevaluatedcorrespondtothe branchnumbersidentifiedintheoveralltreetopology showninFigure3.Inthistreearerepresentedthe “ moleculartree ” ofmammalphylogeneticrelations[35], thegeneduplicationeven tinvolvingtheCCT8Lgene familyinprimatesasinferredbythisanalysis,andthepremammalseparationsoftheCCT7,CCT8andCCT8L familiesofparalogs.Thistopologyisinagreementwith theevolutionarytreeofCCT8Lgenes(Figure2)withthe onlyexceptionoftheweaklysupportedpositionofthe CCT8Lsequencefromrhesusmonkey(seeabove).The highlysignificantconstraintsinnon-synonymoussubstitutionrates(Ka/Ks<1.0)estimatedintheoverallevolution oftheCCT8Lfamily(Table4,foregroundgenes: “ All CCT8L1/2 ” )indicatedthattheCCT8Lsequencesare genesgenerallyexpressingfunctionalproteins.InevaluatingKa/KsratiosforindividualCCT8Lgenelineages (Table4),significantlyconstrainedevolution(Ka/Ks<1.0) wasdetectedforbranchesleadingtomostsequences, includingthoseofmurids,lemur,cow,dog,elephant,marsupial,andtothehumanCCT8L1andCCT8L2group alongthehominoidlineage.Constrainedevolutionwas alsoestimatedfortheCCT8Lgenesofarmadilloandrhesusmonkey,andforhumanCCT8L1andhumanand chimpCCT8L2afterdivergenceofhumanandchimp, althoughinthesecasesKa/Ksvaluesdidnotreachsignificance.InthecasesofthehumanandchimpCCT8L1and Figure2 EvolutionarytreeofCCT8Lsequences .MLtreeof CCT8Lsequencesfromvariousmammalgenomes.Thehomologof humanCCT8L1inchimp(Ptr)ischaracterizedaspseudogeneandis showninbold-italicsfont.Speciesabbreviations:Bt, Bostaurus (cow);Cf, Canislupusfamiliaris (dog);Dn, Dasypusnovemcinctus (nine-bandedarmadillo);Dr, Daniorerio (zebrafish);Ec, Equus caballus (horse);Ga, Gasterosteusaculeatus (stickleback,fish);Gg, Gallusgallusdomesticus (chicken);Hs, Homosapiens (human);La, Loxodontaafricana (africanbushelephant);Md, Monodelphis domestica (southamericangrayshort-tailedopossum,marsupial); Mm, Musmusculus (mouse);Mmu, Macacamulatta (rhesusmonkey); Mmur, Microcebusmurinus (graymouselemur);Oa, Ornithorhynchus anatinus (platypus);Ol, Oryziaslatipes (themedakaorjapanese killifish);Pp, Pongopygmaeus (northwestborneanorangutan);Ptr, Pantroglodytes (chimpanzee);Rn, Rattusnorvegicus (rat);Tn, Tetraodonnigroviridis (spottedgreenpufferfish);Tr, Takifugurubripes (japanesepufferfish);Xl, Xenopuslaevis (africanclawedfrog, amphibian);Xt, Xenopustropicalis (westernclawedfrog,amphibian). Thescalebarrepresentstheindicatednumberofsubstitutionsper positionforaunitbranchlength. Table3DivergenceofBBSandCCT8Lproteinsrelativeto CCT8proteinsMKKSBBS10BBS12CCT8L1No.species21411115 Size( WB)35.77704.60205.89493.3859 B-index( BB)40.69761.10791.02840.3196 Unbiasedpair-wisedistance( BB2)1.39522.21592.05680.6393 LB( BB WB)54.03005.09876.06231.0822 Average Dij( DB)62.09512.96603.28580.8017 CCT87Size( WC)5.52024.55034.36473.1100 B-index( BC)0.11460.13730.09920.0227 Unbiasedpair-wisedistance( BC2)0.22910.27470.19830.0454 LC( BC WC)0.63240.62500.43280.0706 Average Dij( DC)0.33940.36870.27090.0545 BB/ BC6.08738.069210.366914.0793 LB/ LC6.37258.157914.007215.3286 DB/ DC6.17308.044512.129214.7101 WB/ WC1.04651.01141.35061.08871OnlythehumanCCT8L2branchwasincludedinthetree.TheCCT8L1branch hadequivalentlength;2Chaperonin-BBSsequencesusedinthetreeswere fromthefollowingspecies(seethelegendforFigure2foracompletelistof abbreviationsandspeciesnames).MKKS:Bt,Cf,Dr,Ec,Ga,Gg,Hs,Md,Mm, Mmu,Ol,Rn,Tr,Xt;BBS10:Bt,Cf,Dr,Ec,Ga,Hs,Md,Mm,Ol,Rn,Tr;BBS12:Bt, Cf,Dr,Ec,Ga,Gg,Hs,Mm,Ol,Rn,Tr,Xl;CCT8L:Bt,Cf,Hs,Mm,Rn.3Sizeisthe averagenumberofsequencescontainedinaclusteroverevolutionarytime (seeMethods);4TheB-indexmeasurestheaveragesubstitutionspersite (evolutionarydistance)ofthesequenceswithinaclusterfromtheircommon ancestor;5L isthelengthofthetree(sumofthelengthsofallbranches);6Average Dijistheaveragepair-wiseevolutionarydistanceofthesequences;7EstimatesforCCT8werecomputedovercorrespondingspeciesrepresented bythesetsofMKKS,BBS10,BBS12orCCT8Lproteins(seefootnote2,above).Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page6of19

PAGE 7

Table4Ka/KssubstitutionratiosinCCT8LgenesevolutionForegroundgenes1Backgroundgenes1ForegroundKa/Ks2LRT( p )3Foregroundbranches4AllCCT8L1/2HumanCCT8,HumanCCT70.29205.06 (<0.001) 1to25 HumanCCT8L1ChimpCCT8L1,HumanCCT8L20.580.881 ChimpCCT8L1HumanCCT8L1,HumanCCT8L21.020.002 HumanCCT8L2ChimpCCT8L2,HumanCCT8L10.481.24 ChimpCCT8L2HumanCCT8L1,HumanCCT8L20.391.85 HumanCCT8L2HumanCCT8L1,RhesusCCT8L0.424.02 (<0.05) 4+6 HumanCCT8L1HumanCCT8L2,RhesusCCT8L0.295.72 (<0.05) 1+3 MouseandRatCCT8LCowCCT8L,HumanCCT8L20.3831.14 (<0.001) 12+13+14 MouseCCT8LRatCCT8L,HumanCCT8L20.641.2112 RatCCT8LMouseCCT8L,HumanCCT8L20.495.91 (<0.05) 13 RhesusCCT8LLemurCCT8L,HumanCCT8L20.73(0.55)51.91(1.22)58 LemurCCT8LHumanCCT8L2,MouseCCT8L0.2936.82 (<0.001) 10 DogCCT8LCowCCT8L,HumanCCT8L20.3112.07 (<0.001) 16 CowCCT8LDogCCT8L,HumanCCT8L20.13113.78 (<0.001) 17 ArmadilloCCT8LElephantCCT8L,HumanCCT8L20.360.5720 ElephantCCT8LMarsupialCCT8L,HumanCCT8L20.2914.96 (<0.001) 21 MarsupialCCT8LElephantCCT8L,HumanCCT8L20.3162.63 (<0.001) 23+241SeetextforthedefinitionandmeaningofForegroundandBackgroundspecies;2Ka/Ksistheestimatedratioofnon-synonymousandsynonymoussubstitution rates;3LRT,LikelihoodRatioTestresultsforestimatedKa/Ksvs.Ka/Ks=1.0(seeMethods).Probabilities( p )notshownsignify p >0.05;4Foreground-branch numberscorrespondtothenumberingintheschematictreeshowninFigure3.5Valuesinparenthesiswereobtainedafterremovinganunusuallydiverged regionfromrhesusCCT8L(seetext). Figure3 EvolutionaryrelationsofCCT8Lgenes .SchematicrepresentationofevolutionaryrelationsofCCT8Lgenesfromdifferenteukaryotic speciesrootedbyCCT8andCCT7sequences.Thenumbersassociatedwitheachbranchidentifythebranchesforwhichbranch-specificKa/Ks valuesareevaluated(Table4). Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page7of19

PAGE 8

CCT8L2genes,thelackofsignificancecanberelatedto thelossofpowerofthetestsincefewmutationsaccumulatedafterseparationofthes esequences(seeadditional file9:TableS3).InthecaseofrhesusmonkeyCCT8L,we foundthatitsrelativelyhighestimateofKa/Ks(=0.73) wasduetothepreviouslymentioned50-amino-acid divergedregionwithinthissequence.Afterremovingthis regionweestimatedKa/Ks=0.55.Onlyforthelineageof chimpCCT8L1weestimatedKa/Ks 1,consistentwith differentiationofanon-funct ionalsequence.Sincethis sequencewasalsocharacterizedbyaninternalstopcodon andaframe-shift,allevidencestronglysuggeststhat chimpCCT8L1isapseudogene. ToassessthefunctionalityofhumanCCT8Lsequences weinvestigatedtheirexpressionprofilesincomparisonto thoseofhumanCCTmonomersandBBSgenes(seeadditionalfile10:TableS4).ExpressionofCCT8L2wasconfirmedbyfifteenESTsmostlyidentifiedfromthetestis, whereasonlyoneESTidentifiedasaCCT8L1transcript hasbeensofarreported(NCBIUniGenedatabase, November20,2009).QueryingtheNCBIGEOmicroarray database,wefound542expression-profilerecordsidentifyingexpressionofCCT8L2,andnoneidentifyingexpressionofCCT8L1(asofNovember20,2009).Itmustbe noted,however,thatCCT8L2andCCT8L1havesimilarity of97.3%attheDNAlevel.SimilarlytoCCT8L2,another mammal-specificchaperoningene,CCT6B,isalso expressedalmostexclusivelyinthetestis,fromwhich160 ESTshavebeenreportedversusanaverageof4.4ESTs (from0to10pertissue)foundinallothertissues.PseudogenesWeidentifiedinthehumangenome39sequenceswith significantsimilaritytoCCTorHSPD1genesthateither wereshortfragmentsorwerecharacterizedbyin-frame stopcodonsorframe-shifts.Basedontheircorruption, weclassifiedthesesequencesaspseudogenes(Table2). Similarly,searchingthemouseandratgenomeswe identified38and61pseudogenes,respectively(seeadditionalfile1:TableS1andadditionalfile2:TableS2). Mostofthesesequenceshavenotbeenpreviously reportedandareheresystematicallyannotatedandclassifiedforthefirsttime. Basedonphylogenetic-treereconstructions(seeadditionalfile11:FigureS7)oronsimilarityforthemostcorruptedsequences,weidentifiedtheassociationof17 pseudogenesfromhuman,16frommouseand29from ratwithoneofthenineCCTgenes.NoneofthepseudogeneswererelatedtoMKKS,BBS10,BBS12orCCT8L. Toestimatethetimeoforiginofthepseudogenes,we constructedtreesusingtheirtranslatedsequencesand chaperoninsubunitsfromvariousvertebratespecies(see additionalfile12:FiguresS8,andadditionalfile13:FigureS9).Thetreesindicatedthatallrecognizablehuman CCTpseudogenesoriginatedinthemammallineageafter separationfromthereptile/birdlineage. Ofparticularinterestweretheevolutionaryrelations ofCCT6genesandpseudogenes.TwoCCT6gene copies(CCT6AandCCT6B)werefound,besidesplacentalmammals,alsoinplatypusandinopossum(see additionalfile11:FigureS7),suggestingthattheduplicationoftheCCT6geneoccurredinmammalevolution beforeseparationofTheria(marsupialandplacental mammals)andPrototheria(monotremes).WeconstructedanevolutionarytreeofmammalCCT6genes andpseudogenes(Figure4)rootedbythecorresponding genesequencesfromchickenandfrog(thediverged sequenceOa_con2651fromplatypuswasexcludedfrom thistreetoavoidlong-branch attraction).Surprisingly, allrecognizablehuman,mouse,andratpseudogenes belongingtotheCCT6classbranchedinthetreefrom theCCT6Alineageafterseparationoftheplatypus, marsupialandplacentalmammallineages. Twenty-twopseudogenesinhuman(Table2),and22 and32pseudogenesinmouseandrat,respectively(see additionalfile1:TableS1a ndadditionalfile2:Table S2),associatedwiththemitochondrialHSPD1gene (GroupI cpn60 ).Evolutionarytreesincorporatingall pseudogenesfromdifferent vertebratespecieswere uninformativeduetothepresenceamongthepseudogenesofhighlycorruptedsequences,resultinginextensivelong-branchattraction(notshown).AnMLtree builtusingonlytranslationsofthemostconservedpseudogenes(Figure5)showedweaklysupportedbutconsistentassociationofthehumanpseudogeneswithHSPD1 fromprimates,whereaspseudogenesfrommouseand ratallassociatedwithmu ridHspd1sequences,also indicatingtheirrelativelyrecentorigin.Ka/Ksratiointheevolutionofputativepseudogene sequencesOurcharacterizationofmany hsp60 sequencesaspseudogeneswasbasedonthepresenceofsignsofcorruptioninthesequence(in-framestopcodonsandframeshifts).However,in-framestopcodonsandframe-shifts maycorrespondtotruncatedproteinsthatarestillfunctional.Forexample,althoughhumanHSPD1-5Pand HSPD1-6Psequencescontainsignsofsequencecorruption,ESTdataindicatethatthesesequencesare expressedandpossiblyfunctional(seeadditionalfile14: TableS5).Toconfirmourcha racterization,weestimatedKa/Ksratiosintreesthatidentifiedthepseudogene-sequencelineage(branch)includingasout-group itsparentalgeneandtheorthologousgenesequence fromchicken(seeMethods).Theresultsoftheseanalyses(Table2)showedinmostcasesKa/Ksvaluesnot significantlydifferentfrom1.0,asexpectedinthedifferentiationofpseudogenesequencesnotconstrainedbyMukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page8of19

PAGE 9

codingoffunctionalaminoacids.Significantdifferences inmutationratewereestimatedinthecaseoffour sequences.Thesesequences,however,containedmultiplein-framestopcodonsandframe-shifts(Table2).StructuralfeaturesofBBSandCCT8LproteinsBecauseoftheirhighsequencedivergence,itisunclear whetherBBSandCCT8LHsp60-likeproteinsconserve thetypicalfoldofchaperoninsubunitsandtheirability toassembleintotypicaloligomericchaperonincomplexes.Chaperoninmonomersarecharacterizedby threestructuraldomains(apical,intermediateandequatorial)withdistinctfunctionalrolesanditwasrelevant toinvestigatewhetherBBSandCCT8Lproteinsconserveeachofthedomainstypicalofchaperonins. ExperimentalmodelsofeukaryoticGroupII Figure4 EvolutionarytreeofvertebrateCCT6proteins .MLtreeofCCT6proteinsfrommammals,chicken,andfrog(inromanfont)and translatedsequencesoftherelatedpseudogenesfromhuman,mouse,andrat(inbold-italicsfont).OnlyonecopyofCCT6wasfoundin chickenandfrog.Twocopies,CCT6AandCCT6B,werefoundinallmammalsexamined,includingmarsupial(Md)andplatypus(Oa).TheCCT6 sequencesfromchicken(Gg)andfromthetwoamphibians Xenopuslaevis (Xl)and Xenopustropicalis (Xt)wereusedtorootthetree.Allhuman, mouse,andratpseudogenesclusteredwiththeCCT6Asequences.Numbersnexttobranchesindicatepercentbootstrapvalues.Onlybootstrap values>30%areshown.ForallspeciesabbreviationsseethelegendforFigure2.Thescalebarrepresentstheindicatednumberof substitutionsperpositionforaunitbranchlength. Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page9of19

PAGE 10

chaperoninsarenotavailablebuttheirstructuralpropertiescanbeinferredbycomparisonwiththeirclosest relative,thearchaealthermosome.ToinfertertiarystructureconservationinBBSandCCT8Lproteinswe predictedthesecondarystructureforeachfamilyfrom alignmentsofmultiplesequ ences,excludingstructure andsequenceinformationfromotherfamilies.The resultsofthesepredictionsareschematicallyrepresented inFigure6a,inrelationtothesecondarystructure descriptionofthePDBstructure1a6dchainAofthe thermosomesubunitThsAfrom Thermoplasmaacidophilum [36](seeadditionalfile15:FigureS10,additional file16:FigureS11,additionalfile17:FigureS12,additionalfile18:FigureS13,additionalfile19:FigureS14, andadditionalfile20:FigureS15fordetailedrepresentationsofmultiplealignments,secondarystructurepredictionsandalignmentstothesecondary-structure elementsofThsA).InFigure6a,thesecondarystructure Figure5 EvolutionarytreeofvertebratemitochondrialCpn60 .MLtreeofmitochondrialCpn60proteinsfrommammals,chicken,andfrog (inromanfont)andtranslatedsequencesoftherelatedpseudogenesfromhuman,mouse,andrat(inbold-italicsfont).Highlydegraded pseudogenesforwhichonlyfragmentscouldbedetectedwerenotconsidered.HumanpseudogenesclusteredwithprimateCpn60sequences whereasmouseandratpseudogenesclusteredwithrodentcounterparts,indicatingindependentevolutionofthesepseudogenesinthese species.ForallspeciesabbreviationsseelegendforFigure2.Thescalebarrepresentstheindicatednumberofsubstitutionsperpositionfora unitbranchlength. Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page10of19

PAGE 11

Figure6 Secondarystructurepredictionsofchaperoninproteins .(a)Secondarystructurepredictionsof Thermoplasmaacidophilum thermosomealphasubunitThsA(lineTa_ThsA),humanCCTs,mammalCCT8LsandvertebrateBBSs(linesMKKS,BBS10andBBS12)comparedto thesecondarystructuredescriptionofThsA(topline1a6d)determinedfromitscrystalstructure(PDBcode1a6d,chainA).Helicesare representedasredboxes,beta-strandsasyellowboxesandloopsasblacklines.Secondarystructureelementsin1a6darelabeledinsuccession withnumbers(strands)orletters(helices).Thefirst16N-terminalresiduesofThsA,predictedtocontainastrand,arenotincludedinthe1a6d crystalstructure(topline).Secondarystructureelementsinallproteinsrecognizedashomologoustothethermosomechainelementsby sequencesimilarityandpositionalequivalenceareverticallyaligned.BluecirclesindicatethepositionofsequenceinsertionsinCCT8LandBBS sequences.(b)Thethree-dimensionalfoldofthesecondarystructureelementsinthethermosomestructure1a6dchainA.Redcylinders representhelicesandyellowarrowsrepresentstrands.Labels(i.e.,lettersandnumbers)correspondtothoseinpanel “ a ” .Elementsnotpredicted insomeoftheBBSandCCT8Lsequencesarelabeledingray.ThepositionsoftheATPbindingandhydrolysissitesarehighlightedingreen. Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page11of19

PAGE 12

descriptionofThsAisshown(line “ 1a6d “ )inrelationto thepositionoftheequatorial,intermediate,andapical domains.Thepositionoftheseelementsinthetertiary structureofThsAisrepresentedinFigure6b.Resultsof ablindtestoftheperformanceofthemethodonthe correspondingThsAsequencearealsoshown(Figure 6a,line “ Ta_ThsA ” ).Inthistestmoststrandandhelix elements(all “ core ” helices)describedinthecrystal structurewerecorrectlypredictedbythemethod, increasingourconfidenceinthereliabilityofotherpredictions.Asexpected,exte nsiveconservationofpredictedsecondary-structuree lementswerealsoobtained fromthealignmentofhumanCCTsequences(Figure 6a,line “ CCT ” )withonlyfewdiscrepanciesinvolving mostlyshortbetastrands(4,5,18,and21)andone shorthelix(P)exposedattheexternalsurfaceofthe archaealthermosomecomplex.Secondary-structurepredictionsformammalCCT8L andforvertebrateMKKS, BBS10orBBS12sequenceswerealsolargelyconsistent withthesecondary-structuredescriptionofthermosome proteins.Intheequatorialdomain,CCT8LandBBS structurepredictionscorrespondedtothemostlyalphahelicalcompositionofthisregion.Variationsweremore obviousinBBS12andinvolvedmostlyterminalelementsofhelices(mostnotablyhelicesPandQ)and exposedbeta-strands(strand s19-21).Intheintermediatedomainthecorehelical-bundleelements(helicesF, G,andK)aswellastheextensivebeta-sheetcompositionofthisregionwerepredictedinallBBSandCCT8L proteins.Exceptionswere,inallsequences,thetwo shortstrands5and6,whicharepartofanexternal elongatedloopinthethermosomestructure,and,in BBS12,theN-terminalpartofhelixK,whichinthe thermosomeprotrudestowardsthecentralcavitycoveringtheATPhydrolysissite(Figure6b).Theapical domainisformedinthet hermosomebya4-strand anti-parallelbeta-sheet(strands9,10,15,and16)with strand10extendingintoasecondparallelbeta-sheet (strands10,12,13,and14).Thetwosheetsareflanked byahelix(J)andaresurmountedbyastructurecomposedoftwocontactinghelices(HandI)andan extendedloopincludingstrand11.Allhelicesandmost strandsoftheapicaldomainwererecognizedinBBS sequences.Mostobviousdifferenceswereobservedin BBS12proteins,wherethelongapicalhelixHwaspredictedtobeshortened,andinCCT8L,wherehelixI andstrand11werenotpredicted.Differentiationofmonomer-monomerinteractionregions inBBSandCCT8LproteinsToinvestigatethepotentialofCCT8LandBBSproteinsto establishintra-ringandinter-ringmonomer-monomer contacts,weinvestigatedtherelativeconservationofpredictedcontactpositionsinCCT,BBSandCCT8L sequences.Weidentifiedpotentialcontactpositionsin thesefamiliesbasedonhomologytothepositionsinvolved ininter-monomercontactsinthecrystalstructureofthe T.acidophilum thermosomecomplex(PDBcode1a6d). AfteridentifyingallcontactpositionsinCCTmonomers, wedistinguishedamongthemthosethatconservedsimilar aminoacidtypesacrosstheninemonomers.Wecounted howmanyaminoacidtypesobservedinallorinconservedcontactpositionsofCCTmonomerswerealso observedinthe T.acidophilum Thsasequence,inhuman CCT8LsorinhumanBBSsequences(Table5).Acompletelistofallandconservedpositionsconsideredandof theresiduetypesobservedinthesepositionsinall sequencescanbefoundinadditionalfile21:TableS6. ThsaandCCTsubunitsconserve89%similarityinmonomer-monomercontactpositions,whichissubstantially higherthantheaveragesimilarity(62%-66%)ofallhomologouspositionsb etweenthetwofamilies.Thehigher similarityofmonomer-monomercontactregionsisconsistentwithfunctionalconservationbetweenthetwo familiesofthesepositions.Incontrast,thehighrateofdifferentiationincomparisontoglobalaveragedifferentiation showninputativemonomer-monomercontactpositions inBBSorCCT8Lsequences(Table5),suggestsalossof capabilitytoassociateintoatypicalCCT-likeoligomeric complex.Thisresultisconsistentwiththepresencein BBSproteinsofinsertedelements(Figure6)thatwould interferewithformationofthecomplex[22,23].ConservationofATP-bindingandhydrolysisresiduesin BBSandCCT8LproteinsWecomparedconservationinCCT,BBSandCCT8L sequencesoftheATP-bindingandATP-hydrolysis motifstypicalofchaperoninsofGroupII(Figure7). Table5Conservationofmonomer-monomercontact residuesrelativetoCCTsubunits1ProteinMMCMMRRGlobal2ThsA78(83.9)16(94.1)13(86.7)62.0-66.4 BBS1237(39.8)7(41.2)6(40.0)35.5-38.0 BBS1045(48.4)8(47.1)7(46.7)34.3-35.6 MKKS42(45.2)8(47.1)7(46.7)48.8-51.6 CCT8L54(58.1)8(47.1)7(46.7)53.4-61.11ConservationofarchaealThsAandhumanBBSandCCT8Lsequencesrelative tohumanCCTmonomers.Sequence-positionsareconsideredconservedif theyareoccupiedbyresidue-typesappearinginthehomologouspositionin anyofthehumanCCTsequences.Ninety-threeintra-ringcontactpositions and15inter-ringcontactpositionswereidentifiedfromthethermosome structure(1ad6).Contactpositionsweredefinedbyadistanceoftheirsidechainheavyatomsofatmost4.0fromanyheavyatomofthenearby monomerinthethermosomestructure.Foreachproteinfamily,thetable indicatesthenumberandpercentage(inparenthesis)ofpositionsconserved among:all93intra-ringcontactpositions(MM);seventeenintra-ringcontactpositionsconservedamonghumanCCTmonomers(CMM);all15inter-ring contactpositions,noneofwhichwereconservedamongCCTmonomers(RR).2Globalindicatestherangeofsimilarities(percentvalues)ofeachsequence tohumanCCT-subunitproteinswithinallalignedpositions.Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page12of19

PAGE 13

AlthoughthereisconsiderablevariationamongBBSand CCT8LsequencesatsomeoftheATP-bindingpositions,weobservedcompleteconservationofthecrucial ATP-bindingdipeptideGly-Pro,suggestingthatthese otherwisedivergentproteinsconserveATP-bindingability.IntheATP-hydrolysissit es,substantiallossofconservationhasbeenreportedinMKKS[27]andinBBS12 [23].IntheCCT8L,MKKSandBBS10families,unusual substitutionsareobservedi nphosphate-bindingpositionsandwithinthecatalytictriad,whereonlyAspis conservedinMKKS.Theeffectthatthesemutations mayhaveonthehydrolyticactivityintheseprotein familiesisunclear.Thehighlevelofdifferentiationof thisregioninBBS12(wheretheATP-hydrolysismotifis notrecognizable)stronglysuggeststhatBBS12haslost hydrolyticactivity.Conservationofsubstrate-bindingpositionsThreepositionscrucialindeterminingsubstrate-specificityofCCTmonomershavebeenidentifiedinthedistal regionofhelixIintheapicaldomain[37].Weanalyzed conservationatthesepositionsacrossvertebratespecies inallGroupIIchaperoninfamiliesandinthe Fab1_TCPdomainacrossver tebrateorthologsofthe PIKFYVEproteinkinase(Tab le6).Thesepositionsare strikinglyconservedwithineachCCTmonomertype (withtheexceptionofCCT6B)acrossspeciesandare characteristicallydifferen tbetweenmonomertypes. TheyaremostlyconservedalsointheFab1_TCP domainacrossvertebratesequences.Incontrast,inBBS and,particularly,inCCT8Lsequences,thehomologous positionsaresignificantlymoredifferentiated.DiscussionWeidentifiedthefullcomplementofchaperonin hsp60 genesandpseudogenesencodedinthehumangenome and,forcomparison,inthegenomesofthemodel organismsmouseandrat.Wedelimitedthesetof hsp60 genesencodedinthehumangenometo:a)ninecanonical cct genes(CCT1toCCT8includingCCT6Aand CCT6B)involvedinformationoftheCCTcomplex;b) the cpn60 gene(HSPD1)ofmitochondrialorigin;c)the threehighlydiverged hsp60 -likeBBSgenesMKKS, BBS10andBBS12;andd)anewlycharacterizedclassof genes,CCT8L,representedinhumanbyCCT8L1and CCT8L2.Wealsoidentifiedaplethoraofpseudogene sequences,manyofwhichhadnotbeenpreviously Figure7 ProfilelogosofATP-bindingandATP-hydrolysissites inchaperoninproteins .SequenceprofilesofATP/ADP-binding andATP-hydrolysissitesforCCTs,CCT8LandBBS(MMKS,BBS10and BBS12)proteinsfromthemultiplesequencealignmentsof sequencesobtainedfromthespecieslistedinthelegendforFigure 2.Lettersindicatetheaminoacidtypesobservedateachposition. Theheightofeachstackofsymbolsineachpositionisproportional totheinformationcontentatthatpositionandtheheightofeach letterwithinthestackisproportionaltothefrequencyofthe correspondingresidueatthatposition.Residuesinvolvedindirect contactswithbase,riboseorphosphategroups,asdeterminedby homologytotheknownthermosomestructures,areindicated. Table6Conservationofpotentialsubstrate-binding residuepositions1Family I2i +12i +42Description CCT1KYDELys/Tyr/Acidic CCT2QLA(GQ)3Gln/Leu/Ala CCT3HYKRHis/Tyr/Basic CCT4HFKHis/Phe/Lys CCT5HLQHis/Leu/Gln CCT6ADAKAsp/Ala/Lys CCT6BDEAILMSVK(R)Acidic/Medium-Small/Lys CCT7QYD(Y)Gln/Tyr/Asp CCT8HYKHis/Tyr/Lys CCT8LDILPTHLQRKNRYVariable/Variable/Polar-Basic MKKSQ(H)FY(H)DEMQSTGln/Aromatic/MediumSmall BBS10Y (AFQS) CLY (W) LMQVTyr/Variable/Variable BBS12E(KLQ)KR(HQ)HNR (ASD) Glu/Basic/Polar-Basic Fab1_TCP4D(EN)I(LMV)QAsp/Ile/Gln1Conservationevaluatedamongsequencesinvertebrategenomes.2Potential substratebindingpositions,correspondingtoyeastCCT1positions308,309 and312( i =308)[37].3Raresubstitutionsarelistedinparenthesis.4Fab1_TCP domainofvertebratePIKFYVEorthologs.Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page13of19

PAGE 14

reported.Thecomparativeanalysesofthesefamiliesof functionalgenesandoftheirpseudogenesrevealedtheir evolutionaryhistoryandrelationships. IncontrasttotheuncertaintyoftheduplicationpatternofcanonicalCCTsubunits(ourresultsand[38,39]) theoriginofHsp60-likeBBSandCCT8Lproteinswas unambiguouslyidentifiedbyphylogenetictreereconstructions.Ouranalysesindicatedthat hsp60 -likeBBS genesoriginatedmonophyleticallyfromageneduplicationeventintheCCT8genelineage.Inaddition,we determinedthattheCCT8Lfamilyalsooriginatedinthe CCT8lineage,fromamorerecentretrotransposition event.Thepresenceofthisgenefamilyinplacental mammals,marsupialsandmonotremesbutnotinreptiles/birdsorothervertebratespecies,indicatesthatthis familyoriginatedattheonsetofmammalevolution, beforedivergenceofTheria andPrototheria.Presence oftwohighlysimilarCCT8Lgenes(CCT8L1and CCT8L2)inthegenomesofhumanandchimpandofa singlecopyinothermammalgenomes,includingrhesus monkey,suggeststhattheduplicationofthisgene occurredintheapelineage(Hominoidea)afteritsdivergencefromtheold-worldmonkeys(Cercopithecidae). Multipleevidencegatheredinthisworkindicatesthat CCT8Lsequences(andatleastoneofthetwoparalogs inHominoidea)encodeforfunctionalgenes:(i)reduced ratesofnon-synonymousmutationwereestimated alongtheirlineages,asexpec tedforfunctionally-constrainedprotein-codinggenes;(ii)pseudogenesas ancientormorerecentthantheCCT8Lgeneswere heavilydegeneratedandnopseudogenespre-dating mammalevolutioncouldbeidentified.Incontrast, althoughCCT8Lsequencesoriginatedearlyinmammal evolution,theydidnotshowsignsofdegeneration(with theexceptionofthechimpCCT8L1ortholog);(iii)multipleESTandmicroarraydatahavebeencollectedfor CCT8L2,mostlyfromtestis,andoneESTforCCT8L1 hasbeenreportedfromplacentaltissue(aspertheUniGeneESTandGEOexpressiondata,November23, 2009).Thesefeaturestakentogetherarestrongevidence thatatleastCCT8L2inHominoideaandthelone CCT8Lgeneinothermammallineagesencodeforfunctionalproteins.ThesparseexpressionofCCT8L1in humanandthepresenceofonein-framestopcodon andoneframe-shiftinitsorthologoussequencefrom chimpraisedoubtsaboutthefunctionalityofthis sequence. Numeroussequencesassociatedwith cct or cpn60 genesfoundinthehuman,mouseorratgenomeswere classifiedaspseudogenesbasedonthepresenceofinternalstopcodons,frame-shiftsandnon-significantdifferenceinsynonymousandnon-synonymousmutation rates.Amongthem,thesequencesHSPD1-5Pand HSPD1-6PappeartobeexpressedbasedonEST analysis(seeadditionalfile14:TableS5)andmayrepresentinstancesofexpressedpseudogenes[40].Ageneral explosionofpseudogenegenerationinthehumanand muridlineagesaftertheyseparatedfromthecarnivore lineagehasbeenreported[41].Ouranalysisofchaperoninpseudogenesisconsistentwiththisobservation, althoughtheirrelativelyhig hrateofdegenerationsuggeststhatpseudogenesgeneratedbeforetheoriginof mammalsmayhavedegradedbeyondrecognition.The intenseduplicationofchaperoninsequenceswitnessed bythemanypseudogenesidentifiedinthehumanand muridgenomes,verylikelyp rovidedopportunitiesfor multipleparalogy,resultingintheproliferationofchaperoninclassesinthevertebrateandmammallineages. AlthoughtheHsp60-likeBBSandCCT8Lprotein familieshaveconsiderablydifferentiatedfromthecanonicalCCTsubunitsandwithinthemselves,ouranalyses indicatedthattheystillconservetheoverallthreedomainstructuretypicalofCCTproteins.Structureand sequencevariationspredictedfortheirapicaldomains mayreflectdistinctivesubstratespecificities.Inparticular,lackofconservationatpositionscrucialinproviding substrate-specificitytoCCTmonomers[37]suggests thatBBSandCCT8Lproteinsmayinteractwiththeir substrate(s)indifferentregionsascomparedwiththe canonicalCCTsubunits.Sequencedifferentiationpatternsandacquisitionofinsertedelementsincorrespondencetopotentialmonomer-monomercontactregions suggestedthatBBSandCCT8LproteinsdonotassembleinaCCT-likecomplex.Thispredictionissupported byexperimentalevidenceshowingthatMKKSlocalizes asafreemonomeratthepericentriolarmaterialofcentrosomes[27].Inthisrespect,itisalsointerestingto observethatamongBBSandCCT8Lsequencesthe ATP-hydrolysismotif “ Gly-Asp-Gly-Thr ” ,remarkably conservedamongcanonicalchaperonins[42],hasdifferentiatedinMKKSandinBBS12[23,27].Thiscondition mayindicatethatthesefamilieshavelostthehydrolytic activitynecessaryforthefunctionalityofthechaperonin complex[43-52].Ithasbeenshownforthearchaeal thermosomecomplexthatmutationoftheATP-hydrolysis-motifAspresiduepreventshydrolysisandproductiveproteinfolding[49]andthatsomeCCTsubunits, amongwhichCCT8,dissociate invitro fromthecomplexinconditionsthatpreventhydrolysisofATP[53]. Functionalitiesindependentfromformationofthe complexhavealsobeenrepo rtedforcanonicalCCT subunits.TCP1monomersnotincomplexconfer enhancedsalttoleranceinp lants[54].IndividualCCT subunitshavebeenreportedtoassociate invitro with cytoskeletonstructures,sel ectivelybindingtomicrotubulefilaments[55]ortoact inpolymerizingfilaments [56].ThelocalizationofHsp60-likeBBSproteinsatthe ciliumbasalbodyandatthecentrosome[26-28]Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page14of19

PAGE 15

suggeststhattheymayalsoi nteractandassociatewith, forexample,cytoskeletonst ructuresinpromotingthe correctdevelopmentofcilia[28,57].ThemultiplestructuralandexperimentalevidencethatBBSandCCT8L proteinsdonotformacanonicalCCT-likecomplex providesstrongindicationthateukaryoticGroupIIchaperonin-proteinfunctionalitiesextendbeyondthoseof thetypicaloligomericcomplex.ConclusionsChaperoninproteinsarekeyplayersinensuringand preservingcellandorganismfunctionalityundernormal andstressfulconditionsandtheirbiologicalandmedical importanceisundeniable.Therecentdiscoveryof hsp60 genesdirectlyimplicatedin specificpathologicalconditions,thechaperonopathies,extendsourunderstanding oftherolesofchaperoninproteinsincellularprocesses andenhancesawarenessoftheirimportanceinpathology[18-20].Here,wehaveprovidedacomprehensive, unifyingframeworkencompassingallmembersofthe extended hsp60 familyofgenesandpseudogenes.This unifyingframeworkcontributestoourunderstandingof theevolutionaryhistoryoftheextended hsp60 family andwidensourperspectivesonthemultiplerolesthat chaperoninproteinshaveacquiredinvertebrates.Our findingshighlighthowdifferentiationofthechaperonin proteinfamilyinmammalshasbeenfacilitatedby intenseprocessesofgeneduplication.Theroles, mechanismsofaction,andinvolvementinpathogenesis ofindividualchaperoninmoleculesbeyondthosetypical oftheircanonicaloligomericcomplexesconstitute aspectsofchaperoninphysiologyparticularlypromising forfutureexperimentaltesting.MethodsIdentificationofchaperoningenesineukaryoticgenomesSearchesofgenesforHsp60-likeproteinswereexhaustivelyperformedusingTBLASTN[58]atEnsembl[34] andBLAT[59]atUCSC[60]onthegenomesequences ofhuman(NCBIAssembly36,GenebuildEnsemblDec 2006),mouse(NCBIAssemblym37,GenebuildEnsembl Apr2007)andrat(AssemblyRGSC3.4,Genebuild EnsemblFeb2006).Weusedtheninecanonicalhuman CCTproteinsandtheCpn60protein(mitochondrial Hsp60)asqueries.Werecursivelyqueriedthegenomes withthesequencesrecoveredfromprevioussearches untilnootherHsp60sequencesweredetected.Weused bothsearchenginesalsotorecoverthefulllistofannotated hsp60 -likegenesinseveralothermammalgenomesandinchicken.Sequencesfromfrog( Xenopus sp.) wereretrievedfromtheNCBInr(non-redundant)databaseusingPSI-BLAST[61]withCpn60andtheindividualCCTsubunitsasquerie s.Torecovercomplete hsp60 geneandpseudogenesequences,afterthe TBLASTNsearchesthegenomicsequencesfrom approximately2,000ntupstreamto2,000ntdownstreamofthehit-regionswereexcisedandthe hsp60 sequenceswereextractedusingthehomology-based genepredictionmethodimplementedinFGENESH+ [62]attheSoftberrywebsite[63].Forpseudogenes, whenFGENESH+failedtorecognizethecomplete sequenceduetoin-framestopcodonsorframeshiftsin thesequence,thecodingregionwasmanuallyreconstructed,aligningthethree-frame-translationsofthe genomicsequencetothequerysequencewiththemultipleproteinalignmentprogramITERALIGN[64].The Pseudogene.org[33,65]databaseandEnsembl[34], Entrez[30]andHUGO[66]annotationswereconsulted forthepresenceofannotatedhumanpseudogenes,as recordedinourtablesofresults.Multiplesequencealignmentandsecondarystructure predictionMultiplesequencealignmentswereobtainedusing MUSCLE[67],whichinpreviousanalyses[68,69]performedwellwhenaligningdivergentsequences.Alignmentsweremanuallyadjustedasneeded.Predictionsof secondarystructureforeachproteinfamilywereperformedfromtheirmultiplealignmentusingtheJnet algorithmasimplementedintheJPRED-3secondary structurepredictionserver[70,71].EvolutionarytreereconstructionsToinferphylogeneticrelationships,evolutionarytrees wereobtainedusingthemaximum-likelihood(ML)treebuildingprocedureimplementedinPHYML[72]using thedefaultJTTsubstitutionmodeland100bootstrap resamplingreplicates(eachMLtreereconstruction beingquitetimeconsuming).SelectedtreeswerecomparedwiththoseobtainedwiththeBayesianapproach implementedinMrBayes3.1[73]usingtheWAGsubstitutionmodeland10,000 iterationsfortheMCMC process.ConditionalprobabilitieswereestimatedsamplingtheMCMCprocessevery10iterationsafter2,500 burn-initerations(samplesize750).EstimatesofevolutionarydivergenceofsequencefamiliesWeobtainedratesofdivergenceamongfamiliesof sequencesusinganewlydevelopedestimator,called “ Bindex ” .TheB-indexisanunbiasedestimatorofthe averagedivergenceofafamilyofsequencesfromitslast commonancestor(root)thattakesintoconsideration thecorrelationsamongsequ encesdeterminedbytheir phylogenetictree.Briefly,givenarootedtree,aterminal branchoflength dioftheoriginaltreeisconsidereda “ cluster ” ofsize wi=1andlength d = di.Eachforkstructurecomprisingtwoterminalbranches(clusters)of lengths d1and d2andsizes w1and w2bifurcatingfromMukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page15of19

PAGE 16

astem-branchoflength dsisconsideredinturn.The averagelength d ofeachfork-structureiscomputedas d =( d1+ d2)/2+ dsandtheaveragesize w ofthestructureisdefinedas w =[2( d1+ d2)/2+1 ds]/[( d1+ d2)/2 + ds]=( d1+ d2+ ds)/ d .Eachfork-structureisprogressivelyreplacedbyacorrespondingclusteroflength d andsize w .Theprocedureisrepeatedmergingbifurcatingclustersoflengths d1and d2andsizes w1and w2connectedtoastem-branchoflength dsintoalarger clusterofaveragelength d =( w1d1+ w2d2)/( w1+ w2)+ dsandaveragesize w =( d1w1+ d2w2+ ds)/ d ,untilthe treeisreducedtotwoclustersconnectedtotheroot( ds=0).Theglobalaveragedifferentiation D ("B-index ” ) andsize W canfinallybecomputedas D =( w1d1+ w2d2)/( w1+ w2)and W = w1+ w2.Itcanbeshownthat DW = L isthelengthofthetree(sumofallbranch lengths).Iftwosequencefamilies A and B aresampled fromthesamesetofspeciesand WA= WB,then DB/ DA= LB/ LAandtherelativerateofdifferentiationofthe twofamiliesofsequencescanbeestimatedbytheratio oftheirtreelengths.TheB-indexhasseveraladvantages comparedtothemostcommonlyusedaveragepair-wise sequence-similaritymeasure:(i)ittakesintoaccountthe correlationamongsequencesimposedbythetopology oftheevolutionarytree;(ii)incontrasttoaveragepairwisesimilarity,itsexpectationsareinvariantoverthe numberandphylogeneticrelationsofsequences sampledfromaclusterwiththesamecommonancestor andevolutionarymodel;and(iii)withtheB-index,the averagedifferentiationrateofaproteinfamilyrelativeto areferencefamilysharingthesameevolutionaryrelations(e.g.,sampledfromthesamesetofspecies)issimplyestimatedbytheratioofthelengthsofthe evolutionarytreesofthetwofamilies.Estimatesofratiosofnon-synonymousvs.synonymous mutationrate(Ka/Ks)Classificationof hsp60 sequencesasfunctionalgenesor pseudogeneswassupportedbytheabsenceorpresence ofin-framestopcodonsandframe-shifts,andbyestimatingnon-synonymous vs. synonymousmutation-rate ratios(Ka/Ks)alongrelevantbranchesofevolutionary trees.Estimateswereobtainedusingthemaximum-likelihoodbranch-specificmo delimplementedinPAML4 [74].Inthecaseofpseudogenes,Ka/Ksvaluesare expectednottosignificantlydifferfrom1(absenceof positiveornegativeselect ionattheproteinlevel) whereasprotein-codinggenes,whoseevolutionisdominatedbynegativeorpositiveselection,areexpectedto becharacterized,respectively,byKa/Ks<1orKa/Ks> 1.Briefly,weappliedthePAML4 “ branch-specific model ” creatinganevolutionarytreeincludingthe sequenceswhoseevolutionarylineagewastested,the appropriatesistersequence(inthecaseofpseudogenes, thegenesequencefromwhoselineagethepseudogene originated)andanout-groupsequence.Thetreebranch (es)tobetestedaredesignatedas “ foreground ” and otherbranchesas “ background. ” Usingthebranch-specificmodeltheKa/Ksratioisestimatedfortheforegroundbranch(es)andananalogousratioisestimated forthebackgroundbranches.Thelikelihood L1generatedusingthisevolutionarymodeliscomparedtothe likelihood L0ofanullmodelwhereKa/Ksforforegroundbranchesisfixedto1.0.IntheLog-likelihood RatioTest(LRT)thesignificanceofthelikelihooddifferencesbetweenthemodelwithfreeestimateofKa/Ks andthenullmodelisestimatedbythequantity2  ln( L1/ L0),whichapproximatesa c2distribution.DataavailabilityAllrelevantgeneandpseudogeneinformation,including startandendpositions,chromosomallocation,strand, numberofexons,GenBankaccessionnumberforfunctionalgenes,andEnsemblorPseudogene.orgIDfor pseudogenes,canbefoundinadditionalfile22:Table S7.Newlyannotatedsequenceshavebeenapprovedand depositedintheHumanGenomeOrganization(HUGO) database[66].Additionalfile1:TableS1 .Mouse hsp60 genesandpseudogenes. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S1.DOC] Additionalfile2:TableS2 .Therat hsp60 genesandpseudogenes. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S2.DOC] Additionalfile3:FigureS1 .PhylogenetictreeofhumanCCT1-8and CCT8Lproteins. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S3.PDF] Additionalfile4:FigureS2 .PhylogenetictreeofhumanCCT1-8and MKKSproteins. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S4.PDF] Additionalfile5:FigureS3 .PhylogenetictreeofhumanCCT1-8and BBS10proteins. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S5.PDF] Additionalfile6:FigureS4 .PhylogenetictreeofhumanCCT1-8and BBS12proteins. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S6.PDF] Additionalfile7:FigureS5 .PhylogenetictreeofvertebrateCCT1-8, MKKS,BBS10,BBS12andCCT8Lproteins. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S7.PDF]Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page16of19

PAGE 17

Additionalfile8:FigureS6 .PhylogenetictreesofCCT8Lprotein sequencesfromprimates(a,b)andpartialalignmentshowingadivergent regioninthesequencefromrhesusmonkey(c). Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S8.PDF] Additionalfile9:TableS3 .Codon-basespecificcountsofmutation eventsalonghumanandchimpCCT8Levolutionarybranches. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S9.DOC] Additionalfile10:TableS4 .Expressionpattern(ESTcounts)ofthe humanCCTandBBSgenesfromtheUniGenedatabase. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S10.DOC] Additionalfile11:FigureS7 .EvolutionarytreeofvertebrateCCT1-8 andCCT8Lproteinsincludingassociatedhumanpseudogenes. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S11.PDF] Additionalfile12:FigureS8 .EvolutionarytreesofindividualCCT1, CCT3andCCT4proteinsfromvertebratesincludingassociatedhuman pseudogenes. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S12.PDF] Additionalfile13:FigureS9 .EvolutionarytreesofindividualCCT5, CCT7andCCT8proteinsfromvertebratesincludingassociatedhuman pseudogenes. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S13.PDF] Additionalfile14:TableS5 .Expressionpatternofthehuman cpn60 gene(HSPD1)andpseudogenesfromtheUniGenedatabase. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S14.DOC] Additionalfile15:FigureS10 .Alignmentandsecondary-structure predictionofarchaealthermosomesequences. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S15.PDF] Additionalfile16:TableS11 .Alignmentandsecondary-structure predictionofhumanCCT1-8proteinsequences. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S16.PDF] Additionalfile17:TableS12 .Alignmentandsecondary-structure predictionofvertebrateCCT8Lproteinsequences. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S17.PDF] Additionalfile18:TableS13 .Alignmentandsecondary-structure predictionofvertebrateMKKSproteinsequences. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S18.PDF] Additionalfile19:TableS14 .Alignmentandsecondary-structure predictionofvertebrateBBS10proteinsequences. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S19.PDF] Additionalfile20:TableS15 .Alignmentandsecondary-structure predictionofvertebrateBBS12proteinsequences. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S20.PDF]24-Mar-10 Additionalfile22:TableS7 .Databaseandsequenceinformationonall hsp60-likesequencesidentifiedinthehuman,mouseandratgenomes. Clickhereforfile [http://www.biomedcentral.com/content/supplementary/1471-2148-1064-S22.PDF] Abbreviations BBS:Bardet-BiedlSyndrome;CCT:ChaperoninContainingTCP1;ML: Maximum-Likelihood;MMKS:McKusick-KaufmanSyndrome;TRiC:TCP1Ring Complex. Acknowledgements Theauthorsthankananonymousreviewerforprovidingvaluable information.AJLMandECdeMthankWesleyHarlowforhishelpinthe initialstagesofthisworkandtheSanFranciscoFoundationforsupport.LB andKMthankMr.SteveOdenandMs.ShainaR.Wallachforcritical proofreadingofthemanuscript.LBthankstheUniversityofFloridaGenetics Instituteforfinancialsupport. Authordetails1DepartmentofMolecularGeneticsandMicrobiology,UniversityofFlorida, CollegeofMedicine,1660SWArcherRoad,Gainesville,FL32610,USA.2GeneticsInstitute,UniversityofFlorida,CancerandGeneticsResearch Complex,2033MowryRoad,Gainesville,FL32610,USA.3Universityof Maryland,ColumbusCenter,701EastPrattStreet,Baltimore,MD21202,USA. Authors ’ contributions KMparticipatedinresearchandmethodologicalapproachdesign,carried outallsearchesandmostdataanalyses,wrotedraftsofthemanuscriptand participatedinitsrefinement,compiledalltablesandproducedmost figures;ECdeMandAJLMenvisionedtheresearchproject,starteddata collectionandparticipatedinresearchdesignandinmanuscript preparation;LBparticipatedinresearchdesignandmethodological approach,produceddifferentiationandmutation-accumulationestimates andanalysesandparticipatedinwritingthemanuscript.Allauthorsread andapprovedthefinalmanuscript. Received:13August2009Accepted:1March2010 Published:1March2010 References1.HartlFU,Hayer-HartlM: Molecularchaperonesinthecytosol:from nascentchaintofoldedprotein. Science 2002, 295 :1852-1858. 2.FrydmanJ: Foldingofnewlytranslatedproteinsinvivo:theroleof molecularchaperones. AnnuRevBiochem 2001, 70 :603-647. 3.SiglerPB,XuZ,RyeHS,BurstonSG,FentonWA,HorwichAL: Structureand functioninGroEL-mediatedproteinfolding. AnnuRevBiochem 1998, 67 :581-608. 4.BukauB,HorwichAL: TheHsp70andHsp60chaperonemachines. Cell 1998, 92 :351-366. 5.HemmingsenSM,WoolfordC,ViesvanderSM,TillyK,DennisDT, GeorgopoulosCP,HendrixRW,EllisRJ: Homologousplantandbacterial proteinschaperoneoligomericproteinassembly. Nature 1988, 333 :330-334. 6.TrentJD,NimmesgernE,WallJS,HartlFU,HorwichAL: Amolecular chaperonefromathermophilicarchaebacteriumisrelatedtothe eukaryoticproteint-complexpolypeptide-1. Nature 1991, 354 :490-493. 7.KubotaH,HynesG,WillisonK: Thechaperonincontainingt-complex polypeptide1(TCP-1).Multisubunitmachineryassistinginprotein foldingandassemblyintheeukaryoticcytosol. EurJBiochem 1995, 230 :3-16.Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page17of19

PAGE 18

8.MacarioAJL,MalzM,ConwaydeMacarioE: Evolutionofassistedprotein folding:thedistributionofthemainchaperoningsystemswithinthe phylogeneticdomainarchaea. FrontBiosci 2004, 9 :1318-1332. 9.CarrascosaJL,LlorcaO,ValpuestaJM: Structuralcomparisonofprokaryotic andeukaryoticchaperonins. Micron 2001, 32 :43-50. 10.LargeAT,LundPA: Archaealchaperonins. FrontBiosci 2009, 14 :1304-1324. 11.RansonNA,ClareDK,FarrGW,HouldershawD,HorwichAL,SaibilHR: AllostericsignalingofATPhydrolysisinGroEL-GroEScomplexes. Nat StructMolBiol 2006, 13 :147-152. 12.RansonNA,DunsterNJ,BurstonSG,ClarkeAR: Chaperoninscancatalyse thereversalofearlyaggregationstepswhenaproteinmisfolds. JMol Biol 1995, 250 :581-586. 13.RansonNA,WhiteHE,SaibilHR: Chaperonins. BiochemJ 1998, 333(Pt 2) :233-242. 14.Levy-RimlerG,BellRE,Ben-TalN,AzemA: TypeIchaperonins:notallare createdequal. FEBSLett 2002, 529 :1-5. 15.FrydmanJ,NimmesgernE,Erdjument-BromageH,WallJS,TempstP, HartlFU: FunctioninproteinfoldingofTRiC,acytosolicringcomplex containingTCP-1andstructurallyrelatedsubunits. EMBOJ 1992, 11 :4767-4778. 16.KubotaH,HynesG,CarneA,AshworthA,WillisonK: Identificationofsix Tcp-1-relatedgenesencodingdivergentsubunitsoftheTCP-1containingchaperonin. CurrBiol 1994, 4 :89-99. 17.StoldtV,RademacherF,KehrenV,ErnstJF,PearceDA,ShermanF: Review: theCcteukaryoticchaperoninsubunitsofSaccharomycescerevisiaeand otheryeasts. Yeast 1996, 12 :523-529. 18.CappelloF,ConwaydeMacarioE,MarasaL,ZummoG,MacarioAJL: Hsp60 expression,newlocations,functionsandperspectivesforcancer diagnosisandtherapy. CancerBiolTher 2008, 7 :801-809. 19.MacarioAJL,ConwaydeMacarioE: Chaperonopathiesbydefect,excess, ormistake. AnnNYAcadSci 2007, 1113 :178-191. 20.MacarioAJL,ConwaydeMacarioE: Sickchaperones,cellularstress,and disease. NEnglJMed 2005, 353 :1489-1501. 21.StoneDL,SlavotinekA,BouffardGG,Banerjee-BasuS,BaxevanisAD,BarrM, BieseckerLG: Mutationofageneencodingaputativechaperonincauses McKusick-Kaufmansyndrome. NatGenet 2000, 25 :79-82.22.StoetzelC,LaurierV,DavisEE,MullerJ,RixS,BadanoJL,LeitchCC, SalemN,ChoueryE,CorbaniS, etal : BBS10encodesavertebrate-specific chaperonin-likeproteinandisamajorBBSlocus. NatGenet 2006, 38 :521-524. 23.StoetzelC,MullerJ,LaurierV,DavisEE,ZaghloulNA,VicaireS,JacquelinC, PlewniakF,LeitchCC,SardaP, etal : IdentificationofanovelBBSgene (BBS12)highlightsthemajorroleofavertebrate-specificbranchof chaperonin-relatedproteinsinBardet-Biedlsyndrome. AmJHumGenet 2007, 80 :1-11. 24.KatsanisN,BealesPL,WoodsMO,LewisRA,GreenJS,ParfreyPS,AnsleySJ, DavidsonWS,LupskiJR: MutationsinMKKScauseobesity,retinal dystrophyandrenalmalformationsassociatedwithBardet-Biedl syndrome. NatGenet 2000, 26 :67-70. 25.BlacqueOE,LerouxMR: Bardet-Biedlsyndrome:anemerging pathomechanismofintracellulartransport. CellMolLifeSci 2006, 63 :2145-2161. 26.HirayamaS,YamazakiY,KitamuraA,OdaY,MoritoD,OkawaK,KimuraH, CyrDM,KubotaH,NagataK: MKKSisacentrosome-shuttlingprotein degradedbydisease-causingmutationsviaCHIP-mediated ubiquitination. MolBiolCell 2008, 19 :899-911. 27.KimJC,OuYY,BadanoJL,EsmailMA,LeitchCC,FiedrichE,BealesPL, ArchibaldJM,KatsanisN,RattnerJB, etal : MKKS/BBS6,adivergent chaperonin-likeproteinlinkedtotheobesitydisorderBardet-Biedl syndrome,isanovelcentrosomalcomponentrequiredforcytokinesis. J CellSci 2005, 118 :1007-1020. 28.MarionV,StoetzelC,SchlichtD,MessaddeqN,KochM,FloriE,DanseJM, MandelJL,DollfusH: TransientciliogenesisinvolvingBardet-Biedl syndromeproteinsisafundamentalcharacteristicofadipogenic differentiation. ProcNatlAcadSciUSA 2009, 106 :1820-1825. 29.BrocchieriL,ConwaydeMacarioE,MacarioAJL: Chaperonomics,anew tooltostudyageingandassociateddiseases. MechAgeingDev 2007, 128 :125-136. 30. EntrezGene. [http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene]. 31.ShishevaA,SbrissaD,IkonomovO: Cloning,characterization,and expressionofanovelZn2+-bindingFYVEfinger-containing phosphoinositidekinaseininsulin-sensitivecells. MolCellBiol 1999, 19 :623-634. 32.LiS,TiabL,JiaoX,MunierFL,ZografosL,FruehBE,SergeevY,SmithJ, RubinB,MealletMA, etal : MutationsinPIP5K3areassociatedwith Francois-Neetensmoucheteefleckcornealdystrophy. AmJHumGenet 2005, 77 :54-63. 33.KarroJE,YanY,ZhengD,ZhangZ,CarrieroN,CaytingP,HarrrisonP, GersteinM: Pseudogene.org:acomprehensivedatabaseandcomparison platformforpseudogeneannotation. NucleicAcidsRes 2007, 35 :D55-60. 34.Ensembl. [http://www.ensembl.org/index.html]. 35.SpringerMS,StanhopeMJ,MadsenO,deJongWW: Moleculesconsolidate theplacentalmammaltree. TrendsEcolEvol 2004, 19 :430-438. 36.DitzelL,LoweJ,StockD,StetterKO,HuberH,HuberR,SteinbacherS: Crystalstructureofthethermosome,thearchaealchaperoninand homologofCCT. Cell 1998, 93 :125-138. 37.SpiessC,MillerEJ,McClellanAJ,FrydmanJ: IdentificationoftheTRiC/CCT substratebindingsitesuncoversthefunctionofsubunitdiversityin eukaryoticchaperonins. MolCell 2006, 24 :25-37. 38.FaresMA,WolfeKH: Positiveselectionandsubfunctionalizationof duplicatedCCTchaperoninsubunits. MolBiolEvol 2003, 20 :1588-1597. 39.ArchibaldJM,LogsdonJMJr,DoolittleWF: Originandevolutionof eukaryoticchaperonins:phylogeneticevidenceforancientduplications inCCTgenes. MolBiolEvol 2000, 17 :1456-1466. 40.HarrisonPM,ZhengD,ZhangZ,CarrieroN,GersteinM: Transcribed processedpseudogenesinthehumangenome:anintermediateformof expressedretrosequencelackingprotein-codingability. NucleicAcidsRes 2005, 33 :2374-2383. 41.YuZ,MoraisD,IvangaM,HarrisonPM: Analysisoftheroleof retrotranspositioningeneevolutioninvertebrates. BMCBioinformatics 2007, 8 :308. 42.BrocchieriL,KarlinS: ConservationamongHSP60sequencesinrelation tostructure,function,andevolution. ProteinSci 2000, 9 :476-486. 43.BigottiMG,BellamySR,ClarkeAR: TheasymmetricATPasecycleofthe thermosome:elucidationofthebinding,hydrolysisandproduct-release steps. JMolBiol 2006, 362 :835-843. 44.BigottiMG,ClarkeAR: Cooperativityinthethermosome. JMolBiol 2005, 348 :13-26. 45.CliffMJ,KadNM,HayN,LundPA,WebbMR,BurstonSG,ClarkeAR: A kineticanalysisofthenucleotide-inducedallosterictransitionsofGroEL. JMolBiol 1999, 293 :667-684. 46.JacksonGS,StaniforthRA,HalsallDJ,AtkinsonT,HolbrookJJ,ClarkeAR, BurstonSG: Bindingandhydrolysisofnucleotidesinthechaperonin catalyticcycle:implicationsforthemechanismofassistedprotein folding. Biochemistry 1993, 32 :2554-2563. 47.KafriG,HorovitzA: TransientkineticanalysisofATP-inducedallosteric transitionsintheeukaryoticchaperonincontainingTCP-1. JMolBiol 2003, 326 :981-987. 48.KafriG,WillisonKR,HorovitzA: Nestedallostericinteractionsinthe cytoplasmicchaperonincontainingTCP-1. ProteinSci 2001, 10 :445-449.49.KanzakiT,IizukaR,TakahashiK,MakiK,MasudaR,SahlanM,YebenesH, ValpuestaJM,OkaT,FurutaniM, etal : SequentialactionofATPdependentsubunitconformationalchangeandinteractionbetween helicalprotrusionsintheclosureofthebuilt-inlidofgroupII chaperonins. JBiolChem 2008, 283 :34773-34784. 50.StaniforthRA,BurstonSG,AtkinsonT,ClarkeAR: Affinityofchaperonin-60 foraproteinsubstrateanditsmodulationbynucleotidesand chaperonin-10. BiochemJ 1994, 300(Pt3) :651-658. 51.ToddMJ,ViitanenPV,LorimerGH: DynamicsofthechaperoninATPase cycle:implicationsforfacilitatedproteinfolding. Science 1994, 265 :659-666. 52.YifrachO,HorovitzA: Couplingbetweenproteinfoldingandallosteryin theGroEchaperoninsystem. ProcNatlAcadSciUSA 2000, 97 :1521-1524. 53.RoobolA,GranthamJ,WhitakerHC,CardenMJ: Disassemblyofthe cytosolicchaperonininmammaliancellextractsatintracellularlevelsof K+andATP. JBiolChem 1999, 274 :19220-19227. 54.YamadaA,SekiguchiM,MimuraT,OzekiY: TheroleofplantCCTalphain salt-andosmotic-stresstolerance. PlantCellPhysiol 2002, 43 :1043-1048. 55.RoobolA,SahyounZP,CardenMJ: Selectedsubunitsofthecytosolic chaperoninassociatewithmicrotubulesassembledinvitro. JBiolChem 1999, 274 :2408-2415.Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page18of19

PAGE 19

56.GranthamJ,RuddockLW,RoobolA,CardenMJ: Eukaryoticchaperonin containingT-complexpolypeptide1interactswithfilamentousactin andreducestheinitialrateofactinpolymerizationinvitro. CellStress Chaperones 2002, 7 :235-242. 57.ShahAS,FarmenSL,MoningerTO,BusingaTR,AndrewsMP,BuggeK, SearbyCC,NishimuraD,BrogdenKA,KlineJN, etal : LossofBardet-Biedl syndromeproteinsaltersthemorphologyandfunctionofmotileciliain airwayepithelia. ProcNatlAcadSciUSA 2008, 105 :3380-3385. 58.AltschulSF,MaddenTL,SchafferAA,ZhangJ,ZhangZ,MillerW, LipmanDJ: GappedBLASTandPSI-BLAST:anewgenerationofprotein databasesearchprograms. NucleicAcidsRes 1997, 25 :3389-3402. 59.KentWJ: BLAT – theBLAST-likealignmenttool. GenomeRes 2002, 12 :656-664. 60. BLATSearchGenome. [http://genome.ucsc.edu/cgi-bin/hgBlat]. 61.AltschulSF,KooninEV: IteratedprofilesearcheswithPSI-BLAST – atoolfor discoveryinproteindatabases. TrendsBiochemSci 1998, 23 :444-447. 62.SalamovAA,SolovyevVV: AbinitiogenefindinginDrosophilagenomic DNA. GenomeRes 2000, 10 :516-522. 63. Softberry. [http://www.softberry.com]. 64.BrocchieriL,KarlinS: Asymmetric-iteratedmultiplealignmentofprotein sequences. JMolBiol 1998, 276 :249-264. 65. Pseudogene.org. [http://www.pseudogene.org/]. 66. HUGOGeneNomenclatureCommittee. [http://www.genenames.org]. 67.EdgarRC: MUSCLE:multiplesequencealignmentwithhighaccuracyand highthroughput. NucleicAcidsRes 2004, 32 :1792-1797. 68.MukherjeeK,BrglinTR: MEKHLA,anoveldomainwithsimilaritytoPAS domains,isfusedtoplanthomeodomain-leucinezipperIIIproteins. PlantPhysiol 2006, 140 :1142-1150. 69.MukherjeeK,BrglinTR: ComprehensiveanalysisofanimalTALE homeoboxgenes:newconservedmotifsandcasesofaccelerated evolution. JMolEvol 2007, 65 :137-153. 70.ColeC,BarberJD,BartonGJ: TheJpred3secondarystructureprediction server. NucleicAcidsRes 2008, 36 :W197-201. 71. Jpred3.ASecondaryStructurePredictionServer. [http://www.compbio. dundee.ac.uk/www-jpred/]. 72.GuindonS,GascuelO: Asimple,fast,andaccuratealgorithmtoestimate largephylogeniesbymaximumlikelihood. SystBiol 2003, 52 :696-704. 73.RonquistF,HuelsenbeckJP: MrBayes3:Bayesianphylogeneticinference undermixedmodels. Bioinformatics 2003, 19 :1572-1574. 74.YangZ: PAML4:phylogeneticanalysisbymaximumlikelihood. MolBiol Evol 2007, 24 :1586-1591. doi:10.1186/1471-2148-10-64 Citethisarticleas: Mukherjee etal .: Chaperoningenesontherise:new divergentclassesandintenseduplicationinhumanandother vertebrategenomes. BMCEvolutionaryBiology 2010 10 :64. Submit your next manuscript to BioMed Central and take full advantage of: Convenient online submission Thorough peer review No space constraints or color gure charges Immediate publication on acceptance Inclusion in PubMed, CAS, Scopus and Google Scholar Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Mukherjee etal BMCEvolutionaryBiology 2010, 10 :64 http://www.biomedcentral.com/1471-2148/10/64 Page19of19



PAGE 1

La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 9 9 9 8 8 8 8 7 7 6 4 6 8 9 9 9 8 7 4 3 7 8 8 7 6 4 2 0 0 0 5 5 7 9 9 9 9 9 9 9 M D T R G P L A P E L P Q L L H R S P Q P C A A G E K N V L S S T A A A Q A L A M D N R V P S A L D L P Q L L A Q S T E Q S T R G E M Y L L S S T A A V Q A L A M D R R A P T A P E L P E R L P A E E K H L L S S L A A A D T L A M A G G A T A A P P L P E R L G P G P R R R A A Q E E H L L C S V P A A Q T L A M D S T V P S A L E L P Q R L A L N P R E S P E E E P H L L S S L A A V Q T L A M D S T V P S A L E L P Q R L A L N P R E S P E E E P H L L S S L A A V Q T L A M D S T V P S A L E L P Q R L A L N P R E S P E E E P H L L S S L A A V Q T L A M G S R A P S A T G L P E R L E P G P G Q R P A E E Q H V L S S V A A A Q A L A T Q T K V Q S D L E L P Q R L K L G L E K N P E E P L C I L R A T A A A Q T L A T Q T K V Q S D L E L P Q R L K P G L E K T P E E P S Y I L R A T A A A Q T L A M A D P D P S A V N F P Q L M N K E I K C L S K Q E K S L L G S V A A I K T L A 1 10 20 30 40 La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 9 8 7 5 3 7 8 8 8 8 8 5 4 8 8 9 8 5 5 8 8 3 4 6 5 2 3 7 7 8 8 7 7 6 3 4 8 8 8 6 T L I R P C Y G P H G R Q K L L V T A T G E T V C T G C A T A I L R A L E L E H S V I R S C F G P C G R Q K L L V T A K G E T I C T G Y T T A I L Q A L E L E H R V L R P C Y G P Q G R Q K L L V T A K G D T V L T G H A A A I L R A L Q L E H R V I R P C Y G P H G R Q K L L V T A R G T T V F T G S A A A I L Q A L E L E H S V I R P C Y G P H G R Q K F L V T M K G E T V C T G C A T A I L R A L E L E H S V I R P C Y G P H G R Q K F L V T M K G E T V C T G C A T A I L R A L E L E H N V I R P C Y G P H G R Q K F L V T M K G E T V C T G C A T A I L R A L E L E H R V I R P C Y G P H G R Q K L L V T A R G D T V V T G Y A A A I L R A L E L E H S I I R S C Y G P Y G L Q K F L V S A Q G E T V C T G H A A A I L K A L E L E H S I I R S C Y G P F G R Q K F L V T A K G E T V C T G H A A A I L K A L D L E H G I L R S C Y G P Y G R L K Y L V T S Q G K T V C T G Y A A T I L G A L E L E H 50 60 70 80 La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 6 8 9 9 9 9 9 9 9 9 9 8 8 4 3 7 8 8 8 6 4 1 4 6 7 8 9 9 9 9 9 9 9 9 9 9 9 9 8 7 P A A W L L R E A S H T Q A E S R G D G T A F V V L L A G A L L E Q A E L L L R P A A W L L R E A A Q T Q A E N S G D G T A F V V L L T Q G L L E Q A E H V L R P A A R L L R E A A Q G Q A E Q S G D G A A F V V L L A Q A L L A Q A E R L L R P A A R L L R E A A H K Q A E S C G D G A A F V V L L A E A L L Q Q A E H L L R P A A W L L R E A G Q T Q A E N S G D G T A F V V L L T E A L L E Q A E Q L L K P A A W L L R E A G Q T Q A E N S G D G T A F V V L L T E A L L E Q A E Q L L K P A A W L L R E A A Q T Q A E N S G D G T A F V V L L T E A L L E Q A E Q L L K P A A R L L R E A A Q T Q A E X N G D E A A F V G L L C S A L E R T E R L L R P A A R F V Q E L A Q T Q A E N T G D G T A F V V L L T E A L L E Q A Q Y L L W P A A Q F V Q E L A Q T Q V E N A G D G T V F V V L L T E A L L E Q A H Y L L W P A A Q L L R E A A Q T Q A E N S G D G T A F V V L L A G A L L E Q A E A M V R 90 100 110 120 1A6D .A Sec Str description 1 1A6D .A Sec Str description A 2 3 B 1A6D .A Sec Str description C D N-TERMINAL EQUA T ORIAL DOMAIN

PAGE 2

La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 4 4 7 8 7 7 6 0 1 7 9 9 9 9 9 9 9 9 9 8 7 4 4 4 7 8 7 7 4 3 7 8 7 7 7 5 6 8 9 9 A G L P R A Q L R G S Y A A A T T E V L A T L P S L A V R S L G P L E D P F W A A G L P R S Q L R E A Y A T A T T E I L A L L P S L V I R S L G P L E N P F W A A G L P R A Q L R E A Y A A V A A E T L A L L P S L A V R A L G P L E D P V W A A G L P R S Q L R E A Y A A A T A E V L A L L P S L S I R S L G P L E D P F W A A G L P R P Q L L E A Y A T A T A E V L A T L P S L A I Q S L G P L E D P S W A A G L P R P Q L R E A Y A T A T A E V L A T L P S L A I Q S L G P L E D P S W A F G L P R P Q L R E A Y A T A T A E V L A T L P S L A I Q S L G P L E D P S W A V G L H S R S F R G G Y G R R P A E T L A Q L P G L A V R S L G S L E T Y R A A G L T P A Q L R E A F V T A T A E V L T A L P S L A I C S L G P L E D P S W A A G L T P T Q L R E A F A T A T A E V L T A L P S L A I R S L G P L E D P S W A AG L P H S R V R E A Y A I A T D E A L K I L P T L V V C S L D S L E N P T W A 130 140 150 160 La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233, Mm_Gm443 Md_024812 9 9 8 8 6 2 2 6 8 8 8 7 7 5 6 8 9 9 9 9 9 9 9 8 7 4 3 7 7 7 7 7 7 7 8 7 5 4 8 8 L Y S V M N T H T V T Q T D F L T K L V A H A C W A I R E L D G T F K P E R I G L H S V M N T H T L P Q A D F L T N L V A Q A C W A A R E L D G N F K P Q C V G L R S V M N T H A L W R T D H L A G L V A R A C W A T R E L D G G F R P E R V G L Y S V M N T H S A S Q M D Y L T K L V A H A C W A T K E L D G S F H R E R V G L H S V M N T H T L S P M D H L T K L V A H A C W A I K E L D G S F K P E R V G L H S V M N T H T L S P M D H L T K L V A H A C W A I K E L D G S F K P E R V G L H S V M N T H T L P P M N H L T K L V A H A C W A I K E L D G S F K P E R V G L Y C T I N V L A P C Q S S G L T K L V A H A Y S S M E M D G S F Q P E R V G L Y S V M S T H T L S N A E Y L T K L V A Q A C W I S R E P N G S F K P E S I V L Y S V I S T H T L S N S D Y L T K L V A Q A C W V S R E P N G S F K P E S I V LR S A V Y T H S L S H H E Y L T N L V T Q A C R D S R D P D G S F H P E R L A 170 180 190 200 La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 9 9 8 8 4 4 7 7 7 7 7 7 6 4 1 2 2 1 0 0 1 5 4 3 3 4 5 6 8 8 8 7 5 3 3 1 5 6 7 7 V C T L R G A T L G D S C L L P G L V V A G K P C G Q V T M V L R G A R V A L F V C T L R G A T L E N S C L L P G L A V S G K P C G Q V T M V V G G A R V A L F V C A L P G A R Q E D S C L L P G L A L P G K P C G Q V T M V L S G A R V A L L V C T L R G G R L E D S C L L P G M A L A A K P C G Q V I S V L H G A R V A L F V C A L H G G T L E D S C L L P G L A I F G K L C G Q M A A V L S G A R V A L F V C A L P G G T L E D S C L L P G L A I S G K L C G Q M A T V L S G A R V A L F V C T L H G G T L E D S C L L Q G L A I S G K L C G Q M A A V L S G A R V A L F V C A L P G G R L E D S C L L P G L A V A G K A C G K V T A V P R G A R V A L F V C I L Q G G I L T D S R I I P G I A I C G K L C G R K T E V L N D A R V A L F V C V L Q G G K L T D S R I F P G V A I A G K L C G Q K T E V L G D A R V A L FV C S V P G A G L G D S S L I P G L A V Y G A P C G K I T A I L E K V K V A L Y 210 220 230 240 1A6D.A Sec Str description E 4 1A6D.A Sec Str descriptionF G 5 6 1A6D.A Sec Str description7 8 N-TERMINAL INTERMEDIA TE DOMAIN 9 10

PAGE 3

La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 5 4 7 7 7 7 7 7 7 7 7 7 7 7 8 7 7 6 5 4 6 8 9 9 9 8 6 1 2 3 7 9 9 9 9 9 9 9 9 8 A C P F G P A S S N A P A T A R L S S P E E L N K F R K G S E Q R I E K Q V S Q A C P F G P A S L N A P A S A R L S N S E E L S K F R K G S E Q L I E K Q V A Q A C D F G P A R P L A P A T A R L S S P D D L I R F R K G S E S L I E K Q V A Q V C A F G P A S P N A P A T A R L S S S A D L T K F R K G S E Q L I E K Q V A Q A C P F G P A H P N A P A M A R L S S P A D L A Q F S K G S D Q L L E K Q V G Q A C P F G P A H P N A P A T A R L S S P A D L A Q F S K G S D Q L L E K Q V G Q A C P F G P A H P N A P A T A C L S S P A D L A Q F S K G S D Q L L E K Q V G Q A C A F G P A S P S G P A T A R L C S P D D L T Q F R E G S E K L M E K Q V G Q N C P F G P S N P F A P A T L R L S S P E E L I R F R K Q T E Q V E M E I A E N C P F G P T N P F T L A T P R L S N P E E L L R F R K Q T E Q V E K E I A Q S C P F G P V S P H T P A A A H L S S S E D L I N H K E G E E R L A S R L V G Q 250 260 270 280 La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 7 6 4 2 5 1 6 7 8 8 8 9 8 8 5 4 6 1 1 0 0 1 4 7 8 8 5 6 8 9 8 7 5 1 4 4 5 7 7 8 L A D M G I N V V V V W G E I D E K T L L R A D R S G I M V I E A K S R R E M V L A A M A I N V V V V W G E I D E K T L I R A D N C G I M V I Q T K S R R E I A L A A A A N V V V V S G D I D E K T L T H A D K Y G L M V I Q V A S R R E M V L V T A S I N V V V V W G N I N E N T L T L A D K Y G I M V I Q A R S R R D M V L A A A G I N V A V V L G E V D E E T L T L A D K Y G I V V I Q A R S R M E I I L A A A G I N V A V V L G E V D E E T L T L A D K Y G I V V I Q A R S W M E I I L A A A G I N V A V V L G E V D E E T L T L A D K Y G I V V I Q A R S R M E I I L A S A D I N V A V V W G E V D E N T L P Q A D K H G I M V I Q A K S R R E M V L A M M G I N V A V V L G E V N E R S V D Q A D Y C G V M V I Q V K S R K E I V L A I M D I N V A V V L G E V N E K S V D Q A N Y C G I M V I Q A K S R K E I V L A A V G I N V V V V W G Q I N E I C L L H A D R H D I M V V Q A K S R R Q L V 290 300 310 320 La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 8 8 8 7 4 3 7 6 3 4 6 6 6 7 6 6 7 7 8 8 7 6 4 0 0 6 7 8 7 7 7 4 3 7 1 3 5 5 5 4 Y L S E V L G T P L L S Y L V P P L V P G K C L R V Y G Q E L G E S L A V V F E Y L S E V L H T P L M P R L L P P L V P G K C Q R V Y G Q E L G E A L A V V F E Y L S E V L R T P L T P Y L V P P L E P G K C Q R V Y G Q E L G E A L A V V F E Y L S E V L G T P L M P Y L I P P L K P G K C Q R V Y Q Q D L G E G M A V V F E Y L S E V L D T P L L P R L L P P Q R P G K C Q R V Y R Q E L G D G L A V V F E Y L S E V L D T P L L P R L L P P Q R P G K C Q R V Y R Q E L G D G L A V V F E Y L S E V L D T P L L P R L L P P Q R P G K C Q R V Y R Q E L G D G L A V V F E Y L S E V L G T P L M P Y L L P P L K P G K C H R V Y R Q E L G E G L A V V F E Y L S D K L G V P L L N R I L P P L E P G K C H K V Y R M E F G E S A L I M F E Y L S E K L G T P L L G R V L P P L E P G K C H K V Y R K E F G D T A V V M F E Q L S H V M G I T L L P F L I P P I V A G E C V K V Y T K E M A L G L A V V F E 330 340 350 360 1A6D .A Sec Str description 11 1A6D .A Sec Str description H 12 I 13 J 1A6D .A Sec Str description 14 15 16 APICAL DOMAIN

PAGE 4

La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 3 6 7 8 8 7 4 4 8 8 8 8 7 4 5 8 8 8 7 6 6 0 1 8 9 9 9 9 9 9 9 9 9 9 9 8 5 3 6 7 W D C P S S P A L T L I L R G A T T E G L R G A E Q A A Y H G I D A Y S Q L C Q W V C P D T P A L T L V L R G P T T E G L R A A E Q A A Y H G I D A Y F Q L C Q W E S T G T P A L T L V L R G A T A H G L R G A E Q A A Y S G I D A Y F Q L C Q W E C P A T P A L T I A L R G A T A E G L K S A E Q A A Y H G I D A Y F Q L C Q W E C T G T P A L T V V L R G A T T Q G L R S A E Q A V Y H G I D A Y F Q L C Q W E C T G T P A L T V V L R G A T T Q G L R S A E Q A V Y H G I D A Y F Q L C Q W E C T G T P A L T V V L R G A T T Q G L R S A E Q A V Y H S I D A Y F Q P C Q W E C L G T P A L T L V L R G P T T E G L R G V E Q A A Y H G I D A Y F Q L C Q W E R E I A P F L S V V L R G P T I Q G L R G A E Q A V Y Y G I D A F S Q L C Q W E H E I A P F L S V V L R G P T I Q G L R V A E Q A V Y Y G I D A F S Q L C Q W N Y L F A P A I T V I L R G S T D E G L K G A E Q A V R H A I N T Y T Q L S Q 370 380 390 400 La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 7 7 5 2 5 7 6 6 4 7 8 9 9 9 9 9 9 9 9 8 7 4 3 7 7 8 8 8 7 7 6 6 4 6 8 9 9 9 9 9 D P R L L P G A G A T E M A L A K I L S E K G R K L D G P E G P A F L A F A Q A D P R L L P G A G A T E M A L A K I L S E K G R K S E G P N G P A F L A F A R A D P R L L P G A G A T E M A L A K M L S E T G T K L E G P N G P T F L A F A Q A D P R L L P G A G A T E M A L A K I L S E K G S R L E G L N G P A F L A F A Q A D P R L I P G A G A T E M A L A K M L S D K G S R L E G P S G P A F L A F A W A D P R L I P G A G A T E M A L A K M L S D K G S R L E G P S G P A F L A F A W A D P R L I P G A G A T E M A L A K M L S D K G S R L E G P N G P A F L A F A R A D P R L L P G A G A T E M A L V K I L S D K G R K L E G P R G P A F L A F A H A D P R L L P G A G A T E M A L A R M L V D K G S R L D G P N G L A F Q A F A Q A D P R L L P G A G A T E M A L A K M L V D K G S R L S G P N G L A F Q A F A Q A D P R L L P G A G A T E L A L A K E L S E L G S Q L E G P N G P G I L A F A R A 410 420 430 440 La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 9 9 9 9 9 9 9 8 7 4 4 8 8 8 5 6 7 7 7 7 6 4 0 4 5 4 2 3 6 8 5 6 8 9 9 6 4 8 8 4 L R S L P E T L A Q N A G L A V A G V M A E M Y G A H Q A G N F L I G V G E E G L K S L P E T L A E N A G L A V S E V M A E M H G A H Q T G N S L I G V G E E G L Q S L P E T L A E N A G L A V P H V M A E M N G A H Q A G N F L I G V G V E G L Q S L P E T L A E N A G L V V S E V M A E M K G A H Q A G N L L V G V G V E G L K Y L P K T L A E N A G L A V S D V M A E M S G V H Q G G N L L M G V G A E G L K Y L P K T L A E N A G L A V S D V M A E M S G V H Q G G N L L M G V G T E G L K Y L P K T L A E N A G L A V S D V V A E M S G V H Q G G N L L M G V G A E G L R S L P E T L A E N A G L A V S E V M A E M H G A H Q A G N F L T G V G T E G L S S L P K T L A E N A G L A A Q S V L A E M S G Y H Q A G N F V I G V G T D G L S S L P K T L A E N A G L A A Q S V M A E L S G F H Q A G N F F V G V G T D G L R S L P V I L A E N A G V P G A D I L A Q L Q G H H Q A G N T V M G V G D E G 450 460 470 480 1A6D .A Sec Str description 17 K 1A6D .A Sec Str description C-TERMINAL INTERMEDIA TE DOMAIN 18 L 1A6D .A Sec Str description M N O 19 C-TERMINAL EQUA T ORIAL DOMAIN

PAGE 5

La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 688721574576511578899999999871541462454 3 I M N VA Q E Q V W D T L M AKA QG L R VAA D VV L Q L V T V D EVVVA K L I N VA Q EEV W D T L I AKA QG L R VVA D VV L Q LL T V N E II VA R II N VA Q EEV W D T L I AKA QG L R AVA D VV QQ LL T V DD I V L A K II N V T Q E G V W D T L VAKA QG L R AVA D VA L Q L V N V D E II VA K II N VA Q E G V W D T L I VKA QG F R AVAEVV L Q L V T V D E I VVA K II N VA Q E G V W D T L I VKA QG F R AVAEVV L Q L V T V D E I VVA K II N VA Q E G V W D T L I VKA QG F R AVAEVV L Q L V T V D E I VVA K II N VVEEEV W D T L T AKA QG L R AAA D VA L Q L V T I D E II VA K L V N VA Q E G I W D I L R T KA QG L Q AV T G L V QQ L V T V D Q II VA R L V N V T H E G I W D I L R T KA QG L Q AVAE L V QQ L V T V D Q II VA R T I N A T Q E G V F D P L T VK T QG I R VAA D VV L E L V T V DD I L I A K 490 500 510 520 La_CCT8L Dn_CCT8L Bt_618575 Cf_482811 Ptr_470129 Hs_CCT8L2 Hs_CCT8L1 Ec_CCT8L Rn_125233 Mm_Gm443 Md_024812 677777777777777777766777789 9 K K I P T Q K RD L D P D SKK T KK C P S P S M T K K S P A C P Q D P N P E P KK R SA R P L K S P M C QQ D S N P V P KKAKE C L S P V N N K S P T H QQ I W N P D SKK T K K H PP P M L N N K S P T H Q E I W N P D SKK T K K H PP P V L N N K S P T H QQ I W N P D SKK T KK R PP P V M N N K S P K I Q E D L S P Y P KKAK E H P S P VK N K T P R Y R L I P Q SA Q N A N T S S P L KY E K T P L Y R Q I T D P T L N AKVS S P L KY V KS G E F I N P K DRD SES DN L K Q H P T S L NN N 530 540 1A6D.A Sec Str description 2 0 P 21 Q 22 Supplementary figure S12. Alignment and secondary-structure predictions of CCT8 L sequences compared to PDB secondary-structure description of 1a6d. See Legend for Supplementary figure S10 for symbols and Legend for Figure 2 for species abbreviations.



PAGE 1

9 9 8 7 7 7 7 7 6 3 3 5 3 4 7 7 7 7 6 4 6 8 9 9 9 9 9 9 9 9 9 9 8 7 5 3 7 8 7 6 M S R L V K K S P S V C T D L P L D T T D S W K K L R V L L Q L L K S C F G P K M S R L V Q K S P S L C T D L P L D N S D I C T K L Y L L R Q L L K S C F G P T M S R V E K K A P S V C T D L P L N N A E V C E K L H L M K E L L K S C Y G P R M S R I S K K K P A L C T D E P L S N S T I C Q K I T L L R N I L S T A Y G P T M S R L E A K K P S L C K S E P L T T E R V R T T L S V L K R I V T S C Y G P S M S R L E A K K P S L C K S E P L T T E R V R T T L S V L K R I V T S C Y G P S M S R L E A K K P S L C K S E P L T T E R V R T T L S V F K R I V T S C Y G P S M S R L E A K K P S L C K S E P L T S E R V R T T L S V L K R I V T S C Y G P S M S R L E A K K P S L C K S E P L T S E R V R A T L S V L K G I V T S C Y G P S M S R L E A K K P S L C K S E P L T S E R V R A T L C V L K G I L T S S Y G P SM S R L E A K K P S I C R S D P L T G E R A R A S L A A L K G L V M S C Y G P A M S R L E A K K P S L C K T E P L T S E R V R S T L S V L K G I I A S C Y G P S M S R L E A K K P S L C K T E P L T S E K V R S T L S V L K G V I A S C Y G P S M S R L E A K K P S L C T S V Q L T K D V V S Q S L A V L R G I V A S C Y G P C M S R V K A K T P S L C T C E L L T K E I V S K S L S G L R E I V A S C Y G P S M S R L E A K K P S L F I S E P L T T V S Q P L S L L I A I L K S C Y G P A M S R V E A K K P S V C T T G P M N I Q S V R D S L S V L H G V I M S C Y G P L 1 10 20 30 40 5 1 3 6 7 7 5 4 8 8 4 5 8 8 6 5 1 4 5 7 6 6 5 3 4 8 6 1 2 7 9 9 9 9 9 9 9 9 9 9 G R L K L L H N S I G G H V V T T S T S S V L L S A I S S S L P L I N L I K T S G R L K Q V H N N I G G H V I T T S S S S V L L P A I S S S Q S F I N L I K T S G R L K H V R N N I G G R V T T S S A S S V I L P A L Y S S Q P L L N L I K T S G R L K Q I H N N V G G H V L T T S T S T A L L K R L E M S E P L L K L I S T A G R L K Q L H N G F G G Y V C T T S Q S S A L L S H L L V T H P I L K I L T T S G R L K Q L H N G F G G Y V C T T S Q S S A L L S H L L V T H P I L K I L T A S G R L K Q L H N G F G G Y V C T T S Q S S A L L S H L L V T H P I L K I L T T S G R L K Q L H N G F G G Y V C T T S Q S S A L L S H L L V T H P I L K I L T T S G R L K Q L H N G L G G C V C T T S Q S S A L L A N L S V T H P I L K I L T T A G R L K Q V H S G F G G C V C T T S Q S S A L L S N L P V T H P I L K I L T T SG R L K Q L H N G R G G S V V T T S Q S A A L L A G L P V S H P V L K V L T A A G R L K Q L H N G L G G C V C T T S Q S S A L L R N L S V T H P I L K V L T S S G R L K Q L H N G L G G C V Y T T S Q S S A L L R N L S V T H P V L K I L T S S G R L K Q L H N G V G G C V C T T S Q S S A I L G S L S V T H P V L K I L T A S G R L K Q L H N G V G G S I C T T S S S S L L F N N L S V T H P I L K I L T T S G R L K Q L H N G V G G Y V C L T S Q S S A I L G H L S V S H P V L K V L T A S G R I K Q V H N G T G G C V L T T S Q S S A L F N S F S V S K P V A K L L V A S 50 60 70 80 8 7 3 5 3 2 0 1 5 8 7 5 6 8 9 9 9 9 9 9 9 9 9 9 9 8 7 4 4 8 8 8 6 6 8 9 9 9 9 9 I L N H V G R F G D C G L F A A I F C L T L I E Q S R Q S G L R G R A A A Q L N I L N H V S R F S D C G L F A A I L C L S L I E Q A K Q S G L S F S V A I K L N V L N H I S R F S D C G L F A G V F C L S L L E Q A R R H E L R E S L A I K V N L Q H H T T R Y S D S G L F M G I F T L T L I E N T K K Y G L R T S T A I K V Y I Q N H V S S F S D C G L F T A I L C C N L I E N V Q R L G L T P T T V I R L N I Q N H V S S F S D C G L F T A I L C C N L I E N V Q R L G L T P T T V I R L N I Q N H V S S F S D C G L F T A I L C C N L I E N V Q R L D L T P T T V I R L N V Q N H V S S F S D C G L F T A I L C C N L I E N V Q R L G L T P T T V I R L N M Q N H V S C F S D C G L F T A I L C C N L I E N V Q R I G L T P T T V M K L N M Q N H V S C F R D C G L F T A I L C C N L I E N V Q R T G L T P T T V T K L NV R N H V A C F S D G G L F T A I L C C N L V E N V Q R L G L A P T T V I K L N V Q N H V C C F S D C G L F T A I L C C N L I E N I Q R I G L T P T T A I K L N V Q N H V S C F S D C G L F T A I L C C N L I E N I Q R L D L T P A T A I K L N V Q N H L T R F S D C G L F T A I L C C N L I E R C Q K V N L A P R T V T E A N V Q N H V S H F S D C G L F T A I L C C H L I E N F Q R L N L S S A T V I K V S V Q N H V S R F S D C G L F T A I L C C S L I E N F K N L N I A S C T I I K I S I R N H I S C F S D S G L F A A S L C C Y L V D H F F N L N I A R H T V I K V S 90 100 110 1202 B 3 C D E 1 A1A6D.A Sec Str description 1A6D.A Sec Str description 1A6D.A Sec Str description

PAGE 2

9 9 9 9 9 9 9 9 8 8 5 2 7 8 8 8 8 4 5 7 8 7 4 4 8 6 6 8 9 9 9 9 9 9 9 8 7 4 4 7 K H L L D L C T V Y L R Q E D C G C K M E L G F N S S H N L T T L A S S I I S S K H L L G L C T G Y L Q Q E D C G C K V K L D F C S S H N L I T L A R S I I S S K H C L S L C T S Y L Q R D D C A C R V K I D F S S S Q N L M T L A G S I I S S K H L V E Q C N V Y L K G D S C G C K V P V E F S S C D S L V A L A R S M I T S K H L L S L C I S Y L K S E T C G C R I P V D F S S T Q I L L C L V R S I L T S K H L L S L C I S Y L K S E T C G C R I P V D F S S T Q I L L C L V R S I L T S K H L L S L C I S Y L K S E T C G C R I P V D F G S T Q I L L C F V R S V L T S K H L L S L C I S Y L K S E T C G C R I P V D F S S T Q I L L C L V R S I L T S K H L L R L C I S Y L K S E V C G C R I P V D F S S T Q I L L C L V R S I L T S Q H L L S L C T S Y L K S E S C G C R I L V D F S S P Q T L L C L V R S I V T SK H L L S L C T S Y L K S E G C A C R I P V D F S S T Q I L L S L V R S I L T S K Y L L G L S I S Y L K S E A C S C R I P V D F R S T H T F L N L V H S I L T S K Y L L S L C T S Y L K S E A C S C R I P V D F R S T H T F L S L V H S I L T S K H L L S L C V E Y L R S E V C A C R I P V D F S R T R T L L Q L V R S I L T S N H L L S L C T D Y L N S E T C G C R I S V D F S N M K T L H C L V Q S I L T S K H L L S L C T D Y L K S K A C G C R V S V D F S N L E T L L C L V R S I L T S R S L L N M C I S Y L S A D D C A C K V K V D F N S S K P L L C L I R S V V S S 130 140 150 160 8 8 8 7 7 6 5 6 8 9 9 9 9 9 9 9 9 9 9 9 8 7 6 2 1 5 7 7 7 8 8 8 5 5 8 9 8 9 8 9 K P A C V L T Q S E V L H M S K L S V Q A F L L T V P C S S T G V K L G E T V K P A C V L T E S E A L H I S K L A V H A F L L T V P C N R P G E V R L G K A V K P A C L L T E A E K L H I S T M A V R A F L Q T I P C S S P G P V T L G Q T V K P A C M L D S R E M Q Q I S S L I T Q A F L Y S I P C N S S G T A C F G R T V K P A C M L T R K E T E H V S A L I L R A F L L T I P E N A E G H I I L G K S L K P A C M L T R K E T E H V S A L I L R A F L L T I P E N A E G H I I L G K S L K P A C M L T R K E T E H V S A L I L R A F L L T I P E N A E G H I I L G K S L K P A C M L T R K E T E H V S A L I L R A F L L T I P E N A E G H I I L G K S L K P A C M L T R K E I D H V S A L I L R T F L L T I P E N T E D H I I L G K S I K P A C M L T R E E R D H L S A L I L R A F L L T I P E N T K D R L I L G N S I KP A C M L I A K E V D H I S T L I L R A F L L T I P E N A Q E H I I L G K S I K P A C M L T R K E I D H I G A L I L K A F L L T I P E S A E E R M V L G K S I K P A C M L T R K E T D H I G A L I L K A F L L T I P E S T E E R M V L G K S I K P A C M L T V R E V D H I G A L I L R A F L L T V P Q N A E G R A I L G K S I K P A C L L N K K E A D H I S T L I L K A F L L T I P E K T N D F A I L G K S L K P A C M L N K T E V D C L A V L I V K A F M Y T V P C H P E T K P V L G K C V K P A C M L T T Q E A D Y I S T L I L K A F I C T I P D K S G P N I V L G K S V 170 180 190 200 9 9 9 8 5 4 7 4 1 5 6 7 6 4 1 1 1 0 1 1 3 6 7 7 7 7 7 7 7 7 7 6 7 7 7 7 7 7 6 4 T V P V E G P P V K H S A V F P G L L V D M P D L L S L D K A K R P H L R P T V S V E G H P V L N S A V F P G L L V D A P D V S G I D E S E N M R S N P T V T V R G L P V L D S A M F P G L L V E T H D D F N Q T D S S S E L T I G I E G Q S V N H S S V F P G L L L D V P E M L L P G D L E R L G D G P I V P L K G Q R V I D S T V L P G I L I E M S E V Q L M R L L P I K K S T A I V P L K G Q R V I D S T V L P G I L I E M S E V Q L M R L L P I K K S T A I V P L K G Q R V I D S T V L P G I L I E M S E V Q L M R L L P I K K S T A I V P L K D Q R V I D S T V L P G I L I E M S E V Q L M R L L P I K K S T A I V P L K G Q R V I D S T V L P G I L I E M S E I Q F V K L L P V K K P G S I V P L K G Q R V L D S S V L P G I L I E M S E I Q L M K I L P I K K S E A I V P L K G Q R V I D S A V L P G I L I E V SE V Q L M K I L P I K K S D S I V P L K G Q Q V T D S T V L P G L L I E A S E V Q L R R L L P T Q K S S T I V P L K G Q R V T D S T V L P G L L I E A S E V Q L R R L L P T Q K A S G I V P L K G H R V M D S T V L P G L L I E M P E F L S M K T L P D R T L P G R T I I P L K G E R V M D S T V L P G I V I E M P E V Q L M T F P I K K L P S N A T V P L K G K R V L D S T V I P G L L I E T P E I Q F A K P F S V K R T S S D A I V P I E G H S V S E S S V V P G L L I E M P E F C W S R S V P S S G L P F A E 210 220 230 2404 F G 51A6D.A Sec Str description N-TERMINAL EQUATORIAL DOMAIN1A6D.A Sec Str description 6 71A6D.A Sec Str description 8 9 N-TERMINAL INTERMEDIATE DOMAIN

PAGE 3

3 6 7 8 7 7 6 5 4 2 5 6 7 7 7 8 7 7 4 4 7 7 7 4 4 7 8 7 7 4 6 8 9 9 9 9 9 9 9 9 L R A V V F S A S L A G D L S D T G D G V I E V H E G E D T D S Q I L D R L L E L R V V L F S A S L A G D L S E L G D G I F D V H P G V D T D L Q I L D R L L E L R T V L F S T S L A G D L A E L G D G T I E A D P N L D T D L Q I L D Q L L E F K V V L F G V S L S G D I S E V G D V A L E V H R G L N P E R D L L Q Q L L K L K V A L F C T T L S G D I S D T G E G T V V V S Y G V S L E N A V L D Q L L N L K V A L F C T T L S G D T S D T G E G T V V V S Y G V S L E N A V L D Q L L N L K V A L F C T T L S G D I S D T G E G T V V V S Y G V S L E N A A L D Q L L N L K V A L F C T T L S G D I S D T G E G T V V V S Y G V S L E N A V L D Q L L N L K V A L F C A S L S G D L S D T G E G T V V V S Y G V S L E N A V L D Q L L N F K V A L F C A S L S G D L S D T G E G T M V V T Y G V S L E N A V L E Q L L SF K V A L F C V S L S G D L S D T G E G T L L V S Y G V S L E N A A L D Q L F N L R V A L F C A S L S G D F S N A G E G T L V V H Y Q V S L E N A V L E Q L L N L R V A L F C T S L S G D F S N A G E G V V V A H Y Q V S L E N A V L E Q L L N H T M A L F C V S M A G D L S D V G E G T V V I S C D V S L E T A A L D Q L L I L K V A L F C I S M S G E I S D S G E G T L I V T Y G V S L E N A V V D Q L L K V K V A V F C V S M S G D L F D P E E G T V T V H H K I S L E M S E L D Q L L N I K L A L F S I S L S G D L C D T G E G T L N I L N S V D T E N V M L D Q L L I 250 260 270 280 9 9 9 9 8 7 3 5 4 3 4 8 8 8 8 6 1 3 3 4 3 6 6 6 6 6 2 4 8 9 7 4 8 8 8 7 4 5 6 4 L G K Q V V E D E V K L F V C Q K V V H P V L Q Q Y L R S Q D V L V V E R L G V L V K Q V V E D E V K L F V C Q K V I H P V V Q Q H L K S R G V I V I E R L G N L G K Q L V E D E V R L C V C Q K V I H P V L Q Q Y L R S H G V V V M E R L G M I G E Q A V K D K V S L F A C Q K V V H P V L Q H Y L R E H E V V V I E R L G L L G R Q L I S D H V D L V L C Q K V I H P S L K Q F L N M H R I I A I D R I G V L G R Q L I S D H V D L V L C Q K V I H P S L K Q F L N M H R I I A I D R I G V L G R Q L I S D H V D L V L C Q K V I H P S L K Q F L N M H R I I A I D R I G V L G R Q L V S D H V D L V L C Q K V I H P S L K Q F L N M H R V I A V D R I G V L G R Q L V S D H V D L V M C Q K V I H P S L K Q F L S M H H V I A I D R V G V L G R Q L V S D H V D L V V C Q K V I H P S L K Q F L S M H R V I A I D R V G VL G K Q L V S D H V D L V L C Q K V I H P S L K Q F L R T H H V I A V D R I G V L G R Q L V S D H V D L V L C Q K V I H P S L K Q F L S E H Q I I A I D R V G V L G R R L V T D H V D L V L C Q K V I H P S L K Q F F S E R H V M A I D R V G V L G K Q L I D D Q V D L V V C Q K V I H P S L K Q Y L N Q H R V I A V D R V G L L G N Q L I S D H V D I V V C Q K V I H P V L K Q Y L H Q H H I V T I D R V G I V G K Q L V N D E V G L V V C Q K V I H P L L K Q Y L K D N N V I A V D R A G L L G K K L V E D R V N F L L C Q K V V H P S L K Q Y L K E H S V V A V D R L G A 290 300 310 320 5 5 4 3 8 7 7 6 1 0 4 7 6 5 2 3 3 3 0 4 5 3 1 3 3 0 3 1 0 3 6 5 3 4 6 6 5 4 2 1 A L M E P L L Q L T G A N P V A T L H T P P A E A Y G T V H D L G I R Q F G S K P L M E T L T Q L T G A Q P V A T L H T T P A K A Y G E V R D L S I I H F G S R A L M E P L G T L A G T Q P V A T L H A R P S T A Y G R V K N L S T K Q F G F K A L M E P F A K I T G A R A V A S L F S L P M E A Y G L V A G L C F Q D C G S K T L M E P L T K M T G T Q P I G S L G S I P N S Y G S V K D V C T A K F G S K T L M E P L T K M T G T Q P I G S L G S I P N S Y G S V K D V C T A K F G S K T L M E P L T K M T G T Q P I G S L G S I P N S Y G S V K D V C T A K F G S K T L M E P L T K M T G T Q P I G S L G S I P N S Y G S V K D V C T A K F G S K A L M E P L S K V T G T R P I G S L G S V P S T Y G S V K D W C T A K F G F K A L M E P L S K V T G T Q P I G S I G S I P S S Y G S V K D L C P A K F G F K S L M E P LS K V T G T W P I G S L G S I P S S Y G S V K D L C I A K F G C K T L M E P L S K V T G A T P I G S L Y P I S T T Y G S V K D V R S A R F G S K T L M E S L S K V T G A T P I G S L N P I S T T Y G S V K D V C S A R F G S K A L M E P L A Q M T G A Q P I G S L D S V P T G Y G N V K D L C F A N F G S K S L M E P L C E M T G T Q P I G S L N F I P T S Y G Y V K D L C Y T N F G S K S V M E P L S Q V T G S K P I A S I Y S L P S C Y G S L K D V R A E S F A S K A L M E P V S Q M T G A Q P I A S L S S I P D T C Y G S L Q G L H K M S M G S K 330 340 350 360 1A6D.A Sec Str description 11 1A6D.A Sec Str descriptionH 12 I J 1A6D.A Sec Str description 13 14 15 APICAL DOMAIN

PAGE 4

0114337765225555526976899999999999999999 KML H LL P AESSICTMMLC H RNATVLSELKVVCQNTE H VLR TMLQL H PP ESSICTMILC H RNETMLGELKVAFQKTE H VLR TML H LQADEAAVCTAVLC H RNETMLDELKVVWRKTEDVLR KLLQLLSS H AAISTMVLC H RNETMLEELKMTCQRAE H VLR H FF H LI P NEATICSLLLCNRNDTAWDELKLTCQTAL H VLQ H FF H LI P NEATICSLLLCNRNDTAWDELKLTCQTAL H VLQ H FF H LI P NEATICSLLLCNRNDTAWDELKLTCQTAL H VLQ H FF H LI P NEATICSLLLCNRNDTAWDELKLTCQTAL H VLQ H FF H LI P NEATVCSLLLCNRSDTAWDELKVTCQTAL H VLQ H FL H LI P YEATICSLLLCNRNDTAWDELKLTCQTAL H VLQ H FF H LI P NKTTICSLLLCNRNDTAWDELKLTCQTAL H ALQ YFF H LL P NEATICSLLLCSRNDTAWEELKLTCQTAM H VLQ H FF H LL P NEATVCTLLLCSRNDTAWEELKLTCQTAM H VLQ H FL H LI P NDDTVCSLILCNRNETAWEELKLACQTAQ H VLQ P YF H LI P NDSTVCSLLLCNRNETSWNELKLTCQTAQ H VLK H FV H LI P NDTTVCSLILCNRNETTWDELKRACETAE H VLQ H YI H LI P TGNSVCSFVLCNRNETSLKELMRTCEAAERVLQ 370 380 390 400 9861433015887656899999984377777777787766 LTLRD P YILLGGGCTET H LATYVR H KVSGQLTETAAALGC LTLRE P SALLGGGCTET H LAAYVR H KSIQEV P ETLSALGC LTLKE P SALLGGGCTET H LSAYVK H KGQQEASETAAGLRC LTLRE P YALLGGGCTETQLAT H IS H MNQSTA P TTAAALGI LTLKE P WALLGGGCTET H LAAYIR H KT H ND P ESILKDDEC LTLKE P WALLGGGCTET H LAAYIR H KT H ND P ESILKDDEC LTLKE P WALLGGGCTET H LAAYIR H KT H ND P ESILKDDEC LTLKE P WALLGGGCTET H LAAYIR H KT H ND P KSILKDDEC LTLKE P SVLLGGGCTET H LAAYIR H KTCNE P ESILKDEGC LTIKE P CVLLGGGCTET H LAAYIR H KTCNE P ESILKDDGC LTIKD P YALLGGGCTET H LVAYIR H KTRNE P ESVLRDERC LTIKE P WVLLGGGCVET H LAAYIR H KV H NEAEAIVRDDGC LTIKE P WVLLGGGCTET H LAAYVR H KV HH EAEAIVRDDGC LTIKQ P LALLGGGCTET H LATFIR H KSRSV P DSILEVSGC LMIRE P LALLGGGCTET H LAFFLS H KVISV P ECVLKANDC LTLKE P LALLGGGCTET H LASYLRQKTCCLSTSSSKDVEC LTLKN P WVLLGGGCTET H LAALLRYKSANM H SSGLTELNC 410 420 430 440 4689999999999999998753788876526777651402 SQEECLLGLEGFCRSLQSVAMALE H DGGSSAMDLT H A HH W SQAELLLGAGGFCRSLESVAAALE H DGGNSLMDLT H A HH W S P AEFLLGVKVFCRSLDMVARSLE H DGGSSAMDLN H A HH W S H SEFLMAVESFRSSLLAVALSLE H DGQDCLIDLTYG H RW TQTELQLIAEAFCSALESVVGSLE H DGGDILTDMKYG H LW TQTELQLIAEAFCSALESVVGSLE H DGGEILTDMKYG H LW TQTELQLIAEAFCSALESVVGSLE H DGGEILTDMKYG H LW TQTELQLIAEAFCSALESVVGSLE H DGGEILTDMKYG H LW TQTELQLITEAFCSALESLAGSLE H DGGEILTDLKYG H FW TQTELQLSTEAFCSALESVAGSLE H DGGEVLTDMKYG H FW SQTELQLITEAFCSALESLAGSLE H DGGEILTDSKYG H FW SQAEL H IATEAFCSALESAAVSLE H DGGEILIDMKYG H FW TQAKL H VAAEAFCSALESVAGSLE H DGGEILIDTKYG H LW SRTEYQLVADAFCSSLESVARALE H DEGEILTDVEYG H CW TQTEYKIVANAFCNALQAVACSLE H DKGKILADRKYG H FW S P TQYQLITDAFC H SLESVACSLS H DGGEILTDMAYG H CW TAAEYQLVADCFCISLEALARNLE H DGGEMSIDLQMG H CW 450 460 470 480 1A6D.A Sec Str description16 17 1A6D.A Sec Str description K 18 L M1A6D.A Sec Str description N O C-TERMINAL INTERMEDIATE DOMAIN

PAGE 5

337776300045667777744652323223322146777 7 I L P A D V M R E EVE D T L G P C S C G L VE G G P N K P K T Y L S T K T P T H P A D V M R E H L D ES L G F C G C G L M E G S P E Q RR AY L N T N Y P T VSA N ASE E D M A GG L G L C G C G L VE N SS T E W TF L N T EY A V M D I SS Q T EV K H T C G C G LL E DR S N L EK T H L N T A C Q SV Q A D S P C VA N W P D LL S Q C G C G L Y N S Q EE L N W S F L R S T R H SV Q A D S P C VA N W P D LL S Q C G C G L Y N S Q EE L N W S F L R S T R R SV Q A D S P C VA N W P D LL S Q C G C G L Y N S Q EE L N W S F L R S T R R SV Q A D S P C VA N W P D LL S H C G C G L Y N S Q EE L S W S F L R S T R H SV Q A D S P S I V N W S N LL S RC G C G L RN S Q EE L S W S F L R S T C H S A H AE A P S I V N W P D LL S RC G C G L Y N S Q EE L H W S F L KS T R H SV Q A N L P SAV N W P D L P S RC G C G L YSS Q EE L S W S F L R SA R H S C P A D SASV G N W P D T L S RC G C G L Y N S Q EE L S W SV L R S T Y H S C Q A D SASV G N W S D T L S RC G C G L Y N S Q EE L S W SV L R S T Y H SV Q P G S P T N V D W AE L VS RC G C G I YS NR EK L N W RN L R S P S H SV Q P D F P P N VK W Q D L VSK C G C G L C S N KEE L N W C LL Q S I H S L V P S G F P SVS N W S D LL SK C G C G I N SS T K D L N W R H L Q P L F G SA A P D G AV D F H C S D T GQQ C G C G T H R K L E G L K C SA L G S R Y E 490 500 510 520 777777777666788873241578888518999999871 7 A F S P G P C A D A M A Q P R V L D S FT AK I N A L Q VAVE T A N L Q F S P A P L S RD T AV Q P R V L D S F AAK F N A L Q VAVE T A N L E F C P A P LL G D S CR K P T V L D S FT AKV N A L Q VAVE T A N L TF S P V F L E N S D T Q P K V L D S F SAK L N A L N VAVE M AS L P F V P Q S C L P H EAV G SAS N L I L DC L T AK L S G L Q VA I E T A N L P F V P Q S C L P H EAV G SAS N L T L DC L T AK L S G L Q VAVE T A N L P F V P Q T C L P H EAV G SAS N L T L DC L T AK L S G L Q VAVE T A N L P F V P Q T C L P L EAV G SAS N L T L DC L T AK L S G L Q VAVE T A N L P F A P Q T C L P H EAA G SA NN L T L DC FT AK L N G L Q VAVE T A N L P F V P Q T Y L P H EA T G SA DN L T L DC FT AK L S G L Q VAVE T A N L P F P P Q T C L S R EA T G SA D T L T L DC FT AK L S G L Q VAVE T AS L P F A P Q T C L P Q AAS G SVS N L T V D S FT AK L S G L Q VAVE T A N L P F A P Q T C L P Q AA L G SAS N L T V DC FT AK L S G L Q VAVE T A N L P L I PP P C H S QQ A L D SV D Q L T L DC F VAK L S G L Q VAVE T AS L AS Q N S P L Q Q P T N S L K D H L T L DC FT AK L H G L Q VAVE T AS L S S P I ES C P K E P S G KV T D F M T L DC F AAKY N G L Q VAVE T A N L P F C P QG S T EK Q L Y P KS D K F V F DC F AAK CN A L Q VAV N T AS L 530 540 550 560 88765145289 9 A L D V R F V I Q D M N A L D V R Y L V Q D I N A L D V R YV I K D V N V L D VKY II K D V N I L D L SYV I E D K N I L D L SYV I E D K N I L D L SYV I E D K D I L D L SYV I E D K N I L D L SYV I E D K N I L D L SYV I E D K N I L D L S C V I E D T N I L D L SYV I E D K N I L D L SYV I E D K N I L D L AYV I E D K N I L D L SYV I E D K N I L D L S C V I E D Q N I L D L SYV I E D P N 570 1A6D.A Sec Str description 19 20 P 1A6D.A Sec Str description 21 Q 22 1A6D.A Sec Str description C-TERMINA L EQU A T ORIA L DOMAIN Supplementary figure S13. Alignment and secondary-structure predictions of MKKS sequences compared to PDB secondary-structure description of 1a6d. See Legend for Supplementary figure S10 for symbols and Legend for Figure 2 for species abbreviations.



PAGE 1

9 9 9 8 8 7 6 4 6 8 9 9 9 9 9 9 9 9 9 9 9 8 7 5 3 7 8 8 8 8 7 4 4 8 8 8 7 2 7 6 M P P S E R L P L E Q V L E I V S A L E A V V L R S F G P E G G Q V L F T R D T M L P V E H L H L E H V L Q T V C V L E S V V C R S F G P E G G Q V L F T Q D T M V P V E H L H L D H V Q Q I V C V L E S V I L R C F G P D G G Q V L F I R D T M V P V E H L H L D H V A Q I V C V L E S V I L R C F G P D G G Q V L F I R D T M A A A A S V K A A L Q V A E A L E S I V S C C V G P D G R Q V L C T K P T M A A A G S V T V A L H V A E V L E T I V S G C L G P E G R Q V L C T K P T M A A T G S V E A A L Q V A E A L E T I V S R C L G P E G R Q V L C T K P T S S M V A A G S V K A A L Q V A E V L E A I V S C C V G P E G R Q V L C T K P T S S M A A A G S V K A A L Q V A E V L E A I V S C C V G P E G R Q V L C T K P T S S M A A A G S V K A A L Q V A E V L E A I V S C C V G P E G R Q V L C T K P T S S M A A TG S L K A A L Q V A E V L E A I V S C C M G P E G R Q V L C T K P T M A S Q G S V T A A L R V A E V L E T I A N R C V G P E G G L V L C T K P T M A S Q G S V T A A L R V A E V L E S I A N R C V G P E G G Q V L C T K P T M A A A T A G G V G V A L Q A A E V L E S I L S G S V G P E G R Q V L C T K P T K K Y P L A V D V S K V L Q V A E S L E N I V C R C F G P D G G H V L F I K S T 1 10 20 30 40 1 A1A6D.A Sec Str description 2 3 0 6 7 5 3 4 6 5 6 5 4 4 3 4 7 8 8 7 7 5 6 8 9 9 9 9 9 9 9 8 7 4 4 7 8 8 7 6 5 G Q A M L S R S G T C I L S A L H L E H P L A R V V V E S V W K H S R G T G D G G Q A M L S R S G A R I L T A L R L E H P L A R M V V E C V L K H S T E T G D G G Q A M I S R S G S R I L S A L R L E H P L A R M V V D C V L K H S T A T G D G G Q A M L S R T G S Q I L S A L R L E H P L A R V V V D C V L K H S A A T G D G G E V L L S R D G G R L L Q A L H L E H P I A R M M V A C V S N H L R K A G D G G E V L L S R D G G C L L K A L H L E H P V A R V M V A C V S S H L R K T G D G G E V L L S R D G G R L L G A L H L E H P A A R L V V A Y V S S H L R K T G D G G E V L L S R N G G R L L E A L H L E H P I A R M I V D C V S S H L K K T G D G G E V L L S R N G G R L L E A L H L E H P I A R M I V D C V S S H L K K T G D G G E V L L S R N G G R L L E A L H L E H P I A R M I V D C V S S H L K K T G D GG E V L L S R N G G R L L E A L H L Q H P I A R M I V D C V S S H L K K T G D G G E V L L S R D G G C L L E A L H L E H P L A R M I V A C V S S H L K K T G D G G E V L L S R D G G C L L E A L H L E H P L A R M I V A C V S S H L K K T G D G G D V L F S R D G G R L L E A L N V E H P V A R M I V T C V S M N Q N V T G D G G D L L I T R D G R K I L E S L L L D H P I G R I I V H S A C N H A S I T G D G 50 60 70 80 1A6D.A Sec Str description 3 B C 0 1 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 8 7 4 3 7 7 8 8 8 7 7 6 6 4 6 8 9 9 9 9 9 9 9 S K T F I L L L A S L L R G I S K A T S R S A S H A Y G F T E A A K A A A A P Y S K T F I I L M A S L L R I I H T T A C R H V S R T H T S K A A A E T A S A R H S K T F V L L L T S L L R T I H A S A C K N V S H N Y V S R Y S A E A A T G R H S K T F V L L L A S L L R T I H A A A C K H A S H T Y M T R Y S A E A A T A R H A K T F I I F L S H L L R G L H A I P D Q S E N S Q T L G R H W K N C C R W K F A K T F I I F L C H L L R G L R E I T D K F E N I Q T C G R H W K N C C R W K C A K T F V V F L C H L L R G L H A M T D G S K H M Q T H R R H W K N C C R W K L A K T F I I F L C H L L R G L H A I T D S C E N I Q T H G R H W K N C S Q W K C A K T F I I F L C H L L R G L H A I T D R C E N I Q T Q G R H W K N C S R W K F A K T F I I F L C H L L R G L H A I T D R C E N I Q T H G R H W K N C S R W K FA K T F I I F L C H L L R G L H A V T D R C E N I Q T H G R H W K N C C Q W K F A K T F I I F L C H L L R G L H A I G E K S E D I Q S H E R H W K N C C Q W K S A K T F I I F L C H L L R G L H A I G E K S E N I Q S H E R H W K N C C Q W K S A K T F I I L L C D L L R G L K A F T E K G G S I P S Q E R H W K I C C Q W K Y V K S F V V L L C G V L R G L Q A A V N K S G G L T G K Q S N P N Q G H V F K R 90 100 110 120 1A6D.A Sec Str descriptionD E N-TERMINAL EQUATORIAL DOMAIN

PAGE 2

9 9 9 9 9 9 7 4 4 7 8 8 7 7 6 5 6 8 9 9 9 9 9 9 9 9 9 9 8 7 4 3 7 8 8 8 8 7 6 6 R L L L A S F F H T R V S H V H C S Y L S E L T C E M L N R W K Y K S D Q N L L A S F F Q S R V G P T H C D L M S N L T C E L L T N W K F K N D Q K L L A S F F C S R L S R S Q C D F A T S L T C N L L S S C G F K G A Q R L L A S F F R S R L S R S Q G H F A T N L A C D L L T S C Q F K G A E L L L E A Y F C G R V G R N N H N L I S Q L M C D Y L F K C V A R E G E M L L G A Y F C G R V G R N N Q N L M S Q L M C D Y V F K C T A C E S E S L L A A Y F G G R V G R N N R D F M S R L T C D Y F F K C M A R E R E L L L E A Y F C G R V G R N N H K F I S Q L M C D Y F F K C M T C E S E L L L E A Y F C G R V G R N N H K F I S Q L M C D Y F F K C M T C K S E L L L E A Y F C G R V G R N N H K F I S Q L M C D Y F F K C M T C K S E L L L E A Y F C G R V G R N N H K F I S Q L M C D Y F F K C M T C K T E L L LE P Y F C G R V G R N N H R F I S Q L M C D Y V F K C M A C E S E L L L E A Y F C G R V G R N N H R F I S Q L M C D Y V F K C M A C E S E S L L D A Y F C G R V G R N N Q K F I S Q L T C D Y F Y K C L Q H E D Q S V L D T Y F S G R I G Y N N Q A F I S S L V A D Y F H K C L P Y N K 170 180 190 200 1A6D.A Sec Str description G 5 9 9 9 9 9 9 9 8 7 5 1 2 7 6 7 9 9 9 9 9 9 7 4 4 7 1 1 1 1 3 6 6 6 5 3 4 3 7 9 9 F A N K L L E L V S S D L D D L I A V N V V P Y G F C E D F T T H S G L A D K M L A F A S E E L D D L I A A A V V P Y G C C S A K S Q A H S N N P C V L A F K T L E F A S E Q L E N L I T A A V I P Y G C S T A H L Q T G A G S R H V L A Y K I L E F A S G Q L E D L I A T A V S P Y G R S T A Q L Q P H T D S H C V I S Q A L L T F Q T Q I L D Y V M D E Y L S R H F L S S S T K E R T L C R N S L I S Q A F L T F Q T Q I L D Y V M D H Y L S R H F L S S C T K E R T L C R S S L I S Q F L L T F Q T H R L D Y I V D H Y L S R H F G S S S T K E R T L C R S S L I S Q A L L T F Q T Q I L D G I M D Q Y L S R H F L S S S A K E K T L C R S S L I S Q A L L T F Q T Q I L D G I M D Q Y L S R H F L S S S A K E R T L C R S S L I S Q A L L T F Q T Q I L D G I M D Q Y L S R H F L S S S A K E R T L C R S S L I S Q A LV T F Q T Q I L D G I M D Q Y L S R H F L S S S A K E R T L C R S S L I S R A L L R F Q T Q T L G C I V D Q H L S R H Y L S S S A E G R T L C R R S L I S Q A L Q T F Q T Q T L G C I V D R S L S R H Y L S S S T E G R K L C R H S L I S Q A I L T F Q A H I L D Y I M T Q Y L R K H F L S F S G E E K K V C R S S L L S N L I M T F Q T E I L E N I I V K Q L S P H F E S F L K E T N T L S G G T I 130 140 150 160 E 4 F1A6D.A Sec Str description 4 6 8 9 9 9 9 9 9 8 5 5 7 3 5 8 8 8 9 8 9 8 5 4 8 8 8 8 4 5 8 8 8 7 5 2 6 5 3 4 L P Y I S F Q F L S D N F P A L H T P V S G F P V S S S R L I E G Q V I H R D F S P S S S L H F I R D N L A A L H T P V S G F P I S F S R L V E G Q V I H R D F Q P S S L L Q F L T D N F P T L H T R V S G F P F T C S R L V E G Q V I H R D F Q S L S S L Q F L T D N F P V M H T C V S G F P F T C S R L V E G Q V I H R D F G F E E V F E L V G D C F V E L N V G V T G L P V S D S R I V A G L V L H R D F R F E E V L E L V D D C F V E L K V G V T G L P V S D S R I I A G L V L Q R D F G S E E V L D L V D D Y F V E L S V G V T G L P V S D S R I T P G L V L P R D F G I G V F E L V D D Y F V E L N V G V T G L P V S D S R I I A G L V L Q K D F G I G V F E L V D D H F V E L N V G V T G L P V S D S R I I A G L V L Q K D F G I G V F E L V D D H F V E L N V G V T G L P V S D S R I I A G L V L Q K D F G I DV F E L V D D Y F V E L N V G V T G L P I S D S R I I A G L V L Q K D F G I E V F E F L D N C F V E L N V G V T G L P V S E S R I V D G L V L P R D F G V E V F E L L D H C F A E L N V G V T G L P V S D S R I I D G L V L P R D F G K D E M I D L V D E Y F L E L C T A V T G L P V S S S K I I S G F V I H R D F C I D D V V H T V N T C F S E L H T E V A G E P I E N S R I L S G I V L H R Q F 210 220 230 240 1A6D.A Sec Str description6 7 8 9 N-TERMINAL INTERMEDIATE DOMAIN

PAGE 3

5 6 7 6 6 6 7 7 7 8 7 4 4 8 8 8 7 4 3 7 7 7 7 6 6 7 7 7 7 7 3 4 6 6 4 4 7 7 7 4 A T P C L P S K Q Q P V K V V V F T D Y M Q P I L L S T G D V L K L T P E S S A A P C P R A D L Q P V K A V V L T G Y L Q P K L L R A G E V L E L S E E R N A S F C P P N V S D L P V K A V I F T G G L E P E I L T S G E V L E L C G Q T S A T P C P P T G S D L P V R A V V F T G D L E P E I L T A G E V L D L C G Q T N S G Y C P A D G D I R I A L V T E T I Q P I F S T S G S E F I L N S E A Q S V Y C P A D G D I R I V I V T E T I Q P L F S I S G S E F T L N S E A Q S V Y R P A D G D I R I V I V T E T V Q P L F S T S G S E F I L R S E A Q S V Y C P A D G D I R M V I V T E T V Q P L F S T S G S E F I L N S E A Q S V Y R P A D G D M R M V I V T E T I Q P L F S T S G S E F I L N S E A Q S V Y R P A D G D M R M V I V T E T I Q P L F S T S G S E F I L N S E A Q S V Y C P A D G D I R M V I V T E T I QP L F S T S G S E F I L N S E A Q S V Y C P A D G G I R M V I V T E I L Q P L F S S S S S E F V L D S E T Q S M Y C P A D G D I R M V I V T E I L Q P Q F S S A G S E F V L N S E T Q S V Y C P A D G D M R I I I V T E P I Q P A L S S S G S E L L I N S E A Q A V Y C P A E G E I R A V I I T D Q I H Q Y L S A S D V E F A V C S D A Q 250 260 270 280 10 10 1A6D.A Sec Str description11 7 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 8 5 4 9 7 2 7 7 7 6 2 1 6 5 7 8 9 9 9 9 9 I V H F S A W T E R S L E S V I A E L Q S L G V S L L L S A V K Q S D A A V A L I M D F S A W G E R S L E C V L A N L Q S L G V S V L L S A V K Q S A A V L A L I V D F K A W A E R S L S C V F A N L Q R L G I S L L L S A V K Q S P A V L S L I V K F K A W A E R S L R C I F A N L Q R L D I S L L L S A V K Q S P A V L S L F R A S Q F W I M E R T K A I M K H L Q S Q N V K L L L S S V K Q P D S V I F Y F Q T S Q F W I T E R T K A I V K H L Q N Q N V K L L L S T V K Q P D L V I Y C F Q T S R F W I M E R T K A I M K H L Q S Q N V K L L L S S V K Q P D S V I Y Y F Q T S Q F W I M E K T Q A I M K H L H S Q N V K L L I S S V K Q P D L V I Y Y F Q T S Q F W I M E K T K A I M K H L H S Q N V K L L I S S V K Q P D L V I Y Y F Q T S Q F W I M E K T K A I M K H L H S Q N V K L L I S S V K Q P D L V S Y YF Q T S Q F W I M E K T K A I M K H L R S Q N V K L L L S S V K Q P D L V I Y Y F Q A S Q S W I M D R T K T V M N H L R S H N V K L L L S S V K Q P D L V T Y C F Q A S Q C W I T D R T K T V M N H L R G Q N V K L L L T S V K Q P D L V I Y C F Q A S H I W I T E K T K T I M K D L Q S K G I K L L L S S V K Q Q E T V I Y Y L H L S Q M Y L R Q R T E K L M K H F Q N M E V R L I L S T I K Q P E I V L F Y 290 300 310 320 1A6D.A Sec Str descriptionH 12 I 9 8 4 5 8 7 4 7 7 5 4 7 5 1 1 8 9 9 9 9 9 9 8 5 2 7 8 7 4 1 4 6 6 6 7 8 7 7 6 3 A T Q A H M S L V E C V S E E E L S L F V Q L S G V T P V S R G C T I Q L E H A T Q A E M S I V E C V S E D E L S L F Q R L S G A T P V R D C R V I E P E H A A Q A N I C I V E C V S E D E L T L F A Q L S R T Q P V S D C Q I I G P S N A A Q A N I C V V E C V S K D D L A L F T Q L S Q A Q A V S D C Q I I G S K N A R L N D I S V V E C L S S E E V S L I Q R V I D L S P C V Q A S S R C E I S N A G L N G I S V V E C L S S E E L S L I Q R V I G L A P F V E A S S Q Y E L S H S G L A G I S V V E C L S P E E V S L I R R I T G L S P L A H A S S Q D E I C N A G V N G I S V V E C L S S E E V S L I R R I I G L S P F V Q A F S Q C E I P N A G V N G I S V V E C L S S E E V S L I R R I I G L S P F V Q A F S Q C E I P N A G V N G I S V V E C L S S E E V S L I R R I I G L S P F V Q A F S Q C E I P N A G V NG I S V V E C L S S E E V S L I R R I I G L S P F V Q A F S Q C E I S N A R L N S I S V V E C L S S E E V S L V Q R I T G L S P C V E L A S Q C H I A D A R L N S I S V V E C L S A E E V S L V Q R I T G L S P C V E V A S Q C E I S D A K Q N G I S V V E Y I P S E E I S L L C R I I S L S P F M W A T S V C H I S D A K Q N G I S V V D C L P T E E I E L V C L I T G V S P L S G D E F S G Q P L D 330 340 350 360 1A6D.A Sec Str description 13 J 14 APICAL DOMAIN

PAGE 4

6 7 7 7 7 7 6 3 1 2 3 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 T C F Y L K V Y S N L V V P S I E L E K H F P C S T P K V T P M D T Y Q T N E C P S N L K V C S N L I V P S V K L E Q N V L C S T P K V A V T D S Y Q T D E C T S Y S D L V V P S V E L E T C I I P C S A P K V T P T G T Y Q T D E C A A Y L K V H S N L V I S D V E L E T Y I P Y S T P T L T P T D T F Q T V E G T S Y L K V H S N L V I P D V E L E T Y I P Y S T P T L T P T D R F Q T V E G T S Y L K V H S N L V I P D V E L E T Y I P Y S T P T L T P T D T F Q T V E G T S Y L K V H S N L V I S D V E L E T Y I P Y S T P T D T F E T D E G A S R L E V Y P R L G A S D T E L I T C K L W S A H K E T S I D P S Q T N E C P S H L E V Y S G L G A S D T E L R A G K P W S A H K K T P I A P S Q T D E C T C H L R D A P D L V L S S M E S K N D I P S T T L T Q T P K I K E D F E S D P S D S V C E C N L H R I S F NQ K L E S I K C G V H S R I I N G D I V 450 460 470 480 3 6 5 4 3 2 3 6 5 4 3 4 7 8 7 4 4 8 8 8 8 5 3 7 8 8 8 7 4 5 7 7 7 4 4 7 7 7 5 7 V A A L S F C R P I L L G G H R Y V Q V A F C E K V N P C S L V I C G P G E G Q V A T L T F C R P I L L G A H R Y V H V A F H D T V K P C S L V L C G L G E G Q V A T L T F C R P I H L G P H R Y V H L A F Q D M V K L W N L V I C G P G E G Q I A T L T F C Q P I H L G A H R Y V H L D F P D T V K P Y N L V I C G P G E G Q T A V V K F C K P L V L R S K R Y V H L G L I S S F A P H C L V L C G P V Q G L T A L A K F C K P L I L R S K R F V H L G L M S S F A P H C I A L C A P V Q G L T V L V K F C K P L T F R S K R Y V H L G V I S S F I P H C V V L C G P V Q G L T A L V K F C K P L I L R S K R Y V H L G L I S A F I P H S I V L C G P L Q G L T A L V K F C K P L I L R S K R Y V H L G L I S A F I P H S I V L C G P V Q G L T A L V K F C K P L I L R S K R Y V H L G L I S A F I P H S I V L C G P V H G LT A L V K F C K P L I L R S K R Y V H L G L I G T F I P H C V V L C G P V Q G L S A L V R F C K P L I V R S G R Y A H L G L V S A F V P H C V V L C G P V L G L S T L V K F C K P L I L R S K R Y V H L G L I S A F I P H S M V L C G P V L G L T A L V T F C R P I L L N S R R Y V H L G L L S S F I P H C L I L C G P V Q G L S F L I T C C Q P I L L G P R K Y V Q L T F S A T F L P H S V V I C G P V K G L 370 380 390 400 1A6D.A Sec Str description15 16 17 C-TERMINAL 8 9 9 9 9 9 9 9 9 9 9 9 9 9 8 7 5 0 4 2 5 6 6 6 5 3 0 0 3 5 6 7 7 7 7 7 7 7 7 6 T D Q Y I C A I R D A V C M L M T W E P K H T T N K P Y T D Q F A C A L K D A M R M L L T W D P I G I T A A P T D Q Y A S A I L D A I L M L L A W Q P L A N T S A K A T E K T L N D E A S T D Q Y A S A I H D A L H M L L A C R P L A I T A A K A V E Q H K D A L H G A F K M L R L F K H L D L N Y L T Q A S D Q N Y T S S P Q T V E Q H E N A L H G A F K M L R A F K D L D L N C V T Q T S D Q S C T S G P Q T V E Q H R D A L H G A F K M L R L L K D L D L N Y L I Q T K V V N G S I Q R Q T I E Q H E D A L H G A F K M L R L F K D L D L S Y I T Q T N D Q N G T S S L Q T I E Q H E D A L H G A L K M L R L F K D L D L N Y M T Q T N D Q N G T S S L Q T I E Q H E D A L H G A L K M L R L F K D L D L N Y M T Q T N D Q N G T S S L Q T I E Q H E D A L H G A F K M L R L F K D L D L N Y M T Q T S D Q N G T S S L QT V Q Q H E S A F H G A F K M L R L F T D L D L N C M I Q T K E R R N P S P L Q T V E Q H E R A F H G A F K M L R L F T D L D L N Y I I Q T K Q Q C N P S P L Q T T D Q H I R A F H G A F K V L R V F K V L Y L S Y K V Q R S N Q N E T S N S Q S T E Q L V S A I H G A F K M L Q L F Q P V V T N W I H T E K N E G Y C K A R Q K 410 420 430 440 1A6D.A Sec Str description K INTERMEDIATE DOMAIN

PAGE 5

77766777776411466568999999987437888876 6 4 G L K K C V F E P G C V I P A G G T F E F L L S R A L P N P S S D T N G F K Q C V L E P G C V L P A G G T F E LL L N N A L L Q H G S S C S T N K G P T H P L AV S E P G C V I P A G G T F E F L L T R A L L Q Q R H R N S S D I G P S C H AA A W E P G C V I P A G G T F E F L L T S A L R Q Q G H K H S S D M D R S Y C F S S I P A G S V L P V G G N F E I L L H F Y L L N YA K I C Q Q S E Q SYYSS S I P A G C V L P V G G N F E I L L H Y Y L L N YAK K C Q Q S E Q G H C F S T I P A G C V L P V G G N F E I L L H Y Y L L H YAK K C Q Q S E Q S Y L SS S M P A G C V L P V G G N F E I L L H Y Y L L N YAK K C H Q P E Q S Y L SS S M P A G C V L P V G G N F E I L L H Y Y L L N YAK K C H Q S E Q S Y L SS S M P A G C V L P V G G N F E I L L H Y Y L L N YAK K C H Q S E Q S Y L SS S M P A G C V L P V G G N F E I L L H Y Y L L N YAK K C HQ S E Q A H C SS A V P A G C V L P V G G H F E I L M H F Y L L N YA K Q C R Q S D Q G Y C S S T V P A G C V L P V G G S F E I L M SY Y L L SYA K Q C R Q S D Q S Y C T S F I Q A G S V L P V G G H F E I L L H Y Y L L D Y S R Q C Q K P E L Q A N S P L E N T G S V L P GG G T F E M L L H Y Y L Q S F A K Q C Q D A E 490 500 510 520 1A6D.A Sec Str description 18 L 68999999999999999998736510467877646899 9 9 L P SV S Q L L A N A L L S V P R Q I Y S H S P K L F L Q A Q T R V P AV S Q L L AK A L L S V P R Q I Y Y H S P R R L L Q T Q T R T SVV S Q L L A D A L L T V P R Q I Y S C S Q R H F L H T Q D K T SEV S D L L A D A L L T V P R H I Y S H Q Q R Q F L Q T Q E R E A T V G V I V A N A L L G I P K I L H KSK K G N Y S F P Q M Y V R Q T M V S R I I A N A I L G V P K I L Y KSK K G N Y S F P Q VY V R E T M V S M I I A N A L L G V P K I L Y KSK K G N C S F P Q I Y V R E T M V S M I I A N A L L G I P K V L Y KS K T G KY S F P H T Y I R E T M V S M I I A N A L L G I P K V L Y KS K T G KY S F P H T Y I R E T M V S M I I A N A L L G I P K V L Y KS K T G KY S F P H T Y I R E T M V S M I I A N A L L G I P R V L Y KS K T G KY S F P H I Y I R D A V I S M L I A N A L L G V P K I L Y K P K K G K D S F P H I Y T R E T V I S M L I A DA L L G I P K I L Y K P K K G K D S F P H I Y M R V S M I S S L I A N A L L S L P K T L Y KA K R G SK S F C H V Y L R L A L V C SV V G N A L L N I P R N I Y KA K NR N V C F P L K H V R 530 540 550 560 1A6D.A Sec Str description M N O 99974377777777776777766641478458862547 8 8 I M S L V Q N P S N P F S L G L K E P M V D L G L ES V T C K Y Q L L L S L T K N L S I A H N K I L T Q S H R E D G L M S G S G L ESV S C K S Q V I L N F I K S H S H P F S L L S M D D L DC C V I E E L G L ESV S C K Y Q L I L N F I K T H S H P F S L VS V G D L G C C F VE E L G L ES V T C K C Q L A L H A L Q S G S R Q T G L ESV V G K Y Q L A L Q A L Q T I S N Q T G L ESV A G K Y Q L T L H A L Q T VS S Q T G L ESV A G K Y Q L A V H A L Q T VS S Q T G L ES V T G K Y Q L A V H A L Q T VS S Q T G L ES V M G K Y Q L A V H A L Q T VS S Q T G L ES V M G K Y Q L A V H A L Q T VS S Q T G L ES V T G K Y Q L A L R A L Q T V S G Q S G F ESV A G K Y Q L S L H A L Q A V S G Q S G F ESV A G K Y Q L T M H A L Q A A G S P A G L ESV A C K Y Q L F I S A L N N E T S Q L G L ES A T C K Y Q L 570 580 590 600 1A6D.A Sec Str description 19 20 P 21 Q C-TERMINAL EQUA T ORIAL DOMAINSupplementary figure S14. Alignment and secondary-structure predictions of BBS10 sequences compared to PDB secondary-structure descripti o n of 1a6d. See Legend for Supplementary figure S10 for symbols and Legend for Figure 2 for species abbreviations.



PAGE 1

9 9 9 8 7 6 4 6 8 9 9 9 9 9 9 9 9 9 9 8 5 3 7 8 8 8 6 1 2 5 8 8 7 3 1 4 7 6 4 0 N K R R H M G L Q Q L S S F A E T G R T F L G P L K S S K F I I D E E C H E S V N K R R H M G L Q Q L S S F A E T G R T F L G P L K S S K F I I D E E C H E S V N K R R H M G L Q Q L S S F A E T G R T F L G P L K S S K F I I D E E C H E S V N K R R H M G L Q Q L S S F A E T G R T F L G P V K S S K F I I D E E C H E S V N R R R H T G L Q Q L S S F A E T G R T F L G P V K S S K F I I D E E C H E S V N R R R H V G L Q Q L S S F A Q T G R S F L G P V K A T K F I T D A E C H E S V N R R R H V G L Q Q L L S F A Q T G R S F L G P V K A T K F I T D A E C H E S V N R R R H N G L Q Q L S S L A A T G R T F L G P V K S S K F I V D E S T L E S V N S R R H I G L Q Q L S S L A W A G R T L L G P M K A C K F V V D E S T D E S M A I R G H K G L Q Q L L S M A T S V N S F L G P M K S Y K F I F D Q I T H E S IK H Q Q Q V G L Q E L C V L A Q V S H S F L G P N K N Y K F I Q D D T T G E S A N H Q Q H V G L Q K L G V L A E V S H S F L G P N K N Y K F I Q D D T T G E S A N H R Q H V G L Q K L S A L A G I T H S S L G P N K M Y K F I R D E T S G E S A K E H Q Y V G L H K L S A L A R T L H S F I G P Q K S Y K F I R D D R S G E S A M K R R H I G L Q Q L Q A I T S T A H A F L G P N K R L K F I Q D E D A G D A V 1 10 20 30 40 3 1 3 0 2 5 5 4 3 3 7 7 8 7 7 5 6 8 9 9 9 9 9 9 9 9 9 9 9 8 8 5 0 1 5 6 0 2 7 9 L I S S T V R L L E S L D L T S A V G Q L L N E A V Q A Q N N T Y R T G I S T L L I S S T V R L L E S L D L T S A V G Q L L N E A V Q A Q N N T Y R T G I S T L L I S S T V R L L E S L D L T S A V G Q L L N E A V Q A Q N N T Y R T G I S T L L I S S T V R L L E S L D L T S A V G Q L L S E A V R A Q N N T Y R T G T S T L L I S S T V R L L E S L D L T S A V G Q L L N E A I Q A Q N S T Y R T G T S T L L V G S T V R L L E G L D L T C A V G H L L N E A V Q A Q N V T Y K T G A S T L L I S S T V R L L E G L D L T C A V G H L L N E A V Q A Q N N T Y K I G T S T L L T C S V V R L L E S L D L T S A V G Q L L N E T V Q A Q N S T Y G T G T S T L L I C S A V R L M E S L D L T S A A G Q L L N E T I Q A Q S K E F K T G M S T L L T S S S F R L L E N L D L T S A I G Q L L N E T I Q A H H K S Y K T G T T T LL V A S C F R I F E N L E L S C A V G Q L V N E T V Q A H H R V C H T G S G C L L V G S C F R I L E N L E L T C A V G Q L V H E T V Q A H Q R V Y H T G S G C L L A C S C F R L L E N L E L T C G V G Q L V Y E T V R A H Q K V Y G A G S G C L L T C S C F C T I Q N L D P T C A L D Q L V R E T L Q E H E K I Y H T G S G C L L I G S C P R L L E H L E L D G S V G Q L L H E T V L A Q K K L F H S G T N T L 50 60 70 80 9 9 9 9 9 9 9 9 9 9 9 9 8 8 4 3 7 8 8 8 7 6 2 7 8 9 9 9 9 9 9 9 9 9 9 8 8 7 5 2 L F L V G A W S S A V E E C L H L G V P I S I I V S V M S E G L N F C S E E V V L F L V G A W S S A V E E C L H L G V P I S I I V S V M S E G L N F C S E E V V L F L V G A W S S A V E E C L H L G V P I S I I V S V M S E G L N F C S E E V V L F L V G A W S S A A E E C L H M G V P V S L I V S V M S E G L N S C I E E V V L F L V G A W S S A A E E C L H L G V P M S L I A S V M S E G L N S C I E E V E L F L V G A W S R A V E D C L H L G V P T T V I V S V M S E G L N S C I E A V V L F L V G A W S R A V E D C L H L G I P T T V I V S V M S E G L N S C I E A V V L F L A G A W S N A A L E C L R Q D I P A P V I V A V M S E G L R S C S E E I R L F L V G A W S N A I L E C L Q Q N V P V S A I V S V M S E G L D S C C E K V Q F F M V G A W S S A V Q E C L H L G I P V S L I V S V M L D G L N S C I G H V HL F L A G T W S R A A L E C L Q R G I S P G S V V S A M S E G M D I C L E V C K L F L S G A W G R V A L E C L Q R G I S V G S T V S A L S E G M D I C L D I C K L F L A G A W S R A A T E C L Q K G V P V A R I I S A M S E G M D V C L D V C R M F L A G A W S N A A L H G L R Q G I S I T Q I I S A M S E G I E I C L D A I R M F L A G A W S R V A L E C L N R G I S V S D I K S A M R G G L Q E C L D A C T 90 100 110 120 1A6D.A Sec Str description A 21A6D.A Sec Str description 3 B C 1A6D.A Sec Str description D E N-TERMINAL EQUATORIAL DOMAIN

PAGE 2

2344201157777777777777777777777777777665 SL H V P VELFK P QTKVEADNNTSRTLKNSLLADTCCRQSIL SL H V P VELFK P QTKVEADNNTSRTLKNSLLADTCCRQSIL SL H V P VELFK P QTKVEADKNTSRTLKNSLLADTCCRQSIL SLEV P VQLFR P QAKVEADTNTSQTLKNSLLADTCCRKSVL SLQV P IQLFRLQAKFEADENISRTLKNNLLADT H CRKSVL SLQV P IEFFK P QAKVETENTSQALKNNLYTDSFCRKSAL SLQV P IKFFK P QNNLETEKNTLQVLKNNLYTDSFCKKSAL SLQVTV H LLLQSGAGDKEVCAESNAGRTLL P DLSRRK P RL CLQVSVASFGSVASLVRSVDDGSTISGSGVSASICKQKKL SLQVSLSNMRTASEF P VAF H KK P LTGTNLYKTQRRRL KSSI P ILMTSSGASVKASRC P RQQ P RG P GL KGL KSSVSIASRSSGETVKASTCTRQQSRRAGL SGL KCSVSTAATSQGTRQAADKTLNAS GRRKI KSAVFFKAFSEASNMSSNLQAT P AKVYQQAQNGSG H RQI QLAVSVEEFKKSDVRSNDQCIDMMVK H SRYINARTKNTQS 130 140 150 160 0178887527774257877664689999874589768999 I H SR H FNNTEGVSK P DGRCNDLVELAVGLS H GDSSMKLV I H SR H FNNTEGVSK P DGRCNDLVELAVGLS H GDSSMKLV I H SR H FNNTEEVSK P DGRCNDLVELAVGLS H GDSSMKLV I H SR H LNNSQQISK P DGRCNDLVELEVGLS H GDSSMKLV T H SR H FSDR H WISK P RGSCSNLAELEVGLS H GDSSMRLV T H SR H FNNS H WISR P DGRCNDLGELAVGLS H GDSSMTLA A H SR H FNNS H WISR H DGRCNDFGELAVGLS H GDSSMALA T H SR H FSRR H P TYQ P DGDCDDLGQLAAALS H GDGSMKLA T H SRYFSKS H VSS H LGDECYGLG H LAMALS H GS P SMKLL F H SR H LALSFFQDV P ERNNETLDGLAKGLA H GY P VMNLV KLSR H FCLSETVSALE H SDIL H MAEGLS H GCDAMNLV KLSR H FCKSERVSAVG PP DVK H VAEGLS H GCDAMNLV KLSR H FYEAEDASTATQKA P DVALVAAGLS H GYGAMNLV KLSRYFCKSEMVSQFETYSVA H IAEALS H RCNVMELA DDNVQDFRNR H FNAQTE H T HH DVTVIAQAVS H GCSSM H LL 170 180 190 200 9999998853787223667742677877625653735884 EEAVQLQYQNMFDISRIFTCCL P GL P ETSSCVC P GYITVV EEAVQLQYQNMFDISRIFTCCL P GL P ETSSCVC P GYITVV EEAVQLQYQNMFDISRIFTCCL P GL P ETSSCVC P GYITVV EAAAQLQYQNMFDISRIFTCCL P GL P ESFSCVC P GYITVV DEAVRLQ H QN P FDISRIFTCCL P GL P DTLSCVC P GYITVV KAAVRLQWQAMFDISRLLTCCL P GL P ETFSCVCLGYVTSV KAAVRLQWQSMFDISRLLTCI P GL P ETFSRVGLGYVTFV EAAVRCQRGASFDISRIVTCCL P GA P ESRSCVRSGFGTLV QSIVAYQ H ERQFSIAEIATCCL P GL P ESYSCVC P GFVTLV KNAVCL H CAESFNISRLETCSL P GLSEE H TTVSFGYTTLV IRAYQTQSEKAFDVSKVVTCVL P GL P ED H ACVLQGCVLLL IKASQIQSENTFDVSKVVTCLL P GL P ED H ACVLQGCIQLL IEASRLQSKNTFDVSKVATCVL P GL P EENSCVLQGCVV H V VEAMQIQSRN P FDITKVMTCVLFGL P EDCSCVVQGCIVLL LKAYELQNRA H LDI H NLATCCI P GVSED H ACVLRGYVTLL 210 220 230 240 1A6D.A Sec Str description 4F 1A6D.A Sec Str descriptionG 1A6D.A Sec Str description 5 6 7 8 N-TERMINAL INTERMEDIATE DOMAIN 8 9

PAGE 3

4 7 6 4 6 7 7 7 7 7 3 5 9 7 3 8 9 9 8 7 4 3 7 7 7 7 7 6 6 4 2 3 5 6 7 8 8 7 4 4 S V S N N P V I K E L Q N Q P V R I V L I E G D L T E N Y R H L G F N K S A N I S V S N N P V I K E L Q N Q P V R I V L I E G D L T E N Y R H L G F N K S A N I S V S N N P V I K E L Q N Q P V R I V L I E G D L T E N Y R H L G F N K S A N I S T S S T T L I K E L Q N Q P I R V V L I E G D L T E N Y R H L G F N K S A N I S M S T A T L V K E L Q N Q P V R V V L V E G D L T E N Y R H L G F N K P A N I T M P S I T L I K E L Q D Q P F R V I L I E G D L T E S Y R H L G F N K S V N I T M S S I T L I K E L Q D Q P F R V I L I E G D L T E S Y R H L G F N K S V N I S T A V A A A I M R L Q G R L L R V V L L D G D L T E S Y R H L G F N R S G N V S P E Q A T V I K H F Q D K S L W I V L M D G D L T E Q Y R H L G F T K P R N V P T E S A A V I T H L N G K P L R I L L V D G E L T E S H R H L G F D N P D N VS A E Q S S V A R R L Q E Q R L K V A L I T G D L A D T Y R H L G F N K H V G L S A E Q S S V A R H L Q E Q R V K L A L I N G D L A D T Y R H L G F N R H A G M S D E Q A S V A H H L K E Q H L K V A L I N G D L S D T Y R H L G F K R L Q A V A D E K V S V F H H L K T K H L K V A L V N G D L S H N Y R H L G F K R P A G M S V Q Q T T Q V Q R L Q G R P L N I A L I N G D L S E K Y R H V G F N R P G N I 250 260 270 280 7 8 7 6 1 5 7 7 6 7 8 9 9 9 9 9 9 9 9 9 9 8 4 5 8 3 2 5 8 8 9 8 8 6 2 5 1 4 4 4 K T V L D S M S S E E L W A N H V L Q V L I Q F K V N L V L V Q G N V S E R L I K T V L D S M S S E E L W A N H V L Q V L I Q F K V N L V L V Q G N V S E R L I K T V L D S M S S E E L W A N H V L Q V L I Q F K V N L V L V Q G N V S E R L I K T V L E S M S S E E L W I N R V L E I L I K F N V N L I L A Q G N V S E H L I K T V S E S V S S E E L W T D H V L Q V L I K F N V N L V L A R G N V S E R L A R T K S D S G S T E E L W T N R V L E V L I Q F N V N L I L V Q G S V S E H L T K T K L D S G S A E E L W T N H V L Q V L I Q F N V T L I L V Q G S V S E H L T K T V W E T G S A E P W T N R A L E V L L H Y S V N L V L V R G G I C E T L M K T I L E Y P G S R T S W L S N M L D I L I S L E V N L V L V K G N V C E N L M K M V F E H A N L E D S W I S R A Y E K I I Q A N I N L I L V R G D V C P F L L RC V S D H L N K E E E W L E K V E T L L S K L E V N L I L V S G R V S E N V L R R V G D R S S K E E E W L D K V E A L L S R L E V N L I L V S G R V S E N V L Q C V S D R V S K A E E W T E K V L T L L L N L E V N L V V V S G I A S E K V I Q H V R D Q P S R E E E W V K K V V K L L L D L G V N L I L I T G V A C K V M V K H I T D L F I T E E S W I E N A Y K L L H D S S I D V V F V S G V V D V D L K 290 300 310 320 3 3 1 7 7 0 7 8 8 7 3 7 7 5 7 8 8 8 8 8 8 5 2 7 8 8 5 4 7 7 7 8 7 4 2 7 7 7 8 8 E K C S K R L V I G S V N G S V M Q A F A E A A G A V Q V A Y I T Q V N E D C V E K C S K R L V I G S V N G S V M Q A F A E A A G A V Q V A Y I T Q V N E D C V E K C S K R L V I G S V N G S V M Q A F A E A A G A V Q V A Y I T Q V N E D C V E K C S K R L V I G S V N G N V M Q A F A E A S G A V Q L A Y I T Q V D E N C V E K C S K Q L V I G S V Q D S V L Q A F A E A S G A V Q V A Y L T H M N E N C V E K C S K R L V I G S V N G R V L Q A F A E A T R A V P V A Y V T Q V N E D C V E K C S K R L V I G A V N G S V L Q A F A E A T R A V P V A Y V T Q V N E D C V E R C S D Q V I I G S V A Q N V L Q A F A E V T G G L V V S Y I T Q V G P E C V E R C N N V L V I S S V T Q N V L S A F G E V T G A Q P V T Y L T Q L N A S C L K Q C R N I L I V T Q V K Q N I L Q A F S E C T G A E P V T Y L T Q I N C C S VL R F C R I L V V E K V K V S I L K A F A D A T G A V P V T Y A T Q L S Q L C V L R C R R I L A V E K V E V S I L K G F A D A T G A V P V T Y A T Q L S R L C V Q R C L H I L V V E K A N V S V L K A L A D A T G A V P V T Y A T Q L S N R C V Q H C H Q I L V V E K L N F S V L R N F A Q S T G S V P V T Y A S Q L G H H C V Q R C K D V L I L E G V K S S V L K H F S T C T G A V P V S Y I C Q L D E R R V 330 340 350 360 1A6D.A Sec Str description 10 10 111A6D.A Sec Str description H H 12 1A6D.A Sec Str descriptionI 13 J 14 APICAL DOMAIN

PAGE 4

7 5 5 8 8 9 9 9 9 8 6 5 8 8 8 4 6 8 8 8 6 3 8 8 8 4 6 8 8 8 8 5 3 7 8 8 7 4 1 1 G D G V C V T L D V V D R N S R I I L L K T E G I N L V T A V L T N P V T A Q M G D G V C V T L D V V D R N N R I I L L K T E G I N L V T A V L T N P V T A Q M G N G V C V T L D V V D R N N R I I L L K T E G I N L V T A V L T N P V T A Q M G N G V C V T L D V V D T V D G M I M L K T E G I N L V T V V L T S P V T A H M G S G V S V T S D T I D G I S R M V V L K T E G I N L V T V V L T S P V I A Q M G N G V S V T H D I N R S N R I L L T A E G I N L I T A V L T S P A S A Q M G S G V S V T H D I N R S N R I I L L T A E G I N L I T A V L T S P A S A Q M G S G V C V H V A V E E W G G R V A L F T A E G I S L V T V V L G G P V T A Q M G S G V R V E E R A M E L G Q L V V R I K V Q G I P L L T A V L T T A V A S K M G N E A F V T N S I I E V S Q K I I S I T A K K L N L I T A T L S S R I P S T M G I GA S V A V V R P D G N P W S V N I S A D G A G L V T A V L T G C V C A K L G T G V N I T V V R R D G N P S S V N I S A D V A G L V T V V L T G C V R P S C G A G V K V A L A G R Q T K P S T V S V S T G E D G L V T A V L A S C V P S K L G T G V Q L A L S C H D R K L L L V S I S T G G T E L V T A I L T S S V Q C K L G R G L R L R K E S I S G R S D I V S I T T D C S V L V T A V I C S S V S A K L 370 380 390 400 1 1 3 5 4 1 5 6 8 9 9 9 9 9 9 9 8 5 3 7 8 6 2 5 7 6 5 4 5 7 8 8 9 9 9 9 9 9 9 9 Q I K E D R F W T C A Y R L Y Y A L K E E K V F L G G G A V E F L C L S F L Q I K E D R F W T C A Y R L Y Y A L K E E K V F L G G G A V E F L C L S C L H I Q T K E D R F W T C A Y R L Y Y A L K E E K V F L G G G A V E F L C L S C L Q I Q S K E D R F W T C A S R L C Y A L K E Q K V F L G G G A V E F L C L S H L Q I Q T K E D R F W T C A S R L Y Y A L K E Q K V F L G G G A V E I L C L T H L Q S E M K E D R F W S C V N R L C H A L K E E K V F L G G G A V E F L C L S H L Q I E T K E D R F W S C V Y R L Y H A L K E E K V F L G G G A V E F L C L S H L Q I Q T K E D R F W T C T H R L H W A L V D Q K V F L G G G A V E F L C L S H L Q E C L I E D Q F W T F V Y R L H H A L R D G K V F L G G G A V E F L C L S H S Q M Q S I E D Q F L T C A Y R L H H A L Q E G N V F Y G G G A I E L L C I H H L Q K Q AL E D Q F W A C A Y R L H H A L R D R V L L L G A G A T E M L C A Q R L R K R P W R T G S G A C A Y R L H H V L K D G V L L P G A G V T E M L C V R H L L K Q A L E D Q F W A C A Y R L H H A L T D A V L L P G A G V T E M L C V H H L R E E S L E D R F W A C A Y R L H H A L K D R A L L P G A G R T E M F C I C H L Q K Q M L E D E F W S C A H R L Q H A L T D G K L L H G A G V T E L I C I R R L R Q 410 420 430 440 9 8 7 4 4 7 8 8 7 7 5 6 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 8 5 3 7 8 7 5 2 A E L A S S L A I Y R P T V L K F L A N G W Q K Y L S T L L Y N T A N Y S S E L A E L A S S L A I Y R P T V L K F L A N G W Q K Y L S T L L Y N T A N Y S S E L A E L A S S L A I Y R P T V L K F L A N G W Q K Y L S T L L Y N T A N Y S S E L A E L A S S L A L Y R P T V L Q C L A N G W H K Y L S T L I Y N T A N Y S S E L A G L A S S A A L Y R P T V L K C L A D G W H R Y L S T L L R N T A V Y S S D L A E M A S S L S V Y R P T V L K C L A G G W H E F L S A I M C N T A T Y P S A L A E M A S S L S V Y R P T V L K S L A G G W H E F L S A I M C N T A T H P S A L E E Q A S S I A L Y R P V V L R S L A N G W H D Y L T A I M C N A A S F P S E L A E L A A S S S E Y K A V V L Q A L A T G W S Q Y L S V V M C N A A K T A S E L V Q M T E S A T F Y R A A I I G C L A K G W Y K Y I S V L L C N M G G F L S E HA E A A G A A N P Y R G V V L H L M A E G F M D Y V S A L M V N A G R F S K N A E A A A A A D P Y R G V V L R L M A E G L T D Y V S T L M V R T G R F S K R A E A G R A A N P Y R G L V M N L M A E G L V D Y I S T V M V N A G G V S K Q A K A G T A T A L H T S V M L E L M A E G L M D Y I T T V M I N S G K L S K Y T Q L H P D R S P H D S A V L E L M S E A W M D F I S T V M L N S G S V A N R 450 460 470 480 1A6D.A Sec Str description 15 16 17 1A6D.A Sec Str descriptionK INTERMEDIATE DOMAIN C-TERMINAL 18 L 1A6D.A Sec Str description M N

PAGE 5

344457898888633778777743534665689999987 4 F EAS T Y I Q H H L Q N A T D S G S P SSY I VY D VV T P K I EA W RR A L F EAS T Y I Q H H L Q N A T D S G S P SSY I VY D VV T P K I EA W RR A L F EAS T Y I Q H H L Q N A T D S G F P SSY I VY D VV T P K I EA W RR A L F EAS TF I Q H H L Q N A I D S G S P SSY I VY D I V T P K I EA W RR A L SEAA T S I Q C H L Q N AA D S G S P SSY I VY D VV T P K I EA W RR A L VEAS TF I Q H H V Q SAA D S G S P SSY I VY D T V T P K I EA W RR A L VEA R TF I Q Q H V Q N A I D S G S P SSY I VY D T V T P K I EA W RR A L F EA R TF I D H H L Q KA T G H R S P L SYVVY DN V T P K L EA W RR A L F EA R T L I D H H L Q KAA DC G S P SAYVVY DN V T AK M EA W RR A L L D AV TF I E N E L Q N I S H H S S P I D YVVY DN V T P K L EA W RR A L V G A R T E L NR LL K DC SE L D M SAA M S I Y D S L SVK Q EA W RR A L V G A R T E L S R LL RD YS G S D M SAA M S I Y D S L SVK Q EA W RR A L V R A R T AVS Q R L K N H A E N Q AK F SVY D G M SVK Q EA W RR A L VKA R T I VS GQ L Q TF D VK G N AAK F P VY DN L C VK Q EA W R KA L T EA W T S I T H Q M R R H T D I G V T D G L G VY DN V T VK C EA W RR A L 490 500 510 520 303688548725747 9 D L V LL A L Q T D SE II T G D L V LL V L Q T D SE II T G D L V LL V L Q T D SE II T G D L V LL V L Q T D SE II T G D L V LL V L Q T DN E II T G D L V LL V L Q T D SE II T G D VV LL V L Q T D SE II T G D L V LL V L Q T D T E II S G D L V LL V L Q T D AE III G H L V L T V L Q T D AEV I T N D VV LL V L Q A D AEV I T G D L V LL V L Q A D AEV I T G D L V LL V L Q T D AE II T G D L V F L V L Q T D G EVVS G D L V F L V L Q S D T E II T G 530 O 19 21 Q 22 1A6D .A Sec Str description 1A6D.A Sec Str description C-TERMINA L EQU A T ORIA L DOMAIN Supplementary figure S15. Alignment and secon dary-structure predictions of BBS12 sequences compared to PDB secondary-structure descripti o n of 1a6d. See Legend for Supplementary figure S10 for symbols and Legend for Figure 2 for species abbreviations. A gapped alignment corresponding to helix F of 1A6D.A was automa tically excluded by the prediction tool.



PAGE 1

thsa Hs_CCT5 Hs_CCT4 Hs_CCT3 Hs_CCT6B Hs_CCT6A 100 Hs_CCT7 Hs_CCT2 Hs_CCT1 Hs_CCT8 Hs_BBS12 93 31 24 4 17 71 18 0.2Figure S4. ML tree of CCT protein s with BBS12, excluding other BBS proteins and CCT8L proteins. "thsa" is the Thermoplasma acidophilum alpha subunit of the thermosome. The scale bar represents the indicated number of substitutions per position for a unit branch length.



PAGE 1

!"#$"% !"#$"& '"('!& '"()!& &** ++ ",-./*+&% 01-.*23/4 #5-0,67 '8-0,67 9:-.3+232 ;<-00(7 =>-,,67 ?@-,,67 'A-.&47&% 4& ++ 43 // BC-00(7 =>-'3%++D ;<-00(D ",-.2D2/D BC-00(D '8E-.90&+2 'A-.4&+&* #5-0,6D '8-0,6D 01-.3%%/& F6-00(D 3& 27 ?@-,,6D 24 &** &** #5-0,62: '8-0,62: ",-.2*2+% $6@-00(2! BC-00(2! '8E-.*44%2 +3 F6-00(2! 2& 'A-.&*%/* ++ 9:-.33%3& 4% =>-'3&+D+ ?@-,,62: ;<-00(2! 42 'A-.&3&34 7& 01-.+& &D2 F6-'%/+** $6@-"*D++4 BC-00(2F ",-.73%%% #5-.24273 '8-0,62G &** 2% 2D 9:-,H5%27& &** ?@-,,64 =>-,,64 '8E-.&34/D BC-00(4 F6-00(4 ",-.7/34+ 01-.3*&%4 #5-0,64 '8-0,64 'A-.&3*/4 9:-.//&37 ++ D& &** &** '8E-00(/ BC-00(/ F6-'44/34 ",-.7**&D 01-.&*%%2 3+ #5-0,6/ '8-0,6/ +% 'A-.44*2% /& ;<-00(/ +7 =>-,,6/ 74 ?@-,,6/ 24 =>-,,6&I: 'A-00(& 9:-./3+&2 '8E-00(& BC-00(& $6@-00(& 01-.3D*2D ",-.7*23+ F6-'44/D2 #5-0,6& '8-0,6& ;<-00(& 47 2* ?@-,,6& 7* #5-0,6% '8-0,6% 01-./DDD7 ",-.7++&D F6-00(% $6@-00(% BC-00(% '8E-.&/&3% +% 'A-.&74+4 +3 9:-.3&*D7 2& ;<-00(% ++ =>-,,6% // ?@-,,6% /7 &** &** &** D/ F6-00(3 $6@-/4+DD '8E-"*D%3* ?@-,,63 BC-00(3 22 2/ ",-.74/7* +% #5-0,63 '8-0,63 01-./344+ 01-./34++ 'A-.&%*2/ D/ ;<-00(3 +4 =>-,,63 =>-& &D//2 /4 &** F6-2&37/7 01-D3%3 & & #5-&%7%44 '8-;8DD4 $6@-D/*&%+ BC-00(3.% BC-00(3.& '8E@-00(3. &** &** +D +* 2& 'A-*%D3&% &** 9:-3%D&D ++ &** &** #5-'JJC '8-'JJC $6@-'KKL BC-'KKL $M-.*344 '8E-'KKL &** 01-D37//& ",-2DD*+ 43 F6-2&D%33 D& 'A-4&24% &** 9:-**/37/& /+ ;<-'KKL +3 =6-C,:-3 &** ?@-8JJC &** ;<-&/327 ?@-GGC&* =>-3D+D7 'A-FFL&* #5-2*/D3 '8-FGC&* ",-73+/* F6-743243 01-FFL&* /% '8E-+4%/* $M-FFL&* $6@-FFL&* BC-FFL&* &** &** 2& &** &** 32 &** &** +7 &** &** ++ ?@-N*30&/ =>-&*&%33 F6-&24/2 ",-/%&4& $6@-/&%+2 BC-FFL&% $M-O&3&2 '8E-&*743 34 ++ &** #5-2&2*3 '8-FGC&% ++ &** 'A-FFL&% &** 9:-37/D7 3% ;<-%%22/ /* +& +4 4D &** +4 +/ 2D ++ %& D& 32 77 /+ *P7 !!"#$ !!"#% &''" (()*+ (()* (()$ (()# ((), (()(().! (()./ (()0 (()1 23456237 8)9:6;2<2;:= Figure S5. ML tree of the CCT proteins from vertebrate lineages (mammals, birds/reptiles, amphibians, fish) showing the relations of the BBS and CCT8L Classes with CCT8. Species abbreviations: Bt, Bos Taurus ; Cf, Canis familiaris ; Dr, Danio rerio ; Ec, Equus caballus ; Gg, Gallus gallus ; Hs, Homo sapiens ; Md, Monodelphis domestica ; Mm, Mus musculus ; Mmu, Macaca mulatta ; Mmur, Microcebus murinus ; Oa, Ornithorhynchus anatinus ; Ptr, Pan troglodytes ; Rn, Rattus norvegicus ; Xl, Xenopus laevis ; Xt, Xenous tropicalis ; METJA, Methanocaldococcus jannaschii (Euryarchaeota); METMA, Methanosarcina mazei (Euryarchaeota); AERPE, Aeropyrum pernix (Crenarchaeota). The scale bar represents the indicated number of substitutions per position for a unit branch length.



PAGE 1

Mmur_CCT8L GAGLPRPQLREAYAAAMAEVLST-LPTLAIRS-------LG-PL-ED-PSWALYSVMNTHTLSHTHYLTKLVAHACWAAREL Mmu_E7597 KAGLPR-RTREA THS-TAEYCHTALPGHPI-SGAFGRSVLGSPFCDEYPHPVLHG-------PP----DQAGGPC CWAIKEL Hs_CCT8L2 KAGLPRPQLREAYATATAEVLAT-LPSLAIQS-------LG-PL-ED-PSWALHSVMNTHTLSPMDHLTKLVAHACWAIKEL Ptr_470129 KAGLPRPQLLEAYATATAEVLAT-LPSLAIQS-------LG-PL-ED-PSWALHSVMNTHTLSPMDHLTKLVAHACWAIKEL Hs_CCT8L1 KFGLPRPQLREAYATATAEVLAT-LPSLAIQS-------LG-PL-ED-PSWALHSVMNTHTLPPMNHLTKLVAHACWAIKEL Ptr_Chr7P KAGLPRPQLREAYSTATAEVLAT-LPSLAIQS-------LG-PL-ED-PSWALHSVMNTYTLPPMDHLTKLVAHAC-AIKEL a b c Figure S6. Evolutionary trees of primate CCT8 L sequences obtained: ( a) from the alignment of the complete sequence; ( b) removing a 50 amino acid region highly dive r ged in the sequence from rhesus monkey (Mmu) ( c) Alignment of CCT8 L sequence segments from di f ferent primate species showing in yellow diverged positions 132-181 from the rhesus monkey sequence. See legend to Figure 2 for species nam e abbreviations. Ptr_Chr7P Hs_CCT8L1 Ptr_470129 Hs_CCT8L2 80 93 Mmu_E7597 76 Mmur_CCT8L 0.02Ptr_Chr7P Hs_CCT8L1 Mmu_E7597 65 Ptr_470129 Hs_CCT8L2 20 86 Mmur_CCT8L 0.02



PAGE 1

Rn_Cct6a Mm_Cct6a Ec_L60692 Mmu_L03326 Hs_CCT6-3P Hs_CCT6-5P 98 Ptr_CCT6A Hs_CCT6A 37 87 Hs_CCT6-1P Hs_CCT6-4P 40 42 98 Bt_CCT6A 31 Md_L10270 44 Oa_L88281 14 Gg_CCT6A Xl_M81949 24 83 Md_L18183 38 Cf_L91 146 Bt_M27900 Ptr_E04993 Hs_CCT6B Mmu_CCT6B 61 98 Ec_L58222 55 Rn_L63658 Mm_Cct6b 83 100 25 87 Hs_CCT6-2P Oa_con2651 72 80 Rn_Cct7 Mm_Cct7 Ec_L50014 Cf_L10226 Bt_M33783 92 Hs_CCT7 Ptr_CCT7 Mmu_CCT7 63 Gg_CCT7 Xl_cct7 Md_L33062 90 80 62 82 30 Hs_CCT7-2P 56 Hs_CCT7-1 P 37 Hs_CCT2 Mmu_L17182 Ptr_CCT2 62 Rn_Cct2 Mm_Cct2 98 Bt_CCT2 37 Cf_L74445 15 Ec_L59914 40 Md_L15393 100 Oa_L81045 73 Gg_CCT2 100 Xl_cct2 66 97 100 Oa_L78916 Md_CCT1 Rn_T Cct1 Mm_Cct1 Cf_L84064 Ec_L50689 Bt_M33746 87 Hs_CCT1 Ptr_CCT1 Mmu_CCT1 Hs_CCT1-3P Hs_CCT1-2P Hs_CCT1-1P 40 14 6 Gg_CCT1 Xl_cct1-a 32 89 89 20 35 97 83 56 Hs_CCT4-2P Hs_CCT3-1P Rn_Cct4 Mm_Cct4 Md_L31910 95 Ec_L64674 Cf_L82271 6 Ptr_CCT4 Hs_CCT4 Mmu_LOC196 Hs_CCT4-1P 8 Bt_CCT4 21 Gg_CCT4 Xl_M82994 2 8 58 42 46 Cf_L06873 Ec_L70912 Md_L13512 Oa_L89686 Gg_CCT5 Xl_cct5 100 45 Rn_Cct5 Mm_Cct5 48 96 68 26 Hs_CCT5-3P Hs_CCT5-2P Hs_CCT5-1P Hs_CCT5 Ptr_CCT5 Mmu_CCT5 24 12 23 97 Cf_L80123 Mmu_L18374 Hs_CCT3 Ptr_CCT3 Bt_CCT3 Rn_Cct3 Mm_Cct3 31 Ec_L57839 14 Md_L18073 83 Oa_L77185 74 Xl_cct3 78 15 64 40 29 Ec_L53750 Hs_CCT8 Ptr_73944 Mmu_E04280 Hs_CCT8-1P Rn_Cct8 Mm_Cct8 16 Bt_CCT8 23 Cf_L78339 Cf_L78399 16 Md_L12067 39 Gg_CCT8 93 Xl_cct8 Xl_1 14776 76 100 Oa_82414 Md_024812 Cf_48281 1 Bt_618575 Rn_125233 Mm_Gm443 Mmur_CCT8L Mmu_E7597 Hs_CCT8L1 Ptr_Chr7P Ptr_470129 Hs_CCT8L2 41 73 67 100 100 57 80 37 98 87 100 98 98 0.2 CCT8L CCT8 CCT3 CCT5 CCT4 CCT1 CCT2 CCT7 CCT6Supplementary figure S7. Summary ML tree of all human pseudogenes (in red font) with CC T and CCT8L from vertebrate species. Pseudogene sequences of CCT8 L from chimp and rhesus monkey are also shown in red font. See Legends for Figure S5 and for Figure 2 for species. abbreviations. The scale bar represents the indicated number of substitutions per position for a unit branch length.



PAGE 1

Xl_M82994 Gg_CCT4 Md_L31910 Rn_Cct4 Mm_Cct4 Cf_L82271 Ec_L64674 Bt_CCT4 46 Ptr_CCT4 Hs_CCT4 Mmu_LOC196 Hs_CCT4-1P Hs_CCT4-2P 73 100 38 100 44 100 28 98 95 0.1Human CCT4 pseudogenesXl_cct1-a Gg_CCT1 Oa_L78916 Md_CCT1 Bt_M33746 Ec_L50689 Mm_Cct1 Rn_T Cct1 Cf_L84064 Hs_CCT1-3P Mmu_CCT1 Hs_CCT1-1P Hs_CCT1 Ptr_CCT1 9 78 39 Hs_CCT1-2P 8 18 82 20 79 26 98 75 97 0.02Human CCT1 pseudogenesSupplementary figure S8. M L trees of individual CC T monomer families including human pseudogenes (in red font). See Legends for Figure S5 and for Figure 2 for species abbreviations. The scale bar represents the indicated number of substitutions per position for a unit branch length. Human CCT3 pseudogenesXl_cct3 Md_L18073 Ec_L57839 Rn_Cct3 Mm_Cct3 Mmu_L18374 Hs_CCT3 Bt_CCT3 62 98 40 Cf_L80123 Hs_CCT3-1P 26 35 26 54 0.1



PAGE 1

9 9 8 5 1 1 1 3 5 1 6 7 7 6 5 2 6 7 7 6 3 4 3 7 9 9 9 9 9 8 7 0 7 9 9 9 9 9 9 9 V A P R S G A T A G A A G G R G K G A D R D K P A Q I R F S N I S A A K A V A G T L A F D E Y G R P F L I I K D Q D S R L M G L E A L K S H I M A A K A V A M M G H R P V L V L S Q N T K R E S G R K V Q S G N I N A A K T I A M E G P L S V F G D R S T G E T I R S Q N V M A A A S I A M A S L S L A P V N I F K A G A D E E R A E T A R L T S F I G A I A I G M P T P V I L L K E G T D S S Q G I P Q L V S N I S A C Q V I A M A A V K T L N P K A E V A R A Q A A L A V N I S A A R G L Q M A L H V P K A P G F A Q M L K E G A K H F S G L E E A V Y R N I Q A C K E L A 1 10 20 30 40 9 9 8 7 5 3 7 8 8 8 8 7 3 4 7 7 6 2 7 8 8 4 6 8 7 7 4 4 3 5 4 3 4 2 4 7 8 8 7 5 D A I R T S L G P K G M D K M I Q D G K G D V T I T N D G A T I L K Q M Q V L H N T M R T S L G P N G L D K M M V D K D G D V T V T N D G A T I L S M M D V D H D I I R T C L G P K S M M K M L L D P M G G I V M T N D G N A I L R E I Q V Q H N I V K S S L G P V G L D K M L V D D I G D V T I T N D G A T I L K L L E V E H D L V K S T L G P K G M D K I L L S S G R D A S V T N D G A T I L K N I G V D N E A V R T T L G P R G M D K L I V D G R G K A T I S N D G A T I L K L L D V V H D V L R T N L G P K G T M K M L V S G A G D I K L T K D G N V L L H E M Q I Q H Q T T R T A Y G P N G M N K M V I N H L E K L F V T N D A A T I L R E L E V Q H 50 60 70 80 6 8 9 9 9 9 9 9 9 9 8 5 2 7 6 1 0 4 6 7 5 4 2 4 4 5 7 8 9 9 9 9 9 9 9 9 9 9 8 5 P A A R M L V E L S K A Q D I E A G D G T T S V V I I A G S L L D S C T K L L Q Q I A K L M V E L S K S Q D D E I G D G T T G V V V L A G A L L E E A E Q L L D P A A K S M I E I S R T Q D E E V G D G T T S V I I L A G E M L S V A E H F L E P A A K V L C E L A D L Q D K E V G D G T T S V V I I A A E L L K N A D E L V K P A A K V L V D M S R V Q D D E V G D G T T S V T V L A A E L L R E A E S L I A P A A K T L V D I A K S Q D A E V G D G T T S V T L L A A E F L K Q V K P Y V E P T A S L I A K V A T A Q D D I T G D G T T S N V L I I G E L L K Q A D L Y I S P A A K M I V M A S H M Q E Q E V G D G T N F V L V F A G A L L E L A E E L L R 90 100 110 120 4 7 8 8 6 3 6 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 8 6 0 2 0 1 5 8 8 8 7 6 5 6 8 9 9 K G I H P T I I S E S F Q K A L E K G I E I L T D M S R P V E L S D R E T L R G I H P I R I A D G Y E Q A A R V A I E H L D K I S D S V L V D I K D T E P L Q Q M H P T V V I S A Y R K A L D D M I S T L K K I S I P V D I S D S D M M Q K I H P T S V I S G Y R L A C K E A V R Y I N E N L I V N T D E L G R D C L K K I H P Q T I I A G W R E A T K A A R E A L L S S A V D H S D E V K F R Q D L E G L H P Q I I I R A F R T A T Q L A V N K I K E I A V T V A D K V E Q R K L L E G L H P R I I T E G F E A A K E K A L Q F L E E V K V S R E M D R E T L I G L S V S E V I E G Y E I A C R K A H E I L P N L V C C S A K N L R D I D E V 130 140 150 160 1A6D .A Sec Str description 1 A 1A6D .A Sec Str description 2 3 B 1A6D .A Sec Str description C D 1A6D .A Sec Str description E 4 F N-TERMINAL EQUA T ORIAL DOMAIN CCT1 CCT1 CCT1 CCT1

PAGE 2

9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 8 8 6 2 5 7 7 7 6 5 1 4 7 5 L N S A T T S L N S K V V S Y S S L L S P M S V N A V M K V I D P A T S V D L R I Q T A K T T L G S K V V N C H R Q M A E I A V N A V L T V A D M E R D V D F E L N I I N S S I T T K A I S W S S L A C N I A L D A V K M V Q F E E K E I D I K I N A A K T S M S S K I I G N G D F F A N M V V D A V L A I K Y T D P R Y P V N M N I A G T T L S S K L L T H K D H F T K L A V E A V L R L K G S G N L E E K C A M T A L S S K L I S Q K A F F A K M V V D A V M M L D D L L Q L K I D V A R T S L R T K V H A L A D V L T E A V V D S I L A I K K Q D E P I D L F S S L L R T S I M S K Q Y G N E V F L A K L I A Q A C V S I F P D S G H F N V D 170 180 190 200 5 8 8 9 9 8 8 5 4 7 8 8 8 7 5 5 8 9 9 9 9 8 7 1 2 0 0 4 7 6 3 2 1 2 3 6 6 2 6 8 D I K I V K K L G G T I D D C E L V E G L V L T Q K V S G I T R V E K A K I G L I K V E G K V G G R L E D T K L I K G V I V D K D F P Q M P K K V E D A K I A K A R V E K I P G G I I E D S C V L R G V M I N K D V P R M R R Y I K N P R I V S V N I L K A H G R S Q M E S M L I S G Y A L N C V V Q G M P K R I V N A K I A A I H I I K K L G G S L A D S Y L D E G F L L D K K I N Q P K R I E N A K I L M I G I K K V Q G G A L E D S Q L V A G V A F K K T F A G Q P K K Y H N P K I A M I E I M E M K H K S E T D T S L I R G L V L D H G A P D M K K R V E D A Y I L N I R V C K I L G S G I S S S S V L H G M V F K K E T E G D V T S V K D A K I A 210 220 230 240 8 7 5 3 6 7 7 7 7 7 7 7 8 7 4 5 7 7 6 2 5 4 7 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 L I Q F C L S A P K T D M D N Q I V V S D Y A Q M D R V L R E E R A Y I L N L V I L T C P F E P P K P K T K H K L D V T S V E D Y K A L Q K Y E K E K F E E M I L L D S S L E Y K K G E S Q T D I E I T R E E D F T R I L Q M E E E Y I Q Q L C C L D F S L Q K T K M K L G V Q V V I T D P E K L D Q I R Q R E S D I T K E R I I A N T G M D T D K I K I G S R V R V D S T A K V A E I E H A E K E K M K E K V L L N V E L E L K A E K D N A E I R V H T V E D Y Q A I V D A E W N I L Y D K L T C N V S L E Y E K T E V N S G F F Y K S A E E R E K L V K A E R K F I E D R V V Y S C P F D G M I T E T K G T V L I K T A E E L M N F S K G E E N L M D A Q V 250 260 270 280 9 9 9 9 8 4 5 8 6 4 7 7 4 4 7 8 7 7 4 6 8 8 9 9 8 7 3 5 8 7 4 8 8 8 5 3 5 2 4 6 K Q I K K T G C N V L L I Q K S I S D L A L H F L N K M K I M V I K D I E R E D Q Q I K E T G A N L A I C Q W G F D D E A N H L L L Q N N L P A V R W V G G P E E D I I Q L K P D V V I T E K G I S D L A Q H Y L M R A N I T A I R R V R K T D Q K I L A T G A N V I L T T G G I D D M C L K Y F V E A G A M A V R R V L K R D E R I L K H G I N C F I N R Q L I Y N Y P E Q L F G A A G V M A I E H A D F A G E K I H H S G A K V V L S K L P I G D V A T Q Y F A D R D M F C A G R V P E E D K K I I E L K R K V V I N Q K G I D P F S L D A L S K E G I V A L R R A K R R N K A I A D T G A N V V V T G G K V A D M A L H Y A N K Y N I M L V R L N S K W D 290 300 310 320 1A6D .A Sec Str description G 1A6D .A Sec Str description 5 6 7 8 9 10 1A6D .A Sec Str description 11 1A6D .A Sec Str description H 12 I 13 N-TERMINAL INTERMEDIA TE DOMAIN APICAL DOMAIN CCT1 CCT1 CCT1 CCT1

PAGE 3

5 4 5 7 7 7 7 4 1 4 1 1 4 6 7 6 7 7 7 7 6 6 7 7 7 8 7 4 4 8 8 9 8 6 5 8 8 4 6 8 I E F I C K T I G T K P V A H I D Q F T A D M L G S A E L A E E V N L N G G K L I E L I A I A T G G R I V P R F S E L T A E K L G F A G L V Q E I S F G T D K M N N R I A R A C G A R I V S R P E E L R E D D V G T A G L L E I K K I G D E Y F L K R I A K A S G A T I L S T L A N L E G E E L G Q A E E V V Q E R I C D D E L V E R L A L V T G G E I A S T F D H P E L V K L G S C K L I E E V M I G E D K L L K R T M M A C G G S I Q T S V N A L S A D V L G R C Q V F E E T Q I G G E R Y M E R L T L A C G G V A L N S F D D L S P D C L G H A G L V Y E Y T L G E E K F L R R L C K T V G A T A L P R L T P P V L E E M G H C D S V Y L S E V G D T Q V 330 340 350 360 8 8 6 3 2 4 7 8 8 8 4 5 8 8 9 8 7 5 5 7 7 1 2 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 L K I T G C A S P G K T V T I V V R G S N K L V I E E A E R S I H D A L C V I R L V I E Q C K N S R A V T I F I R G G N K M I I E E A K R S L H D A L C V I R T F I T D C K D P K A C T I L L R G A S K E I L S E V E R N L Q D A M Q V C R I L I K N T K A R T S A S I I L R G A N D F M C D E M E R S L H D A L C V V K I H F S G V A L G E A C T I V L R G A T Q Q I L D E A E R S L H D A L C V L A N F F T G C P K A K T C T F I L R G G A E Q F M E E T E R S L H D A I M I V R T F I E K C N N P R S V T L L I K G P N K H T L T Q I K D A V R D G L R A V K V V F K H E K E D G A I S T I V L R G S T D N L M D D I E R A V D D G V N T F K 370 380 390 400 9 8 5 2 7 8 8 6 3 5 6 6 4 6 8 9 9 9 9 9 9 9 9 8 7 4 4 7 8 8 7 5 6 8 9 9 9 9 9 9 C L V K K R A L I A G G G A P E I E L A L R L T E Y S R T L S G M E S Y C V R A N L I R D N R V V Y G G G A A E I S C A L A V S Q E A D K C P T L E Q Y A M R A N V L L D P Q L V P G G G A S E M A V A H A L T E K S K A M T G V E Q W P Y R A R V L E S K S V V P G G G A V E A A L S I Y L E N Y A T S M G S R E Q L A I A E Q T V K D S R T V Y G G G C S E M L M A H A V T Q L A N R T P G K E A V A M E S R A I K N D S V V A G G G A I E M E L S K Y L R D Y S R T I P G K Q Q L L I G A N A I D D G C V V P G A G A V E V A M A E A L I K H K P S V K G R A Q L G V Q A V L T R D K R L V P G G G A T E I E L A K Q I T S Y G E T C P G L E Q Y A I K K 410 420 430 440 9 9 9 9 9 9 9 9 9 9 9 9 8 7 4 5 8 9 7 6 8 9 9 9 9 9 9 9 9 8 5 3 7 8 8 7 4 6 7 7 F A D A M E V I P S T L A E N A G L N P I S T V T E L R N R H A Q G E K T A G I F A D A L E V I P M A L S E N S G M N P I Q T M T E V R A R Q V K E M N P A G I V A Q A L E V I P R T L I Q N C G A S T I R L L T S L R A K H T Q E N C E T G V F A R S L L V I P N T L A V N A A Q D S T D L V A K L R A F H N E A Q V N P G L Y A K A L R M L P T I I A D N A G Y D S A D L V A Q L R A A H S E G N T T A G L Y A K A L E I I P R Q L C D N A G F D A T N I L N K L R A R H A Q G G T G V F A D A L L I I P K V L A Q N S G F D L Q E T L V K I Q A E H S E S G Q L V G V F A E A F E A I P R A L A E N S G V K A N E V I S K L Y A V H Q E G N K N V G L 450 460 470 480 1A6D .A Sec Str description J 14 15 1A6D .A Sec Str description 16 17 1A6D .A Sec Str description K 18 1A6D .A Sec Str description L M N O 19 C-TERMINAL INTERMEDIA TE DOMAIN CCT1 CCT1 CCT1 CCT1

PAGE 4

5 4 7 7 7 4 1 1 5 7 8 9 8 9 8 7 5 1 5 6 9 9 9 9 9 9 9 9 9 9 9 9 9 9 8 8 7 6 3 1 N V R K G G I S N I L E E L V V Q P L L V S V S A L T L A T E T V R S I L K D C L H K G T N D M K Q Q H V I E T L I G K K Q Q I S L A T Q M V R M I L K N G E T G T L V D M K E L G I W E P L A V K L Q T Y K T A V E T A V L L L R D L S N G K P R D N K Q A G V F E P T I V K V K S L K F A T E A A I T I L R D M R E G T I G D M A I L G I T E S F Q V K R Q V L L S A A E A A E V I L R D I N N E D I A D N F E A F V W E P A M V R I N A L T A A S E A A C L I V S D L N T G E P M V A A E V G V W D N Y C V K K Q L L H S C T V I A T N I L L D I E A E V P A V K D M L E A G I L D T Y L G K Y W A I K L A T N A A V T V L R 490 500 510 520 0 3 3 3 1 2 6 7 7 7 7 7 7 7 7 7 7 7 7 6 6 7 7 7 7 8 9 9 I D D V V N T R I D D I R K P G E S E E I D D I V S G H K K K G D D Q S R Q G G A P D A G Q E I D D L I K L H P E S K D D K H G S Y E D A V H S G A L V D N I I K A A P R K R V P D H H P C V D E T I K N P R S T V D A P T A A G R G R G R G R P V D E I M R A G M S S L K G V D Q I I M A K P A G G P K P P S G K K D W D D D Q N D 530 540 1A6D .A Sec Str description 20 P 21 Q 22 1A6D .A Sec Str description C-TERMINAL EQUA T ORIAL DOMAIN Supplementary figure S1 1. Alignment and secondary-structure predictions of human CCT sequences compared to PDB secondary-structure description of 1a6d. For an explanation of abbreviations and symbols see Legend for Supplementary figure S10. CCT1 CCT1



PAGE 1

Table S7 sequenc e name alternate names species names protein accession number refseq number ENSEMBL ID classification Chr. location strand Exons start end remarks sequence CCT1 3P Hs OTTHUMG00000033751 CCTalpha pseudogene 3 vert 7 p14.1 + 2 42,801,030 42,802,033 Not identified in pseudogene.org MERPLSVFGDRSTGEAICSQNVMAAALIANTVKSSLGPVGLVKMLVDDIGDVSITNGAKV LCELADLQDKEVGDGTTSVVIIAAELLKNEDELVKQKIHPTSVISGYGLACKEAVRYINE NLIVNTDKVGRDCLINAAKTSTSSKIIGINGDFFANMVVDAVFAIKYTDIRGQPCYPVNS VNILKAHGKSQTESMLISGYALNCVVGSQ GIPKRIVNAKIACLDFGLQKTKMKLGIQVVI TDPEKLDQIRQRESDITKERIQKILVTGASIILTTGGIDDMCLKYFVEAGATAVRRVFKR DLKRIPKASGATILSTLANLKVLEVLARTIRQEKEIKGIQLGKEKVKLYLFADDMIVYLE N CCT1 1P Hs CCTalpha pseudogene 1 vert 12 p12.2 + 1 19986638 19987216 Not identified in pseudoge ne.org LKNTKAHTSASIILRGANDIMCGEMERSLHDALCVVKRVLESKSMVPGGGAVEAALSIYLENYAT SMGSQEEIAEFARSLLVIPNTLAVNAAQDSTDLVTKFRAFHNEAQVNPEHKNLKWIGLDLSNGER GDNKQAGMFEPTIVKVKSLKFATEAAITILRIDDLTKLHPESKDDKHGGYEDAVHSGALND CCT1 2P Hs CCT1alpha pseudogene 2 vert 5 p13.1 2 41621756 41623646 Not identified in pseudogene.org MEGPLSVFGDQDRSTGEVVRSQNVMAASIANTVKSSLGPVGLDKMLVDDVGDVTITNDGAAILRL PEVEHPAAEVLCELADLQDKEVGDGTTSIEALRYIRNLIIYTDELGRDCLINSTETWMSSKIIGI NGDLFANIVADTVLVTTYTDIRGQPRYPANSVNILKAHARSQIESMLISGYALSCVVGSQGMPKI IVNAEI ALNSSLLDVQVVITDPEKLDQIRQKESDIIKERIPKILATDANIILIAGGIDDMCLKYF VEVGAMAIRRALKRDLKCIAETSGGTVLSTLANLKGEETFETAVSGQVEEVVQERICDDELILIK TSKACASASIILHGANDFVCDEMERSLHDALCIVKKVLESKSMMLVGDAVEAAFSIYPENCETSM ESQEQLAIAESARPLPVIPLTVNAAQNSTDLVASLRAFHNEAQVNPESKNLKIGLDLRNGKPRH N KQAGVFEPTTVKVKSLKFATEAAVTILQIDDLIKLYPESKDVKHGGYEDAAHTGALD TCP1 CCT1, CCTa, TCP 1 alpha Hs ENST00000321394 CCTalpha (real gene) vert 6 q25.3 12 160,119,520 160,130,731 tcp1 = Tailless Complex Polypeptide 1 MEGPLSVFGDRSTGETIRSQNVMAAASIANIVKSSLGPVGLDK MLVDDIGDVTITNDGAT ILKLLEVEHPAAKVLCELADLQDKEVGDGTTSVVIIAAELLKNADELVKQKIHPTSVISG YRLACKEAVRYINENLIVNTDELGRDCLINAAKTSMSSKIIGINGDFFANMVVDAVLAIK YTDIRGQPRYPVNSVNILKAHGRSQMESMLISGYALNCVVGSQGMPKRIVNAKIACLDFS LQKTKMKLGVQVVITDPEKLDQIRQRESDITKERIQKILATGANVILTTGGIDDM CLKYF VEAGAMAVRRVLKRDLKRIAKASGATILSTLANLEGEETFEAAMLGQAEEVVQERICDDE LILIKNTKARTSASIILRGANDFMCDEMERSLHDALCVVKRVLESKSVVPGGGAVEAALS IYLENYATSMGSREQLAIAEFARSLLVIPNTLAVNAAQDSTDLVAKLRAFHNEAQVNPER KNLKWIGLDLSNGKPRDNKQAGVFEPTIVKVKSLKFATEAAITILRIDDLIKLHPESKDD KHGSYE DAVHSGALND CCT8L2 GROL, CESK1 Hs NP_055221 NP_055221.1 ENST00000359963 CCTtheta_L1/L2 (real gene) 22 q11.1 1 15,451,770 15,453,440 EST available MDSTVPSALELPQRLALNPRESPRSPEEEEPHLLSSLAAVQTLASVIRPCYGPHGRQKFL VTMKGETVCTGCATAILRALELEHPAAWLLREAGQTQAENSGDGTAF VVLLTEALLEQAE QLLKAGLPRPQLREAYATATAEVLATLPSLAIQSLGPLEDPSWALHSVMNTHTLSPMDHL TKLVAHACWAIKELDGSFKPERVGVCALPGGTLEDSCLLPGLAISGKLCGQMATVLSGAR VALFACPFGPAHPNAPATARLSSPADLAQFSKGSDQLLEKQVGQLAAAGINVAVVLGEVD EETLTLADKYGIVVIQARSWMEIIYLSEVLDTPLLPRLLPPQRPGKCQRVYRQELGDGL A VVFEWECTGTPALTVVLRGATTQGLRSAEQAVYHGIDAYFQLCQDPRLIPGAGATEMALA KMLSDKGSRLEGPSGPAFLAFAWALKYLPKTLAENAGLAVSDVMAEMSGVHQGGNLLMGV GTEGIINVAQEGVWDTLIVKAQGFRAVAEVVLQLVTVDEIVVAKKSPTHQEIWNPDSKKT KKHPPPVETKKILGLNN CCT8 1P chr1.mb145, Hs TCP1 theta pseudogene 4 ve rt 1 q21.1 1 145141482 145143137 Identified in pseudogene.org, processed MALQVPKAPGFAQMLKEGAKHFSELEEAVYRNIQACKELAQTTRTAYGRN GMKKMVINYLEKLFVTNDAATILRELEVQHPAAKMTVMASHMQEQEVGDG TNIVLVFAGALLELAEELLRIGLSVSEVIEGYEIACRKAHEILPNLVRCS AKNLRDVDEVSSLLRTSVMCKQYGNEVF LAKLIVQACVSIFPDSGHFKVD NIRVCKILGCGITSSSVLHGMVFKKETEGDVTSVKDAKIAVYSCPFDGM ITETKGTVLIKTDEELMNLSKGEENLMDAVKAIADTGANVVVTGGKVA DMALHYANKYNMMLVKLNSKWDVRLCKTVGATALPRLTPPVLEEMGHD SVYLSEVGDTQVVVFKHEKEDGIISTIVLQGSTDNLMDDIERAVDDGVNT FKVLTRDKRLVPGGGATEIELAKQITSYGETCPG LEQYAIKKFAEAFEAI PRALAENSGENSGVKANEVISKLYAVPQEGNKNVGLDTEAVVPAVTDM LEAGVLDTYLGKHWSIKLAANAAVTVLRVGQVIMAKPDGGPKPPSGKKDW DDDQND CCT5 1P Human.chr13.mb78 Hs CCTepsilon pseudogene 1 vert 13 q31.1 + 1 78382086 78382680 Not identified in pseudogene.org ASMGTLA FD*YGPPFLIIKDQDRKSRLMGLEALKSHIMVANAVAHATRTSLGPKGLHKMV VDKDGDVTVTNDGVTILSVMDVSHQIAQFDGGNV*VSG**NWRWNHRCGCPGWCLVRRSP GVARPKYSSNQNSRWL*AGCLLCCGTPGQDQ*QHPC*HKRHQTPDSDCKNHAGLQSGQQL LPTNG*DCCECHPPLTDMEQRGVDFELIKVESKVGGSLEDNKRIKGVIVDKYFSHPQMPK KVEDAKIVILTCPFELPKP KAKHKLDVTSVEDYKALQKYEKEKF*EMIQQIKETGANLEI CQWGFDDEANHLLLQNNLPAVC*VGGPEIELIAITTGGQIITKFSELMARKLGFAGLVQE ISFWTTKDKMLVIEQCKNSRAITIFMRRRNKMIIEEAKQSLHGALCVIWNLIRDNHVVYG GGAAEISCALPVSQEVDKCPTLEQYAMRALANTLEVIPKALSENGGMNPIQSMTKV*ASQ VKEMNLALGIDCLHKGTNYMKQ*RVTETSIG KKATDISCNTNG*NDFEE*QHS*DWRI*R CCT5 TCP1E, TCP1 epsilon,KIAA0098 Hs NP_036205 NM_012073.3 ENST00000280326 CCTepsilon (real gene) vert 5 p15.2 + 11 10,303,453 10,317,892 MASMGTLAFDEYGRPFLIIKDQDRKSRLMGLEALKSHIMAAKAVANTMRTSLGPNGLDKM MVDKDGDVTVTNDGATILSMMDVDH QIAKLMVELSKSQDDEIGDGTTGVVVLAGALLEEA EQLLDRGIHPIRIADGYEQAARVAIEHLDKISDSVLVDIKDTEPLIQTAKTTLGSKVVNS CHRQMAEIAVNAVLTVADMERRDVDFELIKVEGKVGGRLEDTKLIKGVIVDKDFSHPQMP KKVEDAKIAILTCPFEPPKPKTKHKLDVTSVEDYKALQKYEKEKFEEMIQQIKETGANLA ICQWGFDDEANHLLLQNNLPAVRWVGGPEIELIAIAT GGRIVPRFSELTAEKLGFAGLVQ EISFGTTKDKMLVIEQCKNSRAVTIFIRGGNKMIIEEAKRSLHDALCVIRNLIRDNRVVY GGGAAEISCALAVSQEADKCPTLEQYAMRAFADALEVIPMALSENSGMNPIQTMTEVRAR QVKEMNPALGIDCLHKGTNDMKQQHVIETLIGKKQQISLATQMVRMILKIDDIRKPGESE E CCT7 TCP1 eta Hs NP_006420 NM_006429.2 ENST000 00258091 CCTeta (real gene) vert 2 p13.2 + 11 733,320,279 73,333,494 MPTPVILLKEGTDSSQGIPQLVSNISACQVIAEAVRTTLGPRGMDKLIVDGRGKATISND GATILKLLDVVHPAAKTLVDIAKSQDAEVGDGTTSVTLLAAEFLKQVKPYVEEGLHPQII IRAFRTATQLAVNKIKEIAVTVKKADKVEQRKLLEKCAMTALSSKLISQQKAFFAKMVVD AVM MLDDLLQLKMIGIKKVQGGALEDSQLVAGVAFKKTFSYAGFEMQPKKYHNPKIALLN VELELKAEKDNAEIRVHTVEDYQAIVDAEWNILYDKLEKIHHSGAKVVLSKLPIGDVATQ YFADRDMFCAGRVPEEDLKRTMMACGGSIQTSVNALSADVLGRCQVFEETQIGGERYNFF TGCPKAKTCTFILRGGAEQFMEETERSLHDAIMIVRRAIKNDSVVAGGGAIEMELSKYLR DYSRTIPGKQQLLIG AYAKALEIIPRQLCDNAGFDATNILNKLRARHAQGGTWYGVDINN EDIADNFEAFVWEPAMVRINALTAASEAACLIVSVDETIKNPRSTVDAPTAAGRGRGRGR PH

PAGE 2

CCT8L1 LOC155100 Hs NP_001025037 NM_001029866.1 ENST00000021776 CCTtheta_L1/L2(real gene) 7 q36.1 + 1 151,773,495 151,775,165 MDSTVPSALELPQRLALN PRESPRSPEEEEPHLLSSLAAVQTLANVIRPCYGPHGRQKFL VTMKGETVCTGCATAILRALELEHPAAWLLREAAQTQAENSGDGTAFVVLLTEALLEQAE QLLKFGLPRPQLREAYATATAEVLATLPSLAIQSLGPLEDPSWALHSVMNTHTLPPMNHL TKLVAHACWAIKELDGSFKPERVGVCTLHGGTLEDSCLLQGLAISGKLCGQMAAVLSGAR VALFACPFGPAHPNAPATACLSSPADLAQF SKGSDQLLEKQVGQLAAAGINVAVVLGEVD EETLTLADKYGIVVIQARSRMEIIYLSEVLDTPLLPRLLPPQRPGKCQRVYRQELGDGLA VVFEWECTGTPALTVVLRGATTQGLRSAEQAVYHSIDAYFQPCQDPRLIPGAGATEMALA KMLSDKGSRLEGPNGPAFLAFARALKYLPKTLAENAGLAVSDVVAEMSGVHQGGNLLMGV GAEGIINVAQEGVWDTLIVKAQGFRAVAEVVLQLVTVDEIVV AKKSPTHQQIWNPDSKKT KKRPPPVEKKKILGMNN CCT5 3P Hs CCTepsilon pseudogene 3 vert 5 q22.3 + 2 114876388 114877290 Not identified in pseudogene.org SHL*VPKKVEDAKIAILTCPFEPPKPKTKHKLDVTSIDHKALHKYEKEKFEEMIQQIKETGANLA ICQWGFDD*ANHLFLQNNLPVVRWVGGLEIELIAISTRGRIV PGSQSSRPRCWPMALSENSGMNP IQTTTKVRARQVKEMNPALGTDCLHKGTNDMKRQHVIEILIGKKQQTSFATQMVRMILKIDDIHK PGESEE CCT4 TCPD, TCP 1 delta Hs ENSP00000233836 CCTdelta (real gene) vert 2 p15 13 61,950,076 61,969,146 MPENVAPRSGATAGAAGGRGKGAYQDRDKPAQIRFSNISAAKAVADAIRTSLGP KGMDKM IQDGKGDVTITNDGATILKQMQVLHPAARMLVELSKAQDIEAGDGTTSVVIIAGSLLDSC TKLLQKGIHPTIISESFQKALEKGIEILTDMSRPVELSDRETLLNSATTSLNSKVVSQYS SLLSPMSVNAVMKVIDPATATSVDLRDIKIVKKLGGTIDDCELVEGLVLTQKVSNSGITR VEKAKIGLIQFCLSAPKTDMDNQIVVSDYAQMDRVLREERAYILNLVKQIKKTGCNVLLI QKSIL RDALSDLALHFLNKMKIMVIKDIEREDIEFICKTIGTKPVAHIDQFTADMLGSAE LAEEVNLNGSGKLLKITGCASPGKTVTIVVRGSNKLVIEEAERSIHDALCVIRCLVKKRA LIAGGGAPEIELALRLTEYSRTLSGMESYCVRAFADAMEVIPSTLAENAGLNPISTVTEL RNRHAQGEKTAGINVRKGGISNILEELVVQPLLVSVSALTLATETVRSILKIDDVVNTR HSPD1 2P Hsp60s2 Hs AAK60261 ENSP00000328369 GroEL pseudogene 2 5 p14.3 1 21919402 21920175 Not recorded in pseudogene.org database, processed, retrotransposed. MAIATGGAVFGEEGLTLNLEDVQPHDLGKVGEVIVTKDDAMLLKGKGDKAQLEKRIQEII GQLDVTTSEYEKEKLNEWLAKLSDGVVVLKFGGTSDVEVNEKKDRVT DALNATRAAVEGG IVLGGGFALLRCIPALDSLTPANEDQKIGMEIIKRTLKIPAMTTATNAGVEGSLIVEKIM QNSSEVGYDAMVGDFMNMVEKGIIDPTKLVRTALLDAAGVASLLTTAEVVVTEIPKEEKD PGMGAMGGMGGGMGGGMF CCT4 1P chrX.mb64, TCPD human Hs ENSG00000115484 CCTdelta pseudogene 1 vert X q12 + 3 64407520 64409590 Identified in pseudogene.org AGGYRKCSYKDREKSFQINLGNIYMAKEVANALRTSLQPXXXXXXXX XXXGDMTITYDAVTIVKQMGLHPAARILAVLSKAQDIEAGDGTTSV VIIAGSLLYSYNKLLQKRIHLAIISESFKALGNGIKIITDMSHMEVND KETFNSTTNLLNSMLIIQHSSLISPMSVNTVIK MDLATATTVDLRDIKIVKLGRTIDCSELVK GIITQKVENSGIARVEKA KIGLIQFCFSAPKTEMDNQIVSDYTHVEQVLKERA GDNILLTQKSLLIGALSDLELHFLNKMKIMVIKDIEKEGIEFICKTIGNK PVAYIDFTTNMLGAAPLAKGVNLNSSGKLLKTAGCAGPGKA VIVVVHD SNKLMIEKAKCSIYDALCFISYLVKNRALIIRYGAPEIRVALKLPEYS ILRGIESYCIYDFTDDMEVTSFTLAKNTGLNHISSLAEIGNQYLQEVNTV GITVQKGDILNILEEIIAQPLLVSIGVLTLATESLGILKLDDMANT CCT3 TCP1 gamma Hs NP_005989 NM_005998.3 ENSG00000163468 CCTgamma (real gene) vert 1 q23.1 13 154,545,617 154,572,307 MMGHRPVLVLSQNTKRESGRKVQSGNINAAKTIADIIRTCLGPKSMMKMLLDPMGGIVMT NDGNAILREIQVQHPAAKSMIEISRT QDEEVGDGTTSVIILAGEMLSVAEHFLEQQMHPT VVISAYRKALDDMISTLKKISIPVDISDSDMMLNIINSSITTKAISRWSSLACNIALDAV KMVQFEENGRKEIDIKKYARVEKIPGGIIEDSCVLRGVMINKDVTHPRMRRYIKNPRIVL LDSSLEYKKGESQTDIEITREEDFTRILQMEEEYIQQLCEDIIQLKPDVVITEKGISDLA QHYLMRANITAIRRVRKTDNNRIARACGARIVSRPEEL REDDVGTGAGLLEIKKIGDEYF TFITDCKDPKACTILLRGASKEILSEVERNLQDAMQVCRNVLLDPQLVPGGGASEMAVAH ALTEKSKAMTGVEQWPYRAVAQALEVIPRTLIQNCGASTIRLLTSLRAKHTQENCETWGV NGETGTLVDMKELGIWEPLAVKLQTYKTAVETAVLLLRIDDIVSGHKKKGDDQSRQGGAP DAGQE CCT6 1P Hs TCP1 zeta pseudogene 1 5 p15 .2 1 14692965 14693954 Not identified in pseudogene.org KDGNVLFHEMQIQHTTASLIAKVATAQDDITGDSTTSNVLIIEKLLKQEDLYISEGLHL RIITEGFEAAKEKALRFLEEVKIRKEMDRETLINVARTSLHTKVHAELADALTEAVVDS ILAIKRQDEPIDLFMVVIMEMKHKSETDTSLIRGLVLDHGAWHPDMKKRVEDVYILKCN VSLEYEKT*VNSGFFYK TAEMREKLIKAERKFIEDKS*KIELKRKVCGDSDKGFVAINQ EGIDPLSLDALAKEHIVTLHRAKRRNIEGLTLASGEVALNSFENLNPDCLGHAGLVYKY ILGEEKFTFIEKCNNTRSVTLLNKGPNKYTLTQLQ CCT2 TCP1 beta Hs NP_006422 NM_006431.2 ENSP00000299300 CCTbeta (real gene) 12 q15 + 14 68,266,317 68,280,052 MASLS LAPVNIFKAGADEERAETARLTSFIGAIAIGDLVKSTLGPKGMDKILLSSGRDAS LMVTNDGATILKNIGVDNPAAKVLVDMSRVQDDEVGDGTTSVTVLAAELLREAESLIAKK IHPQTIIAGWREATKAAREALLSSAVDHGSDEVKFRQDLMNIAGTTLSSKLLTHHKDHFT KLAVEAVLRLKGSGNLEAIHIIKKLGGSLADSYLDEGFLLDKKIGVNQPKRIENAKILIA NTGMDTDKIKIFGSRVR VDSTAKVAEIEHAEKEKMKEKVERILKHGINCFINRQLIYNYP EQLFGAAGVMAIEHADFAGVERLALVTGGEIASTFDHPELVKLGSCKLIEEVMIGEDKLI HFSGVALGEACTIVLRGATQQILDEAERSLHDALCVLAQTVKDSRTVYGGGCSEMLMAHA VTQLANRTPGKEAVAMESYAKALRMLPTIIADNAGYDSADLVAQLRAAHSEGNTTAGLDM REGTIGDMAILGITESFQVKRQVLLSAAE AAEVILRVDNIIKAAPRKRVPDHHPC CCT71P Hs ENST00000399032 CCTeta pseudogene 1 vert 5 q15 1 92251627 92307366 Not identified in pseudogene.org AKTSMDIAKYQDAKVGDSTTSVTLLAAEFLKQVKPYVEE GLHLKIIIQALRTAIQLAVNDNKEITVTMKKTEKVEQKKLLGEVCHPALSSKLISQQKAF FAKMVVDAVMM LDGLLQLKMIGIKKIQGGALEDSHLLPGVSFK CCT8 TCP1 theta Hs NP_006576 NM_006585.2 ENST00000389159 CCTtheta (real gene) vert 21 q21.3 15 29,350,670 29,367,782 MALHVPKAPGFAQMLKEGAKHFSGLEEAVYRNIQACKELAQTTRTAYGPNGMNKMVINHLEKLFVTNDAA TILRELEVQHPAAKMIVMASHMQEQEVGDGT NFVLVFAGALLELAEELLRIGLSVSEVIEGYEIACRKAH EILPNLVCCSAKNLRDIDEVSSLLRTSIMSKQYGNEVFLAKLIAQACVSIFPDSGHFNVDNIRVCKILGS GISSSSVLHGMVFKKETEGDVTSVKDAKIAVYSCPFDGMITETKGTVLIKTAEELMNFSKGEENLMDAQV KAIADTGANVVVTGGKVADMALHYANKYNIMLVRLNSKWDLRRLCKTVGATALPRLTPPVLEEMGHCDSV YLS EVGDTQVVVFKHEKEDGAISTIVLRGSTDNLMDDIERAVDDGVNTFKVLTRDKRLVPGGGATEIELA KQITSYGETCPGLEQYAIKKFAEAFEAIPRALAENSGVKANEVISKLYAVHQEGNKNVGLDIEAEVPAVK DMLEAGILDTYLGKYWAIKLATNAAVTVLRVDQIIMAKPAGGPKPPSGKKDWDDDQND

PAGE 3

CCT6 4P Hs Tcp1 zeta pseudogene 4 3 q28 + 5 191915332 191916879 Not identified in pseudogene.org GPSAGGFGAQHQCGAGATGHTEDQPGTQGRHEDACFCAGHIKLTKDSHVPLHKMQIHEM QMLLNAFLIAKVATAQDDITGDGMTYNVLIIRELLKQANLYISEGLHPRIITEGFEVER PVLDHGARHPDVKKRVEDAYILKCNVLAKESIAALHRAKSRNMERLTLACGRVALIYFY DLNPDCLGLVVLVHEYTLRRSSPLFSNFQVL VPWKWQWQKTSVKGSAQLGVQAFTDALL NYSQGEPVVAAEIGVWDNYCVKKQLLHSCTVMATNILLVDEIMKAINRMSSLKG CCT6 2P Hs Tcp1 zeta pseudogene 2 11 q22.3 1 109013584 109014117 Not identified in pseudogene.org NSFDDLTPDYVGYAGLLCEYTFGEEKFTFIEKYNYPSSVTLLAKEPNKYTHTQIKD AVRDGLRA VK CCT6A TCP1 zeta, CCT6, Cctz, HTR3, TCP20, TCPZ, TTCP20 Hs NP_001753 NM_001762.3 ENST00000275603 CCTzeta (real gene) 7 p11.2 + 14 56,087,036 56,098,269 MAAVKTLNPKAEVARAQAALAVNISAARGLQDVLRTNLGPKGTMKMLVSGAGDIKLTKDG NVLLHEMQIQHPTASLIAKVATAQDDITGDGTTSNVLI IGELLKQADLYISEGLHPRIIT EGFEAAKEKALQFLEEVKVSREMDRETLIDVARTSLRTKVHAELADVLTEAVVDSILAIK KQDEPIDLFMIEIMEMKHKSETDTSLIRGLVLDHGARHPDMKKRVEDAYILTCNVSLEYE KTEVNSGFFYKSAEEREKLVKAERKFIEDRVKKIIELKRKVCGDSDKGFVVINQKGIDPF SLDALSKEGIVALRRAKRRNMERLTLACGGVALNSFDDLSPDCLGHAGLV YEYTLGEEKF TFIEKCNNPRSVTLLIKGPNKHTLTQIKDAVRDGLRAVKNAIDDGCVVPGAGAVEVAMAE ALIKHKPSVKGRAQLGVQAFADALLIIPKVLAQNSGFDLQETLVKIQAEHSESGQLVGVD LNTGEPMVAAEVGVWDNYCVKKQLLHSCTVIATNILLVDEIMRAGMSSLKG CCT6 3P Hs Tcp1 zeta pseudogene 3 7 q11.21 + 8 64162812 64171325 N ot identified in pseudogene.org TCPZ3P & TCPZ5P are next to each other on chr. 7 MLLHFFSIFPLNHLKQIQHPTASLIAKVATAQDDITGDGTTSNVLIIGELLKQADLYISE GLHPRIITEGFEAVKEKALHFLEEVKVSREMDKETLKDVARASLCTKVHAELADVLTEAV VGSILAIKRKDEPIDLFMILCSFKEGIVALHRAKRRNMERLTLACGGVALNSF EDLSPDC LGHAGLVYEYTLIKDAVRDGLRAVKNAVDDGCVVPGAGAVEVAMAEALNKYKLSVKGKAQ LGVQAFADALLVIPKQQKQALWDNDCVKKQLLHSCTDCHQHSLG CCT6B Cctz2, TSA303, Tcp20) Hs Q92526 Q92526.4 ENST00000314144 CCTzeta (real gene) 17 q12 14 30,279,183 30,312,525 MAAIKAVNSKAEVARARAALAVN ICAARGLQDVLRTNLGPKGTMKMLVSGAGDIKLTKDG NVLLDEMQIQHPTASLIAKVATAQDDVTGDGTTSNVLIIGELLKQADLYISEGLHPRIIA EGFEAAKIKALEVLEEVKVTKEMKRKILLDVARTSLQTKVHAELADVLTEVVVDSVLAVR RPGYPIDLFMVEIMEMKHKLGTDTKLIQGLVLDHGARHPDMKKRVEDAFILICNVSLEYE KTEVNSGFFYKTAEEKEKLVKAERKFIEDRVQKII DLKDKVCAQSNKGFVVINQKGIDPF SLDSLAKHGIVALRRAKRRNMERLSLACGGMAVNSFEDLTVDCLGHAGLVYEYTLGEEKF TFIEECVNPCSVTLLVKGPNKHTLTQVKDAIRDGLRAIKNAIEDGCMVPGAGAIEVAMAE ALVTYKNSIKGRARLGVQAFADALLIIPKVLAQNAGYDPQETLVKVQAEHVESKQLVGVD LNTGEPMVAADAGVWDNYCVKKQLLHSCTVIATNILLVDEIMRAGMS SLK HSPD1 5P Hs GroEL pseudogene 5 12 q13.2 + 2 55191053 55192769 Not identified in pseudogene.org MLRLPTVFRQMRLVSRVLAPHLTGAYAKDVKFGADARALMLQGIDLLADAVALTMGPKGR RTVIIEQSWGSPKVTKDGVTVAKSIDLKDKYKNIGAKLVQDVANNTNEETGDGPTTATVL ARSIAKEGIEKISKGANPVEIRRGVMLVVD AVIAELKKQSKPVTTPEEIAQVATISANGD KEIGNISDAMKKVGRKGVITVKDGKTLNDELEIIEGIKFDRGYISPYFINTSKGQKCEFQ DAYVLLSEKKISSVQSVVPALEIANAHRKPLVIIAEDVDGEALSTLVLNRLKVGLQDVAI KAPGFGDNRKNQLKDMAIATGGAVFGEEGLTLNLEDVQPHDLGNVGEVIVTKDDATLLKE KGDKAQIEKRIQEIIEQLDVTTSEYEKEKLNERLAKLSDVVA VLKVGGTSDVEVNEKKDR VRMPLMLQELLLKKALYWEGVVRTALLDAAGVTSLLTTAEVVVTEIPKEEKDPGMGAMGR MGGGMGSGMF HSPD1 6P Hs GroEL pseudogene 6 3 P22.3 4 36783612 36785195 Not identified in pseudogene.org VDLLVDAVAITMGPKGRTVIIEQSWGSPKVTKDGVTVAKSIDLKNKYKNIEAKLVQDVAN NTNEE AGDGPTTATVLACSIAKEGFEKISKGANPVEIRRGVMLAVDAVIAEPKKQSKPVT TPEEIAQVATISASGDKEIGHILSDAVKKVGRKGVITVKDGKTLNDGLEIIESLKFDRGY VSPYFINTSKGQKCEFQEAYVLLSEKKISSVQSIVPALETASAHRKPLVIITEDSRLQGF VTKNQLKDMAVTTVGAVFGEEGLILHFEDVQPHDLGKVGEVIVTKDDAMLLKGKDGIAMP KVGGTSDVEANEKKDRV TDALNATRAAVEEGIVLGGAMTIAKNAGVEGSSIVEKIMQRSS EVGYDATRVGDFVNMVGKGIIDPRKVVRTALLDAAGVASLLTTAEVVVTEIPKEEKDPGM GAMGGMGCGMGGGML HSPD1 7P Hs GroEL pseudogene 7 8 p23.1 + 6 7263938 7265475 Not identified in pseudogene.org MGPKGRTVIIEHSWGSPKVTKDGVTDAKSIDLKDKYKS IGAKLVQDVANNTDEETGGWHY HCCCTGMLYFQIRLPEAVDAVIAELKKQSKPVTKPEEIAQVATISANGDKEIGNIISDAM KKFGRKGIITKCEFQDAYVLLHEKKISSVQSIVTALEIANAYCKPLVIIAGDIDGEALTT LILNRLKVGLQVVAVKAPGFGDNRKNQLKDTVIATGGEVGEVTVIKDYAMLLKGKGNKSQ IEKCVQEIIDQSDVTTSEYEKEKVSGETFRWSSCAEGGCALLRCIPALDS FTPANEDKII GIEIIKRTLKIPAMTIAKNAVGYDTMLGDVVNMVEKDIIDPTKVVRTASLDAAGMASLLT TAAVVVTEIPKEGNSPGMGAMCGMGGGLF HSPD1 22P Hs GroEL pseudogene 22 21 q21.3 3 29181851 29183334 Not identified in pseudogene.org VDLLLTAVAITMGPKGKTVITEQNWGSPKVTKDGVTVARSIDLKDKYKNIAA KLVQDVAN NTNEEAGDGTTTATVAKEVFKKFSKGANPVEIKRSVMLAVDAVIAELKKQSKPVTTPEEI AQVATISANGDKEIGNLISDAMKKVRRKGFTTDIFLHTLLIHQKVGNVNSKKISSVQSIV PDLEIANAHRKLLVIIAENVDGETLSTLILNRRKLGLQVVAVKAPGFGDNRKNHLKGMTI ATGGAVFGEEELTSNLEDVQPHDLGEVGELDVTTSEYEKEKLNEQLAKLSDGVAVLKVSG TSH VEVNEKKNRVTDALNATRAAVEEGIVLGGGCTLLR HSPD1 4P Hs GroEL pseudogene 4 6 q15 + 1 88065673 88066269 Not identified in pseudogene.org MTTPGEIAQVATISANGDKEIVNIISDANKFGRKGVLTVKDGKTLNDELEIIKGIKFD*GYISPY FINTSKG*KCEFQDAYVVLNEMKISSVQTTVPAL*IANPHCKSLVIIAEDIDREARS TFIFNRLK VGLQIIAVKAPDFGDNRKNQREDTAIATGGLVFSEEGLALNFEAI*PRNLEKVGEVILTKYDAML LKGK HSPD1 8P Hs GroEL pseudogene 8 4 q31.21 + 4 145986418 145987946 Not identified in pseudogene.org CLSLNASRQSWGSPKVTKDSVTVAKSIDLKDKYKNTGAKLVQDVANNTNEEAVDGTTTVT ALARSIAKEGFEK ISKGANPVEIRRGVMLAVDAIIAEPKKQSKPVTTPEEIARVATISAN GDKEIGNIISDAMKKVGSKGIITVNNGKSYISPYLINTSKGQKCEFQDAYVLLSEKKISS VQSIAPALEIANAYFKDPGFGDNRNNQLKDMAIATGGAVFAEEGLTLNLEDVQPHDLGKV GEVIVTKDDAMLLKGKGDKAQIEKCVQEIIEQLDVTTIAVLKVGGTSDAEVNEKQDRVTD ALNATRAAVEEGIVLRGGRALLRCI PALDSLTPVNEDHNIGIEIIKKTLKFPAMTIAKNA GVEVSLIVEKIMQSSSEVGYDAMGRDFVNMVEKGIIDTTKFVRTALLDASGVASLLTTAE VLVTEIPKEEKDPGMGAMDGMGGGMGGGMF HSPD1 9P Hs GroEL pseudogene 9 8 p23.1 5 7785932 7787502 Not identified in pseudogene.org,HSPD9P & HSPD7P are next to eac h other on chr. 8 CRLFVDAVAITMGPKGRTVIIEHSWGSPKVTKDGVTDAKSIDLKDKYKSIGAKLVQDKVS KGANPVEIKRGVMLAVDAVIAELKKQSKPVTKPEEIAQVATISANGDKEIGNIISDAMKK GYISPYFLNTSKGEKCEFQDAYVLLHEKKISSVQSIVPALEIANAYCKPLVIIAEDIDGE ALTTLILNRLKVGLQVVAVKAPGFGDNRKNQLKDTPRDVGEVGEVTVIKDDAMLL KGKGN KSQIEKCVQEIIDQSDVTTSEYEKEKLNGETFRWSSCAEGGCALLRCIPALDSFTPANED KIIGIEIIKRTLKIPAMTIAKNAGVDGFLIVEKIMQSSSEVGYDTMLGDVVNMVEKDIID PTKVVRTALLDAAGMASLLTTAAVVVTKIPKEGNSPGMGAMCGMGGGLF HSPD1 10P Hs GroEL pseudogene 10 12 p13.31 + 4 8058884 8082857 Not identi fied in pseudogene.org MLLPLPTVFPQMRLLSRVLAPHLTRAYAKDVKFGADARALMLQGVDLLADAVAVTMGPKG RTVIIEQSWGSPQVTKDAKLVQDIANNTNEEAGNGTTSATVLARSIAKEGFKKISKGANP VEIRKGVMLAVDAVIAELRKQSKFVTTPEEIAQVATTSANGDKEIGNIISNAMKKVGRKG VITVKDGKTLNDELEIIESGWARCLMPSIVPALEIVNAHRKPLVIIAEDV DGEALSTFIL NRLNVGLQVVAVKALLVTIEPAEEPGERYGYCYWWKPGYLIVTGRHTAGCLPGSVKPSRL R

PAGE 4

HSPD1 11P Hs GroEL pseudogene 11 5 q15 + 5 95130459 95132169 Not identified in pseudogene.org MKILGLPSLSADETRLLGSSSHLGLCQTCKIWCRCLSLNSSRCRPFSRCRSCYSGAEGKN YVASNTNEEAGDGTTTATVLA LPMVKKVFEKITRGANPVKIRKGMMLAIDALIAELKKQS KPMTTLEEIIQFAIISANRDKSATSFLKQLKAGPQVIAVKAPGLLTIERTSLKIWLLLLV VQCLEKGVGLITQEKLERSLGPKVMLCSLKEMVTSLELKIVFKKSLSIAVLKVGVTSDVK TNEKKDRVTDALNAIRAATEEDIILGWGCALLQFIPALDSLTPTNKNKKLLQQLCSVPQK LVMSYGWGFCEYGGRGIIGPTKDVRTALLDAAW VASPLITAEVTEIPKEEKDPGMGAVGG MGGGISDS HSPD1 12P Hs GroEL pseudogene 12 13 q31.1 + 6 78321372 78323341 Not identified in pseudogene.org LVSRVLKPHLPQADAKDVNFGADAQNLMLQDGVTVAKSIDLKDKYKNIGAKLVQNVANNT NKEAGDGITTATILACSTAKKALRRLAKVLIQWKSEVWRQKKIGSIISGAMKKVG RKVVL TVKDGKTLNDELEITEGGRAWLHFSMLINTSKGHKCEFQDAYVLLSEKNISGVQSIVPAL EIASAYLKPLVTIAEDIDEETLSTLILNRLKVGLQVVAAKVSGFGDNGASLKIWLLLLLD IITSGYEKKKLNEYLAKLSDGVAVLMVGETSDVQVKDKKDRFTDVLNATRAAIEDGIVLL QCIPALDSLTPTNDDFFKCVEGSLIAEKIMQDSSEVGYDAMIGDSVNMVEKGIIDSTNIV RTALLH AAEVASLLPTAEVVVIEIPKKENPGMSAIGGMGDGLF HSPD1 13P Hs GroEL pseudogene 13 6 q25.2 + 1 153068626 153068943 Not identified in pseudogene.org MEIIKRTLKIPAMTIAKNASVERSLIVEKIMQSSSEVGYDAMVRDFVNMVEKAIIDPTKI VRAALSDAAGVASPLTTAEIVVTKIPKKREGPWNGCNGWNGRWYER HSPD1 GROEL, HSP60, SPG13, CPN60, HuCHA60 Hs NP_002147 NM_002156.4 ENSG00000144381 GroEL (real gene) 2 q33.1 11 198,060,018 198,071,817 MLRLPTVFRQMRPVSRVLAPHLTRAYAKDVKFGADARALMLQGVDLLADAVAVTMGPKGR TVIIEQSWGSPKVTKDGVTVAKSIDLKDKYKNIGAKLVQDVANNTNEEAGDGTTTATVLA RSIAKEGFEKISKGANPVEIRRGVMLAVDAVIAELKKQSKPVTTPEEIAQVATISANGDK EIGNIISDAMKKVGRKGVITVKASDGKTLNDELEIIEGMKFDRGYISPYFINTSKGQKCE FQDAYVLLSEKKISSIQSIVPALEIANAHRKPLVIIAEDVDGEALSTLVLNRLKVGLQVV AVKAPGFGDNRKNQLKDMAIATGGAVFGEEGLTLNLEDVQPHDLGKVGEVIVTKDDAMLL KGKGDKAQIEKR IQEIIEQLDVTTSEYEKEKLNERLAKLSDGVAVLKVGGTSDVEVNEKK DRVTDALNATRAAVEEGIVLGGGCALLRCIPALDSLTPANEDQKIGIEIIKRTLKIPAMT IAKNAGVEGSLIVEKIMQSSSEVGYDAMAGDFVNMVEKGIIDPTKVVRTALLDAAGVASL LTTAEVVVTEIPKEEKDPGMGAMGGMGGGMGGGMF HSPD1 14P RP11 206L1.2 Hs OTTHUMG00000016753 G roEL pseudogene 14 13 q13.3 4 37465288 37466827 Not identified in pseudogene.org,Processed pseudogene, vega transcript available, annotated in ENSEMBL. MRTVSRVLAPYLTWAYAKDVKCGIDAQALMLQGAELLADAVAITMGSKGRAVIIQQSCGD PKVTKDGVTVEKSIDLKDEYNSIGVKLVQDVANNTSEEAGD DTTTVLFWHALLPRMASRR LVKVLIYWKWGESIVPALEVANAHHKTLVIIVEDIDGEALSTFVLNRLKNQLKDLAIATG GAVFGDEGLALNLEYVQPRDLGKVGEVIVTKDYAMLLKGKGDKSQIEKKIFKKSWSKRQS YRSLNAITAVEEGIVLGRGCALLQYVPALYSLTPASEDQNIGIEIKRIFKIAAMTIAKNA GIEGSFIVKKITQSSIEVGYDAMVRDFVNMVGKGIIDPTKFLKTVLLDAFGWP LC HSPD1 15P Hs GroEL pseudogene 15 5 p14.3 + 4 19269394 19270353 Not identified in pseudogene.org MEPKRRIVIIEESWGRPKVTKDGVIVAHSINLEDKYENIGAKPVQDVSNNRIEEVEDCTT TATVLACPIVKEREKLAKVLIQRKSGDRERNWQHISYTVNNAGRKGVITVNNGKTLNDEL EIMEGMKFDQGYISPYFINTSKVYFTVNNM VKPLVITAENVLNSIKNKLKDKTTAISGEV FEQEGFTLNLEVVEPHDLEEIGEVIATEDDAMLLKGNYGKAQIEKCIQDTQPVRCHN HSPD1 16P Hs GroEL pseudogene 16 11 q22.3 + 3 105082802 105083755 Not identified in pseudogene.org MNWELLKALSLIKVLFSHALYTSKGQRCEFQDTYILFNKKKICSIQFVVLALAIADAHRK PL VIITEDDYGEALSTFILNRLKVCLQIIAVRTPGFDDNRKNQLNDIAIATGVIEQLNTT IHKKESQNEHLGKLLDGIPVLKVHGTGNVKMNEKKELQNSSEVDYEAMFGDSPNVVEKKI IDLAKAVRLLYHVLL HSPD1 17P Hs GroEL pseudogene 17 1 p35.1 + 3 34077070 34078293 Not identified in pseudogene.org MEGLVEISKGVNPVETRKG VMLCADTVITKLKKQPKPVTTPEEISPFTQCLQMETNKSAI SFLMIKVGIWATVVKTPGFGDTERPNKAIAIMAWCLEESLTLKLEDVQFHDLGKVGAVIV TKDDAMFLEGKAIAKNSGVEGSPTDSRENYAKLPRSWLCTTLRDFLKMVEKIIINSTKVV KSALLDAASVDSQLTTAKLRPLEFLKKRRIWHG HSPD1 18P Hs GroEL pseudogene 18 20 q13.32 + 5 561056 84 56108736 Not identified in pseudogene.org SFTKQGQTNTQPLTPTPKTLRPPTVLHQTRPASIPWPSSALDLHQRLKIWCLSEPNASRC DSVAVAMGLNRRKIIIKQSGSRQTQSNKRCAELVQDTAFGTNEEAGMAAPGSVDKKSWQC VVQWAPGEGVMEATHAVPAELRKQPVTTEEGKSLKNLLRLQMETKRKLHSIISGARRKGT FSPPYVIDTSKSSKYEFQDACTHQSLEE RYIPDLGINREEPCSASRLLQGGACDIWDWLQ AEQRSQYISAEAERSTSHGLEPREEERISQPCPLLCICPVQASTASHLDQQQRP HSPD1 1P Human.chr5.mb135, Q96RI3 Hs ENSG00000162241 GroEL pseudogene 1 5 q31.1 1 135744902 135745039 Identified in pseudogene.org MVEKGISDPTKVVRPALLDAAGVASLLTAAQV AVTQIPKEETGLGMD HSPD1 3P Hs GroEL pseudogene 3 20 q13.12 4 43602029 43602280 Not identified in pseudogene.org MLPLPTVLCQMELMPR*LAPHLLRAYTKDIKLGTDAQALMLQGVTFNPML*L*Q QGKTENRVGEVPK**KKLFTVAKSNDLKDK HSPD1 19P Hs GroEL pseudogene 19 10 q11.23 + 1 50318868 50319008 Not identified in pseudogene.org MVGTTILDADGMASLLTTAEAVVTEIPKEEKNPGMGGMGGGMRGGMF HSPD1 2P Hs CCTeta pseudogene 2 vert 6 q25.1 + 1 150242815 150243240 Not identified in pseudogene.org LKEGTDSSQGIPQLVNNLSTCQVIAKAVRATLGACGMDKLIVDGGGKA TISND VATILKLLHVVHLAAKTLVDIAKSQDTEVGDGTISVTLLAAEFLKQVKLQVKDG LHPQLNIRAVCTATQLVVNTIKEIAVTVKKADKVE HSPD1 20P Hs GroEL pseudogene 20 12 q21.31 1 78924341 78924478 Not identified in pseudogene.org IIKPIMIVRTAPTDIAGVASLLSTAEFAITVISKENKKFAMDRMGV HSPD1 21P Hs GroEL pseudogene 21 5 q12.1 6 60994430 60994876 Not identified in pseudogene.org RPETLKVVPAAFEGGFFLRGGWAWFQRIPIWDS*TATNKDQKTDTEIVL *NIQFKMPAVTLTRADG*EGSLTAENIV*SPSEVSCEALRGDLWRTG ERMVGRGTGIIVTIMVGRAAASLPETEKKVSEDPKEKDPGTGRLG GGGMGGSEF CCT4 2P Hs CCTdelta pseudogene 2 7 q34 4 140344301 140345787 Not identified in pseudogene.org IEAGDSTASVGTIAGSLIGSWAKLLQKGIHPTITSKSSQKSLEKGIEILSNISQPVELND RETLLNSATSSLNSQVVFQYSSSASSDECKCSDESDPATGTTVNLRDINIQSCVILYPIV KQYMTAVKKLGGKIDDCELVEGLILTQKVVNSGITQFEKAK IGHIQFCLSAPNRFGNQVA VSDNNIEKEDVEFICKTIGTKPVARTDQVTLTCWVLLNAMEVIPTTLAENAGFESHFYSN RTKKSTCLTRKKLQALVSKGQYFHFGKLVAQLL CCT3 1P Hs CCTgamma pseudogene 1 vert 8 p22 + 2 16177578 16178178 Not identified in pseudogene.org MLKIKKIGDEDFAFISKCKDPKVRTILLQGNGKETL SKMDHNIQDAVQVYCNFLFDPHLV PGDGASEMAVIHVLTEKSKVMTGMNNGHTGLLPGLPSTPRRTMRPGEPLAVKLQTSRAAM ETVVLLLWIDVITTGHKRKGKEQGQQGKTTGASKE CCT6 5P Human.chr7.mb64 Hs ENSP00000275603 TCP1 zeta pseudogene 5 7 q11.21 + 10 64853564 64865440 Identified in pseudogene.org MAAA KTLNPKAKVAGAQAALAFNISGARGLQDVLRTNLGPKGTIKILVSGAGDIKLTKDT GVDDQPWPTCALLHEMQIQHPTASLIAKVATAQDDITGDGTTSNVLIIGELLKQADLYIS EAGVQWHLGSLQRLSPGFKQFPASASSVARITGMRHHARLILYLSQRRGFSTCEGKALHF LEEVKVSREMDKETLKDVARASLCTKVHAELADVLTEAVVGSILAIKRKDEPIDLFMILE VNSGVFYKSAEREKLI KAERKFIEDRVKTNNRTEKESLYPDCLRHAGLVYEYTLKCNNPR SVTLLIKGPNKPTLRSKMQLCGSGAGAVEVAMAEALNKYKLSVKGKAQLGVQAFADVLLV IPKQQKQALWDNDCVKKQLLHSCTGCHQHSLG

PAGE 5

MKKS McKusick Kaufman syndrome protein, BBS6 Hs ENST00000347364 MKKS (real gene) 20 p12.2 4 10,333,898 10,342,1 62 MSRLEAKKPSLCKSEPLTTERVRTTLSVLKRIVTSCYGPSGRLKQLHNGFGGYVCTTSQS SALLSHLLVTHPILKILTASIQNHVSSFSDCGLFTAILCCNLIENVQRLGLTPTTVIRLN KHLLSLCISYLKSETCGCRIPVDFSSTQILLCLVRSILTSKPACMLTRKETEHVSALILR AFLLTIPENAEGHIILGKSLIVPLKGQRVIDSTVLPGILIEMSEVQLMRLLPIKKSTALK VALFCTTL SGDTSDTGEGTVVVSYGVSLENAVLDQLLNLGRQLISDHVDLVLCQKVIHPS LKQFLNMHRIIAIDRIGVTLMEPLTKMTGTQPIGSLGSICPNSYGSVKDVCTAKFGSKHF FHLIPNEATICSLLLCNRNDTAWDELKLTCQTALHVLQLTLKEPWALLGGGCTETHLAAY IRHKTHNDPESILKDDECTQTELQLIAEAFCSALESVVGSLEHDGGEILTDMKYGHLWSV QADSPCVANWPDLLSQCGCG LYNSQEELNWSFLRSTRRPFVPQSCLPHEAVGSASNLTLD CLTAKLSGLQVAVETANLILDLSYVIEDKN BBS10 Bardet Biedl syndrome 10 Hs NP_078961 NM_024685.3 ENST00000313898 BBS10 (real gene) 12 q21.2 2 75,263,727 75,266,269 MLSSMAAAGSVKAALQVAEVLEAIVSCCVGPEGRQVLCTKPTGEVLLSRNGGRLLEA LHL EHPIARMIVDCVSSHLKKTGDGAKTFIIFLCHLLRGLHAITDREKDPLMCENIQTHGRHW KNCSRWKFISQALLTFQTQILDGIMDQYLSRHFLSIFSSAKERTLCRSSLELLLEAYFCG RVGRNNHKFISQLMCDYFFKCMTCKSGIGVFELVDDHFVELNVGVTGLPVSDSRIIAGLV LQKDFSVYRPADGDMRMVIVTETIQPLFSTSGSEFILNSEAQFQTSQFWIMEKTKAIMKH LHSQNVKL LISSVKQPDLVSYYAGVNGISVVECLSSEEVSLIRRIIGLSPFVPPQAFSQC EIPNTALVKFCKPLILRSKRYVHLGLISTCAFIPHSIVLCGPVHGLIEQHEDALHGALKM LRQLFKDLDLNYMTQTNDQNGTSSLFIYKNSGESYQAPDPGNGSIQRPYQDTVAENKDAL EKTQTYLKVHSNLVIPDVELETYIPYSTPTLTPTDTFQTVETLTCLSLERNRLTDYYEPL LKNNSTAYSTRGNRIEISYE NLQVTNITRKGSMLPVSCKLPNMGTSQSYLSSSMPAGCVL PVGGNFEILLHYYLLNYAKKCHQSEETMVSMIIANALLGIPKVLYKSKTGKYSFPHTYIR AVHALQTNQPLVSSQTGLESVMGKYQLLTSVLQCLTKILTIDMVITVKRHPQKVHNQDSE DEL BBS12 C4orf24, FLJ35630, FLJ41559 Hs NP_689831 NM_152618.2 ENSG00000181004 BBS12 (real g ene) 4 q27 + 1 123,882,498 123,884,627 MVMACRVVNKRRHMGLQQLSSFAETGRTFLGPLKSSKFIIDEECHESVLISSTVRLLESL DLTSAVGQLLNEAVQAQNNTYRTGISTLLFLVGAWSSAVEECLHLGVPISIIVSVMSEGL NFCSEEVVSLHVPVHNIFDCMDSTKTFSQLETFSVSLCPFLQVPSDTDLIEELHGLKDVA SQTLTISNLSGRPLKSYELFKPQTKVEADNNTS RTLKNSLLADTCCRQSILIHSRHFNRT DNTEGVSKPDGFQEHVTATHKTYRCNDLVELAVGLSHGDHSSMKLVEEAVQLQYQNACVQ QGNCTKPFMFDISRIFTCCLPGLPETSSCVCPGYITVVSVSNNPVIKELQNQPVRIVLIE GDLTENYRHLGFNKSANIKTVLDSMRLQEDSSEELWANHVLQVLIQFKVNLVLVQGNVSE RLIEKCINSKRLVIGSVNGSVMQAFAEAAGAVQVAYITQVNEDCV GDGVCVTFWRSSPLD VVDRNNRIAILLKTEGINLVTAVLTNPVTAQMQIKEDRFWTCAYRLYYALKEEKVFLGGG AVEFLCLSCLHILAEQSLKKENHACSGWLHNTSSWLASSLAIYRPTVLKFLANGWQKYLS TLLYNTANYSSEFEASTYIQHHLQNATDSGSPSSYILNEYSKLNSRIFNSDISNKLEQIP RVYDVVTPKIEAWRRALDLVLLVLQTDSEIITGHGHTQINSQELTGFLFL CCT5 2P Hs CCTepsilon pseudogene 2 vert 13 q31.1 1 78382866 78382967 VTSAEDYKALQKYEKEKF*EMIQQIKETGANLAI Mkks Mm EDL28392 CH466519.1 MKKS (real gene) 2 qF3 4 136700005 136706971 Mus musculus (house mouse), repredicted in fgeneplus. MSRLEAKKPSLCKTEPLT SEKVRSTLSVLKGVIASCYGPSGRLKQLHNGLGGCVYTTSQS SALLRNLSVTHPVLKILTSSVQNHVSCFSDCGLFTAILCCNLIENIQRLDLTPATAIKLN KYLLSLCTSYLKSEACSCRIPVDFRSTHTFLSLVHSILTSKPACMLTRKETDHIGALILK AFLLTIPESTEERMVLGKSIIVPLKGQRVTDSTVLPGLLIEASEVQLRRLLPTQKASGLR VALFCTSLSGDFSNAGEGVVVAHYQVSLEN AVLEQLLNLGRRLVTDHVDLVLCQKVIHPS LKQFFSERHVMAIDRVGVTLMESLSKVTGATPIGSLNPIVSTTYGSVKDVCSARFGSKHF FHLLPNEATVCTLLLCSRNDTAWEELKLTCQTAMHVLQLTIKEPWVLLGGGCTETHLAAY VRHKVHHEAEAIVRDDGCTQAKLHVAAEAFCSALESVAGSLEHDGGEILIDTKYGHLWSC QADSASVGNWSDTLSRCGCGLYNSQEELSWSVLRSTYHPFAP QTCLPQAALGSASNLTVD CFTAKLSGLQVAVETANLILDLSYVIEDKN Bbs10 Mm XP_125817 BBS10 (real gene) 10 qD1 + 4 110735779 110738219 MASQGSVTAALRVAEVLESIANRCVGPEGGQVLCTKPTGEVLLSRDGGCLLEALHLEHPLARMIVACVSS HLKKTGDGAKTFIIFLCHLLRGLHAIGEKGKDSFTSENIQSHERHWKNCCQWKSISQALQTF QTQTLGCI VDRSLSRHYLSVFSSSTEGRKLCRHSLELLLEAYFCGRVGRNNHRFISQLMCDYVFKCMACESGVEVFEL LDHCFAELNVGVTGLPVSDSRIIDGLVLPRDFSMYCPADGDIRMVIVTEILQPQFSSAGSEFVLNSETQF QASQCWITDRTKTVMNHLRGQNVKLLLTSVKQPDLVIYCARLNSISVVECLSAEEVSLVQRITGLSPCVL PEVASQCEISDSTLVKFCKPLILRSKRYVHLGLI STCAFIPHSMVLCGPVLGLVEQHERAFHGAFKMLRQ LFTDLDLNYIIQTKQQCNPSPLAYDNSRERNHSPETDKYQDIVAKSKNKLETQTHLEVYSGLGASDTELR AGKPWSAHKKTPIAPSQTDEMLKCLPPERSGIIDNCDLSIENHSTGNPTAEDTGTEISFEHLQVSDNAGK GYTLPVMRKSLDTCTCQGYCSSTVPAGCVLPVGGSFEILMSYYLLSYAKQCRQSDETVISMLIADALLGI PKILYK PKKGKDSFPHIYMRSLHALQASQPMVSGQSGFESVAGKYQLLTSVLQCLMKILTIDLIINIKRQ PQKTADQESEDEF Gm443 Mm NP_941023 XP_144240 NM_198621.2 CCTtheta_L1/L2 (real gene) 5 qA3 + 1 25022107 25023792 EST available MAVSPPSRATQTKVQSDLELPQRLKPGLEKTPESQGEEPSYILRATAAAQTLASIIRSCYGPFG RQKFLV TAKGETVCTGHAAAILKALDLEHPAAQFVQELAQTQVENAGDGTVFVVLLTEALLEQAHYLLWAGLTPTQ LREAFATATAEVLTALPSLAIRSLGPLEDPSWALYSVISTHTLSNSDYLTKLVAQACWVSREPNGSFKPE SIVVCVLQGGKLTDSRIFPGVAIAGKLCGQKTEVLGDARVALFNCPFGPTNPFTLATPRLSNPEELLRFR KQTEQVEKEIAQLAIMDINVAVVLGEVNEKSVDQAN YCGIMVIQAKSRKEIVYLSEKLGTPLLGRVLPPL EPGKCHKVYRKEFGDTAVVMFEWEHEIAPFLSVVLRGPTIQGLRVAEQAVYYGIDAFSQLCQDPRLLPGA GATEMALAKMLVDKGSRLSGPNGLAFQAFAQALSSLPKTLAENAGLAAQSVMAELSGFHQAGNFFVGVGT DGLVNVTHEGIWDILRTKAQGLQAVAELVQQLVTVDQIIVARKTPLYRQITDPTLNAKVSSPLRAKFFGK YV Bbs1 2 Mm NP_001008502 NM_001008502.2 BBS12 (real gene) 3 qB + 1 37217982 37220105 MEMACRVINRRRHVGLQQLLSFAQTGRSFLGPVKATKFITDAECHESVLISSTVRLLEGL DLTCAVGHLLNEAVQAQNNTYKIGTSTLLFLVGAWSRAVEDCLHLGIPTTVIVSVMSEGL NSCIEAVVSLQVPIHNVFDHMDNTSTVYKLETVNATLCPFLQDPSGSGLLQEK RDFKDAT SPLLSTYSLSGRHAESPKFFKPQNNLETEKNTLQVLKNNLYTDSFCKKSALAHSRHFNRT DNSHWISRHDGFLEQLESTPKVLRCNDFGELAVGLSHGDHSSMALAKAAVRLQWQSLCLQ QANWMAPFMFDISRLLTCCIPGLPETFSRVGLGYVTFVTMSSITLIKELQDQPFRVILIE GDLTESYRHLGFNKSVNIKTKLDSGELSEDSAEELWTNHVLQVLIQFNVTLILVQGSVSE HLTE KCMHSKRLVIGAVNGSVLQAFAEATRAVPVAYVTQVNEDCVGSGVSVTFWMSPHDI NRSNRIAILLTAEGINLITAVLTSPASAQMETKEDRFWSCVYRLYHALKEEKVFLGGGAV EFLCLSHLQILAEQSLNRGNHACLGWLPDSSSWMASSLSVYRPTVLKSLAGGWHEFLSAI MCNTATHPSAVEARTFIQQHVQNAIDSGSPSSYILSEYSKLSSGVFHSGISDNLELVPRV YDTVTPKIEAWRRALD VVLLVLQTDSEIITGLVHTEMNSQELDGVLFL Cct6a Mm NP_033968 CCTzeta (real gene) 5 F + 13 130293315 130321693 MAAVKTLNPKAEVARAQAALAVNISAARGLQDVLRTNLGPKGTMKMLVSGAGDIKLTKDGNVLLHEMQIQ HPTASLIAKVATAQDDITGDGTTSNVLIIGELLKQADLYISEGLHPRIITEGFEAAKEKALQFLEQVKVS KEMDRETL IDVARTSLRTKVHAELADVLTEAVVDSILAIRKKDEPIDLFMVEIMEMKHKSETDTSLIRGL VLDHGARHPDMKKRVENAYILTCNVSLEYEKTEVNSGFFYKSAEEREKLVKAERKFIEDRVKKIIELKKK VCGDSDKGFVVINQKGIDPFSLDALAKEGIVALRRAKRRNMERLTLACGGIALNSFDDLNPDCLGHAGLV YEYTLGEEKFTFIEKCNNPRSVTLLVKGPNKHTLTQIKDAIRDGLRAVKNA IDDGCVVPGAGAVEVALAE ALIKYKPSVKGRAQLGVQAFADALLIIPKVLAQNSGFDLQETLVKVQAEHSESGQLVGVDLSTGEPMVAA EMGVWDNYCVKKQLLHSCTVIATNILLVDEIMRAGMSSLKG

PAGE 6

Cct6b Mm Q61390 Q61390 CCTzeta (real gene) 11 B5 14 82532867 82577729 MAAIKIANPGAEVTRSQAALAVNICAARGLQDVLRPTLGPKGALKML VSGAGDIKLTKDGNVLLHEMQIQ HPTASIIAKVAAAQDHVTGDGTTSNVLIIGELLKQADLYISEGLHPRIITEGFDVAKTKALEVLDEIKVQ KEMKREILLDVARTSLQTKVHAELADILTEAVVDSVLAIRRPGVPIDLFMVEIVEMRHKSETDTQLIRGL VLDHGARHPRMRKQVRDAYILTCNVSLEYEKTEVSSGFFYKTVEEKEKLVKAERKFIEDRVQKIIDLKQK VCAESNKGFVVINQKGIDP VSLEMLAKHNIVALRRAKRRNLERLTLACGGLAVNSFEGLSEECLGHAGLV FEYALGEEKFTFIEDCVNPLSVTLLVKGPNKHTLIQIKDALRDGLRAVKNAIEDGCVVPGAGAVEVAIAE ALVNYKHRVQGRVRLGIQAFADALLIIPKVLAQNSGYDLQETLIKIQTKHAESKELLGIDLNTGEPMAAA EAGIWDNYCVKKHLLHSCTVIATNILLVDEIMRAGMSSLRD Cct1 c cpn, CCT, Cc t1, Ccta, p63, Tcp 1, Tp63, TRic Mm P11984 ENSMUST00000089024 CCTalpha (real gene) vert 17 A2 13,109,331 13,117,933 MEGPLSVFGDRSTGEAIRSQNVMAAASIANIVKSSLGPVGLDKMLVDDIGDVTITNDGAT ILKLLEVEHPAAKVLCELADLQDKEVGDGTTSVVIIAAELLKNADELVKQKIHPTSVISG YRLACKEAVRYINE NLIINADELGRDCLTNTAKTSMSSKIIGINGDYFANMVVDAVLAVK YTDARGQPRYPINSVNILKAHGRSQIESMLINGYALNCVVGSQGMPKRIVNAKIACLDFS LQKTKMKLGVQVVITDPEKLDQIRQRESDITKERIQKILATGANVILTTGGIDDMCLKYF VEAGAMAVRRVLKRDLKCVAKASGATILSTLANLEGEETFEVTMLGQAEEVVQERICDDE LILIKNTKARTSASIILRGANDFMCD EMERSLHDALCVVKRVLESKSVVPGGGAVEAALS IYLENYATSMGSREQLAIAEFARSLLVIPNTLAVNAAQDSTDLVAKLRAFHNEAQVNPER KNLKWIGLDLVHGKPRDNKQAGVFEPTIVKVKSLKFATEAAITILRIDDLIKLHPECKDD KHGSYENAVHSGALDD Cct8 Mm NP_033970 CCTtheta (real gene) vert 16 qC3.3 15 87484236 87496033 M ALHVPKAPGFAQMLKDGAKHFSGLEEAVYRNIQACKELAQTTRTAYGPNGMNKMVINRLEKLFVTNDAA TILRELEVQHPAAKMIVMASHMQEQEVGDGTNFVLVFAGALLELAEELLRIGLSVSEVISGYEIACKKAH EILPELVCCSAKNLRDVDEVSSLLRTSIMSKQYGSETFLAKLIAQACVSIFPDSGNFNVDNIRVCKILGS GIYSSSVLHGMVFKKETEGDVTSVKDAKIAVYSCPFDGMITETK GTVLIKTAEELMNFSKGEENLMDAQV KAIAGTGANVIVTGGKVADIALHYANKYNIMLVRLNSKWDLRRLCKTVGATALPKLTPPVQEEMGHCDSV YLSEVGDTQVVVFKHEKEDGAISTIVLRGSTDNLMDDIERAVDDGVNTFKVLTRDKRLVPGGGATEIELA KQITSYGETCPGLEQYAIKKFAEAFEAIPRALAENSGVKANEVISKLYSVHQEGNKNVGLDIEAEVPAVK DMLEASILDTYLGKYW AIKLATNAAVTVLRVDQIIMAKPAGGPKPPSGKKDWDDDQND Cct1P Mm CCTalpha pseudogene 4 qC1 3 67963157 67964324 Not identified at pseudogene.org or, at ENSEMBL MEDPLSLFGDCSSGEVICSHNVMAAASIVNIGKTSLGSVGLDKILVDDIGDITITNYDET ILMLIIIAELLKNSDELVKQKINPTSVIRGYCRACKEAVCYI DENLIINTDKLGRDCLTN AAKTSMSSKIIGINDDFFANMVVDTVLVAKYTDVRGQPLYPVNSVNVRKACGRSQIESML INAYALSCVVGYPGMPKLITLRNWTKLDRIRYHQGENSEDPETGVHVILTTSGIDDMFLK YFVESGAMSVRRVLKRDLTHMAKASRASILSLLANLEDEETFETTMLGQVKEVVQERICD DELILIKNTKAHTCASVISCVLKQSTLYMVIFVW Cct71P Mm CCTe ta pseudogene 1 vert X A1.3 + 5 12863162 12881543 Not identified at pseudogene.org or, at ENSEMBL FYDKDSSGWSSSKDKDAYSSFGSRGDSRGKSSFFGDRGSGSRGRRDNLWIRTQALVSNIS ACQVVAEAVRTTLGHCGMDKLIMDGRGKATISNDGATILKLLEIGDLSLKSSFSLKLPHP ILPPRQDRGVHLTTPYTKWGLPLDVCDTRAKWQGTV SLSARYLLNQLSGGWLYLVPHVWC HLTRPAGLVIRGSRGDLLLRTKWGGGRERRTPSGRNNTETSLISIKVSLAKTCTIILCSD TEQFMEETERSLHDAIMIVRRAIKNDSVVAGGGAIEMELSKYLWDYSRTIPGKQQLLNGA YAKALEIIPRQLCDNAGGMWYGVDINNEHIAGNFQAFVWEPAMVRINALTAASEAACLIV SMDKSIKNSHSTVDPSAPTAGCGRGQAHFH Cct7 Mm P80313 P80313.1 CCTeta (real gene) vert 6 D1 + 11 85409109 85418268 MMPTPVILLKEGTDSSQGIPQLVSNISACQVIAEAVRTTLGPRGMDKLIVDGRGKATISN DGATILKLLDVVHPAAKTLVDIAKSQDAEVGDGTTSVTLLAAEFLKQVKPYVEEGLHPQI IIRAFRTATQLAVNKIKEIAVTVKKQDKVEQRKMLEKCAMTALSSKLISQQKVFFAKMVV DAVMMLDELL QLKMIGIKKVQGGALEESQLVAGVAFKKTFSYAGFEMQPKKYKNPKIALL NVELELKAEKDNAEIRVHTVEDYQAIVDAEWNILYDKLEKIHQSGAKVILSKLPIGDVAT QYFADRDMFCAGRVPEEDLKRTMMACGGSIQTSVNALVPDVLGHCQVFEETQIGGERYNF FTGCPKAKTCTIILRGGAEQFMEETERSLHDAIMIVRRAIKNDSVVAGGGAIEMELSKYL RDYSRTIPGKQQLLIGAYAKAL EIIPRQLCDNAGFDATNILNKLRARHAQGGMWYGVDIN NENIADNFQAFVWEPAMVRINALTAASEAACLIVSVDETIKNPRSTVDPPAPSAGRGRGQ ARFH Cct12P Mm CCTalpha pseudogene 17 qA1 + 2 13129036 13136541 Not identified at pseudogene.org or, at ENSEMBL LSRIGLDLINGKPRDRHAGAFEPTIVKVKSLKRHRKVDL REFKVNLIYKTIL RLYGETLTQKQRKTKKTKNNDQRKLHPESKDDKHGSYENAVHSGALDD Cct72P Mm CCTeta pseudogene 2 18 E4 + 5 87401390 87403113 Not identified at pseudogene.org or, at ENSEMBL MSDIRACQVIAEKERITLGRHGMEKLTVDGPGIATNSNDGATVLKLLDVVHLATKTFVDI SKSQDAEVGDGNTSMTLLVVE FLKQLTVNRIREIAVTVKKQNKIGQKRMLEKCAMTTLSS KLISRQKAFFAKMAFDAVAMFDALLQPKYPGVNPSIASIALLNVELELKPEKDNAEIRVH TVEGYQAIDAKVILSKLLIGDVAIQYFADRNKFCAVGVPEEDLKRMMKACGGSIQIIVDA LIPQVLGCCLVFEEIQIEGERYNFFTTGPKDKTRTIIFCGGAEQFMEETKRVDINNEDIA GNFQVFVREPAMVHINALTTAFEAACLIVSMDE TI Cct3 Mm ENSMUSP00000001452 CCTgamma (real gene) vert 3 qF1 + 13 88103257 88125467 MMGHRPVLVLSQNTKRESGRKVQSGNINAAKTIADIIRTCLGPKSMMKMLLDPMGGIVMT NDGNAILREIQVQHPAAKSMIEISRTQDEEVGDGTTSVIILAGEMLSVAEHFLEQQMHPT VVISAYRMALDDMISTLKKISTPVDVNNREMMLSIINSSITTKV ISRWSSLACNIALDAV KTVQFEENGRKEIDIKKYARVEKIPGGIIEDSCVLRGVMINKDVTHPRMRRYIKNPRIVL LDSSLEYKKGESQTDIEITREEDFTRILQMEEEYIHQLCEDIIQLKPDVVITEKGISDLA QHYLMRANVTAIRRVRKTDNNRIARACGARIVSRPEELREDDVGTGAGLLEIKKIGDEYF TFITDCKDPKACTILLRGASKEILSEVERNLQDAMQVCRNVLLDPQLVPGGGASEM AVAH ALTEKSKAMTGVEQWPYRAVAQALEVIPRTLIQNCGASTIRLLTSLRAKHTQESCETWGV NGETGTLVDMKELGIWEPLAVKLQTYKTAVETAVLLLRIDDIVSGHKKKGDDQNRQTGAP DAGQE Cct31P Mm TCP1 pseudogene 11 qC 5 90721084 90722575 Not identified at pseudogene.org or, at ENSEMBL MLLDPMGGIVMTNDGN AILGEIQVQHPAVKSMIEISRTQDEEHFLEQQMHPTVVISAYRM TLGDMINTLKKISTPVDVNNYEMMLNIINSSITTNKKNDIKKYSRVEKIPGGITEDSCIL HGVIINKDVTHPVMSCYIKNPQIVLLNSSLEYKKVSQTDIEITRKEDFTRILQMKDECIQ QLCEDIIQHRILFCDDLVITEKGISDLAQHYLIWVNVTATHRVWKTDNNHFARACSKACT ILLRGASKEILSEVEHNLQDATQVCRNV QLDPQLVPGCGASEIAVAHDLTEKSKAMTGVE QWPYRAVAQALEAMPRTWIQNCGASTICLLTSLRAKHTQEKQGIWEPLAVELQTYKTAVE TEVLHLWIDDIICGHKKKGDDQNRQTRAPDADQE Cct4 Mm NP_033967 NM_009837.1 CCTdelta (real gene) vert 11 qA3.3 + 13 22890754 22902933 MPENVASRSGAPTAGPGSRGKSAYQDRDKPAQIRF SNISAAKAVADAIRTSLGPKGMDKM IQDGKGDVTITNDGATILKQMQVLHPAARMLVELSKAQDIEAGDGTTSVVIIAGSLLDSC TKLLQKGIHPTIISESFQKALEKGLEILTDMSRPVQLSDRETLLNSATTSLNSKVVSQYS SLLSPMSVNAVMKVIDPATATSVDLRDIKIVKKLGGTIDDCELVEGLVLTQKVANSGITR VEKAKIGLIQFCLSAPKTDMDNQIVVSDYAQMDRVLREERAYILNLV KQIKKTGCNVLLI QKSILRDALSDLALHFLNKMKIMVVKDVEREDIEFICKTIGTKPVAHIDQFTADMLGSAE LAEEVSLNGSGKLFKITGCTSPGKTVTIVVRGSNKLVIEEAERSIHDALCVIRCLVKKRA LIAGGGAPEIELALRLTEYSRTLSGMESYCVRAFADAMEVIPSTLAENAGLNPISTVTEL RNRHAQGEKTTGINVRKGGISNILEEMVVQPLLVSVSALTLATETVRSILKIDDVVNTR

PAGE 7

Cct2 Mm NP_031662 NM_007636.2 CCTbeta (real gene) 10 qD2 14 116490071 116500106 MASLSLAPVNIFKAGADEERAETARLSSFIGAIAIGDLVKSTLGPKGMDKILLSSGRDAA LMVTNDGATILKNIGVDNPAAKVLVDMSRVQDDEVGDGTTSVTVLAAELLREAESLIAKK IHPQTIISGWREATKAAREALLSSAVDHGSDEARFWQDLMNIAGTTL SSKLLTHHKDHFT KLAVEAVLRLKGSGNLEAIHVIKKLGGSLADSYLDEGFLLDKKIGVNQPKRIENAKILIA NTGMDTDKIKIFGSRVRVDSTAKVAEIEHAEKEKMKEKVERILKHGINCFINRQLIYNYP EQLFGAAGVMAIEHADFAGVERLALVTGGEIASTFDHPELVKLGSCKLIEEVMIGEDKLI HFSGVALGEACTIVLRGATQQILDEAERSLHDALCVLAQTVKDPRTVYGGGCSEMLMAH A VTQLANRTPGKEAVAMESFAKALRMLPTIIADNAGYDSADLVAQLRAAHSEGHITAGLDM KEGTIGDMAVLGITESFQVKRQVLLSAAEAAEVILRVDNIIKAAPRKRVPDHHPC Cct5 Mm NP_031663 NM_007637.2 ENSMUST00000022842 CCTepsilon (real gene) vert 15 qB3.2 11 31520686 31531460 MASVGTLAFDEYGRPFLIIKDQDRK SRLMGLEALKSHIMAAKAVANTMRTSLGPNGLDKM MVDKDGDVTITNDGATILSMMDVDHQIAKLMVELSKSQDDEIGDGTTGVVVLAGALLEEA EQLLDRGIHPIRIADGYEQAARIAIQHLDKISDKVLVDINNPEPLIQTAKTTLGSKVINS CHRQMAEIAVNAVLTVADMERRDVDFELIKVEGKVGGRLEDTKLIKGVIVDKDFSHPQMP KKVVDAKIAILTCPFEPPKPKTKHKLDVMSVEDYKAL QKYEKEKFEEMIKQIKETGANLA ICQWGFDDEANHLLLQNGLPAVRWVGGPEIELIAIATGGRIVPRFSELTSEKLGFAGVVQ EISFGTTKDKMLVIEKCKNSRAVTIFIRGGNKMIIEEAKRSLHDALCVIRNLIRDNRVVY GGGAAEISCALAVSQEADKCPTLEQYAMRAFADALEVIPMALSENSGMNPIQTMTEVRAR QVKESNPALGIDCLHKGSNDMQYQHVIETLIGKKQQISLATQMVRMILK IDDIRKPGESE E Cct32P Mm CCTgamma pseudogene vert 4 B3 + 3 52913471 52915552 Not identified at pseudogene.org or, at ENSEMBL MMGHHLVLVLSLNTKRESGRKVQSGNINAAKTIADIIRTCLGPKSMMKMLWDPMGDIQMH PTVVINAYRMALDDMISTLEKISTPVDVNNHEMMLNIINSSITTKVISRWSSLACTIALD VVKTV QFEKKNRKKIDTKKYTRYPPELVSLAAYVAEDGLVSHRWEERPRGIANFICLSRG ERQVQVGVEEYIQQLCEDIIQLKPDVVITEKGISDLAQHYLMRANVTAICRVWKTDNNHI ARACRARIVDQSKELREDDVGTRAGLLEIKKIGDEYFTFITYYKDPKACTSLLRGASKEI LSEVERNLQDVMQVCHNGLLDPQLVPGGGASEMAVADALREKSKAMTGVEQWPYRAVAQA VEVIPQTLIQNCGTSTI HLLSSLRAEHTQESCKTWGAKDETGTLMDMKELGIWETLAVKL QTYKTAVETEVLHLWIDDIASGHKKKGDDQNWQTSAPDAGQE Cct41P Mm CCTdelta pseudogene 1 7 qA3 3 26227193 26228371 Not identified at pseudogene.org RGVCDVASMWRSESNLRESVLFCQDSCTKLVQKGRHPTIISESFQKASEKDLEILPDMSQ PVPLSDRDT LLNSASTSLKSKVVSQYSSLLSPKVANSGTTGVEKAKIGHVPFCSSAPKTD MDNQIVVSDYAQMDQVLREERAYILNLVKHVKKTGCDVLLIQRSILRDALSDLALHFLIT GEEAIRSTPTGLNPISTVTELRNRHAQGENARCSNVQKAVLINILEEMVIQPLLVSVSAL TLATEIAEHPEN Cct33P Mm CCTgamma pseudogene vert 6 qE3 1 113303406 113303687 Not identified at pseudogene.org MVLVQGQAYPIEIRDECTTFIPEHRDPEACTTLLRGAIEERLLEEECNLAGCHASVSRIL LDPQLVLVVEPQRWLCPCLDRKIQGHDWYGTMAI Cct6A1P Mm TCP1 zeta pseudogene 1 8 qB1.3 + 3 52361244 52362111 Not identified at pseudogene.org MECADVLIEAVVNSLEAFKVSD KPIVIFIFGIMHVKHKYKTYMRLITRFIFVSLECEKTE VGSEFYYKREADKEKLLHGVETIEVTIKEFMIKYKSNLKARVQLRVLAFADVLHIIHKVL AQGSGFDILNMLRSKLNTQNQISLGKKGLREMTVKQDKTRHKKSR Cct6A2P Mm TCP1 zeta pseudogene 2 14 qE2.1 2 97306133 97306902 Not identified at pseudogene.org LHVFVL DHAAWYPDTEEGVEHSYICTCNVSLEYEQTDASPGVQSFVHAWLSNLKVRAQKF SFNLQTLKFKVNIHNQINL Cct6A3P Mm TCP1 zeta pseudogene 3 18 qE3 + 1 79544122 79545468 Not identified at pseudogene.org GQLVCVDLNLGEPRIAAEICIWSNSSLKKQLLHPCTVIASNTLKRAGILSL Hspd1 Mm P63038 P63038.1 1 qC1.2 11 55135135 55143783 MLRLPTVLRQMRPVSRALAPHLTRAYAKDVKFGADARALMLQGVDLLADAVAVTMGPKGRTVIIEQSWGS PKVTKDGVTVAKSIDLKDKYKNIGAKLVQDVANNTNEEAGDGTTTATVLARSIAKEGFEKISKGANPVEI RRGVMLAVDAVIAELKKQSKPVTTPEEIAQVATISANGDKDIGNIISDAMKKVGRKGVITVKDGKTLNDE LEIIEGMKF DRGYISPYFINTSKGQKCEFQDAYVLLSEKKISSVQSIVPALEIANAHRKPLVIIAEDVDG EALSTLVLNRLKVGLQVVAVKAPGFGDNRKNQLKDMAIATGGAVFGEEGLNLNLEDVQAHDLGKVGEVIV TKDDAMLLKGKGDKAHIEKRIQEITEQLDITTSEYEKEKLNERLAKLSDGVAVLKVGGTSDVEVNEKKDR VTDALNATRAAVEEGIVLGGGCALLRCIPALDSLKPANEDQKIGIEIIKRAL KIPAMTIAKNAGVEGSLI VEKILQSSSEVGYDAMLGDFVNMVEKGIIDPTKVVRTALLDAAGVASLLTTAEAVVTEIPKEEKDPGMGA MGGMGGGMGGGMF Hspd1P Mm ENSMUSG00000058809 GROEL pseudogene 1 11 qA5 + 1 41312239 41313954 Identified as pseudogene at ENSEMBL, not at pseudogene.org MLRLPTVLRQMR PVSRALAPHLTRAYAKDVKFGADARALMLQAVDLLADAVAVTMGPKGR TVIIEQSWGSPKVTKDGVTVAKSIDLKDKYKNIGAKLVQDVANNTNEEAGDGTTTSTVLA RSIAKEGFEKISKGANPVEIRRGVMLAVDAVIAELKKQSKPVTTPEEIAQVATISANGDK DIGNIISDAMKKVGRKGVITVKDGKTLNDELEIIEGMKFDRGYISPYFINTSKGQKCEFQ DAYVLLSEKKISSVQSIVPTLEIA NAHRKPLVIIAEDVDGEALSTMVLNRLKVGLQVVAV KAPGFGDNRKNQLKDMAIATGGAVFGEEGLNLNLEDVQAHDLGKVGEVIVTKDDAMLLKG KGDKAHIEKRIQEITEQLDITTSEYEKEKLNERLAKLSDGVAVLKVGGTSDVEVNEKKDR VTDALNATRAAVEEGIVLGGGCALLRCIPALDSLKPANEDQKIGIEIIKRALKIPAMTIA KNAGVEGSLIVEKILQSSSEVGYDARLGDFVNMVEK GIIDPTKVVRTALLDAAGVASLLT TAEAVVTEIPKEEKDPGMGAMGGMGG Hspd2P Mm GROEL pseudogene 2 4 qC7 1 105808716 105810383 Not identified at pseudogene.org MRPVSPELALHLTQAYAKDGKFDVDAGALMLQGVDLLADAVAVTMGPKGRAVIIEQNWGS PKVKNGVTVTKSIDLKDKYKNIRAKLVQDVANNTNEEAGDGTTAT VLACSIANEGFEKIS KGANSVEIQREIAQVATISANGGKEIGDINTDAKTKVGRKGVITGKDGKSLNDELEIIEG MKLDREYISLYFINTSKGQKCEFQGAYVLLSEKKISSDWFTVPALEIANAHQKPLVIITE DVDGEALSTLVLNRPKVGPQVVAVKAPGFGDNRKSQLKDMAIATGGAVFEEEGLKLNLKD AQAHDLRKVVEVIATKDDAMLLKGKGDKAHIEKRIQEITEQLDITTSEYEKEKLKER LAK LSDGVAVLKVGGTSDVEVNEEKDRVTDVLNATRAAVEEGIVLGGGCALRWCIPALDSLKP ANEDQKIGIEVIKRALKIPAMTIVKNADAEGSLIVEQILQSSSEVGYEAMLGDFVNTVEK GIIDLRKV Hspd3P Mm GROEL pseudogene 3 14 qE2.3 + 5 101742761 101744429 Not identified at pseudogene.org MRPVSWALTPLLTQAYAKDV KFGADAQVLMLQGVDLLASTLAVKMGPTGKTVTIKQSWGH PKEKKRWEIAQVASISANGDKDIGNIISDAMKKAGRKGAITVVAIKAPGFGDNRKNQLKD MAIATGDLGKVVEVIVTKDDAMLLKGKADKAHTEKRTQKITEHLDITTGDYEKEKLDERL AKVSDGVAVLKVGGTSDVEVTEKKELQTVSLKATRAAVEEGIVLGRGCALLRCIPALDSL NPANEDQKIGVGGSLIVEKILQSSSEVGCGTM LGDFVNMVENGIIDPTKV Hspd4P Mm GROEL pseudogene 4 8 qB3.1 3 64965611 64967286 Not identified at pseudogene.org MGPKRRTLIIEQSQGSPKVTKGEVTVTKPIDLKDKCKTIRSKLVQGVANNTNEEAGDGTT TVSVLARSIANDFEKISKGKNPLEIRRGVMLAVSAVIAQLKKRSKPVTTPEEITQIATIS ANRGKDFGNIISDAMRK EALVIIAEDGDRETLRTLVLNRLTVGLQVVAVKAPGFGDNRKN QLKNMAIASGGVVFGEKGLNMNLEDVQAHDLGKVLEVIVTKNGAMLLKGKANNDKKIGIV IIKGALKIPAMMIVKNADVEGPLIVEKILQSSSAVSYDTMLGDFVNMVEKGIIDPTKVVR TALLDAAMVFSLLTT Hspd5P Mm GROEL pseudogene 5 8 qC2 + 6 85271516 85279593 Not identifi ed at pseudogene.org MRLMSLALAPLLTQAYAKDVKFGADAQALMLQGVDLLANVVAVTMGPKGKTVIIKQSWEI PKAKLVQDVANNTNEEAGYGTTTVSVLARSIAKEGFEKISKGANPLEIRRGGINCSGCYN FSNGNKDIGNIISDAMKKVKHVNSKMPVLLSEKKISSVQSIVPAFEIANAHQKPLVIITE DVDGEVSLPSPTSRKEAKKPMSFSMQRSSKCTGLEQAKRKVEEVIVTKDDA MLLKGKGDK AHIEKRIQKITEQLYITTSEDEKEKLNERLAKLSDG

PAGE 8

Hspd6P Mm GROEL pseudogene 6 1 qC1.1 5 49455871 49457564 Not identified at pseudogene.org MRPVSPALAPHLTWAYAKDIKFVNFIKDKYKNIGHKLVQVVANNTNEEAGDGITTTTVLA WSIAKEDFEKSKSVNSKMPMFLLSEMKFSSVQSIVSALKIVNAHQMPLVI IAEDIDRVAL RSLILNKLKVCLQGLNQNLGDVHAHVIGKLGEVLFTKEDAMLLKGKGDKVHIEKHIQEIT EQLDITTALDSLKPTNEDQKIGIEIIKRALKIPAMMIAKDAGVKGSLTVDKILQNSSEVG YDAVLEDFVNMVEKGIIDLTKVIRTTLLDAAGVAFLLTTA Hspd7P Mm GROEL pseudogene 6 3 qE3 3 77889757 77891210 Not identified at ps eudogene.org MGSRGRTVIIEESWGSTKVTKDGITEAKSIALQDEYKNTGAKLVQDVANNTKEGAGDGTT YATVLKQSKPMAIPEEFAQVATISANGVKDTGNIISDAMKKVGRKWFILGKDGKTLNYEL ENIEGKQFDRGYISPYFVKEKEKLNKQVAKLSDGVTVLKVGGTSDVEVNEKKDRVTDILS ATGAAVEEGIVLGGGCALLRCVPALDSFKPSNEDQKIGIEIIKRALKIPPMAIAKNADD K GSVIVEQILQSSSEVGYDIKLGDFFTWCKRESLIQQKFKNCLNGCCWGGLLAHYI Hspd8P Mm GROEL pseudogene 7 4 qC4 5 89108911 89111283 Not identified at pseudogene.org LALPLTGSYAKDVKFGVDVRALTLQNVDLLADVLAVTMGPKGRTVIIGQSWGSLKVTKDG ACSTAKKAFERISKKANPVETQRGVMELVRFLPFLQMETSI GNIISDAMTKAGRKGVVTG KDGKKSLDDELEVTEMKLDRGYIYPYFINISKGYIVPALKTANAHWKPLVIIVEDDDGEA LSTMVLNMVKVGLQVVAVRAPGFGDNRKNQLKICGYCYWCVTRADVEESIVLREGCVLLQ CMPVLDSLKPANGNQKIGIEITKGALNIPAMTMQR Hspd9P Mm GROEL pseudogene 8 12 qA1.1 1 11672585 11673644 Not identifie d at pseudogene.org SNSLTILKPLTTPEEIAQVAIISANGNKDVGNIISDVMKKGGRKGVITVKDGKSLNDKL EIIKGMKFNRGYISPYFINTSKGQKCEFQNAYVLLSEKKFSSVQSIVPEIANAPRTPLVI IAEDGDGEVVSTLVLNRLKVGLQVIAVKAPRFRDNRKNQLKFMAVTTGGVVSGEEGLHLN LEDVQAHDIGKVGEVIVTKDDAMLLKGKGDKAHIEKCIQEIIEQVDITTGEYE KEKMNKQ LPKLSDRVAVLK Hspd10P Mm GROEL pseudogene 9 X qC3 + 2 93743990 93745672 Not identified at pseudogene.org DFANNTNKEAGDGTIIISVLAVSIAKDVSKVANPVEIQRCMMLSVDVVIAELKIQSKSVT TPEEIDHIATISENGNKDIGNIISDATKKIKRLLTVKAPGFGDNKKNQLPDMATGGAVFK EEALNLNLEDVKTHDL GKVGEVIVTKDDATLLKGKVDKVQIEKNGCKKSLGSYTSQLVII KRES Hspd11P Mm GROEL pseudogene 10 6 qA1 + 3 12515616 12517225 Not identified at pseudogene.org DVANNINEEAGNGITIAIVLAQSIAKNGSEKISKGANPVKIWRGVILAVDAVIVVLKKQF EPVATPEEISQVATSPANGDQVIENIILDAMNKVGRKGVITGKDIFP HILLTYQKSLVII AKDVDREALYTPVSNRLKVGIQVVAIKASGLVDNRKNQLTDMLLPLVPDNEDQKIGIEII KRILKIPAMTIAKNAGVEGPFIVEKILQSSSEVGYDAMLGDFVNVVENGIIDSTKV Hspd12P Mm GROEL pseudogene 11 1 qC1.1 + 2 45234912 45235851 Not identified at pseudogene.org MTTLEEIVQVAVISANGDKNIGNI ISDAMKKVGRKGVITVKDGKILHDELEIIEGMKFEI GCISTYFINTSKEALGDDPEDVDGETLSTLVLNRLIVGLQILAVKAPGFGDNKKNQLINM AIATGGVMFGEED Hspd13P Mm GROEL pseudogene 12 15 qA2 2 24133639 24135093 Not identified at pseudogene.org MKHDRGYISSHFIDTYIGQYCEFQDAYALMSEKKIPSGVALLKVK VNEKKDRVTGAFNAV RAVVEEGIILVRTRALLQCIPALDSLKPAIEDQKICIEIKEHSKFLHSSKVGCDPMLGDF VNMVEKGIINSTKIVRLL Hspd14P Mm GROEL pseudogene 13 2 qC3 + 1 79001216 79001752 Not identified at pseudogene.org MRPVSRALALHLTWVYTKDVKFGVDARALMLQCVDLLADAVAVTMGPKGRTVIAEQSWQ V PKMLPIAQKKEAEDGITTATVLAHSMAKEGFKGFNPVEIQK Hspd15P Mm GROEL pseudogene 14 9 qA1 + 1 12819585 12820103 Not identified at pseudogene.org GVAVLKVGGRSDVEVTEKKDRVTDAFNAIRAAVEEGIVLGRGCSLLWCISALDSLKPAN EVQKIGIEIIKRALKIPAMMIAKNAGIEGSLIEKIPQSSSKVSYGAMLGDFVHM VEKKII DPTKA Hspd16P Mm GROEL pseudogene 15 3 qE1 1 61872231 61873436 Not identified at pseudogene.org MGFGEEGWNLNPEDVQTHVLGKVGEVIVTKDDDKLLKEKCDKIQFEKCIQEMTKQLEITI SEYEKEKLNEQLAKSLSGVVVLKVQGT Hspd17P Mm GROEL pseudogene 16 1 qA1 + 2 5053347 50 54702 Not identified at pseudogene.org DKLVQGVAINTNEEACDGTTIATVLSWSIANQGFEKISKVSKTVKIPEENIQVAMISANG DKEFGNIVCDAMKMAGRMIIMTVMDGKTLNDELEIIEGMKSGIFPNISLIRGRYRTYSKR LKIPAMTSAKNAVVEGPLVVEKILQSSVEFGYDAILGDFVNMVGRGIIEPTEV Hspd18P Mm GROEL pseudogene 17 8 qB 2 1 58011087 58011631 Not identified at pseudogene.org MDSLKPANGDLKISIKIIKRALEVTAMTIAENVGVGGSLIIEEILQISSEVGYNAKLGDF MNMIENGIIDQTKVLRSALLDASGVASLLTTGEAVVIEIPKKEMVPGMGAMAR Hspd19P Mm GROEL pseudogene 18 16 qC1.3 + 1 61460831 61461298 Not identified a t pseudogene.org MIPKKTGVEGSWIDKKILQISSEVGYDTLLGDFVNMVEKGIIDTTKVVRPALLDAARVAS LLTTVNAVVTEAPKEEKDTGTSAMGGMGV Hspd20P Mm GROEL pseudogene 19 2 qE1 + 2 98441180 98441466 Not identified at pseudogene.org DQERGIEIIKRTLKIPAMMIAKNAEVGYDAILGDFVNIVEKGIIDPTKV Hspd21P Mm GROEL pseudogene 20 4 qC5 + 2 91375464 91376660 Not identified at pseudogene.org MPLKGKGKGDKAQIERCTQECIPAFTLLNPANENQKIGIEITKRLLKISAMTAAMSI Cct81P Mm CCTeta pseudogene 1 2 qC1.1 + 1 51795721 51795882 Not identified at pseudogene.org MTHQTHYERSGSASPCCTNDIMASLTQQQEVGVMSAHRVNNTVLVTLTAHPSISISHDSS NFNNEHIRVLLDDIKGEENLIDAQSNTTVGSGEDVIITGDKVANMDLHYASKYNTINRTF LQGMEGTEFASIKIEMRGWLHETTHVKEGAKHITPYGEPRPGLEQYTRKKLIVAFEAIIW ALSEISRVKANDW Cct73P Mm CCTeta pseudogene 3 1 qA4 + 4 20811426 20812 537 Not identified at pseudogene.org TTLGPHGNLLWMAKAKQQFLLLKLLDGGHPAAKTLVDIAGSEDTDTSDGTTVVILLAEEF LKQSGFEMQPKNYENPTIAFSNVELKLKAEKDNAEIRVHTHQAILDAKRNILYDKLVIYF ADRNMFCAGHWPEEDLKRTMMACGGSTQTNGNALIPDVLGHSQVFKETQIRGKRCNFFTG FPKAKTCTTILC Cct74P Mm CCTeta pseudogene 4 11 qB1.2 1 48381857 48382517 Not identified at pseudogene.org MKPTPVILIKEGTDSSQGIPQLVSNKSACQVIVEAVRTTLGHHGMDKLIVDGRGKATISN DGATILKLPNFLKHVKPYVEEGLHPQIIIHTATQLAVNKIIAIAVTVKKQDKVEQRKMLE KCAVTALSSKMISQQKAFFTKMEPAMVHINTLTAASEAACLIVSMDETIKNPRST Cct75P Mm CCTeta pseudogene 5 13 qB1 + 1 58647550 58647741 Not identified at pseudogene.org NTFSSAKFETRPKESKNAKIALLNVELELKAEKDNAEIRVHT Cct82P Mm CCTeta pseudogene 2 3 qB 3 35774568 35782271 Not identified at pseudogene.org PGVQLLNVTLFVTHDEACS LIWRHILYLMEGNGKYIGQQIISYGDTYPGLPQYAFRKFTE LFEAIPWALALDQGIFAVKDMLEAGAVSLHLGKYKAAHGPYCCKPCNDQSGHHPKPAGGS KPPCGKKKTGMMTKKI Hspd22P Mm GROEL pseudogene 21 17 qC 1 48192945 48193097 Not identified at pseudogene.org KFGADARALMLQGVDLLADAVAVTMGPTGRTVIIE QRWGSPK Mkks Rn NP_001008354 NM_001008353.1 MKKS (real gene) 3 3q36 4 124975607 124981825 MSRLEAKKPSLCKTEPLTSERVRSTLSVLKGIIASCYGPSGRLKQLHNGLGGCVCTTSQSSALLRNLSVT HPILKVLTSSVQNHVCCFSDCGLFTAILCCNLIENIQRIGLTPTTAIKLNKYLLGLSISYLKSEACSCRI PVDFRSTHTFLNLVHSIL TSKPACMLTRKEIDHIGALILKAFLLTIPESAEERMVLGKSIIVPLKGQQVT DSTVLPGLLIEASEVQLRRLLPTQKSSTLRVALFCASLSGDFSNAGEGTLVVHYQVSLENAVLEQLLNLG RQLVSDHVDLVLCQKVIHPSLKQFLSEHQIIAIDRVGVTLMEPLSKVTGATPIGSLYPIVSTTYGSVKDV RSARFGSKYFFHLLPNEATICSLLLCSRNDTAWEELKLTCQTAMHVLQLTIKEPWVLLGGG CVETHLAAY IRHKVHNEAEAIVRDDGCSQAELHIATEAFCSALESAAVSLEHDGGEILIDMKYGHFWSCPADSASVGNW PDTLSRCGCGLYNSQEELSWSVLRSTYHPFAPQTCLPQAASGSVSNLTVDSFTAKLSGLQVAVETANLIL DLSYVIEDKN

PAGE 9

60748 RGD1560748_predicted Rn BBS10 (real gene) 7 q21 + 2 50317911 50320406 MASQGSVTAALR VAEVLETIANRCVGPEGGLVLCTKPTGEVLLSRDGGCLLEALHLEHPLARMIVACVSS HLKKTGDGAKTFIIFLCHLLRGLHAIGEKEKDSFTSEDIQSHERHWKNCCQWKSISRALLRFQTQTLGCI VDQHLSRHYLSAFSSSAEGRTLCRRSLELLLEPYFCGRVGRNNHRFISQLMCDYVFKCMACESGIEVFEF LDNCFVELNVGVTGLPVSESRIVDGLVLPRDFSVYCPADGGIRMVIVTEILQPLF SSSSSEFVLDSETQF QASQSWIMDRTKTVMNHLRSHNVKLLLSSVKQPDLVTYCARLNSISVVECLSSEEVSLVQRITGLSPCVL PELASQCHIADSALVRFCKPLIVRSGRYAHLGLVSTCAFVPHCVVLCGPVLGLVQQHESAFHGAFKMLRQ LFTDLDLNCMIQTKERRNPSPLADSSNRESSHSPKTGKYQDVVAKNKDKLETQTRLEVYPRLGASDTELI TCKLWSAHKETSIDPSQTNEIPKRLSP EKSRIVDNCEVFLENNPTGNPTAEDTRTEMSFKHLQVADNPGE GYTLPVTYKSPDTCPSQAHCSSAVPAGCVLPVGGHFEILMHFYLLNYAKQCRQSDDAVISMLIANALLGV PKILYKPKKGKDSFPHIYTRALRALQTRQPIVSGQSGFESVAGKYQLLTSVLQCLMKILTIDLIINVKRQ PQKTGDQESEDEL 125233 MGC125233 Rn NP_001032883 XP_575319 CCTtheta_L 1/L2 (real gene) 4 q11 1 5004090 5005772 MAVSSSQVTQTKVQSDLELPQRLKLGLEKNPESQGEEPLCILRATAAAQTLASIIRSCYGPYGLQKFLVS AQGETVCTGHAAAILKALELEHPAARFVQELAQTQAENTGDGTAFVVLLTEALLEQAQYLLWAGLTPAQL REAFVTATAEVLTALPSLAICSLGPLEDPSWALYSVMSTHTLSNAEYLTKLVAQACWISREPNGSFKPES IVVCILQGGILTDSRIIPGIAICGKLCGRKTEVLNDARVALFNCPFGPSNPFAPATLRLSSPEELIRFRK QTEQVEMEIAELAMMGINVAVVLGEVNERSVDQADYCGVMVIQVKSRKEIVYLSDKLGVPLLNRILPPLE PGKCHKVYRMEFGESALIMFEWEREIAPFLSVVLRGPTIQGLRGAEQAVYYGIDAFSQLCQDPRLLPGAG ATEMALARMLVDKGSRLDGPNGLAFQAFAQALSSLPKTLAEN AGLAAQSVLAEMSGYHQAGNFVIGVGTD GLVNVAQEGIWDILRTKAQGLQAVTGLVQQLVTVDQIIVARKTPRYRLIPQSAQNANTSSPLRAKFFGKY E 61608 RGD1561608_predicted Rn XP_345203 BBS12 2 q25 + 1 123866726 123868843 MEMDYRVLNRRRHVGLQQLSSFAQTGRSFLGPVKATKFITDAECHESVLVGSTVRLLEGL DLTCAVGHLLNEA VQAQNVTYKTGASTLLFLVGAWSRAVEDCLHLGVPTTVIVSVMSEGL NSCIEAVVSLQVPIHNVFDHIDNTSTVYKLETVDVSLCPFLQVPSGSGLLEEKHDFKDAT SQLLSTYSLSGRRAKSPEFFKPQAKVETENTSQALKNNLYTDSFCRKSALTHSRHFNRTE NSHWISRPDGFLEHLRSTPKVLRCNDLGELAVGLSHGDHSSMTLAKAAVRLQWQAVCLQP ANCMAPFMFDISRLLTCCLPGLPET FSCVCLGYVTSVTMPSITLIKELQDQPFRVILIEG DLTESYRHLGFNKSVNIRTKSDSGQLSEDSTEELWTNRVLEVLIQFNVNLILVQGSVSEH LTEKCMHSKRLVIGSVNGRVLQAFAEATRAVPVAYVTQVNEDCVGNGVSVTFWTGPHDIN RSNREILLTAEGINLITAVLTSPASAQMEMKEDRFWSCVNRLCHALKEEKVFLGGGAVEF LCLSHLQILAEQSLNKGNHACLGWLPDSSSWMASSLS VYRPTVLKCLAGGWHEFLSAIMC NTATYPSAVEASTFIQHHVQSAADSGSPSSYILSEYSELSSGLFHSDISNNLELVPRVYD TVTPKIEAWRRALDLVLLVLQTDSEIITGLVHTQMNSQELDGVLFL Cct6a Rn NP_001028856 XP_213765 NM_001033684.1 CCTzeta (real gene) 12 12q13 + 14 27992366 28002203 MAAVKTLNPKAEVARAQAALA VNISAARGLQDVLRTNLGPKGTMKMLVSGAGDIKLTKDGNVLLHEMQIQ HPTASLIAKVATAQDDITGDGTTSNVLIIGELLKQADLYISEGLHPRIITEGFEAAKEKALQFLEQVKVS KEMDRETLIDVARTSLRTKVHAELADVLTEAVVDSILAIRKKDEPIDLFMVEIMEMKHKSETDTSLIRGL VLDHGARHPDMKKRVENAYILTCNVSLEYEKTEVNSGFFYKSAEEREKLVKAERKFIEDRVKKI VELKKK VCGDSDKGFVVINQKGIDPFSLDALAKEGIVALRRAKRRNMERLTLACGGIALNSFDDLNPDCLGHAGLV YEYTLGEEKFTFIEKCNNPRSVTLLVKGPNKHTLTQIKDAIRDGLRAVKNAIDDGCVVPGAGAVEVALAE ALIKYKPSVKGRAQLGVQAFADALLIIPKVLAQNSGFDLQETLVKVQAEHSESGQLVGVDLNTGEPMVAA EMGVWDNYCVKKQLLHSCTVIATNILLVDEIMRAGM SSLKG L63658 LOC363658 Rn NP_001014250 XP_343949 NM_001014228.1 CCTzeta (real gene) 10 q26 11 72107176 72144991 MAAIKIANPGAEVTRSQAALAVNICAARGLQDVLRPSLGPKGALKMLVSGAGDIKLTKDGNVLLHEMQIQ HPTASIIAKVAAAQDHITGDGTTSNVLIIGELLKQADLYISEGLHPRIIAEGFDVAKTKALEVLDKIK VQ KEMKREMLLDVARTSLRTKVHTELADILTEAVVDSVLAIRRPGIPIDLFMVEIVEMRHKSETDTQLVRGL VLDHGARHPRMKKQVQDAYILICNVSLEYEKTEVSSGFFYKTVEEKEKLVKAERKFIEDRVQKIIDLKQK VCAESNKGFVVINQKGIDPVSLEMLAKHNIVALRRAKRRNLERLTLACGGLAVNSLEDLSEECLGHAGLV FEYTLGEEKFTFIEDCVNPLSVTLLVKGPNKHTLIQIKDA LRDGLRAVKNAIEDGCVVPGAGAVEVAIAE ALVNYKHCVQGRARLGIQAFADALLIIPKVLAQNSGYDLQETLIKIQTKHAESKELVGIDLNTGEPMVAA EAGIWDNYCVKKHILHSCTVIATNVLLVDEIMRAGMSSLRE Tcp1 Rn NP_036802 CCTalpha vert 1 q11 12 42103629 42111145 MEGPLSVFGDRSTGEAIRSQNVMAAASIANIVKSSLGPVGLDKMLV DDIGDVTITNDGAT ILKLLEVEHPAAKVLCELADLQDKEVGDGTTSVVIIAAELLKNADELVKQKIHPTSVISG YRLACKEAVRYINENLIINTDELGRDCLINAAKTSMSSKIIGINGDFFANMVVDAVLAVK YTDIRGQPRYPVNSVNILKAHGRSQIESMLINGYALNCVVGSQGMLKRIVNAKIACLDFS LQKTKMKLGVQVVITDPEKLDQIRQRESDITKERIQKILATGANVILTTGGIDDMCLK YF VEAGAMAVRRVLKRDLKRIAKASGASILSTLANLEGEETFEATMLGQAEEVVQERICDDE LILIKNTKARTSASIILRGANDFMCDEMERSLHDALCVVKRVLESKSVVPGGGAVEAALS IYLENYATSMGSREQLAIAEFARSLLVIPNTLAVNAAQDSTDLVAKLRAFHNEAQVNPER KNLKWIGLDLVHGKPRDNKQAGVFEPTIVKVKSLKFATEAAITILRIDDLIKLHPESKDD KHGGYENAV HSGALDD Cct8 Rn XP_213673 CCTtheta (real gene) vert 11 q11 15 27234460 27245292 MALHVPKAPGFAQMLKDGAKHFSGLEEAVYRNIQACKELAQTTRTAYGPNGMNKMVINRLEKLFVTNDAA TILRELEVQHPAAKMIVMASHMQEQEVGDGTNFVLVFAGALLELAEELLRIGLSVSEVITGYEIACKKAH EILPDLVCCSAKNLRDVDEVSSLLRTS IMSKQYGSEEFLAKLISQACVSIFPDSGNFNVDNIRVCKILGS GVYSSSVLHGMVFKKETEGDVTSVKDAKIAVYSCPFDGMITETKGTVLIKTAEELMNFSKGEENLMDAQV KAIAGTGANVIVTGGKVADMALHYANKYNIMLVRLNSKWDLRRLCKTVGATALPKLTPPVLEEMGHCDSV YLSEVGDTQVVVFKHEKEDGAISTIVLRGSTDNLMDDIERAVDDGVNTFKVLTRDKRLVPGGGATEIELA KQITSYGETCPGLEQYAIKKFAEAFEAIPRALAENSGVKANEVISKLYSVHQEGNKNVGLDIEAEVPAVK DMLEASILDTYLGKYWAIKLATNAAVTVLRVDQIIMAKPAGGPKPPSGKKDWDDDQND Cct1P Rn CCTalpha pseudogene 1 4 q34 1 122326746 122328035 Not identified at pseudogene.org EAVRYINENLIINTDELGRDCLINAA KTSMSSKIIGINGDFFANMVVDAVLAVKYTDIRG QPRYPVNSVNILKAHGRSQIESMLINGYALNCVVGSQGMLKRIVNAKIACLDFSLQKTKM KLGVQVVITDPEKLDQIRQRESDITKERIQKILATGANVILTTGGIDDMCLKYFVEAGAM AVRRVLKRDLKRIAKASGASILSTLANLEGEETFEATMLGQAEEVVQERICDDELILIKN TKARTSASIILRGANDFMCDEMERSLHDALCVLKGVLE SKSVVPGGGAVEAALSIYLENY ATSMGSREQLAIAEFARSLLVIPNTLAVNAAQDSTDLVAKLRAFHNEAQVNPERKNLKWI GLDLVHGKPRDNKQAGVFEPTIVKVKSLKFATEAAITILRIDDLIKLHPESKDDKHGGYE NAVHSGALDD Cct2 Rn NP_001005905 XP_216891 NM_001005905.1 CCTbeta (real gene) 7 q22 9 56394922 56402261 Not identified at pseudogene.org MASLSLAPVNIFKAGADEERAETARLSSFIGAIAIGDLVKSTLGPKGMDKILLSSGRDAS LMVTNDGATILKNIGVDNPAAKVLVDMSRVQDDEVGDGTTSVTVLAAELLREAESLIAKK IHPQTIIAGWREATKAAREALLSSAVDHGSDEVKFWQDLMNIAGTTLSSKLLTHHKDHFT KLAVEAVLRLKGSGNLEAIHVIKKLGGSLADSYLDEGFLLDKK IGVNQPKRIENAKILIA NTGMDTDKIKIFGSRVRVDSTAKVAEIEHAEKEKMKEKVERILKHGINCFINRQLIYNYP EQLFGAAGVMAIEHADFAGVERLALVTGGEIASTFDHPELVKLGSCKLIEEVMIGEDKLI HFSGVALGEACTIVLRGATQQILDEAERSLHDALCVLAQTVKDPRTVYGGGCSEMLMAHA VTMLASRTPGKEAVAMESFAKALRMLPTIIADNAGYDSADLVAQLRAAHSEGRIT AGLDM KEGSIGDMAVLGITESFQVKRQVLLSAAEAAEVILRVDNIIKAAPRKRVPDHHPC Cct21P Rn CCTbeta pseudogene 1 X q31 + 2 94540595 94541059 Not identified at pseudogene.org ECILKHGINCFVSRQLIYNYPEEIFSTAGVIAIEHADFGGVEFLALVTGKLMQFPGLAVG DVCTIVLHGATQDILDEAERTLYDALCVLAQIVKDP RTVYGGGCLELLMAHVVTKLASGT PGKEV

PAGE 10

Cct3 Rn NP_954522 XP_215611 NM_199091.1 CCTgamma (real gene) vert 2 q34 + 12 180383106 180433933 Not identified at pseudogene.org MMGHRPVLVLSQNTKRESGRKVQSGNINAAKTIADIIRTCLGPKSMMKMLLDPMGGIVMTNDGNAILREI QVQHPAAKSMIEISRTQDEEV GDGTTSVIILAGEMLSVAEHFLEQQMHPTVVISAYRMALDDMVSTLKKI STPVDVNNRDMMLNIINSSITTKVISRWSSLACNIALDAVKTVQFEENGRKEIDIKKYARVEKIPGGIIE DSCVLRGVMINKDVTHPRMRRYIKNPRIVLLDSSLEYKKGESQTDIEITREEDFTRILQMEEEYIQQLCE DIIQLKPDVVITEKGISDLAQHYLMRANVTAIRRVRKTDNNRIARACGARIVSRPEELREDDVG TGAGLL EIKKIGDEYFTFITDCKDPKACTILLRGASKEILSEVERNLQDAMQVCRNVLLDPQLVPGGGASEMAVAH ALTEKSKAMTGVEQWPYRAVAQALEVIPRTLIQNCGASTIRLLTSLRAKHTQENCETWGVNGETGTLVDM KELGIWEPLAVKLQTYKTAVETAVLLLRIDDIVSGHKKKGDDQNRQTGAPDAGQE Cct31P Rn CCTgamma pseudogene 1 vert 19 p11 + 3 21483523 21485110 Not identified at pseudogene.org MVGHCPVLLLSQNTQRVSGRKVQSGNINTAKTIADIIWTCLEPKSIMKMLLDPARGIVMT NDEVGDGTTSVIILAGEMLSVAEHFLEQQMHPTVVISAYHMALDDMINTLKKISIPVDVN NRDMMLNIINSSITTKVISRWSSLACNIALDAVKTVQFEENGRKEIAMKKYARVEKIPRG IFEDSCVLRRVMINKDVTHP RMHRYIKNPRIVLLVSFLDLCSWIVSPPEELREDDVGTGA GLLEIKKIGEEYFTFITDCKDPKACIILLRGASKRDTIGSRTQPPGCLASVHNVLLDPQL VTGGGASEMAVAHALTQKSKAMTGVEQWPYRALAQALEVIPQTLIQNCGASTIRLLTSLR AQHTQENCETWGVNGATGTLCGLERAGYLGAIGHEATNIQNSSGGCSSASADEDIVSGHK KKGDDQNRQTSAADAGQE Cct32P Rn CCTgamma pseudogene 2 vert 13 q13 4 51648197 51649727 Not identified at pseudogene.org NTKRESGRKVQSGNVNAAKRSANIIWTCLEPKSMMKMLLDPMMTNDEMLSVSKHFLQQQM HPTVVISAYRMALDGMISTLKQISTPVDGKNHDVVLNIINSSITTKSSVTGGFLLNPANE RRLHLQLCKDIIQLKDDVVITENGISDLVQHYLMWANVTAIRR VRKTDNNHIARACGART RSQRRDNIGSRTQPPGCHASVRNVLLDPQRVPSGHVEQWSYRAVAQALEVIPLTLIQNYG ASTIRLLISLRAKHTQENCEIWGVKSKT Cct33P Rn CCTgamma pseudogene 3 vert X q31 + 3 93409548 93410990 Not identified at pseudogene.org IQVQHPAAKSIIEISGTQDEVARDGTISVIILVGEMLSVAEQFLE QQTHSTVVSSAYHMA LDDMLSTLKQINSPVDVNDRDMMLNIINSSIITKVIRLWSSLAYNIALDAVKIIGLCYWI LLWNTKQEKARVSLRLHERRISPESCKWEKSTLSSCDAMQVCCNVLLDPYLMPSDGASEM AVAHAVTEKAKARTVMEPWPYRAVAQALE Cct34P Rn CCTgamma pseudogene 4 vert 4 q42 2 149237097 149237635 Not identified a t pseudogene.org AISDLAQHHLVWANVTATPRVWKTVNNDAARACGARIVSAETQRLAPLFLEEPARKYFWK KSAPWQDAMQVCHSILLDPQLVPGSGASEMAHALIEKPKAMTG Cct4 Rn NP_877966 NM_182814.2 CCTdelta (real gene) vert 14 q22 + 13 103675244 103687376 Not identified at pseudogene.org MPENVASRSG PPAAGPGNRGKGAYQDRDKPAQIRFSNISAAKAVADAIRTSLGPKGMDKMIQDGKGDVTI TNDGATILKQMQVLHPAARMLVELSKAQDIEAGDGTTSVVIIAGSLLDSCTKLLQKGIHPTIISESFQKA LEKGLEILTDMSRPVQLSDRETLLNSATTSLNSKVVSQYSSLLSPMSVNAVMKVIDPATATSVDLRDIKI VKKLGGTIDDCELVEGLVLTQKVANSGITRVEKAKIGLIQFCLSAPKTDMDNQ IVVSDYAQMDRVLREER AYILNLVKQIKKTGCNVLLIQKSILRDALSDLALHFLNKMKIMVVKDIEREDIEFICKTIGTKPVAHIDQ FTPDMLGSAELAEEVSLNGSGKLFKITGCTSPGKTVTIVVRGSNKLVIEEAERSIHDALCVIRCLVKKRA LIAGGGAPEIELALRLTEYSRTLSGMESYCVRAFADAMEVIPSTLAENAGLNPISTVTELRNRHAQGEKT TGINVRKGGISNILEEMVVQPLLVS VSALTLATETVRSILKIDDVVNTR Cct5 Rn NP_001004078 XP_215516 NM_001004078.1 CCTepsilon (real gene) vert 2 q22 11 83681899 83692935 MASVGTLAFDEYGRPFLIIKDQDRKSRLMGLEALKSHIMAAKAVANTMRTSLGPNGLDKMMVDKDGDVTV TNDGATILSMMDVDHQIAKLMVELSKSQDDEIGDGTTGVVVLAGALLEEAEQL LDRGIHPIRIADGYEQA ARIAIQHLDKISDNVLVDINNPEPLIQTAKTTLGSKVVNSCHRQMAEIAVNAVLTVADMERRDVDFELIK VEGKVGGRLEDTKLIKGVIVDKDFSHPQMPKEVLNAKIAILTCPFEPPKPKTKHKLDVTSVEDYKALQKY EKEKFEEMIAQIKETGANLAICQWGFDDEANHLLLQNGLPAVRWVGGPEIELIAIATGGRIVPRFSELTS EKLGFAGVVREISFGTTKDKMLVIE QCKNSRAVTIFIRGGNKMIIEEAKRSLHDALCVIRNLIRDNRVVY GGGAAEISCALAVSQEADKCPTLEQYAMRAFADALEVIPMALSENSGMNPIQTMTEVRARQVKESNPALG IDCLHKGSNDMQYQHVIETLIGKKQQISLATQMVRMILKIDDIRKPGESEE Cct51P Rn CCTepsilon pseudogene 1 X q32 2 111419580 111419580 Not identified at pseudogene.org KDQDCKSGLLEFEALKSHIMAAKAVAKTTQMSLGPNTLNMMIVDKGGDVTVNNEDANILS MIDVDHEIAKLRVELSKSQDDENRYGTTRMDVLKHLDKISNNVLVDINNPESLGSKVVNS CHQQMTEISVNSIITVADTELRDVDFELIKWDSKILGWLENTRLIKHVIIDKDFSHPQMS GKKMVDSNISILMCHFEPAKQKKRHKL Cct52P Rn CCTepsilon p seudogene 2 8 q13 1 17626224 17626583 Not identified at pseudogene.org RSLIHYNRVVYGGGAAEISCALGVSQEVDKCPTLEQYAMRTFADALEVIHMALSENGDMNPFQTMTEVLARQVEESNPA Cct6A1P Rn CCTzeta pseudogene 1 8 q13 5 31685049 31688384 Not identified at pseudogene.org MAAVK TLNPKAEVACVQAALAVNICLARGLQDVLRTNLGPKGTMKMLVSGLHPRIITEGF EAAKEKALQFLEQVKVSKEMDRETLIDVARTSLRTKVHAELADVLTEAVVDSILAIRKKD EPIDLFMVEIMEMKHKSETDTSLIRGLVLDHGARHPDMKKRVENAYILTCNVSLEYEKTE VNSGFFFFFNKSAEEREKLVKAERKFIEDRVKKSCSWKEGIVALHRAKRRNMERLTPACG GIALNSFDDLNPDCLGH AGLFYEYTLVHDGGRRASSSSSVREMGDSGSSTEKFTFIEKYN NPRSVTLLVKGPNKHTLTQIKDVIRDGLRAVKNAIDDGCVVPGAGAAEAFADTLLIIPKV LAQNSGFDLQETLVKAEHSESGQLVGVDLNTGEPMVAAEMGVWDNYCVKKQLLHSCTVIT TNILL Cct6A2P Rn CCTzeta pseudogene 2 10 q32.1 + 2 88702370 88703903 AVNISAAGGLQDVLR TNLRPKGTMKMLVSGAGDTKLTKDGNVLLNEMPIQHPIASLIAKV ATAQDDISGDGTTSSVLIIRYLLKQVDLYISEGLHPRIITEGFKAAKEKALQFLEQVKVS KEMDRKTLINVARTSLWTKVHAELADVLTEAVVASILAIGKKDERIDLFMVQIMEMKHKS ETDTSLIRGLKKVCGDSDKAFVVINQKGIDPFSLDALAKEGIVALCRAKRRHMERMTLAC SGIALNSFDDLNPDCLGHAGLVYEYTL GEEKFTFIEKCNNPSSVTLLVKGPNKHMLTQIK DAIRDGLRAVKNATDDGCVVPGAGAEVALAEALIKYKPSVKDRAQLGVQAFADALLIIPK VLVQNSGFDLQETLVKVQAEHSEFGQLVRVDLNTGEPMVATDMGVWDNYCVKKQLLHSCT VVVTKFLL Cct6A3P Rn CCTzeta pseudogene 3 5 q11 + 3 240279 241905 MAVVKTLNPKAEVARVQAALVVNISAAPG LQDVLRTNLGPKGTVKMLVSGAGDIKLTKDG NVLLYEMFFFYLQQIQHPTASLIAKVATVQDDIIDDDTTFNVLIIGELLKQADLYISEGL HPRIITESFETAKEKARQFLKQVKVNKEMDRETLIDVARTSLRTKVHAEFADVLTETVVY SILAIRKKDEPIDLFLVEIMEMKHKSETDTNLIRGLVVDHQARHSDMKKRVENAYILTCN VSLDAEEREKLVKAERKFIEDRVKKIIVLKKKVCGDSDKGF VVINLKGIDPFSLDALADK GILPLRRAKRRNMERLTLDCGPNKHMLTQIKDAIRDGLGGVKMLLMMAVLSRWRCSRSGT AFADALLIISKVLAQNSGFDLQETLVKVQAEHSETGQLVGVDLNTGEPMVAAEIGVWYNY YVKKQLLHSCTVIATNILLVNEIMRAG Cct6A4P Rn CCTzeta pseudogene 4 5 q24 3 76957850 76959406 MKMLVSGSGDIELTKDGNVL LHEMQIQHPTASLIAKMATAQGDITGDGTTSNVLIIGSCS NKCLYRGYGGLHFGHWEKEEPTELFMVETMEMKHKYETDTSLIRGLVLDHGAWHPDMKKR VENAYILTGNSAEEREKLVKAERKLIENRVKKIIELKKKVCGDLDKGFLIINQKGIDPFS LDALAKEGIVALCRAKRRKADSCLCAGAAEVALAEALIKYKPSVKGRTQLGVQAFADALL IIPKVLLQNSGFGLQETLVKVHAEHLESGQLV GVDLNTGEPMVASEMGVWDNYCVKKQLL PSCTLIATNILLVDKIMRAGMSSLK

PAGE 11

Cct6A5P Rn CCTzeta pseudogene 5 18 p11 4 35236443 35237988 MMAVKTLNPKAEVAGAQAVLVVNSSAAWGLQDVLRTNSGLKGTVKMFVSGAGDIKCTKDG NDDITGDGTTSNVLIIGELHKQEDLYVSEGLHPRIISEGFEAAKEKALQFLEQVKVSKEM DRETLIDIKDLL LIKRVLTPFLCKRRYHSSAQSQEEKHGKLTLACGGIALNSFDDLNLDC LGHAGLVYEYTLGKEMFTCIEKCSNPCSVTLLVKGPNKHMLTQIKDTIRDGLRAVKNAID DGCVVPVVPGAGAEVALAEALIKYKPSVKGRAQLGVQACTDALLIIPKVLAQNSGFDLQK TLKFKLNIQNLVS Cct6A6P Rn CCTzeta pseudogene 6 1 q12 5 67148659 67170130 MAVVK TLNPKTEVAREQAALVVNSSAALVLQDDTMKKLVSGAEDIKLTKDGNVLIHEMQI QHPTASLIAKVATAQDDITGDGTISNVLIIRELLKQIDLYVSEGLHPRIISEGFEAAKEK ALQFLEQVKVSKEIDRETLINMARVSLLQFMLNLLGLVLDHGVWHPDMKKRVESAYILTC NVSLEYEKTEGNSGSFTRIELKEIELKKKVCGDSDKGFVVINQKGIDPFSLDALVNEAMN LSVDLNPDCLGHAGLVY EYTLGEEKFTVMKKRDNLQSVTLLAKGPNKHTLTQIKDAIRDG LRAVKMLLMMAVLSWVQVLVQKSSFDLQEKRVKVKAEHSESCQLVGVDLSTGEAMVATEM SDWDNYCVKKQLLHSCTVMATNILLVDEIMRAG Cct6A7P Rn CCTzeta pseudogene 7 1_random 5 1252980 1254555 MKKLVSGAEDIKLTKDGNVLIHEMQIQHPTASLIAKVATAQDDITGDGT ISNVLIIRELL KQIDLYVSEGLHPRIISEGFEAAKEKALQFLEQVKVSKEIDRETLINMARKRVESAYILT CNVSLEYEKTEGNSGSFTRTQEEKHGKADTCVGIAMNLSVDLNPDCLGHAGLVYEYTLGE EKFTVMKKRDNLQSVTLLAKGPNKHTLTQIKDAIRDGLRAVKMLLMMAVLSWVQVLVQKS SFDLQEKRVKVKAEHSESCQLVGVDLSTGEAMVATEMSDWDNYCVKKQLLHSCTVMATNI LLVDEIMRAG Cct6A8P Rn CCTzeta pseudogene 8 13 q22 5 79050104 79053455 VVRAQTALAVNISTAWGQQDVLRTNLRPKGTVKMPVSGAGDTKLTKDDNTVDLYISKGLH PRITEGFEAAKEKALQFLEQVKGSKEMDRGTLIDVAKTSLRTKVHAELADVLTGAVTMEV KHKSETDTSLARGLALDLVAQVSLEYKKTEMPLERRYHSSVQSQEEKHGKLTLACG GIAL NSFDNLNPDGLGHSGLAYKYTLKRSSPVLRSVTIPVLSVYWCSRSGPGRGSRIKYKPSVK DRAQLGVQACADALLIIPKVLAQNSAFDLQETLKFKLNIQNPVSLS Cct6A9P Rn CCTzeta pseudogene 9 16 q12.5 4 88811021 88812383 MVAVKTLNPKADVARTQAALAVNISEALGVQDVLRSNLGSKGTMKTLISGARDIKNSKDS NVLLHEMQIQHPI ASLIAKVATAQDCITGNGTTSNVLIIWELLQQVDIYISKGLHPRIIT GGLEIAKEKNSKLWWTPFWLLGKKDEPIDLSMVENMEMKHKSETDTNLISRLFLDHGMEL KKIIELKKKVCGDSDKGFVIINQKGTNPFYLDALLKEVTLLIKGPNNHMLTQIKDAIRDG LRAIKNAIDDGCVVSGAGAVGLALAETLINYKPSVKGRAQLGVQAFADTLLIFPK Cct6A10P Rn CCTzeta pseud ogene 10 1 q12 3 67148933 67150124 LGPKDTMKKLVSGAEDIKLTKDGNVLIHEMQIQHPTASLIAKVATAQDDITGDGTISNVL IIRELLKQIDLYVSEGLHPRIISEGFEAAKEKALQFLEQVKVSKEIDRETLINMARVSLL QFMLNLLGLVLDHGVWHPDMKKRVESAYILTCNVSLEYEKTEGNSGSFTRIELKEIELKK KVCGDSDKGFVVINQKGIDPFSLDALVNEAMNLSV DLNPDCLGHAGLVYEYTLGEEKFTV MKKRDNLQSVTLLAKGPNKHTLTQIKDAIRDGLRAVKMLLMMAVLSWVQ Cct6A11P Rn CCTzeta pseudogene 11 6 q23 + 5 83591559 83592798 SVGLHPRIITEDFEAAKQKALQCVEQVKVSTEMHREMLIDVARISLQSKVHAEFSDVLTE AVIMEMKHKSETDTSLITGLVLDHGAKHPDMKKKIELIKIVELKKKVCGNS DKEFVIINQ KGIDPFSLDTLVKEGLIYEYTLCEEKFNFIEKCNCSRSLTLLVNGPNKHTLTEIKVAIRD GLRAVKNAVDDGCVVCVQLVGVDLNTGERMVAAEMGMWDNYCVKKQLLHPCTVIATNILL VDEIMRAG Cct6A12P Rn CCTzeta pseudogene 12 9 q22 4 50508448 50509665 RIITEGFEAAKERALQFLEQVKASKEMNRKTLIDVAKTSLQSKVHVE LADVLTEAVVNSI LAIRKRDEPTYLFVVEIMEMNHKSETDTSLIKGLNLDHGAGHPDMKKRVDNAYILMCNSE EEKEKLVKAERTFIDDTVKNIIDLNKKLCTEPKGETWKADTSCGGIALNFFANLTPDCLG HGGLVYEYIMGPTKSMLTEIKDSIRDGLRDVKNAIDDGHRCNRSGTGRRSDQIQTQCEGQ GLAWGPGICRHLAYHSQAETMMATEMGLWDNYCVNKQLLHSCIIITTNFLLVEEIMRAG Cct6A13P Rn CCTzeta pseudogene 13 14 q11 1 66456610 66457113 EKCNNPHSVTLPVKGPNKHMLTQIKDALRDGLRAVNNVIDNGSVVPGTDALLIIPKVLEQ NSGFDLQETLVKVQAEHCKSNQLVDVDLNTSELMVVTEMS Cct6A14P Rn CCTzeta pseudogene 14 1 q12 1 67168555 67168845 VLVQKSGFDLQEKRVK VKAEHSESCQLVGVDLNTGEAIVATEMSEWDNYCVKKQLLHSCT AMATNILLVDEIMRAG Cct6A15P Rn CCTzeta pseudogene 15 5 q11 + 2 8728585 8728815 MVAAEMGVWDNYCVKKQLLHSCTVIATNILLVDEIMRAGMSSLKG Cct6A16P Rn CCTzeta pseudogene 16 18 q12.3 + 1 76283952 76284215 VLAQTSSSDLW ETLVMIQTEHSELSRLVCVDLNTGEPRVVAEVSVS Cct7 Rn NP_001100073 XP_001073942 XP_216180 NM_001106603.1 CCTeta (real gene) vert 4 q34 + 11 119714400 119722770 MMPTPVILLKEGTDSSQGIPQLVSNISACQVIAEAVRTTLGPRGMDKLIVDGRGKATISNDGATILKLLD VVHPAAKTLVDIAKSQDAEVGDGTTSVTLLA AEFLKQVKPYVEEGLHPQIIIRAFRTATQLAVNKIKEIA VTVKKQDKVEQRKMLEKCAMTALSSKLISQQKVFFAKMVVDAVMMLDELLQLKMIGIKKVQGGALEESRL VAGVAFKKTFSYAGFEMQPKKYKNPKIALLNVELELKAEKDNAEIRVHTVEDYQAIVDAEWNILYDKLEK IHQSGAKVILSKLPIGDVATQYFADRDMFCAGRVPEEDLKRTMMACGGSIQTSVNALIPDVLGRCQVFEE TQI GGERYNFFTGCPKAKTCTIILRGGAEQFMEETERSLHDAIMIVRRAIKNDSVVAGGGAIEMELSKYL RDYSRTIPGKQQLLIGAYAKALEIIPRQLCDNAGFDATNILNKLRARHAQGGMWYGVDINNEDIADNFQA FVWEPAMVRINALTAASEAACLIVSVDETIKNPRSTVDAPAPAAGRGRGQGRFH Cct71P Rn CCTeta pseudogene 1 1 q43 + 1 216186823 2161884 54 MMPTPVILLKEGTDSSQGIPQLVSNMSACEVIAEAVRTTLGPHGMDKLIVDGRGKATISN DGATILKLLDVVHPAAKTLMDIAKSQDAEVGDGTTSVTLLAAEFLKQVKPYVEEGLHPQI IIRALLTATQLAVNKIKEIAVTVKKQDKVEQRKMLEKCAMTALSSKLIFQQKVFFAKMVV DAVQGGALEESRLMAGVAFKKTFSYAGFEMQPKKYKNPKIALLNVELKLKAEKDNAEIRV HTVEDYQA IVDSEWNILYDKLEKIHQSGAKVILSKLPIGDVATQYFADRDMFCAGRVPEE DLKRTMMACGGSIQTSVNALIPDVLGRCQVFEETQIGGERYNFFTGCPKAKTCTIILHGG AEQFMEETERSLHDAIMIVRKAIKNDSVVAGGRATEMELSKYLWDYSRTIPGKQQLLIGA YAKALEIIPRQLCDNAGFDATNILNKLRARHAQEGMWYGVDINKEDIADNFQAFVWEPAM MCINTLTAASEAACLIVSVD ETIKNPRSTVDAPAPAAGRGRGQGRFH Cct72P Rn CCTeta pseudogene 2 5 q36 + 3 145306136 145307710 MPTPVILLKEGTDSFQGFPQLVTPVPAKRLLRLELGRCGTDKLIVDGQGKATISNGGATV RKSLDVVHPAAKTLVLLPNPKTLRLVMAPPHSELISGRKVFAKMAAYAVMMPDEVLQLTG AGAAFEKTFSYAGFEMQPEKSKNPKSALLNVELERKAEKD DAEIRVHTGEDFQAMMDTKW SIRYDKLEKTHQSGAIVSKLPIGDVAIQYSADRDVFCADCKPEEDLKKTMTACGGSIQSS VDALTPDVLGYCQVFEETPIGGERHNFFTGCPKAKTCTSLLHGGAEQFMEEAERSLHDAI ITVRRAIKNDSVAAGGGLCDNAGFDATNILNKLQAGHAQGGMWYVVDINNKDIAQNLQAF VWEPASEAACLIVSVDETIKSPRSTVDAPGPAAGG Cct73P Rn C CTeta pseudogene 3 9 q13 + 3 19556657 19557897 MTVKKQDKVEQRKMLEKSAVTAPSSELISQQKVCFAKVVSDAVMMLDETLSLGLGLKGSP RSIRTLLNVELKLKAQKDNVETTVHTADEYKEIVDDKQNILDGKLVIGSSSCCRLEEDLK RLMNVKALIPDVLGHSQVSEETQIGGERCNFFILRPRHVASSSVVALSSLSLHGAIMFVR SVIKYDS Cct81P Rn CC Teta pseudogene 1 11 q12 + 1 43940890 43941006 MNKMVINHLEKLFMT Cct82P Rn CCTeta pseudogene 2 3 q12 + 1 32962674 32962763 GDKVTNMALHYANKGNTML

PAGE 12

Hspd1 LOC684747 Rn 9 q31 11 53884643 53894161 MLRLPTVLRQMRPVSRALAPHLTRAYAKDVKFGADARALMLQGVDLLADAVAVT MGPKGR TVIIEQSWGSPKVTKDGVTVAKSIDLKDKYKNIGAKLVQDVANNTNEEAGDGTTTATVLA RSIAKEGFEKISKGANPVEIRRGVMLAVDAVIAELKKQSKPVTTPEEIAQVATISANGDK DIGNIISDAMKKVGRKGVITVKDGKTLNDELEIIEGMKFDRGYISPYFINTSKGQKCEFQ DAYVLLSEKKISSVQSIVPALEIANAHRKPLVIIAEDVDGEALSTLVLNRLKVGLQVVAV KAPGF GDNRKNQLKDMAIATGGAVFGEEGLNLNLEDVQAHDLGKVGEVIVTKDDAMLLKG KGDKAHIEKRIQEITEQLDITTSEYEKEKLNERLAKLSDGVAVLKVGGTSDVEVNEKKDR VTDALNATRAAVEEGIVLGGGCALLRCIPALDSLKPANEDQKIGIEIIKRALKIPAMTIA KNAGVEGSLIVEKILQSSSEVGYDAMLGDFVNMVEKGIIDPTKVVRTALLDAAGVASLLT TAEAVVTEIPKEEKDPG MGAMGGMGGGMGGGMF Hspd1P Rn GROEL pseudogene 1 14 q22 + 1 107651987 107653696 ASTTHSPSPNETSVSGTGSSSHYAKDVKFGADARALMLQGVDLLADAVAVTMGPKGRTVM IEQSWGSPKVTKDGVTVAKSIDLKDKYKNIGAKLVQDVANNTNEEAGDGTTTATVLARSI AKEGFEKISKGANPVEIRRGVMLAVDAVIAELKKQSKPVTTPEEIAQVATI SANGDKDIG NIISDAMKKVGRKGVITVKDGKTLNDGLEIIEGMKFDRGYISPYFINTSKGQKCEFQDAY VLLSEKKISVHYVDGEALSTLVLNRLKVGLQVVAVKAPGFGDNRKNQLKDMAIATGGAVF GEEGLNLNLEDVQAHDLGKVGEVIVTKDDAMLLKGKGDKAHIEKRIQEITEQLDITTSEY EKEKLNERLAKLPDGVAVLKVGGTSDVEVNEEKDRVTDALNATRAAVEEGIVLGGGYQKI GI EIIKRALKIPAMTIAKNAGVEGSLIVEKILQSSSEVGYDAMLGDFVNMVEKGIIDPTK VVRTALLDAAGVASLLTTAEAVVTEIPKEEKDPGMGAMGGMGGGMGGG Hspd2P Rn GROEL pseudogene 2 14 p11 + 1 42554408 42556112 MLRLPTVLCQMRPVSRALAPHLTQAYAKDVKFGADARALMLQGVDLLADAVAVPVGPKGR TVTLEQSWGSTKVTKDGVTIAKL IDLKDKYKNIGIKLVQDVANDTNEEAGDGTTTATVLA RSIAKEGLEKISQGAEPVGIRRGVMSAVDAVIAELKKQSKPGTTPEETAQVATTSANGDK DTGNIISDAVKKVGRKGVITKISSVQSIVPALEIANAHRKPLVTIAEDVDGEALRTPVLN RLKVGLQVVADKAPGFGDNRKNQLKDMTIATGSAVFGEELNLNLEDVQAHDLGKVGQVIV TKDDAMLLKGKGDKAHIEKILGFIKPANEDQKIGI EIVKRALKIPAMTIAKNAGVEGSLI VEKILQSSSELVVRTALLDAAGVVSLLTTAEAVVTEIPKEEKDPGMGSMGGMGGGMGGG Hspd3P Rn GROEL pseudogene 3 17 q12.1 + 2 58629782 58633066 PKVTKDGVTVAKSIDLKDKYKNIGAILVQDVANNTNEEAGDGTTTATVLAWSIAKEGLEK ISKGANPVEIRRGVMLAVDAVIAELKKQSKPVTTPEEIAQVAT ISANRDKDTGNIISDAV KKVGRKGVITVKDGKTPNDELEIIEGMKFDRGYISPYFINTSKGQKCEFQDAYVLLSEKK ISSVQSTVPALEIANAHRKPLVIIAEDDDAEALSTLVLNRLKVGLQVVAVKAPGFVDNRE NQLKDKAITIGGTVFGEEEKHVQEVTEQLDVTTSEYEKEKLNERLAKLSDGVAVLKVGGT SDVEVNEKKDRVTDALNATRAAVEEGIVLGEGCALLRCIPALDSLKLANEDQKIG IEIIK RALKIPAMIIAKNADFEKMVEKGIIDPKKVVRTALLDAAGMASLLTTAETVVTEIPKEEK DPGMGAMGGMGGDMGGG Hspd4P Rn GROEL pseudogene 4 14 p11 2 41523338 41525050 ALITHSPLPDQTSVSGTGSSSHSGLCQRCKIWCGSQALMLQGVDLLADAVAVPMGPKGRT VIIEQSWGSPKVTKDGVTVAKSIDLKDKYKNIGTELVQDVADNTNK EAGDGTTTATVLAR SIAKEGLEKISQGANPVEIQRPVMLAVDAVIAELKKQSKPVTTPEETAQVATTSANGDKD TGNIISDAVKKVGRKGVITFDRGYISPYFINTSKGQKCEFQDAYVLLSEKKISSVQSIVR ALEIANAHRKPLVTIAEDVDGEALSTPVLNRLKVGLQVVADKAPGFGDNRKNQLKDMTIA TGSAVFGEEELNLNLEDVQAHDLGKVGQVIVTKDDAMLLKGKGDKAHFEKRNQEITEQ LD VTTSEYEKEKLNERLAKLPDGVAVLKVGGTSDVEVNEEKDRVTDALNATRAAVEEAIVLE GTVLCFVIDPTKVVRTALLDAAGVVSLLTTAEAVVTEIPKEEKDPGMGSMSGMRGAVGGG Hspd5P Rn GROEL pseudogene 5 1 q35 2 179041398 179043070 LLADAVAVPMGPKGRTVIIEQSWGSPKVTKDGVTVAKSIDLKDKYKNIGATLVQNVANNT KRLGM HHHCYCSGRSIAKEGFEKNSKGANPVEIRRGVMLTVDAVIAELKKPSKPVTTPEE ISQVATISANGDKDIGNIISDAMKKVGRKGVIAVKDGKTLNDELEIIEGMKCDRENMSPY FINTSKGQKCEFQDAYVLLSEKKISSVRSIVPALEIANAHRKSLVIIAEDVDGEALSTLV LNRLKTGLQAVAVKAPGFGDNRKNQLNDTVIVTGGAVFGEERLNLNLEDVQAHDLGKVGE VIVTKDDAMLLKGKGDE AHIEKCIQEITEQLDVTTSEYEKEKLNERLAKLPDGVAVLKVG GTSDVDVEVNEKKDRVTDALNATRAAVEKGIVVEGSLRVQKILQSSSEVGYDAMLGDFVN MVRKGIIDPTKVVRTALLDAPGVASLLSTAEAVVTEIPKEEKDPGMGAMGGMGGGVGGG Hspd6P Rn GROEL pseudogene 6 20 q11 + 4 30691244 30692932 HSPSPDETSVSGTGSSLHSGLCQRVKF GADAPALMLQGVDLLANAVAVTTGPKGRTVTTE QSWESFKVTKDGATVAKSIDLKDKYKNNGAKLVQDVANNTNEEAEDGTTTATLLAWSIAK EGFEKISKGANPVGIRRGVMVADDAVVAELKKQSKPVTTPEEIAIISTNGDKGIGNIISD ATKKVGRKGYISPYFINVSKGQKCEFQDAYILLREKKISSVRSIVPALEIANAHRKPLVI IAEDVDGEALSALVLNRLKVGLQVVAVKAPGFEDNRKNQ FKDMAIATGGAVFVEEGLNLN LEDVQDHDLGKVGEVIVTEEDAMFLKGEGDKAHIEKHIQEITEQPDVTNEYEKEKLNERL AKLPDGVAVLKVGGTTAVEEGIVLGGGCALFHCIPALHSLKPANEDQKICIEIIKRALKI PAMTVAKNAGVEGSLIVEKILQSSSEVDYDAVLGDFVNMVERELLIQKGSVVTDISKEEK DPGKGAMGGIGGGMGGG Hspd7P Rn GROEL pseudogene 7 1 q22 2 103951550 103953587 LPTVLHQMRPVSRALTPHLTWAYAKDVKYGADARASMFQGVVLLVHAVAVTIWPKGKTVI IDQSWGSPKDAPNNTNEEAGDGTTTATVLAPSVAKEGFEKISKGANPVGIRRGMMLAVDA VTAELKKQSKPVTTPEEIAQVATISANGDKDIGNIISDAMKVGRKGVIKVKDGKTLNNEL EIIEGMKCDRGYIFPYFINTSQDQKCEFQMPMFYVDGEALS TLGLNRLAVSLQLVAVRVQ GFGDNKKNQLKDMAIATGDGVAVLKVGGTSDVEVNEKKDRVTDALSATRAAVEEGISANE DQKIGIEIIKRALKIPAMTIAKNTGIEGSLIVDKNSTEFLRSWGGPLANYSRSFSDIPKE EKDLGMDTMGGMRGGMGGG Hspd8P Rn GROEL pseudogene 8 13 p13 + 1 3553285 3554991 PTVLHQIRPVSRSLAPHLTRSYAKDIKYGVDA RALMLQGVDLLVNAVAVTMGPKGRTLII EQSWGSPKDVANNTNEEAGNGTTTATVLAWSIAKVGFEKFSKGANPVEIRRGVMLTVDAV IAEFKKQSKPVTITEEIAQVVTISANGDKDFGNIISGAVKKVGRKGVITVKDRKTLNDEL ETIEGMKFDRGYIFPYFSNTSKGQKCEFLNAYVLLSEKKISSVKSTVPALEIVNAHWKPL VEIAEDVDGEALSIVVLNRLKVGLQVVAVKAPGFGDNRKNQLKD MAFTAGYTVFGEEGLD LNLKDVQAHGLGKVEEVIFTNDDAMLLKGKGDKVHIEKCIQEITEQLDITTSVKLGGTSD VEVNEKKGRVTDAIDATRAAVEEGIVLGGGYSLLWCIPVLDSLKPANKDQRIGIEIIKRA LNIPAMMIAKNTGAEGSLIVEKILQCSSEVSYDAMLGDFVNMIEKGTIDPKKGCKNCFTG CC Hspd9P Rn GROEL pseudogene 9 2 q43 + 1 234250749 234252288 PSPDETVSRALAPLLTRAYGKDVKFGVDARASVLQGSWGSLKVTKDGVTVAQSIDLKDKY KNIGAKVVQDVTNNPNEEAGDVTTTATLLAQSIAKEGFLKIGKGVNPVEIQRGVMLAVDA VIAELKKQSKPATTPEEIAQVARISANGDKDIGDIISDAMKKVGRKGVITVKDGKTQNDE LEIIEGMKFDRGYISPYFINTSKFIVSALEVANAHGKPLVIIAEDVDGEALSALVLNRLN VGLQVVAVKAPGSAVFGEERLNLNLEDVQARDLGKVGEVIVTQDDAMLLKGKGDKGHTEN CIQEITEQLEITTSESEKEKLNERLANLSNGVSPADKDQKIGVEIIKRALKIPAVTIAKN AGVEGSLIVEKILQSSSEIGYDAVLGEFVNM Hspd10P Rn GROEL pseudogene 10 5 q24 4 76955304 76956924 NAVAVTMESKRTVLIEQSLGSPKVTKDGVTVAKKSKS RETWRTAVDAVIAELKKQSKPVT TPEEIVQVATISANGDKDIGNIISDSTKVGRKGAITVKDGKTLNDDLEIIEGMKFDRRCI SLYFITTSKGQKCEFQDAYVLLSEKKISSVQSIVPALEIANAHRKPLVIIAEDVDGEALS TLVLDRLKVGLQVVAVQSSRETLEMSLSPKMMRCFWKEKLTLKNVKGKGDKAHIEKHVQE ITEQLDITTSEYKKEKPNEWLGKLADGVAVLQVGRTSGMEASEKKDRVT DAFNASRATFE EGIVLGGGCALLRCIPALDSLKPANEDQNIGIEIIKRALKIPAVTIAKNAILIVEKILII DPTKVVRTVLLDAAGVASLLTTAEAVVTEIPKEKKDPGMGAMEGWEGGMRDG

PAGE 13

Hspd11P Rn GROEL pseudogene 11 6 q12 + 5 19170317 19172008 MLELNTVLHQVRPVSWTLFPHLTLAYAKDTQFGTDAKALLFQDIDLLADVVVVTKGPKGR T VIIEQSWGSPKVTKDGITIRKSIDLNDEYKNNRANLVQNEGFQKINKGANPVEIQRVKD GKILNDELEIIEGMKFDRGYISSYFINTSKGQKCEFQDGYVLLSEKKFSSVQSIVPSLEI ANGSPWKIQFKDTAISTGGAVFGKEGLNLNLEDVQAHDLGKVGDVIVTKDGSLLVEKILQ SSSEDGYDVMLRDFVNMVEKGIFDPTKVRRTALLAAAAVDSLLTIAEVKVTKIPNEEKNS GMGAMGGMVGGMG Hspd12P Rn GROEL pseudogene 12 2 q42 + 1 220584917 220585779 MLRLPTVLLKMRPMSQALAPHLTWVYAKDVKFGAAKLVQDVANNTNKEAGDGTTTATVLA WSIAEEGFEKISKGVNPVEIWRGVMLADDAVIAELKKQSKPVPTPEEIAQVATISANGDK DIGNIISDAMKKVGRKGAITVKDGKTLNDELEIIEGMKFDRGYISPYFINISKCHKCEFQ DAYVL LSEKNISSVQTIVPALEIANAHRKPLVIIAEDVDGEALSTL Hspd13P Rn GROEL pseudogene 13 9 q22 + 4 43887785 43896197 MVLHQMKPVSRALTLPLTRACIKDIKFGADARALMLQGIVNLLAIAIVVIMGSKGRTVII EQSWRSPKITKDWFSVAKSIDLKDKYKIMELNLFRMLPITQTKRLGMAPPLSLFWQVEIQ RDVMSAVDAVITELNKQSKPVTAPEEI AQVATISANVEKDIGNIISDAMKKSIVSALEIA NAHQSSLVIITEVKAPGFGDNRKNQFKVMAITIGGVAVGPLNLNLEDVQDHDLGKVGEII VTKDDAILLKGIGDKAHIEK Hspd14P Rn GROEL pseudogene 14 5 q32 4 109039079 109040435 AVIAELIKQSKPVTAPEEIAQVASISTNGDKDIGNIISDAIEKVGRKGVITVKDEQTLMD KLEIIEAIVPAL EVANTRRKPLLIMVEDVDGETLSTLVLNRLEVGLQVVSVKALGFRDNR KNQLKGVSIATGGEVFGEGRLNLNLEEVQAHDLGKVGSVTVNRDGTMFLKGKGDKAHIEK CIQGITEQLDITTSEYEKGKLNKLLAKLSDRVAVWKDKWYGTAVEESIILTEGWVLLQCI PVLRSLKPANGDQKTAEAVRTEIPKEENDPRIGAMDGKAVGMG Hspd15P Rn GROEL pseudogene 15 4 q21 + 5 37890620 37892172 SLALAPHLTQAYAKYVKFHTDPEALMLQGVVLIIDAVAVCYNGIKGKNSGHWIDWESPRH RDGSTTAAVLACSIAKGGFDKSSKGANPVQIQRKVTLAVDAVIAELNQSKPVTTPKKLLS GKKISSAQPMVHALEIANAHPKALVIITEDVVRDTLSTLVLNRLKVPSDGVPVLKLGGTS DVEVGEKNGRITDVLSFIRAAVEEGIVLGGGCLRKFCRFPQKLVMMLFF FEIFVNMVENR IIDPTKVVRISILDAAGVASLLIATKAVMTEIPKEETDPGT Hspd16P Rn GROEL pseudogene 16 X q32 + 1 110646161 110647153 YVLLSGKKISSVQIIVPTFEIANAHRKPLVIIAEDVDGEALRTLVLNRLKTILQVVTVKA PGFAENRKNQFKDMAIATGGVVFGEEGLNLNLKDVQPQDLREIGEAIVTKDDAILLKGKV TKLTLKNKKLNER LAKLSDGIAVLKFGGTCDIEVNEKKDRVTDDLNFPTAAVQEGIVLGR GCALLRCIPALNSLNLLMKNLSVESSLLVEKILQSSSEVAYKAMVVDFVNMVEKGIIDPT Hspd17P Rn GROEL pseudogene 17 X q31 + 4 84112821 84114244 KSILAQFVQDVANNTNKGTGDGTTTITVLAWSIANEGFENVSKGANPVEIQGSMMLTVDA VIVELKTQSKSMTTPEEIALL ATISANVKCNGESIGNIISDVTKKVGRKGCHDRKDGKAL SDELEVSESMKFDRGYISIYQHIKRSKCDFQDAYVLLSTKKVSSPLNRKYTCFEQTQSWS SGCSSQSSKVWGQEEPLTDMAIATGGGVFGEEGLNLNLEDAKAHHLRKVGEVIVTKDDAV LLKGKDGVAVLKVGEISDVEVNKKKDRVTDALSATRASVKEDIVLVGAALHFSSSSDVGY DAILRDFVSMEKGINNRSNKGCKNCFTGYCWGD QSCSDRNS Hspd18P Rn GROEL pseudogene 18 9 q22 1 46394084 46394824 GGAVFGEEGLNLSLEDVYAHDLGKVGEVNVNVTKDDATLLKGKGDKTDIEKCIQEITEQL DIPTGEYEKESWMSDLQNFQTEAVLKAGGQVTLKCVRRRQGYTCSQCYRSSCGRRGCALL RCIPVLDSLKPANEDQKIGIEITKRAGKIPAMMTAKNEGVEGSLIVEKILQSSSEVGYEA MLGDFVNMAEKGIIDRTKVVRTTLLDAGVASLLTTTEAVVTEIPKEEKDPGMGAMGGMGG GMGGG Hspd19P Rn GROEL pseudogene 19 20 q13 1 52019335 52019963 KGDKAHIEKCTQEITEQLDITTKEGVADALKATRTAVEEGIVLGGGCAQVQCIPALDSLK PANEDQKIGIEIIKRALKIPVITIVKNAVVEGSLIVEKILQSSPEVGYGAMLGDFVSMVE KGIIDPTKFVRIALLDAAEVASLLTIAEAVVTEIPEEEKDPGKGSLGGMGWGTG Hspd20P Rn GROEL pseudogene 20 15 p16 + 2 8089404 8090167 MGPKRRTVIIEQNRKSPKATKDGVTVAKSISVKDKHKTIRDKLVQDVANNTKNRLEMAPP LPLFRHSLLPRRALRRSAKGKSREVRKTLKDQKIGREIIKRALKIPAMIIVKSAGLEGSL IVEKILQSSSEVG YDSMLGDFVNMVEKGIINPKKILRAALPNAAGMASLLTTLEAVVT Hspd21P Rn GROEL pseudogene 21 X q35 + 1 132806121 132806537 GIVLGVGCALLQCMPALNSLKATNEDQQIGVEIIKRALKIPAMMIAKNAGVEGSLIVVKI LQSSSEVGYDAMLGDFVNMMEKGIIDPTNVVRTALLDAAGVASLLTTVEAVVPEIPKEEK DPGMGAMGGMGGGMEGG Hs pd22P Rn GROEL pseudogene 24 17 q12.1 1 58562739 58563142 KEKLNERLAKLSDGVAVLKVGGPSDVEVNEKKARVHRCSQCSKTAAVEEGIVLGEGCALL RCIPALDSLKLANEDQKIGIEIIKRALKIPAMIIAKNA Hspd23P Rn GROEL pseudogene 25 2 q31 4 151675083 151676174 MKEIGRKSVITVKDGKPLNNELEI DAYFLLSEKKISSVPFIVPALEIGNAYWKSLVIIAE DNDGKAIRSLVLNMLKFAPQVVVIRALGFGDNRQNHMAIVSGAHVLRKRMHVLKDHVLTK DDDKLLKEKGDKLKLKNVFKKSLSSWTSQLFLRHWHDAMLRDFVNMWKGEIIDPTKVVRT TLLDAVEMALLVTTAET Hspd24P Rn GROEL pseudogene 26 15_random 1 414366 414686 QKICVESIKRALK IPAMTIAKNAGIEGSLIVEKILQSSSEVGYDAMLGDFVNMVEKGIID LTKVVRTALLDAAGVASLLMTAEAVVTEIPKEEKEPAMGAM Hspd25P Rn GROEL pseudogene 27 2 q22 2 75608618 75611557 SIEPALKIANAHWKPLVITADDIDKKSLSTLDLTRLKLLKKVLFLEGPELCFRTLKIPAM KFSKNVGVEESLTVDKILQRSSEVGYNAMLEDFVNMVEKG IIDPTKIIRLLYWMLL Hspd26P Rn GROEL pseudogene 28 6 q23 1 80028738 80029070 MTIAKNAGVERSSIVEKILQNSSEAGYDIMLGDFVNMVEKGIIDPTNILRTALLGASGVT SLLTTAKAVVKEISKEEKDPGMDAMGGMGGGMG Hspd27P Rn GROEL pseudogene 29 7 q11 + 2 11777078 11777682 IFPYFNDTSKDDRC ELQDAYFLLSEKKISSVPFIVPSLEIGNAYWKSLVIIAEDDDGKAI SSLVLNMHLSEYVKERLNEKLANFQM Hspd28P Rn GROEL pseudogene 30 2 q16 + 1 62800517 62800732 SSSEVGYEAMLGDFVNMVEKGIIDPQKVVRTASLDATGVASLLTTSGAVVTEIPKEEKDP GV Hspd29P Rn GROEL pseudogene 31 5 q12 5 1469682 8 14698283 MLPITQMKRLGSGSPIATLLSRRALRRLAKATVPEEIVQVAMISANGDEDIGNIICDAKK MVGRTMLTRSPDALSSVGLNRLIVGIQVITAKSPRFGGNRKNQTKDMAVTPGGAVLKERE WNSNLKDVQAHDLGMSVKRKNYEWLMKASGGAAVLKVGGSSDAEENEEKDCYGYSECFTS SCSEKKKKKRNVKNAVVEGTLIVKEILQSSAEVGYDAARWKEESLIQQR Hspd30P R n GROEL pseudogene 32 9 q31 1 53891997 53892110 MLRLSTVLRQMRPVSRALAPHLTRAYAKDVKF Hspd31P Rn GROEL pseudogene 33 4 q22 + 1 53483173 53483310 ASLLTTAKAVVTEIPKEEKDPRMGAMGRM Hspd32P Rn GROEL pseudogene 34 4 q13 1 30834360 30834470 MVEKGIIGS TMVVRTALLDTPGVASLLTTTEAV

PAGE 14

Tcp1 Rn NP_036802 NM_012670.1 Cctalpha (real gene) vert 1 q12 12 42103629 42111145 MEGPLSVFGDRSTGEAIRSQNVMAAASIANIVKSSLGPVGLDKMLVDDIGDVTITNDGATILKLLEVEHP AAKVLCELADLQDKEVGDGTTSVVIIAAELLKNADELVKQKIHPTSVISGYRLACKEAVRYINENLIINT DELGRDCLINAAKTSMSSKIIGINGDFFANMVVDAVLAVKYTDIRGQPRYPVNSVNILKAHGRSQIESML INGYALNCVVGSQGMLKRIVNAKIACLDFSLQKTKMKLGVQVVITDPEKLDQIRQRESDITKERIQKILA TGANVILTTGGIDDMCLKYFVEAGAMAVRRVLKRDLKRIAKASGASILSTLANLEGEETFEATMLGQAEE VVQERICDDELILIKNTKARTSASIILRGANDFMCDEMERSLH DALCVVKRVLESKSVVPGGGAVEAALS IYLENYATSMGSREQLAIAEFARSLLVIPNTLAVNAAQDSTDLVAKLRAFHNEAQVNPERKNLKWIGLDL VHGKPRDNKQAGVFEPTIVKVKSLKFATEAAITILRIDDLIKLHPESKDDKHGGYENAVHSGALDD Table S7. Gene and pseudogene information, including start and end position, chromoso mal location, strand, number of exons, GenBank accession number for functional genes, and Ensembl or Pseudogene.org ID for pseudogenes.



PAGE 1

thsa Hs_CCT5 Hs_CCT4 Hs_CCT3 Hs_CCT6B Hs_CCT6A 100 Hs_CCT7 Hs_CCT1 Hs_CCT2 Hs_CCT8 Hs_CCT8L1 Hs_CCT8L2 100 43 100 15 17 26 78 36 0.2Figure S1. M L tree of human CCT and CCT8 L proteins, excluding BBS. "thsa" indicates Thermoplasma acidophilum alpha subunit of the thermosome. The scale bar represents the indicated number of substitutions per position for a unit branch length. the



PAGE 1

thsa Hs_CCT5 Hs_CCT4 Hs_CCT3 Hs_CCT6B Hs_CCT6A 100 Hs_CCT1 Hs_CCT2 Hs_CCT7 47 Hs_CCT8 Hs_MKKS 55 62 7 14 59 20 0.2Figure S2. ML tree of human CC T proteins with MKKS (BBS6), excluding other BBS proteins and CCT8L proteins. "thsa" represents the Thermoplasma acidophilum alpha subunit of the thermosome. The scale bar represents the indicated number of substitutions per position for a unit branch length.



PAGE 1

thsa Hs_CCT5 Hs_CCT4 Hs_CCT3 53 Hs_CCT6B Hs_CCT6A Hs_CCT2 Hs_CCT7 Hs_CCT1 Hs_CCT8 Hs_BBS10 77 23 3 100 1 1 30 23 0.2Figure S3 ML tree of human CC T proteins with BBS10, excluding other BBS proteins and CCT8L proteins "thsa" represents the Thermoplasma acidophilum alpha subunit of the thermosome. The scale bar represents the indicated number of substitutions per position for a unit branch length.



PAGE 1

Table S4 Expression pattern of the human CCT genes 1 Body tissue/site CCT BBS 1 2 3 4 5 6A 6B 7 8 8L1 8L2 6 10 12 adipose 3 0 5 4 2 1 0 6 5 0 0 0 1 0 adrenal 16 4 23 12 17 6 0 28 13 0 0 2 1 1 ascites 11 15 48 18 25 10 0 43 13 0 0 1 0 0 bladder 14 14 19 10 19 5 1 14 17 0 0 1 0 2 blood 44 28 50 16 64 19 0 58 36 0 0 4 1 3 bone 8 29 23 31 20 25 0 21 14 0 0 2 2 1 bone marrow 7 12 26 6 18 11 0 9 14 0 0 0 0 0 brain 494 162 655 281 424 118 10 663 417 0 4 105 36 18 cervix 8 8 65 26 38 16 1 55 21 0 0 3 5 0 connective 26 17 69 96 25 20 1 50 27 0 0 9 0 4 ear 4 1 4 1 0 6 0 0 9 0 0 0 0 1 embryonic 48 84 118 59 94 98 0 104 83 0 0 11 4 1 esophagus 11 7 34 19 32 3 0 44 28 0 0 2 0 0 eye 31 41 145 52 67 71 1 133 64 0 0 21 4 4 heart 32 18 44 33 13 20 0 58 32 0 0 8 0 1 intestine 88 43 183 53 87 39 0 156 105 0 0 10 2 3 kidney 45 38 132 41 50 50 0 85 52 0 1 14 4 8 larynx 2 5 4 2 2 0 0 1 0 0 0 0 1 0 liver 42 37 98 89 93 66 1 91 59 0 0 13 1 0 lung 61 58 254 94 115 68 1 150 88 0 0 12 5 2 lymph 11 6 171 14 21 3 4 0 71 21 0 0 0 0 0 l ymph node 14 15 17 24 14 28 0 9 5 0 0 0 2 0 mammary 43 36 84 39 68 17 0 71 25 0 0 6 5 0 mouth 4 1 9 4 10 12 0 10 4 0 0 7 2 3 muscle 17 9 66 17 17 26 1 78 17 0 0 1 3 2 nerve 2 1 3 0 2 1 0 3 3 0 0 1 0 0 ovary 19 14 91 36 42 11 1 81 30 0 0 4 0 0 pancreas 26 30 82 25 49 43 6 143 25 0 0 7 1 0 parathyroid 2 1 4 11 1 7 0 0 0 0 0 3 0 2 pharynx 8 6 45 5 22 3 1 5 23 0 0 4 0 0 pituitary 2 6 5 1 6 3 0 4 7 0 0 1 0 0 placenta 61 72 140 82 109 65 0 115 75 1 0 11 1 0 prostate 39 28 113 33 5 0 46 2 49 36 0 0 11 1 0 salivary 0 1 6 2 4 2 0 9 8 0 0 1 0 0 skin 44 57 239 58 86 103 1 164 59 0 0 11 6 3 spleen 22 2 23 9 14 2 1 23 19 0 0 6 0 2 stomach 22 27 54 25 35 30 4 55 26 0 0 11 1 0 testis 202 174 283 146 234 51 160 313 177 0 10 25 7 29 thym us 60 7 26 18 16 4 0 34 37 0 0 6 5 0 thyroid 15 7 9 9 9 5 0 9 24 0 0 1 0 0 tonsil 6 0 13 2 5 3 0 16 6 0 0 1 0 0 trachea 10 6 6 7 5 0 1 16 11 0 0 5 5 3 umbilical 9 8 16 8 8 3 0 11 13 0 0 1 0 0 uterus 65 54 119 60 84 87 1 120 87 0 0 21 3 7 vascular 16 13 46 14 30 3 2 32 17 0 0 0 0 1 1 Number of ESTs reported for each body tissue/site.



PAGE 1

Table S1. The mouse hsp60 genes and pseudogenes A. Genes Name Start 1 End 2 Str 3 Chr 4 Loc 5 Ex 6 Cct1 13,109,331 13,117,933 + 17 A2 12 Cct2 116,490,071 116,500,106 10 qD2 14 Cct3 88,103,257 88,125,467 + 3 qF1 13 Cct4 22,890,754 22,902,933 + 11 qA3.3 13 Cct5 31,520,686 31,531,460 15 qB3.2 11 Cct6a 130,293,315 130,321,693 + 5 F 13 Cct6b 82,532,867 825,77,729 11 B5 14 Cct7 85,409,109 85,418,268 + 6 D1 11 Cct8 87,484,236 87,496,033 16 qC3.3 15 Gm443 (CCT8L) 25,022,107 25,023,792 + 5 qA3 1 Mkks 136,700,005 136,706,971 2 qF3 4 Bbs10 110,735,779 110,738,219 + 10 qD1 4 Bbs12 37,217,982 37,220,105 + 3 qB 1 Hspd1 55,135,135 55,143,783 1 qC1.2 11 B. Cct related pseudogenes Name Start 1 End 2 Str 3 Chr 4 Loc 5 Ex 6 Cct1 1P 67,963,157 67,964,324 4 qC1 3 Cct1 2P 13,129,036 13,136,541 + 17 qA1 2 Cct3 1P 90,721,084 90,722,575 11 qC 5 Cct3 2P 52,913,471 52,915,552 + 4 B3 3 Cct3 3P 113,303,406 113,303,687 6 qE3 1 Cct4 1P 26,227,193 26,228,371 7 qA3 3 Cct6A 1P 52,361,244 52,362,111 + 8 qB1.3 3 Cct6A 2P 97,306,133 97,306,902 14 qE2.1 2 Cct6A 3P 79,544,122 79,545,468 + 18 qE3 1 Cct7 1P 12,863,162 12,881,543 + X A1.3 5 Cct7 2P 87,401,390 87,403,113 + 18 E4 5 Cct7 3P 20,811,426 20,812,537 + 1 qA4 4 Cct7 4P 48,381,857 48,382,517 11 qB1.2 1 Cct7 5P 58,647,550 58,647,741 + 13 qB1 1 Cct8 1P 51,795,721 51,795,882 + 2 qC1.1 1 Cct8 2P 35,774,568 35,782,271 3 qB 3

PAGE 2

Table S1 (continued) C. Hspd1 related pseudogenes Name Start 1 End 2 Str 3 Chr 4 Loc 5 Ex 6 Hspd1 1P 7 41,312,239 41,313,954 + 11 qA5 1 Hspd1 2P 105,808,716 105,810,383 4 qC7 1 Hspd1 3P 101,742,761 101,744,429 + 14 qE2.3 5 Hspd1 4P 64,965,611 64,967,286 8 qB3.1 3 Hspd1 5P 85,271,516 85,279,593 + 8 qC2 6 Hspd1 6P 49,455,871 49,457,564 1 qC1.1 5 Hspd1 7P 77,889,757 77,8 91,210 3 qE3 3 Hspd1 8P 89,108,911 89,111,283 4 qC4 5 Hspd1 9P 11,672,585 11,673,644 12 qA1.1 1 Hspd1 10P 93,743,990 93,745,672 + X qC3 2 Hspd1 11P 12,515,616 12,517,225 + 6 qA1 3 Hspd1 12P 45,234,912 45,235,851 + 1 qC1.1 2 Hspd1 13P 24,133,639 24,135,093 15 qA2 2 Hspd1 14P 79,001,216 79,001,752 + 2 qC3 1 Hspd1 15P 12,819,585 12,820,103 + 9 qA1 1 Hspd1 16P 61,872,231 61,873,436 3 qE1 1 Hspd1 17P 5,053,347 5,054,702 + 1 qA1 2 Hspd1 18P 58,011,087 58,011,631 8 qB2 1 Hspd1 19P 61,460,83 1 61,461,298 + 16 qC1.3 1 Hspd1 20P 98,441,180 98,441,466 + 2 qE1 2 Hspd1 21P 91,375,464 91,376,660 + 4 qC5 2 Hspd1 22P 48,192,945 48,193,097 17 qC 1 1 Start and 2 End of the gene or pseudogene in the genome; 3 Strand; 4 Chromosome; 5 Location; 6 Number o f exons; 7 Identified in Ensembl.



PAGE 1

Table S5 Expression pattern of the human cpn60 gene (HSPD1) and pseudogenes 1 Body tissue/s ite HSPD1 HSPD1 5P (LOC644745) HSPD1 6P (LOC645548) adipose 6 0 0 adrenal 96 0 0 ascites 31 1 0 bladder 21 0 1 blood 90 0 0 bone 14 0 0 bone marrow 37 1 0 brain 537 1 0 cervix 24 2 0 connective 32 0 0 ear 4 0 0 embryonic 101 1 1 esophagus 22 0 0 eye 45 1 0 heart 33 0 0 intestine 107 0 2 kidney 146 0 0 larynx 4 0 0 liver 143 1 0 lung 96 0 0 lymph 30 0 0 lymph node 4 0 0 mammary 35 1 0 muscle 18 0 0 nerve 43 0 0

PAGE 2

Table S5 (continued) Body tissue/s ite HSPD1 HSPD1 5P (LOC644745) HSPD1 6P (LOC645548) o vary 3 0 0 pancreas 27 0 0 parathyroid 24 0 0 pharynx 1 0 0 pituitary 12 0 0 placenta 3 0 0 prostate 61 0 0 salivary 50 10 0 skin 4 0 0 soft 75 0 0 spleen 21 0 0 stomach 31 0 0 testis 135 0 0 thymus 38 0 0 thyroid 17 0 1 tongue 0 0 0 tonsil 19 0 0 trachea 6 0 0 umbilical cord 78 2 0 uterus 43 0 0 vascular 6 0 0 1 Number of ESTs reported in each body tissue/site



PAGE 1

Table S2. The rat hsp60 gene s and pseudogenes A. Genes Name Start 1 End 2 St 3 Chr 4 Loc 5 Ex 6 Tcp1 42,103,629 42,111,145 11 q12 12 Cct2 56,394,922 56,402,261 7 q22 9 Cct3 180,383,106 180,433,933 + 2 q34 12 Cct4 103,675,244 103,687,376 + 14 q22 13 C Ct5 83,681,899 83,692,935 2 q22 11 Cct6a 27,992,366 28,002,203 + 12 q13 14 Cct6b ( L63658) 72,107,176 72,144,991 10 q26 11 Cct7 119,714,400 119,722,770 + 4 q34 11 Cct8 27,234,460 27,245,292 11 q11 15 Cct8L ( 125233) 5,004,090 5,005,772 4 q11 1 Mkks 124,975,607 124,981,825 3 q36 4 BBS10 (60748) 50,317,911 50,320,406 + 7 q21 2 BBS12 (61608) 123866726 123868843 + 2 q25 1

PAGE 2

Table S2 (continued 1) B. Cct related pseudogenes Name Start 1 End 2 St 3 Chr 4 Loc 5 Ex 6 Cct1 1P 122,326,746 122,328,035 4 q34 1 Cct2 1P 94,540,595 94,541,059 + X q31 2 Cct3 1P 21,483,523 21,485,110 + 19 p11 3 Cct3 2P 51,648,197 51,649,727 13 q13 4 Cct3 3P 93,409,548 93,410,990 + X q31 3 Cct3 4P 149,237,097 149,237,635 4 q42 2 Cct5 1P 111,419,580 111,419,580 X q 32 2 Cct5 2P 17,626,224 17,626,583 8 q13 1 Cct6A 1P 31,685,049 31,688,384 8 q13 5 Cct6A 2P 88,702,370 88,703,903 + 10 q32.1 2 Cct6A 3P 240,279 241,905 + 5 q11 3 Cct6A 4P 76,957,850 76,959,406 5 q24 3 Cct6A 5P 35,236,443 35,237,988 18 p11 4 C ct6A 6P 67,148,659 67,170,130 1 q12 5 Cct6A 7P 1,252,980 1,254,555 1 rnd 7 N.A. 8 5 Cct6A 8P 79,050,104 79,053,455 13 q22 5 Cct6A 9P 88,811,021 88,812,383 16 q12.5 4 Cct6A 10P 67,148,933 67,150,124 1 q12 3 Cct6A 11P 83,591,559 83,592,798 + 6 q 23 5 Cct6A 12P 50,508,448 50,509,665 9 q22 4 Cct6A 13P 66,456,610 66,457,113 14 q11 1 Cct6A 14P 67,168,555 67,168,845 1 q12 1 Cct6A 15P 8,728,585 8,728,815 + 5 q11 2 Cct6A 16P 76,283,952 76,284,215 + 18 q12.3 1 Cct7 1P 216,186,823 216,188,454 + 1 q43 1 Cct7 2P 145,306,136 145,307,710 + 5 q36 3 Cct7 3P 19,556,657 19,557,897 + 9 q13 3 Cct8 1P 43,940,890 43,941,006 + 11 q12 1 Cct8 2P 32,962,674 32,962,763 + 3 q12 1

PAGE 3

Table S2 (continued 2) C. Hspd1 related pseudogenes Name Start 1 End 2 St 3 C hr 4 Loc 5 Ex 6 Hspd1 1P 107,651,987 107,653,696 + 14 q22 1 Hspd1 2P 42,554,408 42,556,112 + 14 p11 1 Hspd1 3P 58,629,782 58,633,066 + 17 q12.1 2 Hspd1 4P 41,523,338 41,525,050 14 p11 2 Hspd1 5P 179,041,398 179,043,070 1 q35 2 Hspd1 6P 30,691,244 30 ,692,932 + 20 q11 4 Hspd1 7P 103,951,550 103,953,587 1 q22 2 Hspd1 8P 3,553,285 3,554,991 + 13 p13 1 Hspd1 9P 234,250,749 234,252,288 + 2 q43 1 Hspd1 10P 76,955,304 76,956,924 5 q24 4 Hspd1 11P 19,170,317 19,172,008 + 6 q12 6 Hspd1 12P 220,584,91 7 220,585,779 + 2 q42 1 Hspd1 13P 43,887,785 43,896,197 + 9 q22 4 Hspd1 14P 109,039,079 109,040,435 5 q32 4 Hspd1 15P 37,890,620 37,892,172 + 4 q21 5 Hspd1 16P 110,646,161 110,647,153 + X q32 1 Hspd1 17P 84,112,821 84,114,244 + X q31 4 Hspd1 18P 46 ,394,084 46,394,824 9 q22 1 Hspd1 19P 52,019,335 52,019,963 20 q13 1 Hspd1 20P 8,089,404 8,090,167 + 15 p16 2 Hspd1 21P 132,806,121 132,806,537 + X q35 1 Hspd1 22P 58562739 58,563,142 17 q12.1 1 Hspd1 23P 151675083 151,676,174 2 q31 4 Hspd1 2 4P 414366 414,686 15 rnd 7 N.A. 8 1 Hspd1 25P 75608618 75,611,557 2 q22 2 Hspd1 26P 80028738 80,029,070 6 q23 1 Hspd1 27P 11777078 11,777,682 + 7 q11 2 Hspd1 28P 62800517 62,800,732 + 2 q16 1 Hspd1 29P 14696828 14,698,283 5 q12 5 Hspd1 30P 5389 1997 53,892,110 9 q31 1 Hspd1 31P 53483173 53,483,310 + 4 q22 1 Hspd1 32P 30834360 30,834,470 4 q13 1 1 Start and 2 End of the gene/pseudogene in the genome; 3 Strand; 4 Chromosome; 5 Locatio n; 6 Number of exons; 7 Location not known; 8 N.A., Not Available



PAGE 1

1 Table S 6 Residue types at monomer monomer interaction positions in thermosome, CCT, BBS, and CCT8L proteins A. Intra ring Thermosome 1 CCTs 2 BBS12 3 BBS10 3 MKKS 3 CCT8L 3 Conserved aa types in CCTs 19 Q HQRST R S R S 24 Q LQRV R V V L 27 N HNS G* A* T* S 40 T ST T C* S P* Hydroxyl 45 K KNRV L* E* S* H* 47 M LMT S* R* R* R* 48 D DMN S* Q* L* Q* 49 K K K V* K K Lysine 50 M ILM F* L Q* F* Intermediate size hydrophobic 51 L ILMV I C* L L Intermediate size hydrophobic 52 V ILQV I T* H* V 57 D DG K H* E* G E* 58 I AILV S* V V T* Aliphatic 60 I ILMV L L T* C* Intermediate size hydrophobic 76 P PQ A* P P P 77 T AIT V* I I A 79 K KRS Q* R K W* 80 M LMSTV L M I* L 83 E DEKM E D A* E 87 A ALMSTV A S N* T 88 Q Q Q H* H* Q Glutamine 89 D DE N* L* V* A* Acidic 90 T* ADEIKQ N* K S* E 18 V* ILM V* K* L L Intermediate size hydrophobic 119 H HS P* D* T* P* 120 P PV I* P P R 121 T IQRST S L* T P* 128 R EQR S* K* K* A* 135 R HILRV S* L I L 163 N* AILQV P* V P* P* 166 L* EHIQRS S N M* 203 N* HKLPQV P N* K HP 206 S AGILRST P* P* R T 207 V* EGILQ E V* V* L 208 N* ADEMST T S I* E 224 H GHNSVY N V Q* G 226 K* DGQR P* M* M* 245 I* AGKLPTY Y P G P 246 K DEKMPT R* L* D A* 247 K AIK H* F* T* H* 248 T EGIMPT L* S* S P 249 E DEK G* T* D N* Charged 250 I DILMSTV F* S T A* 251 E* DGKNQ N G G P* 252 A AGHNSTV K* S E* A 253 K DEGKQRT S* E G T 254 V FILV A* F T* A* Hydrophobic

PAGE 2

2 Table S 6 (continued 1) Thermosome 1 CCTs 2 BBS12 3 BBS10 3 MKKS 3 CCT8L 3 Conserved aa ty pes in CCTs 255 Q* DEFLRV N* I* V C*R 256 I IVY I L* V L* 257 S DHKST K N* V* S 259 P AEPTVY V E Y P 262 I* FKLMRVY S* F L 263 Q ADEKMQT M Q V* A 264 D* AEKNQR R T* S* Q 265 F FILV L S* L F Hydrophobic 266 L ELQRSV Q Q E S 268 Q* AEGMR Y D* W* A G 269 E E S* I* V* S* Glutamate 270 T* EKRSW S M* L* D* 274 K DEKLQY N* K L E 275 Q ADENQ H* A N K* 297 V EFLMVY R* L S* E 300 H DHKQ E* Y* Q T* 301 Y AFLSY K* Y F L 304 K ADEKQR N* V* M* K 315 K EFGKR G S* V* RW* 319 E EKNR Q* S* E I* 328 K EKMRSTV V S Q* P* 330 V ALQV V F* I* L 331 T ANPST A V* G* P 332 D HRST Y* P S R 335 D ADEHNPQ Q A S* P 340 V CDEKMV C L* S* R* 354 D DEGKS R* Y* D 376 G AGPS P P R* A 377 T ANST V* V* N T 378 D DEKQ T* H* D T* 379 H EFHLMNQ A* G* T* Q 380 V FILMTV Q* L A* G* Hydrophobic 500 H KNQSW E* Q S Q 503 E HKLST R* T Q* R* 504 S AFLST R* S V* A 507 E ENQV D* Q E E 508 V AIMT L* C* T V* 510 T CEIRTV L* T N* L* 511 M LMNSTV L K* L Q* 514 R KLRS Q* T* D* T* 515 I IV T* I L* V Beta branched aliphatic 516 D D D D S* D Aspartate 517 D DENQ S* M* Y* E 518 V ILTV E* V V I Intermediate size hydrophobic 519 I IMRV I I I V 520 A KMNRS I* T* E* V*

PAGE 3

3 Table S6 (continued 2) B. Inter ring Thermosome 1 CCTs 2 BBS1 2 3 BBS10 3 MKKS 3 CCT8L 3 Conserved aa types in CCTs 25 R* AFKLSVY H* K R* L 29 E* CIMNQS Q Q S A* 108 K DEKRS S R E E Charged/hydrophilic 115 D ADEKQRS H* D K 116 Q EIKQR L* R A*F* 117 G GKQ G E* G G 429 R KLMRV W* E* E* P* 432 L LVWY N* M* L P* 439 K DEKQR S* N* S* RW* Charged/hydrophilic 446 R KMNRST T K G* K 450 E DEQV N* K* H* E 455 D DKNS S K I* A* Charged/hydrophilic 456 P ALPST S Y* L V* 457 I AINQT E* S* T S* 458 N DENQRS F* F* D D 1 Residue observed in the model str ucture of thermosome subunit A from T. acidophilum ; 2 Residues observed in all human CCT subunits; 3 Residues observed in human BBS and CCT8L proteins; *Residue types not observed in any human CCT subunit.



PAGE 1

Table S3. Lineage specific mutation events along human and chimp CCT8L evolutionary branches. 1 Name Branch No. 2 Codon base I II III Human CCT8L1 1 4 3 5 Chimp CCT8L1 2 6 3 5 Common CCT8L1 3 2 1 2 Total CCT8L1 12 7 12 Human CCT8L2 4 3 1 5 Chimp C CT8L2 5 1 3 3 Common CCT8L2 6 2 4 7 Total CCT8L2 6 8 15 1 Number of substitutions events inferred by parsimony along the indicated evolutionary branches ; 2 Branch numbers refer to the schematic evolutionary tree represented in Figure 3.


!DOCTYPE art SYSTEM 'http:www.biomedcentral.comxmlarticle.dtd'
ui 1471-2148-10-64ji 1471-2148fm
dochead Research article
bibl
title
p Chaperonin genes on the rise: new divergent classes and intense duplication in human and other vertebrate genomes
aug
au id A1 snm Mukherjeefnm Krishanuinsr iid I1 I2 email krishanu@ufl.edu
A2 Conway de MacarioEverlyI3 everlyc@gmail.com
ca yes A3 Macariomi JLAlbertomacarioster@gmail.com
A4 BrocchieriLucianolucianob@ufl.edu
insg
ins Department of Molecular Genetics and Microbiology, University of Florida, College of Medicine, 1660 SW Archer Road, Gainesville, FL 32610, USA
Genetics Institute, University of Florida, Cancer and Genetics Research Complex, 2033 Mowry Road, Gainesville, FL 32610, USA
University of Maryland, Columbus Center, 701 East Pratt Street, Baltimore, MD 21202, USA
source BMC Evolutionary Biology
issn 1471-2148
pubdate 2010
volume 10
issue 1
fpage 64
url http://www.biomedcentral.com/1471-2148/10/64
xrefbib pubidlist pubid idtype pmpid 20193073doi 10.1186/1471-2148-10-64
history rec date day 13month 8year 2009acc 132010pub 132010
cpyrt 2010collab Mukherjee et al; licensee BioMed Central Ltd.note This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
abs
sec
st
Abstract
Background
Chaperonin proteins are well known for the critical role they play in protein folding and in disease. However, the recent identification of three diverged chaperonin paralogs associated with the human Bardet-Biedl and McKusick-Kaufman Syndromes (BBS and MKKS, respectively) indicates that the eukaryotic chaperonin-gene family is larger and more differentiated than previously thought. The availability of complete genome sequences makes possible a definitive characterization of the complete set of chaperonin sequences in human and other species.
Results
We identified fifty-four chaperonin-like sequences in the human genome and similar numbers in the genomes of the model organisms mouse and rat. In mammal genomes we identified, besides the well-known CCT chaperonin genes and the three genes associated with the MKKS and BBS pathological conditions, a newly-defined class of chaperonin genes named CCT8L, represented in human by the two sequences CCT8L1 and CCT8L2. Comparative analyses from several vertebrate genomes established the monophyletic origin of chaperonin-like MKKS and BBS genes from the CCT8 lineage. The CCT8L gene originated from a later duplication also in the CCT8 lineage at the onset of mammal evolution and duplicated in primate genomes. The functionality of CCT8L genes in different species was confirmed by evolutionary analyses and in human by expression data. Detailed sequence analysis and structural predictions of MKKS, BBS and CCT8L proteins strongly suggested that they conserve a typical chaperonin-like core structure but that they are unlikely to form a CCT-like oligomeric complex. The characterization of many newly-discovered chaperonin pseudogenes uncovered the intense duplication activity of eukaryotic chaperonin genes.
Conclusions
In vertebrates, chaperonin genes, driven by intense duplication processes, have diversified into multiple classes and functionalities that extend beyond their well-known protein-folding role as part of the typical oligomeric chaperonin complex, emphasizing previous observations on the involvement of individual CCT monomers in microtubule elongation. The functional characterization of newly identified chaperonin genes will be a challenge for future experimental analyses.
meta
classifications
classification endnote subtype user_supplied_xml type bmc
bdy
Background
Hsp60-like chaperonin proteins are well known for their role in assisting protein folding and in protecting cells from the deleterious effects of stress abbrgrp
abbr bid B1 1
B2 2
B3 3
B4 4
B5 5
. The eukaryotic cell expresses representatives of two distinct groups of chaperonin genes that are otherwise typical of bacteria (Group I) or archaea (Group II). In eukaryotes, Group I chaperonins are mostly expressed in mitochondria and chloroplasts, and Group II chaperonins are found in the eukaryotic cytosol
1
B6 6
B7 7
B8 8
B9 9
B10 10
. Chaperonin proteins form typical multi-subunit double-ringed structures collectively called "chaperonins"
9
10
B11 11
B12 12
B13 13
. The Group I chaperonins are typically formed by the products of a single gene (it groEL in bacteria; hsp60/cpn60 in mitochondria) assembled into a 14-subunit double-ringed structure in bacteria and into a double or single-ringed structure in mitochondria
B14 14
. Eukaryotic Group II chaperonin proteins assemble in a similar double-ringed oligomeric structure, called TRiC or CCT complex
B15 15
, composed of 16 subunits that in human are encoded by nine distinct genes (tcp1/cct1, cct2-5, cct6A-B, cct7-8)
8
9
10
. The CCT complex is mostly known for its role in folding the cytoskeleton proteins actin and tubulin
7
B16 16
and mutations in individual CCT subunits lead to defects in the functioning of the cytoskeleton and mitosis arrest
B17 17
.
As for other chaperones, the malfunctioning of chaperonin proteins has been associated with various human pathological conditions, the chaperonopathies
B18 18
B19 19
B20 20
. In this respect, besides the canonical cct and cpn60 genes described above, three divergent hsp60-like genes have been more recently identified
B21 21
B22 22
B23 23
in association with pathological conditions. One gene, MKKS
21
, was named for its association with the developmental disease McKusick-Kaufman Syndrome and was soon after also identified as BBS6
B24 24
for its association with the Bardet-Biedl Syndrome (BBS), another developmental condition involving cilium-related dysfunction
B25 25
. More recently two other hsp60-like BBS genes, named BBS10
22
and BBS12
23
, have been identified among fourteen genes (BBS1 to BBS14) so far associated with BBS. The protein products of MKKS/BBS6, BBS10 and BBS12 localize to the basal body of cilia and to the centrosome
B26 26
B27 27
B28 28
. We will hereafter refer to the MKKS/BBS6 gene as MKKS, and collectively to the three hsp60-like BBS genes as the "BBS genes". The identification of these genes provides new perspectives on the spectrum of functionalities of Hsp60-like proteins in eukaryotes and on their role in development.
The recognition of chaperonopathies has increased the importance of elucidating the entire set of chaperone genes present in the human genome
19
. The work reported here was conceived to: a) identify all Hsp60-like sequences encoded in the human and other genomes including all diverged chaperonin genes; b) reconstruct the evolutionary origins and relations of diverged chaperonin genes; c) distinguish with bioinformatics methods functional genes from pseudogenes; d) characterize structural properties of the corresponding proteins. We mostly devoted our attention to the characterization of the evolutionary history and structural properties of newly or recently identified sequences, referring the reader to the vast amount of published literature for information on functional/structural properties and the evolutionary history of mitochondrial Cpn60 or CCT-complex proteins.
Exhaustive searches of hsp60-like sequences were carried out in human and other genomes following and extending our "chaperonomics" methodological protocol
B29 29
. The extensive analysis of the genomes of human and other vertebrate species lead to the identification and characterization of many previously unknown sequences and to the discovery of a new, mammal-specific class of chaperonin proteins. Classification, evolutionary analysis and structural characterization of diverged chaperonin-like sequences should provide valuable information for future studies on the functional roles of these proteins.
Results
Chaperonin sequences in the human genome
To identify all human hsp60-like sequences we queried the human genome using the nine human CCT subunit and mitochondrial Cpn60 sequences. Analogous extensive searches were performed in the mouse and rat genomes using corresponding queries. In the human genome, we found a total of 54 sequences with significant similarity to Hsp60 proteins (Tables tblr tid T1 1 and T2 2). Fifteen sequences had a NCBI Entrez
B30 30
gene descriptor assigned. Nine of these corresponded to the canonical CCT-subunit sequences and one, HSPD1, encoded the mitochondrial Cpn60 protein. Three sequences corresponded to the BBS genes MKKS, BBS10 and BBS12. We recovered two additional uncharacterized sequences designated in the NCBI Entrez Gene database as CCT8L1 and CCT8L2. Besides these complete Hsp60-like sequences, a sequence domain conserved across eukaryote species with highest similarity to the apical domain of the CCT3 protein has also been reported in PIKFYVE
B31 31
, a kinase belonging to the Fab1p protein family involved in corneal pathological conditions
B32 32
. In addition, we identified 39 other human hsp60 sequences that did not correspond to a gene descriptor in the NCBI Entrez Gene database (Table 2). All of these sequences contained in-frame stop codons or frame-shifts, suggesting that they were most likely pseudogenes. Thirty-five of these had not been described in the Pseudogene.org pseudogene database
B33 33
and 33 were not listed in the Ensembl database
B34 34
, and are here annotated and classified for the first time. In analogous searches of the complete genomes of mouse and rat, we identified in each genome 14 chaperonin genes (nine for the canonical CCT monomers, one for the mitochondrial Cpn60, three BBS genes and one CCT8L gene), 38 pseudogenes in mouse and 61 pseudogenes in rat (see additional file supplr sid S1 1: Table S1, for mouse sequences, and additional file S2 2: Table S2, for rat sequences).
suppl
Additional file 1
text
b Table S1. Mouse hsp60 genes and pseudogenes.
file name 1471-2148-10-64-S1.DOC
Click here for file
Additional file 2
Table S2. The rat hsp60 genes and pseudogenes.
1471-2148-10-64-S2.DOC
Click here for file
tbl Table 1caption The human hsp60 genestblbdy cols 10
r
c left
Namesup 1
Alternative names
Start2
End3
Str4
Chr5
Loc6
IF7
Exons8
aa9
cspan
hr
CCT110
TCP1, CCTa, CCTα TCP-1-α
160,119,520
160,130,731
-
6
q25.3
2
12, 7
556, 401
CCT2
CCT β, TCP-1-β
68,266,317
68,280,052
+
12
q15
1
14
535
CCT3
CCT γ, TCP-1-γ
154,545,617
154,572,307
-
1
q23.1
3
13, 13, 12
545,544, 507
CCT4
CCT δ, TCPD, TCP-1-δ
61,950,076
61,969,146
-
2
p15
1
13
539
CCT5
CCT ε, TCP1E, TCP-1-ε
10,303,453
10,317,892
+
5
p15.2
1
11
541
CCT6A
CCT ζ, CCT ζ-1, TCP-1-ζ, CCT6, Cctz, HTR3, TCP20, TCPZ, TTCP20
56,087,036
56,098,269
+
7
p11.2
2
14, 13
531,486
CCT6B
CCT ζ-2, TCP-1-ζ-2, Cctz2, TSA303, Tcp20
30,279,183
30,312,525
-
17
q12
1
14
530
CCT7
CCT η, TCP-1-η, Ccth, NIP7-1
73,320,279
73,333,494
+
2
p13.2
2
12, 7
543,339
CCT8
CCT θ, TCP-1-θ, Cctq
29,350,670
29,367,782
-
21
q21.3
1
15
548
CCT8L1
LOC155100
151,773,495
151,775,165
+
7
q36.1
1
1
557
CCT8L2
GROL, CESK1
15,451,770
15,453,440
-
22
q11.1
1
1
557
MKKS
BBS6
10,333,898
10,342,162
-
20
p12.2
2
4, 4
570,570
BBS10
C12orf58, FLJ23560
75,263,727
75,266,269
-
12
q21.2
1
2
723
BBS12
C4orf24, FLJ35630, FLJ41559
123,882,498
123,884,627
+
4
q27
1
1
710
HSPD1
GROEL, HSP60, SPG13, CPN60, HuCHA60
198,060,018
198,071,817
-
2
q33.1
2
11, 11
573,573
(PIKFYVE)
11
CFD, FAB1, PIP5K, PIP5K3
209,182,591
209,190,094
+
2
q34
1
5
224
tblfn
1Official NCBI Entrez gene database name; 2Start and 3End of coding region; 4Strand "+" indicates sequenced strand. "-" indicates complementary strand; 5Chromosome; 6Chromosome location; 7Number of isoforms; 8Number of exons. Multiple numbers indicate the number of exons in each isoform; 9Total amino acids; 10The official Entrez name is TCP1. CCT1 improves consistency with other subunit gene names. 11Fab1_TCP sequence domain of PIKFYVE kinase, most similar to the apical domain of CCT3. Features refer to the domain portion of the gene/protein.
Table 2The human hsp60 pseudogenes13
Name1
Start2
End3
Str4
Chr5
Loc6
Ex7
P/D8
Ka/Ks9
LRT10
FS11
SC12
aa13
CCT1-1P
19,986,638
19,987,216
+
12
p12.2
1
P
0.75
0.16
5
2
190
CCT1-2P
41,621,756
41,623,646
-
5
p13.1
2?
D
1.21
0.15
7
5
512
CCT1-3P14
42,801,030
42,802,033
+
7
p14.1
3
D
0.68
2.10
1
1
367
CCT3-1P
16,177,578
16,178,178
+
8
p22
1
P
1.02
0.0
2
2
159
CCT4-1P
64,177,578
64,409,590
+
X
q12
3
D
0.65
1.76
0
3
512
CCT4-2P
140,344,301
140,345,787
-
7
q34
4
D
0.82
1.24
2
10
278
CCT5-1P14,15
78,382,086
78,382,680
+
13
q31.1
1
P
0.81
0.20
3
4
549
CCT5-2P15
78,382,866
78,382,967
-
13
q31.1
1
?
-
-
-
1
34
CCT5-3P
114,876,388
114,877,290
+
5
q22.3
1
P
0.25
2.12
6
3
201
CCT6-1P
14,692,965
14,693,954
-
5
p15.2
1
P
1.00
0.0
2
2
330
CCT6-2P
109,013,584
109,014,117
-
11
q22.3
1
P
0.90
0.0
0
2
178
CCT6-3P16
64,162,812
64,171,325
+
7
q11.21
8
D
0.43
3.06
5
4
289
CCT6-4P
191,915,332
191,916,879
+
3
q28
1
P
0.57
6.90**
9
4
292
CCT6-5P14,16
64,853,564
64,865,440
+
7
q11.21
10
D
0.84
0.34
12
6
399
CCT7-1P14
92,251,627
92,307,366
-
5
q15
1
P
0.45
1.88
3
5
145
CCT7-2P
150,242,815
150,243,240
+
6
q25.1
1
P
0.87
0.10
3
4
552
CCT8-1P14
145,141,482
145,143,137
-
1
q21.1
1
P
1.14
0.10
2
3
561
HSPD1-1P14
135,744,902
135,745,039
-
5
q31.1
1
P
1.46
0.27
0
0
48
HSPD1-2P14,17
21,919,402
21,920,175
-
5
p14.3
1
P
0.90
0.10
1
1
264
HSPD1-3P
43,602,029
43,602,280
-
20
q13.12
4
D
-
-
0
1
84
HSPD1-4P
88,065,673
88,066,269
+
6
q15
1
P
0.55
1.08
4
5
199
HSPD1-5P14
55,191,053
55,192,769
+
12
q13.2
1
P
0.56
3.08
2
1
499
HSPD1-6P14
36,783,612
36,785,195
-
3
P22.3
1
P
0.59
2.46
2
2
443
HSPD1-7P18
7,263,938
7,265,475
+
8
p23.1
1
P
1.12
0.18
5
4
396
HSPD1-8P
145,986,418
145,987,946
+
4
q31.21
1
P
0.63
2.32
4
3
458
HSPD1-9P18
7,785,932
7,787,502
-
8
p23.1
1
P
0.91
0.08
5
3
416
HSPD1-10P
8,058,884
8,082,857
+
12
p13.31
2?
D
0.78
0.94
1
1
307
HSPD1-11P
95,130,459
95,132,169
+
5
q15
5
D
0.74
2.44
6
6
375
HSPD1-12P
78,321,372
78,323,341
+
13
q31.1
1
P
0.62
4.98*
5
4
410
HSPD1-13P
153,068,626
153,068,943
+
6
q25.2
1
P
0.54
2.84
1
2
108
HSPD1-14P14
37,465,288
37,466,827
-
13
q13.3
4
D
0.68
3.3
6
3
361
HSPD1-15P
19,269,394
19,270,353
+
5
p14.3
4
D
0.74
1.24
4
4
241
HSPD1-16P
105,082,802
105,083,755
+
11
q22.3
2?
D
0.51
6.24*
5
4
199
HSPD1-17P
34,077,070
34,078,293
+
1
p35.1
3
D
0.74
1.48
2
2
217
HSPD1-18P
56,105,684
56,108,736
+
20
q13.32
5
D
0.48
10.84**
3
2
299
HSPD1-19P
50,318,868
50,319,008
+
10
q11.23
1
?
2.42
0.72
0
0
47
HSPD1-20P
78,924,341
78,924,478
-
12
q21.31
1
?
0.40
3.08
0
0
46
HSPD1-21P
60,994,430
60,994,876
-
5
q12.1
6
D
-
-
0
6
155
HSPD1-22P
29,181,851
29,183,334
-
21
q21.3
2?
D
0.69
1.4
3
4
344
1Pseudogene names follow the HUGO nomenclature. They are composed of the name of the parental gene followed by a unique number identifier and the suffix "P" (Pseudogene); 2Start and 3 End positions of the pseudogene on the chromosome; 4Strand; 5Chromosome; 6Location on the chromosome; 7Number of exons. A question mark indicates gene fragments with uncertain numbers of exons; 8Processed (P), duplicated (D) or undetermined (?); 9Ratio of non-synonymous vs. synonymous substitution rates; 10Likelihood Ratio Test (LRT) values. Values different from 1.0 with probability p < 0.01 (**) or p < 0.05 (*) are shown in bold-face; 11Number of Frame-Shifts recognized in the coding region of the pseudogene; 12Number of in-frame Stop Codons recognized in the coding region of the pseudogene; 13Length in amino acids of pseudo-translation of the recognized pseudogene sequence; 14Ten pseudogenes previously reported in the Ensembl (roman), Pseudogene.org (italics) or NCBI (bold) databases: CCT1-3P = OTTHUMG00000033751; CCT5-1P = Human.chr13.mb78; CCT6-5P = ENSP00000275603, Human.chr7.mb64; CCT7-1P = ENST00000399032; CCT8-1P = Human.chr1.mb145; HSPD1-1P = ENSG00000162241, Human.chr5.mb135; HSPD1-2P = ENSP00000328369; HSPD1-5P = LOC644745; HSPD1-6P = LOC645548; HSPD1-14P = OTTHUMG00000016753; 15,16,18Tandemly duplicated;17Previously identified as Hsp60s2 (Hsp60 short form 2).
Evolutionary origins of human BBS and CCT8L genes
A maximum-likelihood (ML) phylogenetic tree of human chaperonin-like proteins (Figure figr fid F1 1a) indicated that Hsp60-like BBS proteins are monophyletic (bootstrap support 86%) and that their common ancestor derived from a duplication event in the CCT8 lineage (bootstrap support 88%). The tree also showed that the unique ancestor of the two closely related genes CCT8L1 and CCT8L2 also originated in the CCT8 lineage from a more recent duplication event (bootstrap support 75%). The relation of BBS and CCT8L proteins with the CCT8 chaperonin subunit was confirmed with strong conditional probability support (0.99) by Bayesian tree construction (Figure 1b).
fig Figure 1Evolutionary trees of CCT proteins
Evolutionary trees of CCT proteins. (a) Maximum-likelihood evolutionary tree of all human chaperonin-like proteins, including CCT monomers, MKKS, BBS10, BBS12 and the two members, CCT8L1 and CCT8L2, of the newly defined CCT8L class. Numbers associated with each branch indicate bootstrap support from 100 replicates. Tree rooted by the archaeal thermosome alpha subunit of Sulfolobus solfataricus (Ss_ThsA). (b) Bayesian evolutionary tree of the same sequences shown in (a). The numbers assigned to each branch indicate posterior probabilities. Tree rooted by the thermosome alpha subunit of Thermoplasma acidophilum (Ta_ThsA). The scale bars represent the indicated number of substitutions per position for a unit branch length.
graphic 1471-2148-10-64-1 hint_layout double
Although the association of BBS and CCT8L proteins with the CCT lineage was robustly supported, the high divergence of these sequences could produce clustering in the trees due to long-branch attraction. To address this concern, we built independent ML trees for each BBS or CCT8L sequence adding them separately to the tree of CCT subunits. All individual trees confirmed with strong bootstrap support the association of each BBS or CCT8L lineage with the CCT8 lineage (see additional file S3 3: Figure S1, additional file S4 4: Figure S2, additional file S5 5: Figure S3 and additional file S6 6: Figure S4). A ML evolutionary tree including hsp60-gene homologs found in the genomes of eighteen other vertebrate species, including representatives of several mammals, chicken, frogs, and fish, also confirmed the origin of BBS and CCT8L genes from the CCT8 lineage (see additional file S7 7: Figure S5).
Additional file 3
Figure S1. Phylogenetic tree of human CCT1-8 and CCT8L proteins.
1471-2148-10-64-S3.PDF
Click here for file
Additional file 4
Figure S2. Phylogenetic tree of human CCT1-8 and MKKS proteins.
1471-2148-10-64-S4.PDF
Click here for file
Additional file 5
Figure S3. Phylogenetic tree of human CCT1-8 and BBS10 proteins.
1471-2148-10-64-S5.PDF
Click here for file
Additional file 6
Figure S4. Phylogenetic tree of human CCT1-8 and BBS12 proteins.
1471-2148-10-64-S6.PDF
Click here for file
Additional file 7
Figure S5. Phylogenetic tree of vertebrate CCT1-8, MKKS, BBS10, BBS12 and CCT8L proteins.
1471-2148-10-64-S7.PDF
Click here for file
We did not find CCT8L genes in the genomes of chicken, Xenopus laevis, or Danio rerio, representatives respectively of the reptile/bird, amphibian and fish lineages. However, among mammals we identified orthologs of CCT8L genes in genomes not only of placental mammals (Eutheria), but also of the marsupial opossum (Metatheria) and of the egg-laying platypus (Prototheria), suggesting that the CCT8L gene class originated at the onset of mammal evolution. All CCT8L gene orthologs were intron-less, indicating that their ancestor originated from a retro-transposition event. Two copies of CCT8L sequences were found in human and chimp and one CCT8L gene in all other genomes examined, including those from the other primate rhesus monkey (Macaca mulatta) and gray mouse lemur (Microcebus murinus) (Figure F2 2), suggesting that a duplication of the CCT8L gene occurred in Hominoidea after their separation from old world monkeys. However, the lone gene copy of CCT8L identified in rhesus monkey clustered with CCT8L1 in evolutionary trees (Figure 2), suggesting an earlier duplication of the gene and successive loss of the CCT8L2 copy from the genome of rhesus monkey. Close inspection of protein alignments revealed that the rhesus monkey CCT8L sequence included an anomalously diverged segment of about 50 amino acids of uncertain alignment. Excluding this segment from the analysis we obtained a different and more robustly supported tree topology (75% vs. 20% bootstrap value, see additional file S8 8: Figure S6, panels a and b), consistent with a later duplication of the CCT8L gene in Hominoidea. The tree also indicated that the removed segment was alone responsible for the overall higher evolutionary rate predicted for this sequence (see additional file 8: Figure S6).
Additional file 8
Figure S6. Phylogenetic trees of CCT8L protein sequences from primates (a,b) and partial alignment showing a divergent region in the sequence from rhesus monkey (c).
1471-2148-10-64-S8.PDF
Click here for file
Figure 2Evolutionary tree of CCT8L sequences
Evolutionary tree of CCT8L sequences. ML tree of CCT8L sequences from various mammal genomes. The homolog of human CCT8L1 in chimp (Ptr) is characterized as pseudogene and is shown in bold-italics font. Species abbreviations: Bt, Bos taurus (cow); Cf, Canis lupus familiaris (dog); Dn, Dasypus novemcinctus (nine-banded armadillo); Dr, Danio rerio (zebrafish); Ec, Equus caballus (horse); Ga, Gasterosteus aculeatus (stickleback, fish); Gg, Gallus gallus domesticus (chicken); Hs, Homo sapiens (human); La, Loxodonta africana (african bush elephant); Md, Monodelphis domestica (south american gray short-tailed opossum, marsupial); Mm, Mus musculus (mouse); Mmu, Macaca mulatta (rhesus monkey); Mmur, Microcebus murinus (gray mouse lemur); Oa, Ornithorhynchus anatinus (platypus); Ol, Oryzias latipes (the medaka or japanese killifish); Pp, Pongo pygmaeus (northwest bornean orangutan); Ptr, Pan troglodytes (chimpanzee); Rn, Rattus norvegicus (rat); Tn, Tetraodon nigroviridis (spotted green pufferfish); Tr, Takifugu rubripes (japanese pufferfish); Xl, Xenopus laevis (african clawed frog, amphibian); Xt, Xenopus tropicalis (western clawed frog, amphibian). The scale bar represents the indicated number of substitutions per position for a unit branch length.
1471-2148-10-64-2 single
Differentiation rate of BBS and CCT8L proteins
The branch lengths of the trees shown in Figure 1 indicate that BBS and CCT8L proteins have differentiated at much higher rates than CCT subunits. We applied a newly-developed, unbiased measure of differentiation called "B-index" (see Methods) to calculate differentiation of MKKS, BBS10 and BBS12 proteins from their respective last ancestor common to Actinopterygii (ray-finned fishes) and Sarcopterygii (including tetrapods), determined by rooting the trees with CCT8 proteins from corresponding fish and tetrapod species. Similarly, we calculated differentiation of CCT8L proteins from a eutherial ancestor rooting their tree with corresponding sets of CCT8 proteins (see footnotes of Table T3 3 and legend for Figure 2 for species represented in each tree). We estimated for the MKKS family an average evolutionary distance from their root of almost 0.7 substitutions per site, corresponding to a 6-fold higher rate of differentiation compared to the number of substitutions estimated in CCT8 proteins over the same period of time. For BBS10 and BBS12, we calculated a distance of about 1.0-1.2 substitutions per site, corresponding to a substitution rate about 8-10 times higher than in CCT8. Finally, for the mammal-specific family of CCT8L proteins, we estimated an evolutionary distance from their mammal root of about 0.3 substitutions per site. The smaller divergence of CCT8L proteins compared to BBS proteins reflects the more recent origin of the CCT8L gene. However, when scaled to the evolution of CCT8 sequences over the same periods of time, the substitution rate of CCT8L proteins was about 14-15 times higher than in CCT8 and 1.4-2.3 times higher than in BBS proteins.
Table 3Divergence of BBS and CCT8L proteins relative to CCT8 proteins5
center
MKKS
BBS10
BBS12
CCT8L1
No. species2
14
11
11
5
Size (Wsub B)3
5.7770
4.6020
5.8949
3.3859
B-index (BB)4
0.6976
1.1079
1.0284
0.3196
Unbiased pair-wise distance (BB × 2)
1.3952
2.2159
2.0568
0.6393
LB (BB × WB)5
4.0300
5.0987
6.0623
1.0822
Average Dij (DB)6
2.0951
2.9660
3.2858
0.8017
CCT87
Size (WC)
5.5202
4.5503
4.3647
3.1100
B-index (BC)
0.1146
0.1373
0.0992
0.0227
Unbiased pair-wise distance (BC × 2)
0.2291
0.2747
0.1983
0.0454
LC (BC × WC)
0.6324
0.6250
0.4328
0.0706
Average Dij (DC)
0.3394
0.3687
0.2709
0.0545
BB/BC
6.0873
8.0692
10.3669
14.0793
LB/LC
6.3725
8.1579
14.0072
15.3286
DB/DC
6.1730
8.0445
12.1292
14.7101
WB/WC
1.0465
1.0114
1.3506
1.0887
1Only the human CCT8L2 branch was included in the tree. The CCT8L1 branch had equivalent length; 2Chaperonin-BBS sequences used in the trees were from the following species (see the legend for Figure 2 for a complete list of abbreviations and species names). MKKS: Bt, Cf, Dr, Ec, Ga, Gg, Hs, Md, Mm, Mmu, Ol, Rn, Tr, Xt; BBS10: Bt, Cf, Dr, Ec, Ga, Hs, Md, Mm, Ol, Rn, Tr; BBS12: Bt, Cf, Dr, Ec, Ga, Gg, Hs, Mm, Ol, Rn, Tr, Xl; CCT8L: Bt, Cf, Hs, Mm, Rn. 3Size is the average number of sequences contained in a cluster over evolutionary time (see Methods); 4The B-index measures the average substitutions per site (evolutionary distance) of the sequences within a cluster from their common ancestor; 5L is the length of the tree (sum of the lengths of all branches); 6Average Dij is the average pair-wise evolutionary distance of the sequences; 7Estimates for CCT8 were computed over corresponding species represented by the sets of MKKS, BBS10, BBS12 or CCT8L proteins (see footnote 2, above).
Functional constraints in the evolution of CCT8L genes
We tested functionality of CCT8L genes from several species estimating ratios of non-synonymous and synonymous substitution rates (Ka/Ks) along their respective lineages (see Methods). The results of this analysis are shown in Table T4 4, which indicates the gene(s) analyzed (foreground), the two genes used to identify foreground and background branches, the estimated Ka/Ks values and their significance. The evolutionary lineages for which Ka/Ks values were evaluated correspond to the branch numbers identified in the overall tree topology shown in Figure F3 3. In this tree are represented the "molecular tree" of mammal phylogenetic relations
B35 35
, the gene duplication event involving the CCT8L gene family in primates as inferred by this analysis, and the pre-mammal separations of the CCT7, CCT8 and CCT8L families of paralogs. This topology is in agreement with the evolutionary tree of CCT8L genes (Figure 2) with the only exception of the weakly supported position of the CCT8L sequence from rhesus monkey (see above). The highly significant constraints in non-synonymous substitution rates (Ka/Ks < 1.0) estimated in the overall evolution of the CCT8L family (Table 4, foreground genes: "All CCT8L1/2") indicated that the CCT8L sequences are genes generally expressing functional proteins. In evaluating Ka/Ks ratios for individual CCT8L gene lineages (Table 4), significantly constrained evolution (Ka/Ks < 1.0) was detected for branches leading to most sequences, including those of murids, lemur, cow, dog, elephant, marsupial, and to the human CCT8L1 and CCT8L2 group along the hominoid lineage. Constrained evolution was also estimated for the CCT8L genes of armadillo and rhesus monkey, and for human CCT8L1 and human and chimp CCT8L2 after divergence of human and chimp, although in these cases Ka/Ks values did not reach significance. In the cases of the human and chimp CCT8L1 and CCT8L2 genes, the lack of significance can be related to the loss of power of the test since few mutations accumulated after separation of these sequences (see additional file S9 9: Table S3). In the case of rhesus monkey CCT8L, we found that its relatively high estimate of Ka/Ks (= 0.73) was due to the previously mentioned 50-amino-acid diverged region within this sequence. After removing this region we estimated Ka/Ks = 0.55. Only for the lineage of chimp CCT8L1 we estimated Ka/Ks ≅ 1, consistent with differentiation of a non-functional sequence. Since this sequence was also characterized by an internal stop codon and a frame-shift, all evidence strongly suggests that chimp CCT8L1 is a pseudogene.
Additional file 9
Table S3. Codon-base specific counts of mutation events along human and chimp CCT8L evolutionary branches.
1471-2148-10-64-S9.DOC
Click here for file
Table 4Ka/Ks substitution ratios in CCT8L genes evolution
Foreground genes1
Background genes1
Foreground Ka/Ks2
LRT (p)3
Foreground branches4
All CCT8L1/2
Human CCT8, Human CCT7
0.29
205.06
(<0.001)
1 to 25
Human CCT8L1
Chimp CCT8L1, Human CCT8L2
0.58
0.88
1
Chimp CCT8L1
Human CCT8L1, Human CCT8L2
1.02
0.00
2
Human CCT8L2
Chimp CCT8L2, Human CCT8L1
0.48
1.2
4
Chimp CCT8L2
Human CCT8L1, Human CCT8L2
0.39
1.8
5
Human CCT8L2
Human CCT8L1, Rhesus CCT8L
0.42
4.02
(<0.05)
4+6
Human CCT8L1
Human CCT8L2, Rhesus CCT8L
0.29
5.72
(<0.05)
1+3
Mouse and Rat CCT8L
Cow CCT8L, Human CCT8L2
0.38
31.14
(<0.001)
12+13+14
Mouse CCT8L
Rat CCT8L, Human CCT8L2
0.64
1.21
12
Rat CCT8L
Mouse CCT8L, Human CCT8L2
0.49
5.91
(<0.05)
13
Rhesus CCT8L
Lemur CCT8L, Human CCT8L2
0.73 (0.55)5
1.91 (1.22)5
8
Lemur CCT8L
Human CCT8L2, Mouse CCT8L
0.29
36.82
(<0.001)
10
Dog CCT8L
Cow CCT8L, Human CCT8L2
0.31
12.07
(<0.001)
16
Cow CCT8L
Dog CCT8L, Human CCT8L2
0.13
113.78
(<0.001)
17
Armadillo CCT8L
Elephant CCT8L, Human CCT8L2
0.36
0.57
20
Elephant CCT8L
Marsupial CCT8L, Human CCT8L2
0.29
14.96
(<0.001)
21
Marsupial CCT8L
Elephant CCT8L, Human CCT8L2
0.31
62.63
(<0.001)
23+24
1See text for the definition and meaning of Foreground and Background species; 2Ka/Ks is the estimated ratio of non-synonymous and synonymous substitution rates; 3LRT, Likelihood Ratio Test results for estimated Ka/Ks vs. Ka/Ks = 1.0 (see Methods). Probabilities (p) not shown signify p 0.05; sup4/supForeground-branch numbers correspond to the numbering in the schematic tree shown in Figure 3. sup5/supValues in parenthesis were obtained after removing an unusually diverged region from rhesus CCT8L (see text)./p
/tblfn/tbl
fig id="F3"titlepFigure 3/p/titlecaptionpEvolutionary relations of CCT8L genes/p/captiontext
pbEvolutionary relations of CCT8L genes/b. Schematic representation of evolutionary relations of CCT8L genes from different eukaryotic species rooted by CCT8 and CCT7 sequences. The numbers associated with each branch identify the branches for which branch-specific KaKs values are evaluated (Table 4)./p
/textgraphic file="1471-2148-10-64-3" hint_layout="single"//fig
pTo assess the functionality of human CCT8L sequences we investigated their expression profiles in comparison to those of human CCT monomers and BBS genes (see additional file supplr sid="S10"10/supplr: Table S4). Expression of CCT8L2 was confirmed by fifteen ESTs mostly identified from the testis, whereas only one EST identified as a CCT8L1 transcript has been so far reported (NCBI UniGene database, November 20, 2009). Querying the NCBI GEO microarray database, we found 542 expression-profile records identifying expression of CCT8L2, and none identifying expression of CCT8L1 (as of November 20, 2009). It must be noted, however, that CCT8L2 and CCT8L1 have similarity of 97.3% at the DNA level. Similarly to CCT8L2, another mammal-specific chaperonin gene, CCT6B, is also expressed almost exclusively in the testis, from which 160 ESTs have been reported versus an average of 4.4 ESTs (from 0 to 10 per tissue) found in all other tissues./p
suppl id="S10"
title
pAdditional file 10/p
/title
text
p
bTable S4/b. Expression pattern (EST counts) of the human CCT and BBS genes from the UniGene database./p
/text
file name="1471-2148-10-64-S10.DOC"
pClick here for file/p
/file
/suppl
/sec
sec
st
pPseudogenes/p
/st
pWe identified in the human genome 39 sequences with significant similarity to CCT or HSPD1 genes that either were short fragments or were characterized by in-frame stop codons or frame-shifts. Based on their corruption, we classified these sequences as pseudogenes (Table tblr tid="T2"2/tblr). Similarly, searching the mouse and rat genomes we identified 38 and 61 pseudogenes, respectively (see additional file supplr sid="S1"1/supplr: Table S1 and additional file supplr sid="S2"2/supplr: Table S2). Most of these sequences have not been previously reported and are here systematically annotated and classified for the first time./p
pBased on phylogenetic-tree reconstructions (see additional file supplr sid="S11"11/supplr: Figure S7) or on similarity for the most corrupted sequences, we identified the association of 17 pseudogenes from human, 16 from mouse and 29 from rat with one of the nine CCT genes. None of the pseudogenes were related to MKKS, BBS10, BBS12 or CCT8L. To estimate the time of origin of the pseudogenes, we constructed trees using their translated sequences and chaperonin subunits from various vertebrate species (see additional file supplr sid="S12"12/supplr: Figures S8, and additional file supplr sid="S13"13/supplr: Figure S9). The trees indicated that all recognizable human CCT pseudogenes originated in the mammal lineage after separation from the reptilebird lineage./p
suppl id="S11"
title
pAdditional file 11/p
/title
text
p
bFigure S7/b. Evolutionary tree of vertebrate CCT1-8 and CCT8L proteins including associated human pseudogenes./p
/text
file name="1471-2148-10-64-S11.PDF"
pClick here for file/p
/file
/suppl
suppl id="S12"
title
pAdditional file 12/p
/title
text
p
bFigure S8/b. Evolutionary trees of individual CCT1, CCT3 and CCT4 proteins from vertebrates including associated human pseudogenes./p
/text
file name="1471-2148-10-64-S12.PDF"
pClick here for file/p
/file
/suppl
suppl id="S13"
title
pAdditional file 13/p
/title
text
p
bFigure S9/b. Evolutionary trees of individual CCT5, CCT7 and CCT8 proteins from vertebrates including associated human pseudogenes./p
/text
file name="1471-2148-10-64-S13.PDF"
pClick here for file/p
/file
/suppl
pOf particular interest were the evolutionary relations of CCT6 genes and pseudogenes. Two CCT6 gene copies (CCT6A and CCT6B) were found, besides placental mammals, also in platypus and in opossum (see additional file supplr sid="S11"11/supplr: Figure S7), suggesting that the duplication of the CCT6 gene occurred in mammal evolution before separation of Theria (marsupial and placental mammals) and Prototheria (monotremes). We constructed an evolutionary tree of mammal CCT6 genes and pseudogenes (Figure figr fid="F4"4/figr) rooted by the corresponding gene sequences from chicken and frog (the diverged sequence Oa_con2651 from platypus was excluded from this tree to avoid long-branch attraction). Surprisingly, all recognizable human, mouse, and rat pseudogenes belonging to the CCT6 class branched in the tree from the CCT6A lineage after separation of the platypus, marsupial and placental mammal lineages./p
fig id="F4"titlepFigure 4/p/titlecaptionpEvolutionary tree of vertebrate CCT6 proteins/p/captiontext
pbEvolutionary tree of vertebrate CCT6 proteins/b. ML tree of CCT6 proteins from mammals, chicken, and frog (in roman font) and translated sequences of the related pseudogenes from human, mouse, and rat (in bold-italics font). Only one copy of CCT6 was found in chicken and frog. Two copies, CCT6A and CCT6B, were found in all mammals examined, including marsupial (Md) and platypus (Oa). The CCT6 sequences from chicken (Gg) and from the two amphibians itXenopus laevis /it(Xl) and itXenopus tropicalis /it(Xt) were used to root the tree. All human, mouse, and rat pseudogenes clustered with the CCT6A sequences. Numbers next to branches indicate percent bootstrap values. Only bootstrap values 30% are shown. For all species abbreviations see the legend for Figure 2. The scale bar represents the indicated number of substitutions per position for a unit branch length.p
textgraphic file="1471-2148-10-64-4" hint_layout="single"fig
pTwenty-two pseudogenes in human (Table tblr tid="T2"2tblr), and 22 and 32 pseudogenes in mouse and rat, respectively (see additional file supplr sid="S1"1supplr: Table S1 and additional file supplr sid="S2"2supplr: Table S2), associated with the mitochondrial HSPD1 gene (Group I itcpn60it). Evolutionary trees incorporating all pseudogenes from different vertebrate species were uninformative due to the presence among the pseudogenes of highly corrupted sequences, resulting in extensive long-branch attraction (not shown). An ML tree built using only translations of the most conserved pseudogenes (Figure figr fid="F5"5figr) showed weakly supported but consistent association of the human pseudogenes with HSPD1 from primates, whereas pseudogenes from mouse and rat all associated with murid Hspd1 sequences, also indicating their relatively recent origin.p
fig id="F5"titlepFigure 5ptitlecaptionpEvolutionary tree of vertebrate mitochondrial Cpn60pcaptiontext
pbEvolutionary tree of vertebrate mitochondrial Cpn60b. ML tree of mitochondrial Cpn60 proteins from mammals, chicken, and frog (in roman font) and translated sequences of the related pseudogenes from human, mouse, and rat (in bold-italics font). Highly degraded pseudogenes for which only fragments could be detected were not considered. Human pseudogenes clustered with primate Cpn60 sequences whereas mouse and rat pseudogenes clustered with rodent counterparts, indicating independent evolution of these pseudogenes in these species. For all species abbreviations see legend for Figure 2. The scale bar represents the indicated number of substitutions per position for a unit branch length.p
textgraphic file="1471-2148-10-64-5" hint_layout="single"fig
sec
sec
st
pKaKs ratio in the evolution of putative pseudogene sequencesp
st
pOur characterization of many ithsp60 itsequences as pseudogenes was based on the presence of signs of corruption in the sequence (in-frame stop codons and frame-shifts). However, in-frame stop codons and frame-shifts may correspond to truncated proteins that are still functional. For example, although human HSPD1-5P and HSPD1-6P sequences contain signs of sequence corruption, EST data indicate that these sequences are expressed and possibly functional (see additional file supplr sid="S14"14supplr: Table S5). To confirm our characterization, we estimated KaKs ratios in trees that identified the pseudogene-sequence lineage (branch) including as out-group its parental gene and the orthologous gene sequence from chicken (see Methods). The results of these analyses (Table tblr tid="T2"2tblr) showed in most cases KaKs values not significantly different from 1.0, as expected in the differentiation of pseudogene sequences not constrained by coding of functional amino acids. Significant differences in mutation rate were estimated in the case of four sequences. These sequences, however, contained multiple in-frame stop codons and frame-shifts (Table tblr tid="T2"2tblr).p
suppl id="S14"
title
pAdditional file 14p
title
text
p
bTable S5b. Expression pattern of the human itcpn60 itgene (HSPD1) and pseudogenes from the UniGene database.p
text
file name="1471-2148-10-64-S14.DOC"
pClick here for filep
file
suppl
sec
sec
st
pStructural features of BBS and CCT8L proteinsp
st
pBecause of their high sequence divergence, it is unclear whether BBS and CCT8L Hsp60-like proteins conserve the typical fold of chaperonin subunits and their ability to assemble into typical oligomeric chaperonin complexes. Chaperonin monomers are characterized by three structural domains (apical, intermediate and equatorial) with distinct functional roles and it was relevant to investigate whether BBS and CCT8L proteins conserve each of the domains typical of chaperonins. Experimental models of eukaryotic Group II chaperonins are not available but their structural properties can be inferred by comparison with their closest relative, the archaeal thermosome. To infer tertiary-structure conservation in BBS and CCT8L proteins we predicted the secondary structure for each family from alignments of multiple sequences, excluding structure and sequence information from other families. The results of these predictions are schematically represented in Figure figr fid="F6"6afigr, in relation to the secondary structure description of the PDB structure ext-link ext-link-id="1a6d" ext-link-type="pdb"1a6dext-link chain A of the thermosome subunit ThsA from itThermoplasma acidophilum it
abbrgrp
abbr bid="B36"36abbr
abbrgrp (see additional file supplr sid="S15"15supplr: Figure S10, additional file supplr sid="S16"16supplr: Figure S11, additional file supplr sid="S17"17supplr: Figure S12, additional file supplr sid="S18"18supplr: Figure S13, additional file supplr sid="S19"19supplr: Figure S14, and additional file supplr sid="S20"20supplr: Figure S15 for detailed representations of multiple alignments, secondary structure predictions and alignments to the secondary-structure elements of ThsA). In Figure figr fid="F6"6afigr, the secondary structure description of ThsA is shown (line "ext-link ext-link-id="1a6d" ext-link-type="pdb"1a6dext-link") in relation to the position of the equatorial, intermediate, and apical domains. The position of these elements in the tertiary structure of ThsA is represented in Figure figr fid="F6"6bfigr. Results of a blind test of the performance of the method on the corresponding ThsA sequence are also shown (Figure figr fid="F6"6afigr, line "Ta_ThsA"). In this test most strand and helix elements (all "core" helices) described in the crystal structure were correctly predicted by the method, increasing our confidence in the reliability of other predictions. As expected, extensive conservation of predicted secondary-structure elements were also obtained from the alignment of human CCT sequences (Figure figr fid="F6"6afigr, line "CCT") with only few discrepancies involving mostly short beta strands (4, 5, 18, and 21) and one short helix (P) exposed at the external surface of the archaeal thermosome complex. Secondary-structure predictions for mammal CCT8L and for vertebrate MKKS, BBS10 or BBS12 sequences were also largely consistent with the secondary-structure description of thermosome proteins. In the equatorial domain, CCT8L and BBS structure predictions corresponded to the mostly alpha-helical composition of this region. Variations were more obvious in BBS12 and involved mostly terminal elements of helices (most notably helices P and Q) and exposed beta-strands (strands 19-21). In the intermediate domain the core helical-bundle elements (helices F, G, and K) as well as the extensive beta-sheet composition of this region were predicted in all BBS and CCT8L proteins. Exceptions were, in all sequences, the two short strands 5 and 6, which are part of an external elongated loop in the thermosome structure, and, in BBS12, the N-terminal part of helix K, which in the thermosome protrudes towards the central cavity covering the ATP hydrolysis site (Figure figr fid="F6"6bfigr). The apical domain is formed in the thermosome by a 4-strand anti-parallel beta-sheet (strands 9, 10, 15, and 16) with strand 10 extending into a second parallel beta-sheet (strands 10, 12, 13, and 14). The two sheets are flanked by a helix (J) and are surmounted by a structure composed of two contacting helices (H and I) and an extended loop including strand 11. All helices and most strands of the apical domain were recognized in BBS sequences. Most obvious differences were observed in BBS12 proteins, where the long apical helix H was predicted to be shortened, and in CCT8L, where helix I and strand 11 were not predicted.p
suppl id="S15"
title
pAdditional file 15p
title
text
p
bFigure S10b. Alignment and secondary-structure prediction of archaeal thermosome sequences.p
text
file name="1471-2148-10-64-S15.PDF"
pClick here for filep
file
suppl
suppl id="S16"
title
pAdditional file 16p
title
text
p
bTable S11b. Alignment and secondary-structure prediction of human CCT1-8 protein sequences.p
text
file name="1471-2148-10-64-S16.PDF"
pClick here for filep
file
suppl
suppl id="S17"
title
pAdditional file 17p
title
text
p
bTable S12b. Alignment and secondary-structure prediction of vertebrate CCT8L protein sequences.p
text
file name="1471-2148-10-64-S17.PDF"
pClick here for filep
file
suppl
suppl id="S18"
title
pAdditional file 18p
title
text
p
bTable S13b. Alignment and secondary-structure prediction of vertebrate MKKS protein sequences.p
text
file name="1471-2148-10-64-S18.PDF"
pClick here for filep
file
suppl
suppl id="S19"
title
pAdditional file 19p
title
text
p
bTable S14b. Alignment and secondary-structure prediction of vertebrate BBS10 protein sequences.p
text
file name="1471-2148-10-64-S19.PDF"
pClick here for filep
file
suppl
suppl id="S20"
title
pAdditional file 20p
title
text
p
bTable S15b. Alignment and secondary-structure prediction of vertebrate BBS12 protein sequences.p
text
file name="1471-2148-10-64-S20.PDF"
pClick here for filep
file
suppl
fig id="F6"titlepFigure 6ptitlecaptionpSecondary structure predictions of chaperonin proteinspcaptiontext
pbSecondary structure predictions of chaperonin proteinsb. (a) Secondary structure predictions of itThermoplasma acidophilum itthermosome alpha subunit ThsA (line Ta_ThsA), human CCTs, mammal CCT8Ls and vertebrate BBSs (lines MKKS, BBS10 and BBS12) compared to the secondary structure description of ThsA (top line ext-link ext-link-id="1a6d" ext-link-type="pdb"1a6dext-link) determined from its crystal structure (PDB code ext-link ext-link-id="1a6d" ext-link-type="pdb"1a6dext-link, chain A). Helices are represented as red boxes, beta-strands as yellow boxes and loops as black lines. Secondary structure elements in ext-link ext-link-id="1a6d" ext-link-type="pdb"1a6dext-link are labeled in succession with numbers (strands) or letters (helices). The first 16 N-terminal residues of ThsA, predicted to contain a strand, are not included in the ext-link ext-link-id="1a6d" ext-link-type="pdb"1a6dext-link crystal structure (top line). Secondary structure elements in all proteins recognized as homologous to the thermosome chain elements by sequence similarity and positional equivalence are vertically aligned. Blue circles indicate the position of sequence insertions in CCT8L and BBS sequences. (b) The three-dimensional fold of the secondary structure elements in the thermosome structure ext-link ext-link-id="1a6d" ext-link-type="pdb"1a6dext-link chain A. Red cylinders represent helices and yellow arrows represent strands. Labels (i.e., letters and numbers) correspond to those in panel "a". Elements not predicted in some of the BBS and CCT8L sequences are labeled in gray. The positions of the ATP binding and hydrolysis sites are highlighted in green.p
textgraphic file="1471-2148-10-64-6" hint_layout="double"fig
sec
sec
st
pDifferentiation of monomer-monomer interaction regions in BBS and CCT8L proteinsp
st
pTo investigate the potential of CCT8L and BBS proteins to establish intra-ring and inter-ring monomer-monomer contacts, we investigated the relative conservation of predicted contact positions in CCT, BBS and CCT8L sequences. We identified potential contact positions in these families based on homology to the positions involved in inter-monomer contacts in the crystal structure of the itT. acidophilum itthermosome complex (PDB code ext-link ext-link-id="1a6d" ext-link-type="pdb"1a6dext-link). After identifying all contact positions in CCT monomers, we distinguished among them those that conserved similar amino acid types across the nine monomers. We counted how many amino acid types observed in all or in conserved contact positions of CCT monomers were also observed in the itT. acidophilum itThsa sequence, in human CCT8Ls or in human BBS sequences (Table tblr tid="T5"5tblr). A complete list of all and conserved positions considered and of the residue types observed in these positions in all sequences can be found in additional file supplr sid="S21"21supplr: Table S6. Thsa and CCT subunits conserve 89% similarity in monomer-monomer contact positions, which is substantially higher than the average similarity (62%-66%) of all homologous positions between the two families. The higher similarity of monomer-monomer contact regions is consistent with functional conservation between the two families of these positions. In contrast, the high rate of differentiation in comparison to global average differentiation shown in putative monomer-monomer contact positions in BBS or CCT8L sequences (Table tblr tid="T5"5tblr), suggests a loss of capability to associate into a typical CCT-like oligomeric complex. This result is consistent with the presence in BBS proteins of inserted elements (Figure figr fid="F6"6figr) that would interfere with formation of the complex abbrgrp
abbr bid="B22"22abbr
abbr bid="B23"23abbr
abbrgrp.p
suppl id="S21"
title
pAdditional file 21p
title
text
p
bTable S6b. Residue types at monomer-monomer interaction positions in thermosome, CCT, BBS, and CCT8L proteins.p
text
file name="1471-2148-10-64-S21.DOC"
pClick here for filep
file
suppl
tbl id="T5"titlepTable 5ptitlecaptionpConservation of monomer-monomer contact residues relative to CCT subunitssup1suppcaptiontblbdy cols="5"
r
c ca="left"
p
bProteinb
p
c
c ca="center"
p
bMMb
p
c
c ca="center"
p
bCMMb
p
c
c ca="center"
p
bRRb
p
c
c ca="center"
p
bGlobalsup2supb
p
c
r
r
c cspan="5"
hr
c
r
r
c ca="left"
pThsAp
c
c ca="center"
p78 (83.9)p
c
c ca="center"
p16 (94.1)p
c
c ca="center"
p13 (86.7)p
c
c ca="center"
p62.0-66.4p
c
r
r
c ca="left"
pBBS12p
c
c ca="center"
p37 (39.8)p
c
c ca="center"
p7 (41.2)p
c
c ca="center"
p6 (40.0)p
c
c ca="center"
p35.5-38.0p
c
r
r
c ca="left"
pBBS10p
c
c ca="center"
p45 (48.4)p
c
c ca="center"
p8 (47.1)p
c
c ca="center"
p7 (46.7)p
c
c ca="center"
p34.3-35.6p
c
r
r
c ca="left"
pMKKSp
c
c ca="center"
p42 (45.2)p
c
c ca="center"
p8 (47.1)p
c
c ca="center"
p7 (46.7)p
c
c ca="center"
p48.8-51.6p
c
r
r
c ca="left"
pCCT8Lp
c
c ca="center"
p54 (58.1)p
c
c ca="center"
p8 (47.1)p
c
c ca="center"
p7 (46.7)p
c
c ca="center"
p53.4-61.1p
c
r
tblbdytblfn
psup1supConservation of archaeal ThsA and human BBS and CCT8L sequences relative to human CCT monomers. Sequence-positions are considered conserved if they are occupied by residue-types appearing in the homologous position in any of the human CCT sequences. Ninety-three intra-ring contact positions and 15 inter-ring contact positions were identified from the thermosome structure (ext-link ext-link-id="1ad6" ext-link-type="pdb"1ad6ext-link). Contact positions were defined by a distance of their side-chain heavy atoms of at most 4.0Å from any heavy atom of the nearby monomer in the thermosome structure. For each protein family, the table indicates the number and percentage (in parenthesis) of positions conserved among: all 93 intra-ring contact positions (bMMb); seventeen intra-ring contact-positions conserved among human CCT monomers (bCMMb); all 15 inter-ring contact positions, none of which were conserved among CCT monomers (bRRb). sup2supbGlobal bindicates the range of similarities (percent values) of each sequence to human CCT-subunit proteins within all aligned positions.p
tblfntbl
sec
sec
st
pConservation of ATP-binding and hydrolysis residues in BBS and CCT8L proteinsp
st
pWe compared conservation in CCT, BBS and CCT8L sequences of the ATP-binding and ATP-hydrolysis motifs typical of chaperonins of Group II (Figure figr fid="F7"7figr). Although there is considerable variation among BBS and CCT8L sequences at some of the ATP-binding positions, we observed complete conservation of the crucial ATP-binding dipeptide Gly-Pro, suggesting that these otherwise divergent proteins conserve ATP-binding ability. In the ATP-hydrolysis sites, substantial loss of conservation has been reported in MKKS abbrgrp
abbr bid="B27"27abbr
abbrgrp and in BBS12 abbrgrp
abbr bid="B23"23abbr
abbrgrp. In the CCT8L, MKKS and BBS10 families, unusual substitutions are observed in phosphate-binding positions and within the catalytic triad, where only Asp is conserved in MKKS. The effect that these mutations may have on the hydrolytic activity in these protein families is unclear. The high level of differentiation of this region in BBS12 (where the ATP-hydrolysis motif is not recognizable) strongly suggests that BBS12 has lost hydrolytic activity.p
fig id="F7"titlepFigure 7ptitlecaptionpProfile logos of ATP-binding and ATP-hydrolysis sites in chaperonin proteinspcaptiontext
pbProfile logos of ATP-binding and ATP-hydrolysis sites in chaperonin proteinsb. Sequence profiles of ATPADP-binding and ATP-hydrolysis sites for CCTs, CCT8L and BBS (MMKS, BBS10 and BBS12) proteins from the multiple sequence alignments of sequences obtained from the species listed in the legend for Figure 2. Letters indicate the amino acid types observed at each position. The height of each stack of symbols in each position is proportional to the information content at that position and the height of each letter within the stack is proportional to the frequency of the corresponding residue at that position. Residues involved in direct contacts with base, ribose or phosphate groups, as determined by homology to the known thermosome structures, are indicated.p
textgraphic file="1471-2148-10-64-7" hint_layout="single"fig
sec
sec
st
pConservation of substrate-binding positionsp
st
pThree positions crucial in determining substrate-specificity of CCT monomers have been identified in the distal region of helix I in the apical domain abbrgrp
abbr bid="B37"37abbr
abbrgrp. We analyzed conservation at these positions across vertebrate species in all Group II chaperonin families and in the Fab1_TCP domain across vertebrate orthologs of the PIKFYVE protein kinase (Table tblr tid="T6"6tblr). These positions are strikingly conserved within each CCT monomer type (with the exception of CCT6B) across species and are characteristically different between monomer types. They are mostly conserved also in the Fab1_TCP domain across vertebrate sequences. In contrast, in BBS and, particularly, in CCT8L sequences, the homologous positions are significantly more differentiated.p
tbl id="T6"titlepTable 6ptitlecaptionpConservation of potential substrate-binding residue positionssup1suppcaptiontblbdy cols="5"
r
c ca="left"
p
bFamilyb
p
c
c ca="center"
p
b
itIit
sup2sup
b
p
c
c ca="center"
p
biti it+1sup2supb
p
c
c ca="center"
p
bitiit+4sup2supb
p
c
c ca="center"
p
bDescriptionb
p
c
r
r
c cspan="5"
hr
c
r
r
c ca="left"
pCCT1p
c
c ca="center"
pKp
c
c ca="center"
pYp
c
c ca="center"
pDEp
c
c ca="center"
pLysTyrAcidicp
c
r
r
c ca="left"
pCCT2p
c
c ca="center"
pQp
c
c ca="center"
pLp
c
c ca="center"
pA (GQ)sup3supp
c
c ca="center"
pGlnLeuAlap
c
r
r
c ca="left"
pCCT3p
c
c ca="center"
pHp
c
c ca="center"
pYp
c
c ca="center"
pKRp
c
c ca="center"
pHisTyrBasicp
c
r
r
c ca="left"
pCCT4p
c
c ca="center"
pHp
c
c ca="center"
pFp
c
c ca="center"
pKp
c
c ca="center"
pHisPheLysp
c
r
r
c ca="left"
pCCT5p
c
c ca="center"
pHp
c
c ca="center"
pLp
c
c ca="center"
pQp
c
c ca="center"
pHisLeuGlnp
c
r
r
c ca="left"
pCCT6Ap
c
c ca="center"
pDp
c
c ca="center"
pAp
c
c ca="center"
pKp
c
c ca="center"
pAspAlaLysp
c
r
r
c ca="left"
pCCT6Bp
c
c ca="center"
pDEp
c
c ca="center"
pAILMSVp
c
c ca="center"
pK (R)p
c
c ca="center"
pAcidicMedium-SmallLysp
c
r
r
c ca="left"
pCCT7p
c
c ca="center"
pQp
c
c ca="center"
pYp
c
c ca="center"
pD (Y)p
c
c ca="center"
pGlnTyrAspp
c
r
r
c ca="left"
pCCT8p
c
c ca="center"
pHp
c
c ca="center"
pYp
c
c ca="center"
pKp
c
c ca="center"
pHisTyrLysp
c
r
r
c ca="left"
pCCT8Lp
c
c ca="center"
pDILPTp
c
c ca="center"
pHLQRp
c
c ca="center"
pKNRYp
c
c ca="center"
pVariableVariablePolar-Basicp
c
r
r
c ca="left"
pMKKSp
c
c ca="center"
pQ (H)p
c
c ca="center"
pFY (H)p
c
c ca="center"
pDEMQSTp
c
c ca="center"
pGlnAromaticMedium-Smallp
c
r
r
c ca="left"
pBBS10p
c
c ca="center"
pY (AFQS)p
c
c ca="center"
pCLY (W)p
c
c ca="center"
pLMQVp
c
c ca="center"
pTyrVariableVariablep
c
r
r
c ca="left"
pBBS12p
c
c ca="center"
pE (KLQ)p
c
c ca="center"
pKR (HQ)p
c
c ca="center"
pHNR (ASD)p
c
c ca="center"
pGluBasicPolar-Basicp
c
r
r
c ca="left"
pFab1_TCPsup4supp
c
c ca="center"
pD (EN)p
c
c ca="center"
pI (LMV)p
c
c ca="center"
pQp
c
c ca="center"
pAspIleGlnp
c
r
tblbdytblfn
p1 Conservation evaluated among sequences in vertebrate genomes. 2 Potential substrate binding positions, corresponding to yeast CCT1 positions 308, 309 and 312 (iti it= 308) abbrgrpabbr bid="B37"37abbrabbrgrp. sup3supRare substitutions are listed in parenthesis. sup4supFab1_TCP domain of vertebrate PIKFYVE orthologs.p
tblfntbl
sec
sec
sec
st
pDiscussionp
st
pWe identified the full complement of chaperonin ithsp60 itgenes and pseudogenes encoded in the human genome and, for comparison, in the genomes of the model organisms mouse and rat. We delimited the set of ithsp60 itgenes encoded in the human genome to: a) nine canonical itcct itgenes (CCT1 to CCT8 including CCT6A and CCT6B) involved in formation of the CCT complex; b) the itcpn60 itgene (HSPD1) of mitochondrial origin; c) the three highly diverged ithsp60it-like BBS genes MKKS, BBS10 and BBS12; and d) a newly characterized class of genes, CCT8L, represented in human by CCT8L1 and CCT8L2. We also identified a plethora of pseudogene sequences, many of which had not been previously reported. The comparative analyses of these families of functional genes and of their pseudogenes revealed their evolutionary history and relationships.p
pIn contrast to the uncertainty of the duplication pattern of canonical CCT subunits (our results and abbrgrp
abbr bid="B38"38abbr
abbr bid="B39"39abbr
abbrgrp) the origin of Hsp60-like BBS and CCT8L proteins was unambiguously identified by phylogenetic tree reconstructions. Our analyses indicated that ithsp60it-like BBS genes originated monophyletically from a gene duplication event in the CCT8 gene lineage. In addition, we determined that the CCT8L family also originated in the CCT8 lineage, from a more recent retrotransposition event. The presence of this gene family in placental mammals, marsupials and monotremes but not in reptilesbirds or other vertebrate species, indicates that this family originated at the onset of mammal evolution, before divergence of Theria and Prototheria. Presence of two highly similar CCT8L genes (CCT8L1 and CCT8L2) in the genomes of human and chimp and of a single copy in other mammal genomes, including rhesus monkey, suggests that the duplication of this gene occurred in the ape lineage (Hominoidea) after its divergence from the old-world monkeys (Cercopithecidae). Multiple evidence gathered in this work indicates that CCT8L sequences (and at least one of the two paralogs in Hominoidea) encode for functional genes: (i) reduced rates of non-synonymous mutation were estimated along their lineages, as expected for functionally-constrained protein-coding genes; (ii) pseudogenes as ancient or more recent than the CCT8L genes were heavily degenerated and no pseudogenes pre-dating mammal evolution could be identified. In contrast, although CCT8L sequences originated early in mammal evolution, they did not show signs of degeneration (with the exception of the chimp CCT8L1 ortholog); (iii) multiple EST and microarray data have been collected for CCT8L2, mostly from testis, and one EST for CCT8L1 has been reported from placental tissue (as per the UniGene EST and GEO expression data, November 23, 2009). These features taken together are strong evidence that at least CCT8L2 in Hominoidea and the lone CCT8L gene in other mammal lineages encode for functional proteins. The sparse expression of CCT8L1 in human and the presence of one in-frame stop codon and one frame-shift in its orthologous sequence from chimp raise doubts about the functionality of this sequence.p
pNumerous sequences associated with itcct itor itcpn60 itgenes found in the human, mouse or rat genomes were classified as pseudogenes based on the presence of internal stop codons, frame-shifts and non-significant difference in synonymous and non-synonymous mutation rates. Among them, the sequences HSPD1-5P and HSPD1-6P appear to be expressed based on EST analysis (see additional file supplr sid="S14"14supplr: Table S5) and may represent instances of expressed pseudogenes abbrgrp
abbr bid="B40"40abbr
abbrgrp. A general explosion of pseudogene generation in the human and murid lineages after they separated from the carnivore lineage has been reported abbrgrp
abbr bid="B41"41abbr
abbrgrp. Our analysis of chaperonin pseudogenes is consistent with this observation, although their relatively high rate of degeneration suggests that pseudogenes generated before the origin of mammals may have degraded beyond recognition. The intense duplication of chaperonin sequences witnessed by the many pseudogenes identified in the human and murid genomes, very likely provided opportunities for multiple paralogy, resulting in the proliferation of chaperonin classes in the vertebrate and mammal lineages.p
pAlthough the Hsp60-like BBS and CCT8L protein families have considerably differentiated from the canonical CCT subunits and within themselves, our analyses indicated that they still conserve the overall three-domain structure typical of CCT proteins. Structure and sequence variations predicted for their apical domains may reflect distinctive substrate specificities. In particular, lack of conservation at positions crucial in providing substrate-specificity to CCT monomers abbrgrp
abbr bid="B37"37abbr
abbrgrp suggests that BBS and CCT8L proteins may interact with their substrate(s) in different regions as compared with the canonical CCT subunits. Sequence differentiation patterns and acquisition of inserted elements in correspondence to potential monomer-monomer contact regions suggested that BBS and CCT8L proteins do not assemble in a CCT-like complex. This prediction is supported by experimental evidence showing that MKKS localizes as a free monomer at the pericentriolar material of centrosomes abbrgrp
abbr bid="B27"27abbr
abbrgrp. In this respect, it is also interesting to observe that among BBS and CCT8L sequences the ATP-hydrolysis motif "Gly-Asp-Gly-Thr", remarkably conserved among canonical chaperonins abbrgrp
abbr bid="B42"42abbr
abbrgrp, has differentiated in MKKS and in BBS12 abbrgrp
abbr bid="B23"23abbr
abbr bid="B27"27abbr
abbrgrp. This condition may indicate that these families have lost the hydrolytic activity necessary for the functionality of the chaperonin complex abbrgrp
abbr bid="B43"43abbr
abbr bid="B44"44abbr
abbr bid="B45"45abbr
abbr bid="B46"46abbr
abbr bid="B47"47abbr
abbr bid="B48"48abbr
abbr bid="B49"49abbr
abbr bid="B50"50abbr
abbr bid="B51"51abbr
abbr bid="B52"52abbr
abbrgrp. It has been shown for the archaeal thermosome complex that mutation of the ATP-hydrolysis-motif Asp residue prevents hydrolysis and productive protein folding abbrgrp
abbr bid="B49"49abbr
abbrgrp and that some CCT subunits, among which CCT8, dissociate itin vitro itfrom the complex in conditions that prevent hydrolysis of ATP abbrgrp
abbr bid="B53"53abbr
abbrgrp.p
pFunctionalities independent from formation of the complex have also been reported for canonical CCT subunits. TCP1 monomers not in complex confer enhanced salt tolerance in plants abbrgrp
abbr bid="B54"54abbr
abbrgrp. Individual CCT subunits have been reported to associate itin vitro itwith cytoskeleton structures, selectively binding to microtubule filaments abbrgrp
abbr bid="B55"55abbr
abbrgrp or to actin polymerizing filaments abbrgrp
abbr bid="B56"56abbr
abbrgrp. The localization of Hsp60-like BBS proteins at the cilium basal body and at the centrosome abbrgrp
abbr bid="B26"26abbr
abbr bid="B27"27abbr
abbr bid="B28"28abbr
abbrgrp suggests that they may also interact and associate with, for example, cytoskeleton structures in promoting the correct development of cilia abbrgrp
abbr bid="B28"28abbr
abbr bid="B57"57abbr
abbrgrp. The multiple structural and experimental evidence that BBS and CCT8L proteins do not form a canonical CCT-like complex provides strong indication that eukaryotic Group II chaperonin-protein functionalities extend beyond those of the typical oligomeric complex.p
sec
sec
st
pConclusionsp
st
pChaperonin proteins are key players in ensuring and preserving cell and organism functionality under normal and stressful conditions and their biological and medical importance is undeniable. The recent discovery of ithsp60 itgenes directly implicated in specific pathological conditions, the chaperonopathies, extends our understanding of the roles of chaperonin proteins in cellular processes and enhances awareness of their importance in pathology abbrgrp
abbr bid="B18"18abbr
abbr bid="B19"19abbr
abbr bid="B20"20abbr
abbrgrp. Here, we have provided a comprehensive, unifying framework encompassing all members of the extended ithsp60 itfamily of genes and pseudogenes. This unifying framework contributes to our understanding of the evolutionary history of the extended ithsp60 itfamily and widens our perspectives on the multiple roles that chaperonin proteins have acquired in vertebrates. Our findings highlight how differentiation of the chaperonin protein family in mammals has been facilitated by intense processes of gene duplication. The roles, mechanisms of action, and involvement in pathogenesis of individual chaperonin molecules beyond those typical of their canonical oligomeric complexes constitute aspects of chaperonin physiology particularly promising for future experimental testing.p
sec
sec
st
pMethodsp
st
sec
st
pIdentification of chaperonin genes in eukaryotic genomesp
st
pSearches of genes for Hsp60-like proteins were exhaustively performed using TBLASTN abbrgrp
abbr bid="B58"58abbr
abbrgrp at Ensembl abbrgrp
abbr bid="B34"34abbr
abbrgrp and BLAT abbrgrp
abbr bid="B59"59abbr
abbrgrp at UCSC abbrgrp
abbr bid="B60"60abbr
abbrgrp on the genome sequences of human (NCBI Assembly 36, Genebuild Ensembl Dec 2006), mouse (NCBI Assembly m37, Genebuild Ensembl Apr 2007) and rat (Assembly RGSC 3.4, Genebuild Ensembl Feb 2006). We used the nine canonical human CCT proteins and the Cpn60 protein (mitochondrial Hsp60) as queries. We recursively queried the genomes with the sequences recovered from previous searches until no other Hsp60 sequences were detected. We used both search engines also to recover the full list of annotated ithsp60it-like genes in several other mammal genomes and in chicken. Sequences from frog (itXenopus itsp.) were retrieved from the NCBI nr (non-redundant) database using PSI-BLAST abbrgrp
abbr bid="B61"61abbr
abbrgrp with Cpn60 and the individual CCT subunits as queries. To recover complete ithsp60 itgene and pseudogene sequences, after the TBLASTN searches the genomic sequences from approximately 2,000 nt upstream to 2,000 nt downstream of the hit-regions were excised and the ithsp60 itsequences were extracted using the homology-based gene prediction method implemented in FGENESH+ abbrgrp
abbr bid="B62"62abbr
abbrgrp at the Softberry web site abbrgrp
abbr bid="B63"63abbr
abbrgrp. For pseudogenes, when FGENESH+ failed to recognize the complete sequence due to in-frame stop codons or frame shifts in the sequence, the coding region was manually reconstructed, aligning the three-frame-translations of the genomic sequence to the query sequence with the multiple protein alignment program ITERALIGN abbrgrp
abbr bid="B64"64abbr
abbrgrp. The Pseudogene.org abbrgrp
abbr bid="B33"33abbr
abbr bid="B65"65abbr
abbrgrp database and Ensembl abbrgrp
abbr bid="B34"34abbr
abbrgrp, Entrez abbrgrp
abbr bid="B30"30abbr
abbrgrp and HUGO abbrgrp
abbr bid="B66"66abbr
abbrgrp annotations were consulted for the presence of annotated human pseudogenes, as recorded in our tables of results.p
sec
sec
st
pMultiple sequence alignment and secondary structure predictionp
st
pMultiple sequence alignments were obtained using MUSCLE abbrgrp
abbr bid="B67"67abbr
abbrgrp, which in previous analyses abbrgrp
abbr bid="B68"68abbr
abbr bid="B69"69abbr
abbrgrp performed well when aligning divergent sequences. Alignments were manually adjusted as needed. Predictions of secondary structure for each protein family were performed from their multiple alignment using the Jnet algorithm as implemented in the JPRED-3 secondary structure prediction server abbrgrp
abbr bid="B70"70abbr
abbr bid="B71"71abbr
abbrgrp.p
sec
sec
st
pEvolutionary tree reconstructionsp
st
pTo infer phylogenetic relationships, evolutionary trees were obtained using the maximum-likelihood (ML) tree-building procedure implemented in PHYML abbrgrp
abbr bid="B72"72abbr
abbrgrp using the default JTT substitution model and 100 bootstrap resampling replicates (each ML tree reconstruction being quite time consuming). Selected trees were compared with those obtained with the Bayesian approach implemented in MrBayes 3.1 abbrgrp
abbr bid="B73"73abbr
abbrgrp using the WAG substitution model and 10,000 iterations for the MCMC process. Conditional probabilities were estimated sampling the MCMC process every 10 iterations after 2,500 burn-in iterations (sample size 750).p
sec
sec
st
pEstimates of evolutionary divergence of sequence familiesp
st
pWe obtained rates of divergence among families of sequences using a newly developed estimator, called "B-index". The B-index is an unbiased estimator of the average divergence of a family of sequences from its last common ancestor (root) that takes into consideration the correlations among sequences determined by their phylogenetic tree. Briefly, given a rooted tree, a terminal branch of length itdit
sub
iti it
subof the original tree is considered a "cluster" of size itwit
sub
iti it
sub= 1 and length itd it= itdit
sub
itiit
sub. Each fork-structure comprising two terminal branches (clusters) of lengths itdit
sub1 suband itdit
sub2 suband sizes itwit
sub1 suband itwit
sub2 subbifurcating from a stem-branch of length itdit
sub
its it
subis considered in turn. The average length itd itof each fork-structure is computed as itd it= (itdit
sub1 sub+ itdit
sub2sub)2 + itdit
sub
its it
suband the average size itw itof the structure is defined as itw it= [2(itdit
sub1 sub+ itdit
sub2sub)2 + 1itdit
sub
itsit
sub][(itdit
sub1 sub+ itdit
sub2sub)2 + itdit
sub
itsit
sub] = (itdit
sub1 sub+ itdit
sub2 sub+ itdit
sub
itsit
sub)itdit. Each fork-structure is progressively replaced by a corresponding cluster of length itd itand size itwit. The procedure is repeated merging bifurcating clusters of lengths itdit
sub1 suband itdit
sub2 suband sizes itwit
sub1 suband itwit
sub2 subconnected to a stem-branch of length itdit
sub
its it
subinto a larger cluster of average length itd it= (itwit
sub1sub
itdit
sub1 sub+ itwit
sub2sub
itdit
sub2sub)(itwit
sub1 sub+ itwit
sub2sub) + itdit
sub
its it
suband average size itw it= (itdit
sub1sub
itwit
sub1 sub+ itdit
sub2sub
itwit
sub2 sub+ itdit
sub
itsit
sub)itdit, until the tree is reduced to two clusters connected to the root (itdit
sub
its it
sub= 0). The global average differentiation itD it("B-index") and size itW itcan finally be computed as itD it= (itwit
sub1sub
itdit
sub1 sub+ itwit
sub2sub
itdit
sub2sub)(itwit
sub1 sub+ itwit
sub2sub) and itW it= itwit
sub1 sub+ itwit
sub2sub. It can be shown that itDW it= itL itis the length of the tree (sum of all branch lengths). If two sequence families itA itand itB itare sampled from the same set of species and itWit
sub
itA it
sub= itWit
sub
itBit
sub, then itDit
sub
itBit
subitDit
sub
itA it
sub= itLit
sub
itBit
subitLit
sub
itA it
suband the relative rate of differentiation of the two families of sequences can be estimated by the ratio of their tree lengths. The B-index has several advantages compared to the most commonly used average pair-wise sequence-similarity measure: (i) it takes into account the correlation among sequences imposed by the topology of the evolutionary tree; (ii) in contrast to average pair-wise similarity, its expectations are invariant over the number and phylogenetic relations of sequences sampled from a cluster with the same common ancestor and evolutionary model; and (iii) with the B-index, the average differentiation rate of a protein family relative to a reference family sharing the same evolutionary relations (e.g., sampled from the same set of species) is simply estimated by the ratio of the lengths of the evolutionary trees of the two families.p
sec
sec
st
pEstimates of ratios of non-synonymous vs. synonymous mutation rate (KaKs)p
st
pClassification of ithsp60 itsequences as functional genes or pseudogenes was supported by the absence or presence of in-frame stop codons and frame-shifts, and by estimating non-synonymous itvs. itsynonymous mutation-rate ratios (KaKs) along relevant branches of evolutionary trees. Estimates were obtained using the maximum-likelihood branch-specific model implemented in PAML4 abbrgrp
abbr bid="B74"74abbr
abbrgrp. In the case of pseudogenes, KaKs values are expected not to significantly differ from 1 (absence of positive or negative selection at the protein level) whereas protein-coding genes, whose evolution is dominated by negative or positive selection, are expected to be characterized, respectively, by KaKs < 1 or KaKs > 1. Briefly, we applied the PAML4 "branch-specific model" creating an evolutionary tree including the sequences whose evolutionary lineage was tested, the appropriate sister sequence (in the case of pseudogenes, the gene sequence from whose lineage the pseudogene originated) and an out-group sequence. The tree branch(es) to be tested are designated as "foreground" and other branches as "background." Using the branch-specific model the KaKs ratio is estimated for the foreground branch(es) and an analogous ratio is estimated for the background branches. The likelihood itLit
sub1 subgenerated using this evolutionary model is compared to the likelihood itLit
sub0 subof a null model where KaKs for foreground branches is fixed to 1.0. In the Log-likelihood Ratio Test (LRT) the significance of the likelihood differences between the model with free estimate of KaKs and the null model is estimated by the quantity 2•ln(itLit
sub1subitLit
sub0sub), which approximates a χsup2 supdistribution.p
sec
sec
st
pData availabilityp
st
pAll relevant gene and pseudogene information, including start and end positions, chromosomal location, strand, number of exons, GenBank accession number for functional genes, and Ensembl or Pseudogene.org ID for pseudogenes, can be found in additional file supplr sid="S22"22supplr: Table S7. Newly annotated sequences have been approved and deposited in the Human Genome Organization (HUGO) database abbrgrp
abbr bid="B66"66abbr
abbrgrp.p
suppl id="S22"
title
pAdditional file 22p
title
text
p
bTable S7b. Database and sequence information on all hsp60-like sequences identified in the human, mouse and rat genomes.p
text
file name="1471-2148-10-64-S22.PDF"
pClick here for filep
file
suppl
sec
sec
sec
st
pAbbreviationsp
st
pBBS: Bardet-Biedl Syndrome; CCT: Chaperonin Containing TCP1; ML: Maximum-Likelihood; MMKS: McKusick-Kaufman Syndrome; TRiC: TCP1 Ring Complex.p
sec
sec
st
pAuthors' contributionsp
st
pKM participated in research and methodological approach design, carried out all searches and most data analyses, wrote drafts of the manuscript and participated in its refinement, compiled all tables and produced most figures; EC de M and AJLM envisioned the research project, started data collection and participated in research design and in manuscript preparation; LB participated in research design and methodological approach, produced differentiation and mutation-accumulation estimates and analyses and participated in writing the manuscript. All authors read and approved the final manuscript.p
sec
bdybm
ack
sec
st
pAcknowledgementsp
st
pThe authors thank an anonymous reviewer for providing valuable information. AJLM and EC de M thank Wesley Harlow for his help in the initial stages of this work and the San Francisco Foundation for support. LB and KM thank Mr. Steve Oden and Ms. Shaina R. Wallach for critical proofreading of the manuscript. LB thanks the University of Florida Genetics Institute for financial support.p
sec
ack
refgrpbibl id="B1"titlepMolecular chaperones in the cytosol: from nascent chain to folded proteinptitleaugausnmHartlsnmfnmFUfnmauausnmHayer-HartlsnmfnmMfnmauaugsourceSciencesourcepubdate2002pubdatevolume295volumefpage1852fpagelpage1858lpagexrefbibpubidlistpubid idtype="doi"10.1126science.1068408pubidpubid idtype="pmpid" link="fulltext"11884745pubidpubidlistxrefbibbiblbibl id="B2"titlepFolding of newly translated proteins in vivo: the role of molecular chaperonesptitleaugausnmFrydmansnmfnmJfnmauaugsourceAnnu Rev Biochemsourcepubdate2001pubdatevolume70volumefpage603fpagelpage647lpagexrefbibpubidlistpubid idtype="doi"10.1146annurev.biochem.70.1.603pubidpubid idtype="pmpid" link="fulltext"11395418pubidpubidlistxrefbibbiblbibl id="B3"titlepStructure and function in GroEL-mediated protein foldingptitleaugausnmSiglersnmfnmPBfnmauausnmXusnmfnmZfnmauausnmRyesnmfnmHSfnmauausnmBurstonsnmfnmSGfnmauausnmFentonsnmfnmWAfnmauausnmHorwichsnmfnmALfnmauaugsourceAnnu Rev Biochemsourcepubdate1998pubdatevolume67volumefpage581fpagelpage608lpagexrefbibpubidlistpubid idtype="doi"10.1146annurev.biochem.67.1.581pubidpubid idtype="pmpid" link="fulltext"9759498pubidpubidlistxrefbibbiblbibl id="B4"titlepThe Hsp70 and Hsp60 chaperone machinesptitleaugausnmBukausnmfnmBfnmauausnmHorwichsnmfnmALfnmauaugsourceCellsourcepubdate1998pubdatevolume92volumefpage351fpagelpage366lpagexrefbibpubidlistpubid idtype="doi"10.1016S0092-8674(00)80928-9pubidpubid idtype="pmpid" link="fulltext"9476895pubidpubidlistxrefbibbiblbibl id="B5"titlepHomologous plant and bacterial proteins chaperone oligomeric protein assemblyptitleaugausnmHemmingsensnmfnmSMfnmauausnmWoolfordsnmfnmCfnmauausnmViessnmmnmvan dermnmfnmSMfnmauausnmTillysnmfnmKfnmauausnmDennissnmfnmDTfnmauausnmGeorgopoulossnmfnmCPfnmauausnmHendrixsnmfnmRWfnmauausnmEllissnmfnmRJfnmauaugsourceNaturesourcepubdate1988pubdatevolume333volumefpage330fpagelpage334lpagexrefbibpubidlistpubid idtype="doi"10.1038333330a0pubidpubid idtype="pmpid" link="fulltext"2897629pubidpubidlistxrefbibbiblbibl id="B6"titlepA molecular chaperone from a thermophilic archaebacterium is related to the eukaryotic protein t-complex polypeptide-1ptitleaugausnmTrentsnmfnmJDfnmauausnmNimmesgernsnmfnmEfnmauausnmWallsnmfnmJSfnmauausnmHartlsnmfnmFUfnmauausnmHorwichsnmfnmALfnmauaugsourceNaturesourcepubdate1991pubdatevolume354volumefpage490fpagelpage493lpagexrefbibpubidlistpubid idtype="doi"10.1038354490a0pubidpubid idtype="pmpid" link="fulltext"1836250pubidpubidlistxrefbibbiblbibl id="B7"titlepThe chaperonin containing t-complex polypeptide 1 (TCP-1). Multisubunit machinery assisting in protein folding and assembly in the eukaryotic cytosolptitleaugausnmKubotasnmfnmHfnmauausnmHynessnmfnmGfnmauausnmWillisonsnmfnmKfnmauaugsourceEur J Biochemsourcepubdate1995pubdatevolume230volumefpage3fpagelpage16lpagexrefbibpubidlistpubid idtype="doi"10.1111j.1432-1033.1995.tb20527.xpubidpubid idtype="pmpid" link="fulltext"7601114pubidpubidlistxrefbibbiblbibl id="B8"titlepEvolution of assisted protein folding: the distribution of the main chaperoning systems within the phylogenetic domain archaeaptitleaugausnmMacariosnmfnmAJLfnmauausnmMalzsnmfnmMfnmauausnmConway de MacariosnmfnmEfnmauaugsourceFront Bioscisourcepubdate2004pubdatevolume9volumefpage1318fpagelpage1332lpagexrefbibpubidlistpubid idtype="doi"10.27411328pubidpubid idtype="pmpid" link="fulltext"14977547pubidpubidlistxrefbibbiblbibl id="B9"titlepStructural comparison of prokaryotic and eukaryotic chaperoninsptitleaugausnmCarrascosasnmfnmJLfnmauausnmLlorcasnmfnmOfnmauausnmValpuestasnmfnmJMfnmauaugsourceMicronsourcepubdate2001pubdatevolume32volumefpage43fpagelpage50lpagexrefbibpubidlistpubid idtype="doi"10.1016S0968-4328(00)00027-5pubidpubid idtype="pmpid" link="fulltext"10900379pubidpubidlistxrefbibbiblbibl id="B10"titlepArchaeal chaperoninsptitleaugausnmLargesnmfnmATfnmauausnmLundsnmfnmPAfnmauaugsourceFront Bioscisourcepubdate2009pubdatevolume14volumefpage1304fpagelpage1324lpagexrefbibpubidlistpubid idtype="doi"10.27413310pubidpubid idtype="pmpid" link="fulltext"19273132pubidpubidlistxrefbibbiblbibl id="B11"titlepAllosteric signaling of ATP hydrolysis in GroEL-GroES complexesptitleaugausnmRansonsnmfnmNAfnmauausnmClaresnmfnmDKfnmauausnmFarrsnmfnmGWfnmauausnmHouldershawsnmfnmDfnmauausnmHorwichsnmfnmALfnmauausnmSaibilsnmfnmHRfnmauaugsourceNat Struct Mol Biolsourcepubdate2006pubdatevolume13volumefpage147fpagelpage152lpagexrefbibpubidlistpubid idtype="doi"10.1038nsmb1046pubidpubid idtype="pmpid" link="fulltext"16429154pubidpubidlistxrefbibbiblbibl id="B12"titlepChaperonins can catalyse the reversal of early aggregation steps when a protein misfoldsptitleaugausnmRansonsnmfnmNAfnmauausnmDunstersnmfnmNJfnmauausnmBurstonsnmfnmSGfnmauausnmClarkesnmfnmARfnmauaugsourceJ Mol Biolsourcepubdate1995pubdatevolume250volumefpage581fpagelpage586lpagexrefbibpubidlistpubid idtype="doi"10.1006jmbi.1995.0399pubidpubid idtype="pmpid" link="fulltext"7623376pubidpubidlistxrefbibbiblbibl id="B13"titlepChaperoninsptitleaugausnmRansonsnmfnmNAfnmauausnmWhitesnmfnmHEfnmauausnmSaibilsnmfnmHRfnmauaugsourceBiochem Jsourcepubdate1998pubdatevolume333volumeissuePt 2issuefpage233fpagelpage242lpagexrefbibpubidlistpubid idtype="pmcid"1219577pubidpubid idtype="pmpid" link="fulltext"9657960pubidpubidlistxrefbibbiblbibl id="B14"titlepType I chaperonins: not all are created equalptitleaugausnmLevy-RimlersnmfnmGfnmauausnmBellsnmfnmREfnmauausnmBen-TalsnmfnmNfnmauausnmAzemsnmfnmAfnmauaugsourceFEBS Lettsourcepubdate2002pubdatevolume529volumefpage1fpagelpage5lpagexrefbibpubidlistpubid idtype="doi"10.1016S0014-5793(02)03178-2pubidpubid idtype="pmpid" link="fulltext"12354603pubidpubidlistxrefbibbiblbibl id="B15"titlepFunction in protein folding of TRiC, a cytosolic ring complex containing TCP-1 and structurally related subunitsptitleaugausnmFrydmansnmfnmJfnmauausnmNimmesgernsnmfnmEfnmauausnmErdjument-BromagesnmfnmHfnmauausnmWallsnmfnmJSfnmauausnmTempstsnmfnmPfnmauausnmHartlsnmfnmFUfnmauaugsourceEMBO Jsourcepubdate1992pubdatevolume11volumefpage4767fpagelpage4778lpagexrefbibpubidlistpubid idtype="pmcid"556952pubidpubid idtype="pmpid"1361170pubidpubidlistxrefbibbiblbibl id="B16"titlepIdentification of six Tcp-1-related genes encoding divergent subunits of the TCP-1-containing chaperoninptitleaugausnmKubotasnmfnmHfnmauausnmHynessnmfnmGfnmauausnmCarnesnmfnmAfnmauausnmAshworthsnmfnmAfnmauausnmWillisonsnmfnmKfnmauaugsourceCurr Biolsourcepubdate1994pubdatevolume4volumefpage89fpagelpage99lpagexrefbibpubidlistpubid idtype="doi"10.1016S0960-9822(94)00024-2pubidpubid idtype="pmpid" link="fulltext"7953530pubidpubidlistxrefbibbiblbibl id="B17"titlepReview: the Cct eukaryotic chaperonin subunits of Saccharomyces cerevisiae and other yeastsptitleaugausnmStoldtsnmfnmVfnmauausnmRademachersnmfnmFfnmauausnmKehrensnmfnmVfnmauausnmErnstsnmfnmJFfnmauausnmPearcesnmfnmDAfnmauausnmShermansnmfnmFfnmauaugsourceYeastsourcepubdate1996pubdatevolume12volumefpage523fpagelpage529lpagexrefbibpubidlistpubid idtype="doi"10.1002(SICI)1097-0061(199605)12:6<523::AID-YEA962>3.0.CO;2-Cpubidpubid idtype="pmpid"8771707pubidpubidlistxrefbibbiblbibl id="B18"titlepHsp60 expression, new locations, functions and perspectives for cancer diagnosis and therapyptitleaugausnmCappellosnmfnmFfnmauausnmConway de MacariosnmfnmEfnmauausnmMarasasnmfnmLfnmauausnmZummosnmfnmGfnmauausnmMacariosnmfnmAJLfnmauaugsourceCancer Biol Thersourcepubdate2008pubdatevolume7volumefpage801fpagelpage809lpagexrefbibpubid idtype="pmpid" link="fulltext"18497565pubidxrefbibbiblbibl id="B19"titlepChaperonopathies by defect, excess, or mistakeptitleaugausnmMacariosnmfnmAJLfnmauausnmConway de MacariosnmfnmEfnmauaugsourceAnn N Y Acad Scisourcepubdate2007pubdatevolume1113volumefpage178fpagelpage191lpagexrefbibpubidlistpubid idtype="doi"10.1196annals.1391.009pubidpubid idtype="pmpid" link="fulltext"17483209pubidpubidlistxrefbibbiblbibl id="B20"titlepSick chaperones, cellular stress, and diseaseptitleaugausnmMacariosnmfnmAJLfnmauausnmConway de MacariosnmfnmEfnmauaugsourceN Engl J Medsourcepubdate2005pubdatevolume353volumefpage1489fpagelpage1501lpagexrefbibpubidlistpubid idtype="doi"10.1056NEJMra050111pubidpubid idtype="pmpid" link="fulltext"16207851pubidpubidlistxrefbibbiblbibl id="B21"titlepMutation of a gene encoding a putative chaperonin causes McKusick-Kaufman syndromeptitleaugausnmStonesnmfnmDLfnmauausnmSlavotineksnmfnmAfnmauausnmBouffardsnmfnmGGfnmauausnmBanerjee-BasusnmfnmSfnmauausnmBaxevanissnmfnmADfnmauausnmBarrsnmfnmMfnmauausnmBieseckersnmfnmLGfnmauaugsourceNat Genetsourcepubdate2000pubdatevolume25volumefpage79fpagelpage82lpagexrefbibpubidlistpubid idtype="doi"10.103875637pubidpubid idtype="pmpid" link="fulltext"10802661pubidpubidlistxrefbibbiblbibl id="B22"titlepBBS10 encodes a vertebrate-specific chaperonin-like protein and is a major BBS locusptitleaugausnmStoetzelsnmfnmCfnmauausnmLauriersnmfnmVfnmauausnmDavissnmfnmEEfnmauausnmMullersnmfnmJfnmauausnmRixsnmfnmSfnmauausnmBadanosnmfnmJLfnmauausnmLeitchsnmfnmCCfnmauausnmSalemsnmfnmNfnmauausnmChouerysnmfnmEfnmauausnmCorbanisnmfnmSfnmauetalaugsourceNat Genetsourcepubdate2006pubdatevolume38volumefpage521fpagelpage524lpagexrefbibpubidlistpubid idtype="doi"10.1038ng1771pubidpubid idtype="pmpid" link="fulltext"16582908pubidpubidlistxrefbibbiblbibl id="B23"titlepIdentification of a novel BBS gene (BBS12) highlights the major role of a vertebrate-specific branch of chaperonin-related proteins in Bardet-Biedl syndromeptitleaugausnmStoetzelsnmfnmCfnmauausnmMullersnmfnmJfnmauausnmLauriersnmfnmVfnmauausnmDavissnmfnmEEfnmauausnmZaghloulsnmfnmNAfnmauausnmVicairesnmfnmSfnmauausnmJacquelinsnmfnmCfnmauausnmPlewniaksnmfnmFfnmauausnmLeitchsnmfnmCCfnmauausnmSardasnmfnmPfnmauetalaugsourceAm J Hum Genetsourcepubdate2007pubdatevolume80volumefpage1fpagelpage11lpagexrefbibpubidlistpubid idtype="doi"10.1086510256pubidpubid idtype="pmcid"1785304pubidpubid idtype="pmpid" link="fulltext"17160889pubidpubidlistxrefbibbiblbibl id="B24"titlepMutations in MKKS cause obesity, retinal dystrophy and renal malformations associated with Bardet-Biedl syndromeptitleaugausnmKatsanissnmfnmNfnmauausnmBealessnmfnmPLfnmauausnmWoodssnmfnmMOfnmauausnmLewissnmfnmRAfnmauausnmGreensnmfnmJSfnmauausnmParfreysnmfnmPSfnmauausnmAnsleysnmfnmSJfnmauausnmDavidsonsnmfnmWSfnmauausnmLupskisnmfnmJRfnmauaugsourceNat Genetsourcepubdate2000pubdatevolume26volumefpage67fpagelpage70lpagexrefbibpubidlistpubid idtype="doi"10.103879201pubidpubid idtype="pmpid" link="fulltext"10973251pubidpubidlistxrefbibbiblbibl id="B25"titlepBardet-Biedl syndrome: an emerging pathomechanism of intracellular transportptitleaugausnmBlacquesnmfnmOEfnmauausnmLerouxsnmfnmMRfnmauaugsourceCell Mol Life Scisourcepubdate2006pubdatevolume63volumefpage2145fpagelpage2161lpagexrefbibpubidlistpubid idtype="doi"10.1007s00018-006-6180-xpubidpubid idtype="pmpid" link="fulltext"16909204pubidpubidlistxrefbibbiblbibl id="B26"titlepMKKS is a centrosome-shuttling protein degraded by disease-causing mutations via CHIP-mediated ubiquitinationptitleaugausnmHirayamasnmfnmSfnmauausnmYamazakisnmfnmYfnmauausnmKitamurasnmfnmAfnmauausnmOdasnmfnmYfnmauausnmMoritosnmfnmDfnmauausnmOkawasnmfnmKfnmauausnmKimurasnmfnmHfnmauausnmCyrsnmfnmDMfnmauausnmKubotasnmfnmHfnmauausnmNagatasnmfnmKfnmauaugsourceMol Biol Cellsourcepubdate2008pubdatevolume19volumefpage899fpagelpage911lpagexrefbibpubidlistpubid idtype="doi"10.1091mbc.E07-07-0631pubidpubid idtype="pmcid"2262992pubidpubid idtype="pmpid" link="fulltext"18094050pubidpubidlistxrefbibbiblbibl id="B27"titlepMKKSBBS6, a divergent chaperonin-like protein linked to the obesity disorder Bardet-Biedl syndrome, is a novel centrosomal component required for cytokinesisptitleaugausnmKimsnmfnmJCfnmauausnmOusnmfnmYYfnmauausnmBadanosnmfnmJLfnmauausnmEsmailsnmfnmMAfnmauausnmLeitchsnmfnmCCfnmauausnmFiedrichsnmfnmEfnmauausnmBealessnmfnmPLfnmauausnmArchibaldsnmfnmJMfnmauausnmKatsanissnmfnmNfnmauausnmRattnersnmfnmJBfnmauetalaugsourceJ Cell Scisourcepubdate2005pubdatevolume118volumefpage1007fpagelpage1020lpagexrefbibpubidlistpubid idtype="doi"10.1242jcs.01676pubidpubid idtype="pmpid" link="fulltext"15731008pubidpubidlistxrefbibbiblbibl id="B28"titlepTransient ciliogenesis involving Bardet-Biedl syndrome proteins is a fundamental characteristic of adipogenic differentiationptitleaugausnmMarionsnmfnmVfnmauausnmStoetzelsnmfnmCfnmauausnmSchlichtsnmfnmDfnmauausnmMessaddeqsnmfnmNfnmauausnmKochsnmfnmMfnmauausnmFlorisnmfnmEfnmauausnmDansesnmfnmJMfnmauausnmMandelsnmfnmJLfnmauausnmDollfussnmfnmHfnmauaugsourceProc Natl Acad Sci USAsourcepubdate2009pubdatevolume106volumefpage1820fpagelpage1825lpagexrefbibpubidlistpubid idtype="doi"10.1073pnas.0812518106pubidpubid idtype="pmcid"2635307,2635307pubidpubid idtype="pmpid" link="fulltext"19190184pubidpubidlistxrefbibbiblbibl id="B29"titlepChaperonomics, a new tool to study ageing and associated diseasesptitleaugausnmBrocchierisnmfnmLfnmauausnmConway de MacariosnmfnmEfnmauausnmMacariosnmfnmAJLfnmauaugsourceMech Ageing Devsourcepubdate2007pubdatevolume128volumefpage125fpagelpage136lpagexrefbibpubidlistpubid idtype="doi"10.1016j.mad.2006.11.019pubidpubid idtype="pmpid" link="fulltext"17123587pubidpubidlistxrefbibbiblbibl id="B30"titlepEntrez Geneptitleurlhttp:www.ncbi.nlm.nih.govsitesentrezdb=geneurlbiblbibl id="B31"titlepCloning, characterization, and expression of a novel Zn2+-binding FYVE finger-containing phosphoinositide kinase in insulin-sensitive cellsptitleaugausnmShishevasnmfnmAfnmauausnmSbrissasnmfnmDfnmauausnmIkonomovsnmfnmOfnmauaugsourceMol Cell Biolsourcepubdate1999pubdatevolume19volumefpage623fpagelpage634lpagexrefbibpubidlistpubid idtype="pmcid"83920pubidpubid idtype="pmpid" link="fulltext"9858586pubidpubidlistxrefbibbiblbibl id="B32"titlepMutations in PIP5K3 are associated with Francois-Neetens mouchetee fleck corneal dystrophyptitleaugausnmLisnmfnmSfnmauausnmTiabsnmfnmLfnmauausnmJiaosnmfnmXfnmauausnmMuniersnmfnmFLfnmauausnmZografossnmfnmLfnmauausnmFruehsnmfnmBEfnmauausnmSergeevsnmfnmYfnmauausnmSmithsnmfnmJfnmauausnmRubinsnmfnmBfnmauausnmMealletsnmfnmMAfnmauetalaugsourceAm J Hum Genetsourcepubdate2005pubdatevolume77volumefpage54fpagelpage63lpagexrefbibpubidlistpubid idtype="doi"10.1086431346pubidpubid idtype="pmcid"1226194pubidpubid idtype="pmpid" link="fulltext"15902656pubidpubidlistxrefbibbiblbibl id="B33"titlepPseudogene.org: a comprehensive database and comparison platform for pseudogene annotationptitleaugausnmKarrosnmfnmJEfnmauausnmYansnmfnmYfnmauausnmZhengsnmfnmDfnmauausnmZhangsnmfnmZfnmauausnmCarrierosnmfnmNfnmauausnmCaytingsnmfnmPfnmauausnmHarrrisonsnmfnmPfnmauausnmGersteinsnmfnmMfnmauaugsourceNucleic Acids Ressourcepubdate2007pubdatevolume35volumefpageD55fpagelpage60lpagexrefbibpubidlistpubid idtype="doi"10.1093nargkl851pubidpubid idtype="pmcid"1669708pubidpubid idtype="pmpid" link="fulltext"17099229pubidpubidlistxrefbibbiblbibl id="B34"titlepEnsemblptitleurlhttp:www.ensembl.orgindex.htmlurlbiblbibl id="B35"titlepMolecules consolidate the placental mammal treeptitleaugausnmSpringersnmfnmMSfnmauausnmStanhopesnmfnmMJfnmauausnmMadsensnmfnmOfnmauausnmde JongsnmfnmWWfnmauaugsourceTrends Ecol Evolsourcepubdate2004pubdatevolume19volumefpage430fpagelpage438lpagexrefbibpubidlistpubid idtype="doi"10.1016j.tree.2004.05.006pubidpubid idtype="pmpid" link="fulltext"16701301pubidpubidlistxrefbibbiblbibl id="B36"titlepCrystal structure of the thermosome, the archaeal chaperonin and homolog of CCTptitleaugausnmDitzelsnmfnmLfnmauausnmLowesnmfnmJfnmauausnmStocksnmfnmDfnmauausnmStettersnmfnmKOfnmauausnmHubersnmfnmHfnmauausnmHubersnmfnmRfnmauausnmSteinbachersnmfnmSfnmauaugsourceCellsourcepubdate1998pubdatevolume93volumefpage125fpagelpage138lpagexrefbibpubidlistpubid idtype="doi"10.1016S0092-8674(00)81152-6pubidpubid idtype="pmpid" link="fulltext"9546398pubidpubidlistxrefbibbiblbibl id="B37"titlepIdentification of the TRiCCCT substrate binding sites uncovers the function of subunit diversity in eukaryotic chaperoninsptitleaugausnmSpiesssnmfnmCfnmauausnmMillersnmfnmEJfnmauausnmMcClellansnmfnmAJfnmauausnmFrydmansnmfnmJfnmauaugsourceMol Cellsourcepubdate2006pubdatevolume24volumefpage25fpagelpage37lpagexrefbibpubidlistpubid idtype="doi"10.1016j.molcel.2006.09.003pubidpubid idtype="pmpid" link="fulltext"17018290pubidpubidlistxrefbibbiblbibl id="B38"titlepPositive selection and subfunctionalization of duplicated CCT chaperonin subunitsptitleaugausnmFaressnmfnmMAfnmauausnmWolfesnmfnmKHfnmauaugsourceMol Biol Evolsourcepubdate2003pubdatevolume20volumefpage1588fpagelpage1597lpagexrefbibpubidlistpubid idtype="doi"10.1093molbevmsg160pubidpubid idtype="pmpid" link="fulltext"12832642pubidpubidlistxrefbibbiblbibl id="B39"titlepOrigin and evolution of eukaryotic chaperonins: phylogenetic evidence for ancient duplications in CCT genesptitleaugausnmArchibaldsnmfnmJMfnmauausnmLogsdonsnmfnmJMfnmsufJrsufauausnmDoolittlesnmfnmWFfnmauaugsourceMol Biol Evolsourcepubdate2000pubdatevolume17volumefpage1456fpagelpage1466lpagexrefbibpubid idtype="pmpid" link="fulltext"11018153pubidxrefbibbiblbibl id="B40"titlepTranscribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding abilityptitleaugausnmHarrisonsnmfnmPMfnmauausnmZhengsnmfnmDfnmauausnmZhangsnmfnmZfnmauausnmCarrierosnmfnmNfnmauausnmGersteinsnmfnmMfnmauaugsourceNucleic Acids Ressourcepubdate2005pubdatevolume33volumefpage2374fpagelpage2383lpagexrefbibpubidlistpubid idtype="doi"10.1093nargki531pubidpubid idtype="pmcid"1087782pubidpubid idtype="pmpid" link="fulltext"15860774pubidpubidlistxrefbibbiblbibl id="B41"titlepAnalysis of the role of retrotransposition in gene evolution in vertebratesptitleaugausnmYusnmfnmZfnmauausnmMoraissnmfnmDfnmauausnmIvangasnmfnmMfnmauausnmHarrisonsnmfnmPMfnmauaugsourceBMC Bioinformaticssourcepubdate2007pubdatevolume8volumefpage308fpagexrefbibpubidlistpubid idtype="doi"10.11861471-2105-8-308pubidpubid idtype="pmcid"2048973pubidpubid idtype="pmpid" link="fulltext"17718914pubidpubidlistxrefbibbiblbibl id="B42"titlepConservation among HSP60 sequences in relation to structure, function, and evolutionptitleaugausnmBrocchierisnmfnmLfnmauausnmKarlinsnmfnmSfnmauaugsourceProtein Scisourcepubdate2000pubdatevolume9volumefpage476fpagelpage486lpagexrefbibpubidlistpubid idtype="pmcid"2144576pubidpubid idtype="pmpid" link="fulltext"10752609pubidpubidlistxrefbibbiblbibl id="B43"titlepThe asymmetric ATPase cycle of the thermosome: elucidation of the binding, hydrolysis and product-release stepsptitleaugausnmBigottisnmfnmMGfnmauausnmBellamysnmfnmSRfnmauausnmClarkesnmfnmARfnmauaugsourceJ Mol Biolsourcepubdate2006pubdatevolume362volumefpage835fpagelpage843lpagexrefbibpubidlistpubid idtype="doi"10.1016j.jmb.2006.07.064pubidpubid idtype="pmpid" link="fulltext"16942780pubidpubidlistxrefbibbiblbibl id="B44"titlepCooperativity in the thermosomeptitleaugausnmBigottisnmfnmMGfnmauausnmClarkesnmfnmARfnmauaugsourceJ Mol Biolsourcepubdate2005pubdatevolume348volumefpage13fpagelpage26lpagexrefbibpubidlistpubid idtype="doi"10.1016j.jmb.2005.01.066pubidpubid idtype="pmpid" link="fulltext"15808850pubidpubidlistxrefbibbiblbibl id="B45"titlepA kinetic analysis of the nucleotide-induced allosteric transitions of GroELptitleaugausnmCliffsnmfnmMJfnmauausnmKadsnmfnmNMfnmauausnmHaysnmfnmNfnmauausnmLundsnmfnmPAfnmauausnmWebbsnmfnmMRfnmauausnmBurstonsnmfnmSGfnmauausnmClarkesnmfnmARfnmauaugsourceJ Mol Biolsourcepubdate1999pubdatevolume293volumefpage667fpagelpage684lpagexrefbibpubidlistpubid idtype="doi"10.1006jmbi.1999.3138pubidpubid idtype="pmpid" link="fulltext"10543958pubidpubidlistxrefbibbiblbibl id="B46"titlepBinding and hydrolysis of nucleotides in the chaperonin catalytic cycle: implications for the mechanism of assisted protein foldingptitleaugausnmJacksonsnmfnmGSfnmauausnmStaniforthsnmfnmRAfnmauausnmHalsallsnmfnmDJfnmauausnmAtkinsonsnmfnmTfnmauausnmHolbrooksnmfnmJJfnmauausnmClarkesnmfnmARfnmauausnmBurstonsnmfnmSGfnmauaugsourceBiochemistrysourcepubdate1993pubdatevolume32volumefpage2554fpagelpage2563lpagexrefbibpubidlistpubid idtype="doi"10.1021bi00061a013pubidpubid idtype="pmpid"8095403pubidpubidlistxrefbibbiblbibl id="B47"titlepTransient kinetic analysis of ATP-induced allosteric transitions in the eukaryotic chaperonin containing TCP-1ptitleaugausnmKafrisnmfnmGfnmauausnmHorovitzsnmfnmAfnmauaugsourceJ Mol Biolsourcepubdate2003pubdatevolume326volumefpage981fpagelpage987lpagexrefbibpubidlistpubid idtype="doi"10.1016S0022-2836(03)00046-9pubidpubid idtype="pmpid" link="fulltext"12589746pubidpubidlistxrefbibbiblbibl id="B48"titlepNested allosteric interactions in the cytoplasmic chaperonin containing TCP-1ptitleaugausnmKafrisnmfnmGfnmauausnmWillisonsnmfnmKRfnmauausnmHorovitzsnmfnmAfnmauaugsourceProtein Scisourcepubdate2001pubdatevolume10volumefpage445fpagelpage449lpagexrefbibpubidlistpubid idtype="doi"10.1110ps.44401pubidpubid idtype="pmcid"2373951pubidpubid idtype="pmpid" link="fulltext"11266630pubidpubidlistxrefbibbiblbibl id="B49"titlepSequential action of ATP-dependent subunit conformational change and interaction between helical protrusions in the closure of the built-in lid of group II chaperoninsptitleaugausnmKanzakisnmfnmTfnmauausnmIizukasnmfnmRfnmauausnmTakahashisnmfnmKfnmauausnmMakisnmfnmKfnmauausnmMasudasnmfnmRfnmauausnmSahlansnmfnmMfnmauausnmYebenessnmfnmHfnmauausnmValpuestasnmfnmJMfnmauausnmOkasnmfnmTfnmauausnmFurutanisnmfnmMfnmauetalaugsourceJ Biol Chemsourcepubdate2008pubdatevolume283volumefpage34773fpagelpage34784lpagexrefbibpubidlistpubid idtype="doi"10.1074jbc.M805303200pubidpubid idtype="pmpid" link="fulltext"18854314pubidpubidlistxrefbibbiblbibl id="B50"titlepAffinity of chaperonin-60 for a protein substrate and its modulation by nucleotides and chaperonin-10ptitleaugausnmStaniforthsnmfnmRAfnmauausnmBurstonsnmfnmSGfnmauausnmAtkinsonsnmfnmTfnmauausnmClarkesnmfnmARfnmauaugsourceBiochem Jsourcepubdate1994pubdatevolume300volumeissuePt 3issuefpage651fpagelpage658lpagexrefbibpubidlistpubid idtype="pmcid"1138217pubidpubid idtype="pmpid"7912068pubidpubidlistxrefbibbiblbibl id="B51"titlepDynamics of the chaperonin ATPase cycle: implications for facilitated protein foldingptitleaugausnmToddsnmfnmMJfnmauausnmViitanensnmfnmPVfnmauausnmLorimersnmfnmGHfnmauaugsourceSciencesourcepubdate1994pubdatevolume265volumefpage659fpagelpage666lpagexrefbibpubidlistpubid idtype="doi"10.1126science.7913555pubidpubid idtype="pmpid" link="fulltext"7913555pubidpubidlistxrefbibbiblbibl id="B52"titlepCoupling between protein folding and allostery in the GroE chaperonin systemptitleaugausnmYifrachsnmfnmOfnmauausnmHorovitzsnmfnmAfnmauaugsourceProc Natl Acad Sci USAsourcepubdate2000pubdatevolume97volumefpage1521fpagelpage1524lpagexrefbibpubidlistpubid idtype="doi"10.1073pnas.040449997pubidpubid idtype="pmcid"26467pubidpubid idtype="pmpid" link="fulltext"10677493pubidpubidlistxrefbibbiblbibl id="B53"titlepDisassembly of the cytosolic chaperonin in mammalian cell extracts at intracellular levels of K+ and ATPptitleaugausnmRoobolsnmfnmAfnmauausnmGranthamsnmfnmJfnmauausnmWhitakersnmfnmHCfnmauausnmCardensnmfnmMJfnmauaugsourceJ Biol Chemsourcepubdate1999pubdatevolume274volumefpage19220fpagelpage19227lpagexrefbibpubidlistpubid idtype="doi"10.1074jbc.274.27.19220pubidpubid idtype="pmpid" link="fulltext"10383429pubidpubidlistxrefbibbiblbibl id="B54"titlepThe role of plant CCTalpha in salt- and osmotic-stress toleranceptitleaugausnmYamadasnmfnmAfnmauausnmSekiguchisnmfnmMfnmauausnmMimurasnmfnmTfnmauausnmOzekisnmfnmYfnmauaugsourcePlant Cell Physiolsourcepubdate2002pubdatevolume43volumefpage1043fpagelpage1048lpagexrefbibpubidlistpubid idtype="doi"10.1093pcppcf120pubidpubid idtype="pmpid" link="fulltext"12354922pubidpubidlistxrefbibbiblbibl id="B55"titlepSelected subunits of the cytosolic chaperonin associate with microtubules assembled in vitroptitleaugausnmRoobolsnmfnmAfnmauausnmSahyounsnmfnmZPfnmauausnmCardensnmfnmMJfnmauaugsourceJ Biol Chemsourcepubdate1999pubdatevolume274volumefpage2408fpagelpage2415lpagexrefbibpubidlistpubid idtype="doi"10.1074jbc.274.4.2408pubidpubid idtype="pmpid" link="fulltext"9891010pubidpubidlistxrefbibbiblbibl id="B56"titlepEukaryotic chaperonin containing T-complex polypeptide 1 interacts with filamentous actin and reduces the initial rate of actin polymerization in vitroptitleaugausnmGranthamsnmfnmJfnmauausnmRuddocksnmfnmLWfnmauausnmRoobolsnmfnmAfnmauausnmCardensnmfnmMJfnmauaugsourceCell Stress Chaperonessourcepubdate2002pubdatevolume7volumefpage235fpagelpage242lpagexrefbibpubidlistpubid idtype="doi"10.13791466-1268(2002)007<0235:ECCTCP>2.0.CO;2pubidpubid idtype="pmcid"514823pubidpubid idtype="pmpid"12482199pubidpubidlistxrefbibbiblbibl id="B57"titlepLoss of Bardet-Biedl syndrome proteins alters the morphology and function of motile cilia in airway epitheliaptitleaugausnmShahsnmfnmASfnmauausnmFarmensnmfnmSLfnmauausnmMoningersnmfnmTOfnmauausnmBusingasnmfnmTRfnmauausnmAndrewssnmfnmMPfnmauausnmBuggesnmfnmKfnmauausnmSearbysnmfnmCCfnmauausnmNishimurasnmfnmDfnmauausnmBrogdensnmfnmKAfnmauausnmKlinesnmfnmJNfnmauetalaugsourceProc Natl Acad Sci USAsourcepubdate2008pubdatevolume105volumefpage3380fpagelpage3385lpagexrefbibpubidlistpubid idtype="doi"10.1073pnas.0712327105pubidpubid idtype="pmcid"2265193pubidpubid idtype="pmpid" link="fulltext"18299575pubidpubidlistxrefbibbiblbibl id="B58"titlepGapped BLAST and PSI-BLAST: a new generation of protein database search programsptitleaugausnmAltschulsnmfnmSFfnmauausnmMaddensnmfnmTLfnmauausnmSchaffersnmfnmAAfnmauausnmZhangsnmfnmJfnmauausnmZhangsnmfnmZfnmauausnmMillersnmfnmWfnmauausnmLipmansnmfnmDJfnmauaugsourceNucleic Acids Ressourcepubdate1997pubdatevolume25volumefpage3389fpagelpage3402lpagexrefbibpubidlistpubid idtype="doi"10.1093nar25.17.3389pubidpubid idtype="pmcid"146917pubidpubid idtype="pmpid" link="fulltext"9254694pubidpubidlistxrefbibbiblbibl id="B59"titlepBLAT--the BLAST-like alignment toolptitleaugausnmKentsnmfnmWJfnmauaugsourceGenome Ressourcepubdate2002pubdatevolume12volumefpage656fpagelpage664lpagexrefbibpubidlistpubid idtype="pmcid"187518pubidpubid idtype="pmpid" link="fulltext"11932250pubidpubidlistxrefbibbiblbibl id="B60"titlepBLAT Search Genomeptitleurlhttp:genome.ucsc.educgi-binhgBlaturlbiblbibl id="B61"titlepIterated profile searches with PSI-BLAST--a tool for discovery in protein databasesptitleaugausnmAltschulsnmfnmSFfnmauausnmKooninsnmfnmEVfnmauaugsourceTrends Biochem Scisourcepubdate1998pubdatevolume23volumefpage444fpagelpage447lpagexrefbibpubidlistpubid idtype="doi"10.1016S0968-0004(98)01298-5pubidpubid idtype="pmpid" link="fulltext"9852764pubidpubidlistxrefbibbiblbibl id="B62"titlepAb initio gene finding in Drosophila genomic DNAptitleaugausnmSalamovsnmfnmAAfnmauausnmSolovyevsnmfnmVVfnmauaugsourceGenome Ressourcepubdate2000pubdatevolume10volumefpage516fpagelpage522lpagexrefbibpubidlistpubid idtype="doi"10.1101gr.10.4.516pubidpubid idtype="pmcid"310882pubidpubid idtype="pmpid" link="fulltext"10779491pubidpubidlistxrefbibbiblbibl id="B63"titlepSoftberryptitleurlhttp:www.softberry.comurlbiblbibl id="B64"titlepA symmetric-iterated multiple alignment of protein sequencesptitleaugausnmBrocchierisnmfnmLfnmauausnmKarlinsnmfnmSfnmauaugsourceJ Mol Biolsourcepubdate1998pubdatevolume276volumefpage249fpagelpage264lpagexrefbibpubidlistpubid idtype="doi"10.1006jmbi.1997.1527pubidpubid idtype="pmpid" link="fulltext"9514731pubidpubidlistxrefbibbiblbibl id="B65"titlepPseudogene.orgptitleurlhttp:www.pseudogene.orgurlbiblbibl id="B66"titlepHUGO Gene Nomenclature Committeeptitleurlhttp:www.genenames.orgurlbiblbibl id="B67"titlepMUSCLE: multiple sequence alignment with high accuracy and high throughputptitleaugausnmEdgarsnmfnmRCfnmauaugsourceNucleic Acids Ressourcepubdate2004pubdatevolume32volumefpage1792fpagelpage1797lpagexrefbibpubidlistpubid idtype="doi"10.1093nargkh340pubidpubid idtype="pmcid"390337pubidpubid idtype="pmpid" link="fulltext"15034147pubidpubidlistxrefbibbiblbibl id="B68"titlepMEKHLA, a novel domain with similarity to PAS domains, is fused to plant homeodomain-leucine zipper III proteinsptitleaugausnmMukherjeesnmfnmKfnmauausnmBürglinsnmfnmTRfnmauaugsourcePlant Physiolsourcepubdate2006pubdatevolume140volumefpage1142fpagelpage1150lpagexrefbibpubidlistpubid idtype="doi"10.1104pp.105.073833pubidpubid idtype="pmcid"1435804pubidpubid idtype="pmpid" link="fulltext"16607028pubidpubidlistxrefbibbiblbibl id="B69"titlepComprehensive analysis of animal TALE homeobox genes: new conserved motifs and cases of accelerated evolutionptitleaugausnmMukherjeesnmfnmKfnmauausnmBürglinsnmfnmTRfnmauaugsourceJ Mol Evolsourcepubdate2007pubdatevolume65volumefpage137fpagelpage153lpagexrefbibpubidlistpubid idtype="doi"10.1007s00239-006-0023-0pubidpubid idtype="pmpid" link="fulltext"17665086pubidpubidlistxrefbibbiblbibl id="B70"titlepThe Jpred 3 secondary structure prediction serverptitleaugausnmColesnmfnmCfnmauausnmBarbersnmfnmJDfnmauausnmBartonsnmfnmGJfnmauaugsourceNucleic Acids Ressourcepubdate2008pubdatevolume36volumefpageW197fpagelpage201lpagexrefbibpubidlistpubid idtype="doi"10.1093nargkn238pubidpubid idtype="pmcid"2447793pubidpubid idtype="pmpid" link="fulltext"18463136pubidpubidlistxrefbibbiblbibl id="B71"titlepJpred 3. A Secondary Structure Prediction Serverptitleurlhttp:www.compbio.dundee.ac.ukwww-jpredurlbiblbibl id="B72"titlepA simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihoodptitleaugausnmGuindonsnmfnmSfnmauausnmGascuelsnmfnmOfnmauaugsourceSyst Biolsourcepubdate2003pubdatevolume52volumefpage696fpagelpage704lpagexrefbibpubidlistpubid idtype="doi"10.108010635150390235520pubidpubid idtype="pmpid"14530136pubidpubidlistxrefbibbiblbibl id="B73"titlepMrBayes 3: Bayesian phylogenetic inference under mixed modelsptitleaugausnmRonquistsnmfnmFfnmauausnmHuelsenbecksnmfnmJPfnmauaugsourceBioinformaticssourcepubdate2003pubdatevolume19volumefpage1572fpagelpage1574lpagexrefbibpubidlistpubid idtype="doi"10.1093bioinformaticsbtg180pubidpubid idtype="pmpid" link="fulltext"12912839pubidpubidlistxrefbibbiblbibl id="B74"titlepPAML 4: phylogenetic analysis by maximum likelihoodptitleaugausnmYangsnmfnmZfnmauaugsourceMol Biol Evolsourcepubdate2007pubdatevolume24volumefpage1586fpagelpage1591lpagexrefbibpubidlistpubid idtype="doi"10.1093molbevmsm088pubidpubid idtype="pmpid" link="fulltext"17483113pubidpubidlistxrefbibbiblrefgrp
bmart


STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00099891/00001
 Material Information
Title: Chaperonin genes on the rise: new divergent classes and intense duplication in human and other vertebrate genomes
Physical Description: Book
Language: English
Creator: Mukherjee, Krishanu
Conway, de Macario
Macario, Alberto
Brocchieri, Luciano
Publisher: BMC Evolutionary Biology
Publication Date: 2010
 Notes
Abstract: BACKGROUND:Chaperonin proteins are well known for the critical role they play in protein folding and in disease. However, the recent identification of three diverged chaperonin paralogs associated with the human Bardet-Biedl and McKusick-Kaufman Syndromes (BBS and MKKS, respectively) indicates that the eukaryotic chaperonin-gene family is larger and more differentiated than previously thought. The availability of complete genome sequences makes possible a definitive characterization of the complete set of chaperonin sequences in human and other species.RESULTS:We identified fifty-four chaperonin-like sequences in the human genome and similar numbers in the genomes of the model organisms mouse and rat. In mammal genomes we identified, besides the well-known CCT chaperonin genes and the three genes associated with the MKKS and BBS pathological conditions, a newly-defined class of chaperonin genes named CCT8L, represented in human by the two sequences CCT8L1 and CCT8L2. Comparative analyses from several vertebrate genomes established the monophyletic origin of chaperonin-like MKKS and BBS genes from the CCT8 lineage. The CCT8L gene originated from a later duplication also in the CCT8 lineage at the onset of mammal evolution and duplicated in primate genomes. The functionality of CCT8L genes in different species was confirmed by evolutionary analyses and in human by expression data. Detailed sequence analysis and structural predictions of MKKS, BBS and CCT8L proteins strongly suggested that they conserve a typical chaperonin-like core structure but that they are unlikely to form a CCT-like oligomeric complex. The characterization of many newly-discovered chaperonin pseudogenes uncovered the intense duplication activity of eukaryotic chaperonin genes.CONCLUSIONS:In vertebrates, chaperonin genes, driven by intense duplication processes, have diversified into multiple classes and functionalities that extend beyond their well-known protein-folding role as part of the typical oligomeric chaperonin complex, emphasizing previous observations on the involvement of individual CCT monomers in microtubule elongation. The functional characterization of newly identified chaperonin genes will be a challenge for future experimental analyses.
General Note: Start page 64
General Note: M3: 10.1186/1471-2148-10-64
 Record Information
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: Open Access: http://www.biomedcentral.com/info/about/openaccess/
Resource Identifier: issn - 1471-2148
http://www.biomedcentral.com/1471-2148/10/64
System ID: UF00099891:00001

Full Text


Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


BMutionary Biology
Evolutionary Biology


Chaperonin genes on the rise: new divergent

classes and intense duplication in human and

other vertebrate genomes

Krishanu Mukherjeel'2, Everly Conway de Macario3, Alberto JL Macario3*, Luciano Brocchieri1'2*


Abstract
Background: Chaperonin proteins are well known for the critical role they play in protein folding and in disease.
However, the recent identification of three diverged chaperonin paralogs associated with the human Bardet-Biedl
and McKusick-Kaufman Syndromes (BBS and MKKS, respectively) indicates that the eukaryotic chaperonin-gene
family is larger and more differentiated than previously thought. The availability of complete genome sequences
makes possible a definitive characterization of the complete set of chaperonin sequences in human and other
species.
Results: We identified fifty-four chaperonin-like sequences in the human genome and similar numbers in the
genomes of the model organisms mouse and rat. In mammal genomes we identified, besides the well-known CCT
chaperonin genes and the three genes associated with the MKKS and BBS pathological conditions, a newly-defined
class of chaperonin genes named CCT8L, represented in human by the two sequences CCT8L1 and CCT8L2.
Comparative analyses from several vertebrate genomes established the monophyletic origin of chaperonin-like
MKKS and BBS genes from the CCT8 lineage. The CCT8L gene originated from a later duplication also in the CCT8
lineage at the onset of mammal evolution and duplicated in primate genomes. The functionality of CCT8L genes
in different species was confirmed by evolutionary analyses and in human by expression data. Detailed sequence
analysis and structural predictions of MKKS, BBS and CCT8L proteins strongly suggested that they conserve a
typical chaperonin-like core structure but that they are unlikely to form a CCT like oligomeric complex. The
characterization of many newly-discovered chaperonin pseudogenes uncovered the intense duplication activity of
eukaryotic chaperonin genes.
Conclusions: In vertebrates, chaperonin genes, driven by intense duplication processes, have diversified into
multiple classes and functionalities that extend beyond their well-known protein-folding role as part of the typical
oligomeric chaperonin complex, emphasizing previous observations on the involvement of individual CCT
monomers in microtubule elongation. The functional characterization of newly identified chaperonin genes will be
a challenge for future experimental analyses.


Background
Hsp60-like chaperonin proteins are well known for their
role in assisting protein folding and in protecting cells
from the deleterious effects of stress [1-5]. The eukaryo-
tic cell expresses representatives of two distinct groups
of chaperonin genes that are otherwise typical of

* Correspondence' macarioster@gmail com; lucianob@ufl edu
1Department of Molecular Genetics and Microbiology, University of Florida,
College of Medicine, 1660 SW Archer Road, Gainesville, FL 32610, USA
University of Maryland, Columbus Center, 701 East Pratt Street, Baltimore,
MD 21202, USA


bacteria (Group I) or archaea (Group II). In eukaryotes,
Group I chaperonins are mostly expressed in mitochon-
dria and chloroplasts, and Group II chaperonins are
found in the eukaryotic cytosol [1,6-10]. Chaperonin
proteins form typical multi-subunit double-ringed struc-
tures collectively called "chaperonins" [9-13]. The Group
I chaperonins are typically formed by the products of a
single gene (groEL in bacteria; hsp60/cpn60 in mito-
chondria) assembled into a 14-subunit double-ringed
structure in bacteria and into a double or single-ringed
structure in mitochondria [14]. Eukaryotic Group II


0 2010 Mukherjee et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
BiolVIled Central Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.






Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


chaperonin proteins assemble in a similar double-ringed
oligomeric structure, called TRiC or CCT complex [15],
composed of 16 subunits that in human are encoded by
nine distinct genes (tcpl/cct1, cct2-5, cct6A-B, cct7-8)
[8-10]. The CCT complex is mostly known for its role
in folding the cytoskeleton proteins actin and tubulin
[7,16] and mutations in individual CCT subunits lead to
defects in the functioning of the cytoskeleton and mito-
sis arrest [17].
As for other chaperones, the malfunctioning of cha-
peronin proteins has been associated with various
human pathological conditions, the chaperonopathies
[18-20]. In this respect, besides the canonical cct and
cpn60 genes described above, three divergent hsp60-like
genes have been more recently identified [21-23] in
association with pathological conditions. One gene,
MKKS [21], was named for its association with the
developmental disease McKusick-Kaufman Syndrome
and was soon after also identified as BBS6 [24] for its
association with the Bardet-Biedl Syndrome (BBS),
another developmental condition involving cilium-
related dysfunction [25]. More recently two other
hsp60-like BBS genes, named BBS10 [22] and BBS12
[23], have been identified among fourteen genes (BBS1
to BBS14) so far associated with BBS. The protein pro-
ducts of MKKS/BBS6, BBS10 and BBS12 localize to the
basal body of cilia and to the centrosome [26-28]. We
will hereafter refer to the MKKS/BBS6 gene as MKKS,
and collectively to the three hsp60-like BBS genes as the
"BBS genes". The identification of these genes provides
new perspectives on the spectrum of functionalities of
Hsp60-like proteins in eukaryotes and on their role in
development.
The recognition of chaperonopathies has increased the
importance of elucidating the entire set of chaperone
genes present in the human genome [19]. The work
reported here was conceived to: a) identify all Hsp60-
like sequences encoded in the human and other gen-
omes including all diverged chaperonin genes; b) recon-
struct the evolutionary origins and relations of diverged
chaperonin genes; c) distinguish with bioinformatics
methods functional genes from pseudogenes; d) charac-
terize structural properties of the corresponding pro-
teins. We mostly devoted our attention to the
characterization of the evolutionary history and struc-
tural properties of newly or recently identified
sequences, referring the reader to the vast amount of
published literature for information on functional/struc-
tural properties and the evolutionary history of mito-
chondrial Cpn60 or CCT-complex proteins.
Exhaustive searches of hsp60-like sequences were car-
ried out in human and other genomes following and
extending our "chaperonomics" methodological protocol
[29]. The extensive analysis of the genomes of human


and other vertebrate species lead to the identification
and characterization of many previously unknown
sequences and to the discovery of a new, mammal-spe-
cific class of chaperonin proteins. Classification, evolu-
tionary analysis and structural characterization of
diverged chaperonin-like sequences should provide valu-
able information for future studies on the functional
roles of these proteins.

Results
Chaperonin sequences in the human genome
To identify all human hsp60-like sequences we queried
the human genome using the nine human CCT subunit
and mitochondrial Cpn60 sequences. Analogous exten-
sive searches were performed in the mouse and rat gen-
omes using corresponding queries. In the human
genome, we found a total of 54 sequences with signifi-
cant similarity to Hsp60 proteins (Tables 1 and 2). Fif-
teen sequences had a NCBI Entrez [30] gene descriptor
assigned. Nine of these corresponded to the canonical
CCT-subunit sequences and one, HSPD1, encoded the
mitochondrial Cpn60 protein. Three sequences corre-
sponded to the BBS genes MKKS, BBS10 and BBS12.
We recovered two additional uncharacterized sequences
designated in the NCBI Entrez Gene database as
CCT8L1 and CCT8L2. Besides these complete Hsp60-
like sequences, a sequence domain conserved across
eukaryote species with highest similarity to the apical
domain of the CCT3 protein has also been reported in
PIKFYVE [31], a kinase belonging to the Fablp protein
family involved in corneal pathological conditions [32].
In addition, we identified 39 other human hsp60
sequences that did not correspond to a gene descriptor
in the NCBI Entrez Gene database (Table 2). All of
these sequences contained in-frame stop codons or
frame-shifts, suggesting that they were most likely pseu-
dogenes. Thirty-five of these had not been described in
the Pseudogene.org pseudogene database [33] and 33
were not listed in the Ensembl database [34], and are
here annotated and classified for the first time. In analo-
gous searches of the complete genomes of mouse and
rat, we identified in each genome 14 chaperonin genes
(nine for the canonical CCT monomers, one for the
mitochondrial Cpn60, three BBS genes and one CCT8L
gene), 38 pseudogenes in mouse and 61 pseudogenes in
rat (see additional file 1: Table S1, for mouse sequences,
and additional file 2: Table S2, for rat sequences).

Evolutionary origins of human BBS and CCT8L genes
A maximum-likelihood (ML) phylogenetic tree of
human chaperonin-like proteins (Figure la) indicated
that Hsp60-like BBS proteins are monophyletic (boot-
strap support 86%) and that their common ancestor
derived from a duplication event in the CCT8 lineage


Page 2 of 19







Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


Table 1 TI
Name1
CCT110
CCT2
CCT3
CCT4
CCT5
CCT6A


he human hsp60 genes
Alternative names
TCP1, CCTa, CCTa TCP-1-a
CCT P, TCP-1-P
CCT Y, TCP-1 -y
CCT 6, TCPD, TCP-1-
CCT E, TCPI E, TCP-1-E
CCT , CCT G-1, TCP-1 -, CCT6, Cctz, HTR3,
TCP20, TCPZ, TTCP20
CCT t-2, TCP- 1--2, Cctz2, TSA303, Tcp20
CCT T1, TCP-1-r, Ccth, NIP7-1
CCT 0, TCP-1-0, Cctq
LOC 55100
GROL, CESK1
BBS6
C 2orf58, FLJ23560
C4orf24, FU35630, FLJ41559
GROEL, HSP60, SPG13, CPN60, HuCHA60
CFD, FAB1, PIP5K, PIP5K3


Start2
160,119,520
68,266,317
154,545,617
61,950,076
10,303,453
56,087,036

30,279,183
73,320,279
29,350,670
151,773,495
15,451,770
10,333,898
75,263,727
123,882,498
198,060,018
209,182,591


6 q25.3
+ 12 q15
1 q23.1
2 p15
+ 5 p 5.2
+ 7 pI 1.2


End3
160,130,731
68,280,052
154,572,307
61,969,146
10,317,892
56,098,269

30,312,525
73,333,494
29,367,782
151,775,165
15,453,440
10,342,162
75,266,269
123,884,627
198,071,817
209,190,094


q12
p 3.2
q21.3
q36.1
qll .1
pI 2.2
q21.2
q27
q33.1
q34


'Official NCBI Entrez gene database name; 2Start and 3End of coding region; 4Strand "+" indicates sequenced strand. "-" indicates complementary strand;
sChromosome; 6Chromosome location; 7Number of isoforms; 8Number of exons. Multiple numbers indicate the number of exons in each isoform; 9Total amino
acids; "1The official Entrez name is TCP1. CCT1 improves consistency with other subunit gene names. 11Fabl_TCP sequence domain of PIKFYVE kinase, most
similar to the apical domain of CCT3. Features refer to the domain portion of the gene/protein.


(bootstrap support 88%). The tree also showed that the
unique ancestor of the two closely related genes
CCT8L1 and CCT8L2 also originated in the CCT8 line-
age from a more recent duplication event (bootstrap
support 75%). The relation of BBS and CCT8L proteins
with the CCT8 chaperonin subunit was confirmed with
strong conditional probability support (0.99) by Bayesian
tree construction (Figure ib).
Although the association of BBS and CCT8L proteins
with the CCT lineage was robustly supported, the high
divergence of these sequences could produce clustering
in the trees due to long-branch attraction. To address
this concern, we built independent ML trees for each
BBS or CCT8L sequence adding them separately to the
tree of CCT subunits. All individual trees confirmed
with strong bootstrap support the association of each
BBS or CCT8L lineage with the CCT8 lineage (see addi-
tional file 3: Figure S1, additional file 4: Figure S2, addi-
tional file 5: Figure S3 and additional file 6: Figure S4).
A ML evolutionary tree including hsp60-gene homologs
found in the genomes of eighteen other vertebrate spe-
cies, including representatives of several mammals,
chicken, frogs, and fish, also confirmed the origin of
BBS and CCT8L genes from the CCT8 lineage (see
additional file 7: Figure S5).
We did not find CCT8L genes in the genomes of
chicken, Xenopus laevis, or Danio rerio, representatives
respectively of the reptile/bird, amphibian and fish
lineages. However, among mammals we identified


orthologs of CCT8L genes in genomes not only of placen-
tal mammals (Eutheria), but also of the marsupial opos-
sum (Metatheria) and of the egg-laying platypus
(Prototheria), suggesting that the CCT8L gene class origi-
nated at the onset of mammal evolution. All CCT8L gene
orthologs were intron-less, indicating that their ancestor
originated from a retro-transposition event. Two copies of
CCT8L sequences were found in human and chimp and
one CCT8L gene in all other genomes examined, includ-
ing those from the other primate rhesus monkey (Macaca
mulatta) and gray mouse lemur (Microcebus murinus)
(Figure 2), suggesting that a duplication of the CCT8L
gene occurred in Hominoidea after their separation from
old world monkeys. However, the lone gene copy of
CCT8L identified in rhesus monkey clustered with
CCT8L1 in evolutionary trees (Figure 2), suggesting an
earlier duplication of the gene and successive loss of the
CCT8L2 copy from the genome of rhesus monkey. Close
inspection of protein alignments revealed that the rhesus
monkey CCT8L sequence included an anomalously
diverged segment of about 50 amino acids of uncertain
alignment. Excluding this segment from the analysis we
obtained a different and more robustly supported tree
topology (75% vs. 20% bootstrap value, see additional file
8: Figure S6, panels a and b), consistent with a later dupli-
cation of the CCT8L gene in Hominoidea. The tree also
indicated that the removed segment was alone responsible
for the overall higher evolutionary rate predicted for this
sequence (see additional file 8: Figure S6).


Page 3 of 19


Str4 Chr5 Loc6 IF7 Exons8 aa9


+ 2
21
+ 7
22
20
12
+ 4
2
+ 2


CCT6B
CCT7
CCT8
CCT8L1
CCT8L2
MKKS
BBS10
BBS12
HSPD1
(PIKFYVE)1


12,7
14
13, 13, 12
13
11
14, 13

14
12,7
15
1
1
4,4
2


556, 401
535
545,544, 507
539
541
531,486

530
543,339
548
557
557
570,570
723
710
573,573
224








Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


Table 2 The human hsp60 pseudogenes


Str4 Chr5 Loc6 Ex7 P/D8 Ka/Ks9 LRT10 FS11 SC12 aa13


Name'
CCT-1 P
CCT -2P
CCT -3P
CCT3-1 P
CCT4-1 P
CCT4-2 P
CCT5-1 P'
CCT5-2P1
CCT5-3P
CCT6-1 P
CCT6-2 P
CCT6-3 P1
CCT6-4P
CCT6-;
CCT7-1 P1
CCT7-2 P
CCT8-1 Pl
HSPDI-lf
HSPDI-2F
HSPDI-3F
HSPDI-4F
HSPD1-5F
HSPD1-6F
HSPDI-7F
HSPD1-8F
HSPD1-9F
HSPDI-I(
HSPD1-1
HSPDI-1I
HSPDI-1;
HSPDI-1l
HSPDI-1;
HSPDI-lf(
HSPDI-1
HSPDI-I~
HSPDI-1_
HSPD1-2(
HSPDI-2'
HSPDI-2;


Start2
19,986,638
41,621,756
42,801,030
16,177,578
64,177,578
140,344,301
78,382,086
78,382,866
114,876,388
14,692,965
109,013,584
64,162,812
191,915,332
64,853,564
92,251,627
150,242,815
145,141,482
135,744,902
21,919,402
43,602,029
88,065,673
55,191,053
36,783,612
7,263,938
145,986,418
7,785,932
8,058,884
95,130,459
78,321,372
153,068,626
37,465,288
19,269,394
105,082,802
34,077,070
56,105,684
50,318,868
78,924,341
60,994,430
29,181,851


2 2 159
0 3 512
2 10 278
3 4 549
1 34
6 3 201
2 2 330
0 2 178
5 4 289
9 4 292
12 6 399
3 5 145


19,987,216
41,623,646
42,802,033
16,178,178
64,409,590
140,345,787
78,382,680
78,382,967
114,877,290
14,693,954
109,014,117
64,171,325
191,916,879
64,865,440
92,307,366
150,243,240
145,143,137
135,745,039
21,920,175
43,602,280
88,066,269
55,192,769
36,785,195
7,265,475
145,987,946
7,787,502
8,082,857
95,132,169
78,323,341
153,068,943
37,466,827
19,270,353
105,083,755
34,078,293
56,108,736
50,319,008
78,924,478
60,994,876
29,183,334


+ 12 pI2.2
5 p13.1
+ 7 p14.1
+ 8 p22
+ X q12
7 q34
+ 13 q31.1
13 q31.1
+ 5 q22.3
5 p 5.2
S 11 q22.3
+ 7 qll.21
+ 3 q28
+ 7 qll.21
5 q15
+ 6 q25.1
1 q21.1
5 q31.1I
5 p14.3
20 q13.12
+ 6 q15
+ 12 q13.2
3 P22.3
+ 8 p23.1
+ 4 q31.21
8 p23.1
+ 12 p13.31
+ 5 q15
+ 13 q31.1
+ 6 q25.2
13 q13.3
+ 5 p14.3
+ 11 q22.3
+ 1 p35.1
+ 20 ql3.32
+ 10 qll.23
12 q21.31
5 q12.1
21 q21.3


1 P
2? D
3 D
1 P
3 D
4 D
1 P
1 ?
1 P
1 P
1 P
8 D
1 P
10 D
1 P
1 P
1 P
1 P
1 P
4 D
1 P
1 P
1 P
1 P
1 P
1 P
2? D
5 D
1 P
1 P
4 D
4 D
2? D
3 D
5 D
1 ?
1 ?
6 D
2? D


'Pseudogene names follow the HUGO nomenclature. They are composed of the name of the parental gene followed by a unique number identifier and the
suffix "P" (Pseudogene); 2Start and 3 End positions of the pseudogene on the chromosome; 4Strand; sChromosome; 6Location on the chromosome; 7Number of
exons. A question mark indicates gene fragments with uncertain numbers of exons; 8Processed (P), duplicated (D) or undetermined (?); 9Ratio of non-
synonymous vs. synonymous substitution rates; "1Likelihood Ratio Test (LRT) values. Values different from 1.0 with probability p < 0.01 (**) or p < 0.05 (*) are
shown in bold-face; "Number of Frame-Shifts recognized in the coding region of the pseudogene; 12Number of in-frame Stop Codons recognized in the coding
region of the pseudogene; 13Length in amino acids of pseudo-translation of the recognized pseudogene sequence; 14Ten pseudogenes previously reported in
the Ensembl (roman), Pseudogene.org (italics) or NCBI (bold) databases: CCT1-3P = OTTHUMG00000033751; CCT5-1 P = Human.chrl3.mb78; CCT6-5P =
ENSP00000275603, Human.chr7.mb64; CCT7-1P = ENST00000399032; CCT8-1P = Human.chrl.mb 145; HSPD1-1P = ENSG00000162241, Human.chr5.mb 135; HSPD1-
2P = ENSP00000328369; HSPD1-5P = LOC644745; HSPD1-6P = LOC645548; HSPD1-14P = OTTHUMG00000016753; 15'16'18Tandemly duplicated;17Previously
identified as Hsp60s2 (Hsp60 short form 2).


Page 4 of 19


1.08
3.08
2.46
0.18
2.32
0.08
0.94
2.44
4.98*
2.84
3.3
1.24
6.24*


0 1
4 5
2 1
2 2
5 4
4 3
5 3
1 1
6 6
5 4
1 2
6 3


0.74 1.48 2
0.48 10.84** 3
2.42 0.72 0
0.40 3.08 0
0
0.69 1.4 3






Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


BBS



CCT8L


Page 5 of 19


BBS



CCT8L


0.9







0.81


1-0 --Hs_BBS12
86
SlHs_MKKS

88 IIHs_BBS10
100 HsCCT8L2
100 [ Hs_CCT8L1
75 Hs_CCT8


a b
Figure 1 Evolutionary trees of CCT proteins. (a) Maximum-likelihood evolutionary tree of all human chaperonin-like proteins, including CCT
monomers, MKKS, BBS10, BBS12 and the two members, CCT8L1 and CCT8L2, of the newly defined CCT8L class. Numbers associated with each
branch indicate bootstrap support from 100 replicates. Tree rooted by the archaeal thermosome alpha subunit of Sulfolobus solfataricus


(a). The numbers assigned to each brar


ma acidophilum (Ta_ThsA). The scale bars represent the indicated


(Ss_ThsA). (b) Bayesian evolutionary tree of the same sequences shown in
probabilities. Tree rooted by the thermosome alpha subunit of Thermopla
number of substitutions per position for a unit branch length.


Differentiation rate of BBS and CCT8L proteins
The branch lengths of the trees shown in Figure 1 indi-
cate that BBS and CCT8L proteins have differentiated at
much higher rates than CCT subunits. We applied a
newly-developed, unbiased measure of differentiation
called "B-index" (see Methods) to calculate differentia-
tion of MKKS, BBS10 and BBS12 proteins from their
respective last ancestor common to Actinopterygii (ray-
finned fishes) and Sarcopterygii (including tetrapods),
determined by rooting the trees with CCT8 proteins
from corresponding fish and tetrapod species. Similarly,
we calculated differentiation of CCT8L proteins from a
eutherial ancestor rooting their tree with corresponding
sets of CCT8 proteins (see footnotes of Table 3 and
legend for Figure 2 for species represented in each tree).
We estimated for the MKKS family an average evolu-
tionary distance from their root of almost 0.7 substitu-
tions per site, corresponding to a 6-fold higher rate of
differentiation compared to the number of substitutions
estimated in CCT8 proteins over the same period of
time. For BBS10 and BBS12, we calculated a distance of


about 1.0-1.2 substitutions per site, corresponding to a
substitution rate about 8-10 times higher than in CCT8.
Finally, for the mammal-specific family of CCT8L pro-
teins, we estimated an evolutionary distance from their
mammal root of about 0.3 substitutions per site. The
smaller divergence of CCT8L proteins compared to BBS
proteins reflects the more recent origin of the CCT8L
gene. However, when scaled to the evolution of CCT8
sequences over the same periods of time, the substitu-
tion rate of CCT8L proteins was about 14-15 times
higher than in CCT8 and 1.4-2.3 times higher than in
BBS proteins.

Functional constraints in the evolution of CCT8L genes
We tested functionality of CCT8L genes from several
species estimating ratios of non-synonymous and synon-
ymous substitution rates (Ka/Ks) along their respective
lineages (see Methods). The results of this analysis are
shown in Table 4, which indicates the gene(s) analyzed
(foreground), the two genes used to identify foreground
and background branches, the estimated Ka/Ks values


indicate posterior


0.57 Hs_MKKS
0.98 IHs_BBS12
Hs_BBS10
0.99
HsCCT8L1

00 lHsCCT8L2
0- --- Hs_CCT8
8
Hs_CCT7

Hs_CCT1
0.9 -- HsCCT2

Hs_CCT5

HsCCT4
--Hs_CCT3

rHs_CCT6B 0.2
1.00 LHs_CCT6A
-Ta ThsA







Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


Table 3 Divergence of BBS and CCT8L proteins relative to
CCT8 proteins


No. species2
Size ,.


. .- B-index (Be)4
Unbiased pair-wise distance (Be x 2)
125233 Le (Be x We)s
Average D, (De)


CCT87
Size (We)
B-index (Be)
Unbiased pair-wise distance (Be x 2)
Lc (Be x We)
Average D,, (Dc)


L La_CCT8L
Figure 2 Evolutionary tree of CCT8L sequences. ML tree of
CCT8L sequences from various mammal genomes. The homolog of
human CCT8L1 in chimp (Ptr) is characterized as pseudogene and is
shown in bold-italics font. Species abbreviations: Bt, Bos taurus
(cow); Cf, Canis lupus familiaris (dog); Dn, Dasypus novemcinctus
(nine-banded armadillo); Dr, Danio rerio (zebrafish); Ec, Equus
caballus (horse); Ga, Gasterosteus aculeatus (stickleback, fish); Gg,
Gallus gallus domesticus (chicken); Hs, Homo sapiens (human); La,
Loxodonta africana (african bush elephant); Md, Monodelphis
domestic (south american gray short-tailed opossum, marsupial);
Mm, Mus musculus (mouse); Mmu, Macaca mulatto (rhesus monkey);
Mmur, Microcebus murinus (gray mouse lemur); Oa, Ornithorhynchus
anatinus (platypus); 0I, Oryzias latipes (the medaka or japanese
killifish); Pp, Pongo pygmaeus (northwest bornean orangutan); Ptr,
Pan troglodytes (chimpanzee); Rn, Rattus norvegicus (rat); Tn,
Tetraodon nigroviridis (spotted green pufferfish); Tr, Takifugu rubripes
(Japanese pufferfish); XI, Xenopus laevis (african clawed frog,
amphibian); Xt, Xenopus tropicalis (western clawed frog, amphibian).
The scale bar represents the indicated number of substitutions per
position for a unit branch length.


and their significance. The evolutionary lineages for
which Ka/Ks values were evaluated correspond to the
branch numbers identified in the overall tree topology
shown in Figure 3. In this tree are represented the
"molecular tree" of mammal phylogenetic relations [35],
the gene duplication event involving the CCT8L gene


6.0873 8.0692 10.3669 14.0793
LB/Lc 6.3725 8.1579 14.0072 15.3286
DB/Dc 6.1730 8.0445 12.1292 14.7101
WB/WC 1.0465 1.0114 1.3506 1.0887
1Only the human CCT8L2 branch was included in the tree. The CCT8L1 branch
had equivalent length; Chaperonin-BBS sequences used in the trees were
from the following species (see the legend for Figure 2 for a complete list of
abbreviations and species names). MKKS: Bt, Cf, Dr, Ec, Ga, Gg, Hs, Md, Mm,
Mmu, 01, Rn, Tr, Xt; BBS10: Bt, Cf, Dr, Ec, Ga, Hs, Md, Mm, 01, Rn, Tr; BBS12: Bt,
Cf, Dr, Ec, Ga, Gg, Hs, Mm, 01, Rn, Tr, Xl; CCT8L: Bt, Cf, Hs, Mm, Rn. 3Size is the
average number of sequences contained in a cluster over evolutionary time
(see Methods); 4The B-index measures the average substitutions per site
(evolutionary distance) of the sequences within a cluster from their common
ancestor; SL is the length of the tree (sum of the lengths of all branches);
6Average D|u is the average pair-wise evolutionary distance of the sequences;
7Estimates for CCT8 were computed over corresponding species represented
by the sets of MKKS, BBS10, BBS12 or CCT8L proteins (see footnote 2, above).


family in primates as inferred by this analysis, and the pre-
mammal separations of the CCT7, CCT8 and CCT8L
families of paralogs. This topology is in agreement with
the evolutionary tree of CCT8L genes (Figure 2) with the
only exception of the weakly supported position of the
CCT8L sequence from rhesus monkey (see above). The
highly significant constraints in non-synonymous substitu-
tion rates (Ka/Ks < 1.0) estimated in the overall evolution
of the CCT8L family (Table 4, foreground genes: "All
CCT8Ll/2") indicated that the CCT8L sequences are
genes generally expressing functional proteins. In evaluat-
ing Ka/Ks ratios for individual CCT8L gene lineages
(Table 4), significantly constrained evolution (Ka/Ks < 1.0)
was detected for branches leading to most sequences,
including those of murids, lemur, cow, dog, elephant, mar-
supial, and to the human CCT8L1 and CCT8L2 group
along the hominoid lineage. Constrained evolution was
also estimated for the CCT8L genes of armadillo and rhe-
sus monkey, and for human CCT8L1 and human and
chimp CCT8L2 after divergence of human and chimp,
although in these cases Ka/Ks values did not reach signifi-
cance. In the cases of the human and chimp CCT8L1 and


Page 6 of 19


MKKS
14
5.7770
0.6976
1.3952
4.0300
2.0951



5.5202
0.1146
0.2291
0.6324
0.3394


BBS10
11
4.6020
1.1079


2.2159
5.0987
2.9660



4.5503
0.1373
0.2747
0.6250
0.3687


BBS12
11
5.8949
1.0284
2.0568
6.0623
3.2858



4.3647
0.0992
0.1983
0.4328
0.2709


CCT8L'
5
3.3859
0.3196
0.6393
1.0822
0.8017



3.1100
0.0227
0.0454
0.0706
0.0545


-
MmGm443


I1|IIm







Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


Table 4 Ka/Ks substitution ratios in CCT8L genes evolution


Foreground genes1
All CCT8L1/2

Human CCT8L1
Chimp CCT8L1
Human CCT8L2
Chimp CCT8L2
Human CCT8L2

Human CCT8L1

Mouse and Rat CCT8L


Background genes1
Human CCT8, Human CCT7

Chimp CCT8L1, Human CCT8L2
Human CCT8L1, Human CCT8L2
Chimp CCT8L2, Human CCT8L1
Human CCT8L1, Human CCT8L2
Human CCT8L1, Rhesus CCT8L

Human CCT8L2, Rhesus CCT8L

Cow CCT8L, Human CCT8L2


Foreground Ka/Ks2


LRT (p)3
205.06
(<0.001)
0.88
0.00
1.2
1.8
4.02
(<0.05)
5.72
(<0.05)
31.14
(<0.001)


Foreground branches4
1 to 25

1
2
4
5
4+6

1+3

12+13+14


Mouse CCT8L Rat CCT8L, Human CCT8L2 0.64 1.21 12
Rat CCT8L Mouse CCT8L, Human CCT8L2 0.49 5.91 13
(<0.05)
Rhesus CCT8L Lemur CCT8L, Human CCT8L2 0.73 (0.55)' 1.91 (1.22)' 8
Lemur CCT8L Human CCT8L2, Mouse CCT8L 0.29 36.82 10
(<0.001)
Dog CCT8L Cow CCT8L, Human CCT8L2 0.31 12.07 16
(<0.001)
Cow CCT8L Dog CCT8L, Human CCT8L2 0.13 113.78 17
(<0.001)
Armadillo CCT8L Elephant CCT8L, Human CCT8L2 0.36 0.57 20
Elephant CCT8L Marsupial CCT8L, Human CCT8L2 0.29 14.96 21
(<0.001)
Marsupial CCT8L Elephant CCT8L, Human CCT8L2 0.31 62.63 23+24
(<0.001)
'See text for the definition and meaning of Foreground and Background species; Ka/Ks is the estimated ratio of non-synonymous and synonymous substitution
rates; 3LRT, Likelihood Ratio Test results for estimated Ka/Ks vs. Ka/Ks = 1.0 (see Methods). Probabilities (p) not shown signify p > 0.05; 4Foreground-branch
numbers correspond to the numbering in the schematic tree shown in Figure 3. SValues in parenthesis were obtained after removing an unusually diverged
region from rhesus CCT8L (see text).



3 1 CCT8L1 Human
7 CCT8L1 Chimp

9 6 CCT8L2 Human
11r1 CCCCT8L2 Chimp
CCT8L Rhesus

SCCT8L Bushbaby
19 14 CCT8L Mouse
-13 CCT8L Rat

2r 18 16 CCT8L Dog
24- 1i- 17 CCT8L Cow

25 220 CCT8L Armadillo
"21 CCT8L Elephant
I23 CCT8L Marsupial

CCT8 Human
CCT7 Human
Figure 3 Evolutionary relations of CCT8L genes. Schematic representation of evolutionary relations of CCT8L genes from different eukaryotic
species rooted by CCT8 and CCT7 sequences. The numbers associated with each branch identify the branches for which branch-specific Ka/Ks
values are evaluated (Table 4).


Page 7 of 19






Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


CCT8L2 genes, the lack of significance can be related to
the loss of power of the test since few mutations accumu-
lated after separation of these sequences (see additional
file 9: Table S3). In the case of rhesus monkey CCT8L, we
found that its relatively high estimate of Ka/Ks (= 0.73)
was due to the previously mentioned 50-amino-acid
diverged region within this sequence. After removing this
region we estimated Ka/Ks = 0.55. Only for the lineage of
chimp CCT8L1 we estimated Ka/Ks 1, consistent with
differentiation of a non-functional sequence. Since this
sequence was also characterized by an internal stop codon
and a frame-shift, all evidence strongly suggests that
chimp CCT8L1 is a pseudogene.
To assess the functionality of human CCT8L sequences
we investigated their expression profiles in comparison to
those of human CCT monomers and BBS genes (see addi-
tional file 10: Table S4). Expression of CCT8L2 was con-
firmed by fifteen ESTs mostly identified from the testis,
whereas only one EST identified as a CCT8L1 transcript
has been so far reported (NCBI UniGene database,
November 20, 2009). Querying the NCBI GEO microarray
database, we found 542 expression-profile records identify-
ing expression of CCT8L2, and none identifying expres-
sion of CCT8L1 (as of November 20, 2009). It must be
noted, however, that CCT8L2 and CCT8L1 have similarity
of 97.3% at the DNA level. Similarly to CCT8L2, another
mammal-specific chaperonin gene, CCT6B, is also
expressed almost exclusively in the testis, from which 160
ESTs have been reported versus an average of 4.4 ESTs
(from 0 to 10 per tissue) found in all other tissues.

Pseudogenes
We identified in the human genome 39 sequences with
significant similarity to CCT or HSPD1 genes that either
were short fragments or were characterized by in-frame
stop codons or frame-shifts. Based on their corruption,
we classified these sequences as pseudogenes (Table 2).
Similarly, searching the mouse and rat genomes we
identified 38 and 61 pseudogenes, respectively (see addi-
tional file 1: Table S1 and additional file 2: Table S2).
Most of these sequences have not been previously
reported and are here systematically annotated and clas-
sified for the first time.
Based on phylogenetic-tree reconstructions (see addi-
tional file 11: Figure S7) or on similarity for the most cor-
rupted sequences, we identified the association of 17
pseudogenes from human, 16 from mouse and 29 from
rat with one of the nine CCT genes. None of the pseudo-
genes were related to MKKS, BBS10, BBS12 or CCT8L.
To estimate the time of origin of the pseudogenes, we
constructed trees using their translated sequences and
chaperonin subunits from various vertebrate species (see
additional file 12: Figures S8, and additional file 13: Fig-
ure S9). The trees indicated that all recognizable human


CCT pseudogenes originated in the mammal lineage after
separation from the reptile/bird lineage.
Of particular interest were the evolutionary relations
of CCT6 genes and pseudogenes. Two CCT6 gene
copies (CCT6A and CCT6B) were found, besides pla-
cental mammals, also in platypus and in opossum (see
additional file 11: Figure S7), suggesting that the dupli-
cation of the CCT6 gene occurred in mammal evolution
before separation of Theria (marsupial and placental
mammals) and Prototheria monotremess). We con-
structed an evolutionary tree of mammal CCT6 genes
and pseudogenes (Figure 4) rooted by the corresponding
gene sequences from chicken and frog (the diverged
sequence Oa_con2651 from platypus was excluded from
this tree to avoid long-branch attraction). Surprisingly,
all recognizable human, mouse, and rat pseudogenes
belonging to the CCT6 class branched in the tree from
the CCT6A lineage after separation of the platypus,
marsupial and placental mammal lineages.
Twenty-two pseudogenes in human (Table 2), and 22
and 32 pseudogenes in mouse and rat, respectively (see
additional file 1: Table S1 and additional file 2: Table
S2), associated with the mitochondrial HSPD1 gene
(Group I cpn60). Evolutionary trees incorporating all
pseudogenes from different vertebrate species were
uninformative due to the presence among the pseudo-
genes of highly corrupted sequences, resulting in exten-
sive long-branch attraction (not shown). An ML tree
built using only translations of the most conserved pseu-
dogenes (Figure 5) showed weakly supported but consis-
tent association of the human pseudogenes with HSPD1
from primates, whereas pseudogenes from mouse and
rat all associated with murid Hspdl sequences, also
indicating their relatively recent origin.

Ka/Ks ratio in the evolution of putative pseudogene
sequences
Our characterization of many hsp60 sequences as pseu-
dogenes was based on the presence of signs of corrup-
tion in the sequence (in-frame stop codons and frame-
shifts). However, in-frame stop codons and frame-shifts
may correspond to truncated proteins that are still func-
tional. For example, although human HSPD1-5P and
HSPD1-6P sequences contain signs of sequence corrup-
tion, EST data indicate that these sequences are
expressed and possibly functional (see additional file 14:
Table S5). To confirm our characterization, we esti-
mated Ka/Ks ratios in trees that identified the pseudo-
gene-sequence lineage (branch) including as out-group
its parental gene and the orthologous gene sequence
from chicken (see Methods). The results of these ana-
lyses (Table 2) showed in most cases Ka/Ks values not
significantly different from 1.0, as expected in the differ-
entiation of pseudogene sequences not constrained by


Page 8 of 19






Mukherjee et al. BMC Evolutionary Biology 2010, 10:64 Page 9 of 19
http://www.biomedcentral.com/1471-2148/10/64




64- Mm_Cct6A-3P
77- Rn_ Cct6-A 16
0 L Hs CCT6-2P
9 HsCCT6-1P
55 Hs_ CCT6-5P
SHs CCT6-3P
Mmu_L03326
Ptr_CCT6A
Hs CCT6A
9Ec L60692
MmCct6a CCT6A
63 -- Rn_Cct6A-2P
65 Rn_Cct6A-9P
Rn_Cct6A-13P
87 Rn_Cct6A-8P
Rn Cct6A-5P
RnCct6A-1P
Rn_Cct6a
Rn Cct6A-4P
92 Hs CCT6-4P
Rn Cct6A-15P
95 Rn Cct6A-14P
9toar Rn_Cct6A-10P
6 65 Rn_Cct6A-6P
34 RnCct6A-7P
--- RnCct6A-3P
L 5-1 RnCct6A-1P Rn_Cct6A-12P
39 I Rn_Cct6A-11P
BtCCT6A
9 MdL-10270
-- Oa_L88281
I MdL18183
100 Rn_L63658
82 Mm_Cct6b
10I EEc L58222
C- L91146 CCT6B
82 f1 BtM27900
6 991-- MMu_CCT6B
Hs CCT6B
100 PtrE04993
Gg_CCT6A
Xt-M69492
00 XIM81949
Figure 4 Evolutionary tree of vertebrate CCT6 proteins. ML tree of CCT6 proteins from mammals, chicken, and frog (in roman font) and
translated sequences of the related pseudogenes from human, mouse, and rat (in bold-italics font). Only one copy of CCT6 was found in
chicken and frog. Two copies, CCT6A and CCT6B, were found in all mammals examined, including marsupial (Md) and platypus (Oa). The CCT6
sequences from chicken (Gg) and from the two amphibians Xenopus laevis (XI) and Xenopus tropicalis (Xt) were used to root the tree. All human,
mouse, and rat pseudogenes clustered with the CCT6A sequences. Numbers next to branches indicate percent bootstrap values. Only bootstrap
values > 30% are shown. For all species abbreviations see the legend for Figure 2. The scale bar represents the indicated number of
substitutions per position for a unit branch length.


coding of functional amino acids. Significant differences the typical fold of chaperonin subunits and their ability
in mutation rate were estimated in the case of four to assemble into typical oligomeric chaperonin com-
sequences. These sequences, however, contained multi- plexes. Chaperonin monomers are characterized by
ple in-frame stop codons and frame-shifts (Table 2). three structural domains (apical, intermediate and equa-
torial) with distinct functional roles and it was relevant
Structural features of BBS and CCT8L proteins to investigate whether BBS and CCT8L proteins con-
Because of their high sequence divergence, it is unclear serve each of the domains typical of chaperonins.
whether BBS and CCT8L Hsp60-like proteins conserve Experimental models of eukaryotic Group II






Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


30 Hs_HSPD1-11P
Hs_ HSPD1-22P
9 100 Hs HSPD1-7P
28 HS_ HSPD1-9P
--Hs_HSPD1-8P
.3 8 Hs_HSPD1-14P
Hs HSPD1-12P .
-- Hs_HSPD1-6P Primates
2 Pp_HSPD1


- _Hs_HSPD1-10P
42
-- Hs_HSPD1-5P
Ptr_HSPD1
55
Hs_HDPD1
.mu_HSPD1
Mm_Hspdl
100
Rn_Hspdl
Rn_HSPD1-1P
120 Rn_HSPD1-4P

17 Rn_HSPD1-5P
Rn_HSPD1-6P
33--- Mm_HSPD1-2P
Rn_HSPD1-3P
Rn_HSPD1-2P
-Mm HSPD1-1P


Rodents


YBt_L11913
100
Ec_L55147
Cf_L78854
-Md_L22262
-Gg_HSPD1
-_XI_Hspdl


Page 10 of 19


S 0.1 ,


Figure 5 Evolutionary tree of vertebrate mitochondrial Cpn60. ML tree of mitochondrial Cpn60 proteins from mammals, chicken, and frog
(in roman font) and translated sequences of the related pseudogenes from human, mouse, and rat (in bold-italics font). Highly degraded
pseudogenes for which only fragments could be detected were not considered. Human pseudogenes clustered with primate Cpn60 sequences
whereas mouse and rat pseudogenes clustered with rodent counterparts, indicating independent evolution of these pseudogenes in these
species. For all species abbreviations see legend for Figure 2. The scale bar represents the indicated number of substitutions per position for a
unit branch length.


chaperonins are not available but their structural prop-
erties can be inferred by comparison with their closest
relative, the archaeal thermosome. To infer tertiary-
structure conservation in BBS and CCT8L proteins we
predicted the secondary structure for each family from
alignments of multiple sequences, excluding structure
and sequence information from other families. The
results of these predictions are schematically represented
in Figure 6a, in relation to the secondary structure


description of the PDB structure la6d chain A of the
thermosome subunit ThsA from Thermoplasma acido-
philum [36] (see additional file 15: Figure S10, additional
file 16: Figure S11, additional file 17: Figure S12, addi-
tional file 18: Figure S13, additional file 19: Figure S14,
and additional file 20: Figure S15 for detailed represen-
tations of multiple alignments, secondary structure pre-
dictions and alignments to the secondary-structure
elements of ThsA). In Figure 6a, the secondary structure


-=







Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


1 A
1 a6d
TaThsA
CCT M -I
CCT8L M e ll
MKKS -
BBS10 -
BBS12 -


INTERMEDIATE


N-TERMINAL EQUATORIAL DOMAIN
2 3 B C D E
- --


N-TERMINAL
4 F G
-M -


*a - m -
--- -m

o O m m
-- m O --om


APICAL DOMAIN


5 6 7 8
la6d

Ta_ThsA -
CCT -
CCT8L - -
MKKS - -
BBS10 -
BBS12 -


9 10 11 H
I I

m m I -Im
m -

-- m
m. -m m


12 I 13 J
-
S

-MM
a
-MM
n


C-TERMINAL EQUATORIAL DOMAIN


19 20 P 21 Q 22


1 a6d m
Ta ThsA - - --1
CCT Ii
CCT8L S O
MKKS O- -0 -0
BBS10 O- -I- --
BBS12 O -- -



11 11


15 r
1 >16




F 2 2
4 22 C 184
19 2N A1 22E


M
Figure 6 Secondary structure predictions of chaperonin proteins. (a) Secondary structure predictions of Thermoploasma acidophilum
thermosome alpha subunit ThsA (line TaThsA), human CCTs, mammal CCT8Ls and vertebrate BBSs (lines MKKS, BBS10 and BBS12) compared to
the secondary structure description of ThsA (top line la6d) determined from its crystal structure (PDB code la6d, chain A). Helices are
represented as red boxes, beta-strands as yellow boxes and loops as black lines. Secondary structure elements in la6d are labeled in succession
with numbers (strands) or letters helicess). The first 16 N-terminal residues of ThsA, predicted to contain a strand, are not included in the la6d
crystal structure (top line). Secondary structure elements in a pteins recognized as homologous to the thermosome chain elements by
sequence similarity and positional equivalence are vertically aligned. Blue es indicate the position of sequence insertions in CCT8L and BBS
sequences. (b) The three-dimensional fold of the secondary structure elements in the thermosome structure la6d chain A. Red cylinders
represent helices and yellow arrows represent strands. Labels (i.e., letters and numbers) correspond to those in panel "a". Elements not predicted
in some of the BBS and CCT8L sequences are labeled in gray. The positions of the ATP binding and hydrolysis sites are highlighted in green.


Page 11 of 19


C-TERMINAL INTERMEDIATE


14 15 16




- I-


17 K 18 L M N 0







Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


description of ThsA is shown (line "la6d") in relation to
the position of the equatorial, intermediate, and apical
domains. The position of these elements in the tertiary
structure of ThsA is represented in Figure 6b. Results of
a blind test of the performance of the method on the
corresponding ThsA sequence are also shown (Figure
6a, line "Ta_ThsA"). In this test most strand and helix
elements (all "core" helices) described in the crystal
structure were correctly predicted by the method,
increasing our confidence in the reliability of other pre-
dictions. As expected, extensive conservation of pre-
dicted secondary-structure elements were also obtained
from the alignment of human CCT sequences (Figure
6a, line "CCT") with only few discrepancies involving
mostly short beta strands (4, 5, 18, and 21) and one
short helix (P) exposed at the external surface of the
archaeal thermosome complex. Secondary-structure pre-
dictions for mammal CCT8L and for vertebrate MKKS,
BBS10 or BBS12 sequences were also largely consistent
with the secondary-structure description of thermosome
proteins. In the equatorial domain, CCT8L and BBS
structure predictions corresponded to the mostly alpha-
helical composition of this region. Variations were more
obvious in BBS12 and involved mostly terminal ele-
ments of helices (most notably helices P and Q) and
exposed beta-strands (strands 19-21). In the intermedi-
ate domain the core helical-bundle elements helicess F,
G, and K) as well as the extensive beta-sheet composi-
tion of this region were predicted in all BBS and CCT8L
proteins. Exceptions were, in all sequences, the two
short strands 5 and 6, which are part of an external
elongated loop in the thermosome structure, and, in
BBS12, the N-terminal part of helix K, which in the
thermosome protrudes towards the central cavity cover-
ing the ATP hydrolysis site (Figure 6b). The apical
domain is formed in the thermosome by a 4-strand
anti-parallel beta-sheet (strands 9, 10, 15, and 16) with
strand 10 extending into a second parallel beta-sheet
(strands 10, 12, 13, and 14). The two sheets are flanked
by a helix (J) and are surmounted by a structure com-
posed of two contacting helices (H and I) and an
extended loop including strand 11. All helices and most
strands of the apical domain were recognized in BBS
sequences. Most obvious differences were observed in
BBS12 proteins, where the long apical helix H was pre-
dicted to be shortened, and in CCT8L, where helix I
and strand 11 were not predicted.

Differentiation of monomer-monomer interaction regions
in BBS and CCT8L proteins
To investigate the potential of CCT8L and BBS proteins to
establish intra-ring and inter-ring monomer-monomer
contacts, we investigated the relative conservation of pre-
dicted contact positions in CCT, BBS and CCT8L


sequences. We identified potential contact positions in
these families based on homology to the positions involved
in inter-monomer contacts in the crystal structure of the
T. acidophilum thermosome complex (PDB code la6d).
After identifying all contact positions in CCT monomers,
we distinguished among them those that conserved similar
amino acid types across the nine monomers. We counted
how many amino acid types observed in all or in con-
served contact positions of CCT monomers were also
observed in the T acidophilum Thsa sequence, in human
CCT8Ls or in human BBS sequences (Table 5). A com-
plete list of all and conserved positions considered and of
the residue types observed in these positions in all
sequences can be found in additional file 21: Table S6.
Thsa and CCT subunits conserve 89% similarity in mono-
mer-monomer contact positions, which is substantially
higher than the average similarity (62%-66%) of all homo-
logous positions between the two families. The higher
similarity of monomer-monomer contact regions is con-
sistent with functional conservation between the two
families of these positions. In contrast, the high rate of dif-
ferentiation in comparison to global average differentiation
shown in putative monomer-monomer contact positions
in BBS or CCT8L sequences (Table 5), suggests a loss of
capability to associate into a typical CCT-like oligomeric
complex. This result is consistent with the presence in
BBS proteins of inserted elements (Figure 6) that would
interfere with formation of the complex [22,23].

Conservation of ATP-binding and hydrolysis residues in
BBS and CCT8L proteins
We compared conservation in CCT, BBS and CCT8L
sequences of the ATP-binding and ATP-hydrolysis
motifs typical of chaperonins of Group II (Figure 7).


Table 5 Conservation of monomer-monomer contact
residues relative to CCT subunits1
Protein MM CMM RR Global2
ThsA 78 (83.9) 16 (94.1) 13 (86.7) 62.0-66.4
BBS12 37 (39.8) 7 (41.2) 6 (40.0) 35.5-38.0
BBS10 45 (48.4) 8 (47.1) 7 (46.7) 34.3-35.6
MKKS 42 (45.2) 8 (47.1) 7 (46.7) 48.8-51.6
CCT8L 54 (58.1) 8 (47.1) 7 (46.7) 53.4-61.1
'Conservation of archaeal ThsA and human BBS and CCT8L sequences relative
to human CCT monomers. Sequence-positions are considered conserved if
they are occupied by residue-types appearing in the homologous position in
any of the human CCT sequences. Ninety-three intra-ring contact positions
and 15 inter-ring contact positions were identified from the thermosome
structure (lad6). Contact positions were defined by a distance of their side-
chain heavy atoms of at most 4.0A from any heavy atom of the nearby
monomer in the thermosome structure. For each protein family, the table
indicates the number and percentage (in parenthesis) of positions conserved
among: all 93 intra-ring contact positions (MM); seventeen intra-ring contact-
positions conserved among human CCT monomers (CMM); all 15 inter-ring
contact positions, none of which were conserved among CCT monomers (RR).
Global indicates the range of similarities (percent values) of each sequence
to human CCT-subunit proteins within all aligned positions.


Page 12 of 19






Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


ATP/ADP ATP
binding hydrolysis


CCT8 YGPNGM GDGTNF

All other OT U
coTS l GDGTMs


CCT8L Y G K GDUF


MKKS GPRLK sMI


BBS10 gPL GDGiRT


BBS12 Ib K T

RRB P PPP

Contacts: R = ribose; B = base; P = phosphates.
Figure 7 Profile logos of ATP-binding and ATP-hydrolysis sites
in chaperonin proteins. Sequence profiles of ATP/ADP-binding
and ATP-hydrolysis sites for CCTs, CCT8L and BBS (MMKS, BBS10 and
BBS12) proteins from the multiple sequence alignments of
sequences obtained from the species listed in the legend for Figure
2. Letters indicate the amino acid types observed at each position.
The height of each stack of symbols in each position is proportional
to the information content at that position and the height of each
letter within the stack is proportional to the frequency of the
corresponding residue at that position. Residues involved in direct
contacts with base, ribose or phosphate groups, as determined by
homology to the known thermosome structures, are indicated.

Although there is considerable variation among BBS and
CCT8L sequences at some of the ATP-binding posi-
tions, we observed complete conservation of the crucial
ATP-binding dipeptide Gly-Pro, suggesting that these
otherwise divergent proteins conserve ATP-binding abil-
ity. In the ATP-hydrolysis sites, substantial loss of con-
servation has been reported in MKKS [27] and in BBS12
[23]. In the CCT8L, MKKS and BBS10 families, unusual
substitutions are observed in phosphate-binding posi-
tions and within the catalytic triad, where only Asp is
conserved in MKKS. The effect that these mutations
may have on the hydrolytic activity in these protein
families is unclear. The high level of differentiation of
this region in BBS12 (where the ATP-hydrolysis motif is
not recognizable) strongly suggests that BBS12 has lost
hydrolytic activity.


Conservation of substrate-binding positions
Three positions crucial in determining substrate-specifi-
city of CCT monomers have been identified in the distal
region of helix I in the apical domain [37]. We analyzed
conservation at these positions across vertebrate species
in all Group II chaperonin families and in the
Fabl_TCP domain across vertebrate orthologs of the
PIKFYVE protein kinase (Table 6). These positions are
strikingly conserved within each CCT monomer type
(with the exception of CCT6B) across species and are
characteristically different between monomer types.
They are mostly conserved also in the Fabl_TCP
domain across vertebrate sequences. In contrast, in BBS
and, particularly, in CCT8L sequences, the homologous
positions are significantly more differentiated.

Discussion
We identified the full complement of chaperonin hsp60
genes and pseudogenes encoded in the human genome
and, for comparison, in the genomes of the model
organisms mouse and rat. We delimited the set of hsp60
genes encoded in the human genome to: a) nine canoni-
cal cct genes (CCT1 to CCT8 including CCT6A and
CCT6B) involved in formation of the CCT complex; b)
the cpn60 gene (HSPD1) of mitochondrial origin; c) the
three highly diverged hsp60-like BBS genes MKKS,
BBS10 and BBS12; and d) a newly characterized class of
genes, CCT8L, represented in human by CCT8L1 and
CCT8L2. We also identified a plethora of pseudogene
sequences, many of which had not been previously

Table 6 Conservation of potential substrate-binding
residue positions1


Family
CCT1
CCT2
CCT3
CCT4
CCT5
CCT6A
CCT6B
CCT7
CCT8
CCT8L
MKKS


12 i+12 i+42
K Y DE
Q L A (GQ)3
H Y KR
H F K
H L Q


LMSV
Y


DILPT HLQR
Q (H) FY (H)


SO10 Y CLY
(AFQS) (W)
3S12 E (KLQ) KR (HQ)


FablTCP" D (E


Description
Lys/Tyr/Acidic
GIn/Leu/Ala
His/yr/Basic
His/Phe/Lys
His/L eu/GIn


N Asp/Ala/Lys
K (R) Acidic/Medium-Small/Lys
D (Y) GIn/Tyr/Asp
K His/Tyr/Lys
KNRY VariableN/ariable/Polar-Basic
DEMQST Gin/Aromatic/Medium-
Small
LMQV Tyr/Variable/Nariable


(ASD)
(LMV) Q


Glu/Basic/Polar-Basi<

Asp/Ile/Gin


'Conservation evaluated among sequences in vertebrate genomes. Potential
substrate binding positions, corresponding to yeast CCT1 positions 308, 309
and 312 (i = 308) [37]. 3Rare substitutions are listed in parenthesis. 4FablTCP
domain of vertebrate PIKFYVE orthologs.


Page 13 of 19


HNR






Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


reported. The comparative analyses of these families of
functional genes and of their pseudogenes revealed their
evolutionary history and relationships.
In contrast to the uncertainty of the duplication pat-
tern of canonical CCT subunits (our results and [38,39])
the origin of Hsp60-like BBS and CCT8L proteins was
unambiguously identified by phylogenetic tree recon-
structions. Our analyses indicated that hsp60-like BBS
genes originated monophyletically from a gene duplica-
tion event in the CCT8 gene lineage. In addition, we
determined that the CCT8L family also originated in the
CCT8 lineage, from a more recent retrotransposition
event. The presence of this gene family in placental
mammals, marsupials and monotremes but not in rep-
tiles/birds or other vertebrate species, indicates that this
family originated at the onset of mammal evolution,
before divergence of Theria and Prototheria. Presence
of two highly similar CCT8L genes (CCT8L1 and
CCT8L2) in the genomes of human and chimp and of a
single copy in other mammal genomes, including rhesus
monkey, suggests that the duplication of this gene
occurred in the ape lineage (Hominoidea) after its diver-
gence from the old-world monkeys (Cercopithecidae).
Multiple evidence gathered in this work indicates that
CCT8L sequences (and at least one of the two paralogs
in Hominoidea) encode for functional genes: (i) reduced
rates of non-synonymous mutation were estimated
along their lineages, as expected for functionally-con-
strained protein-coding genes; (ii) pseudogenes as
ancient or more recent than the CCT8L genes were
heavily degenerated and no pseudogenes pre-dating
mammal evolution could be identified. In contrast,
although CCT8L sequences originated early in mammal
evolution, they did not show signs of degeneration (with
the exception of the chimp CCT8L1 ortholog); (iii) mul-
tiple EST and microarray data have been collected for
CCT8L2, mostly from testis, and one EST for CCT8L1
has been reported from placental tissue (as per the Uni-
Gene EST and GEO expression data, November 23,
2009). These features taken together are strong evidence
that at least CCT8L2 in Hominoidea and the lone
CCT8L gene in other mammal lineages encode for func-
tional proteins. The sparse expression of CCT8L1 in
human and the presence of one in-frame stop codon
and one frame-shift in its orthologous sequence from
chimp raise doubts about the functionality of this
sequence.
Numerous sequences associated with cct or cpn60
genes found in the human, mouse or rat genomes were
classified as pseudogenes based on the presence of inter-
nal stop codons, frame-shifts and non-significant differ-
ence in synonymous and non-synonymous mutation
rates. Among them, the sequences HSPD1-5P and
HSPD1-6P appear to be expressed based on EST


analysis (see additional file 14: Table S5) and may repre-
sent instances of expressed pseudogenes [40]. A general
explosion of pseudogene generation in the human and
murid lineages after they separated from the carnivore
lineage has been reported [41]. Our analysis of chapero-
nin pseudogenes is consistent with this observation,
although their relatively high rate of degeneration sug-
gests that pseudogenes generated before the origin of
mammals may have degraded beyond recognition. The
intense duplication of chaperonin sequences witnessed
by the many pseudogenes identified in the human and
murid genomes, very likely provided opportunities for
multiple paralogy, resulting in the proliferation of cha-
peronin classes in the vertebrate and mammal lineages.
Although the Hsp60-like BBS and CCT8L protein
families have considerably differentiated from the cano-
nical CCT subunits and within themselves, our analyses
indicated that they still conserve the overall three-
domain structure typical of CCT proteins. Structure and
sequence variations predicted for their apical domains
may reflect distinctive substrate specificities. In particu-
lar, lack of conservation at positions crucial in providing
substrate-specificity to CCT monomers [37] suggests
that BBS and CCT8L proteins may interact with their
substrate(s) in different regions as compared with the
canonical CCT subunits. Sequence differentiation pat-
terns and acquisition of inserted elements in correspon-
dence to potential monomer-monomer contact regions
suggested that BBS and CCT8L proteins do not assem-
ble in a CCT-like complex. This prediction is supported
by experimental evidence showing that MKKS localizes
as a free monomer at the pericentriolar material of cen-
trosomes [27]. In this respect, it is also interesting to
observe that among BBS and CCT8L sequences the
ATP-hydrolysis motif "Gly-Asp-Gly-Thr", remarkably
conserved among canonical chaperonins [42], has differ-
entiated in MKKS and in BBS12 [23,27]. This condition
may indicate that these families have lost the hydrolytic
activity necessary for the functionality of the chaperonin
complex [43-52]. It has been shown for the archaeal
thermosome complex that mutation of the ATP-hydro-
lysis-motif Asp residue prevents hydrolysis and produc-
tive protein folding [49] and that some CCT subunits,
among which CCT8, dissociate in vitro from the com-
plex in conditions that prevent hydrolysis of ATP [53].
Functionalities independent from formation of the
complex have also been reported for canonical CCT
subunits. TCP1 monomers not in complex confer
enhanced salt tolerance in plants [54]. Individual CCT
subunits have been reported to associate in vitro with
cytoskeleton structures, selectively binding to microtu-
bule filaments [55] or to actin polymerizing filaments
[56]. The localization of Hsp60-like BBS proteins at the
cilium basal body and at the centrosome [26-28]


Page 14 of 19






Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


suggests that they may also interact and associate with,
for example, cytoskeleton structures in promoting the
correct development of cilia [28,57]. The multiple struc-
tural and experimental evidence that BBS and CCT8L
proteins do not form a canonical CCT-like complex
provides strong indication that eukaryotic Group II cha-
peronin-protein functionalities extend beyond those of
the typical oligomeric complex.

Conclusions
Chaperonin proteins are key players in ensuring and
preserving cell and organism functionality under normal
and stressful conditions and their biological and medical
importance is undeniable. The recent discovery of hsp60
genes directly implicated in specific pathological condi-
tions, the chaperonopathies, extends our understanding
of the roles of chaperonin proteins in cellular processes
and enhances awareness of their importance in pathol-
ogy [18-20]. Here, we have provided a comprehensive,
unifying framework encompassing all members of the
extended hsp60 family of genes and pseudogenes. This
unifying framework contributes to our understanding of
the evolutionary history of the extended hsp60 family
and widens our perspectives on the multiple roles that
chaperonin proteins have acquired in vertebrates. Our
findings highlight how differentiation of the chaperonin
protein family in mammals has been facilitated by
intense processes of gene duplication. The roles,
mechanisms of action, and involvement in pathogenesis
of individual chaperonin molecules beyond those typical
of their canonical oligomeric complexes constitute
aspects of chaperonin physiology particularly promising
for future experimental testing.

Methods
Identification of chaperonin genes in eukaryotic genomes
Searches of genes for Hsp60-like proteins were exhaus-
tively performed using TBLASTN [58] at Ensembl [34]
and BLAT [59] at UCSC [60] on the genome sequences
of human (NCBI Assembly 36, Genebuild Ensembl Dec
2006), mouse (NCBI Assembly m37, Genebuild Ensembl
Apr 2007) and rat (Assembly RGSC 3.4, Genebuild
Ensemble Feb 2006). We used the nine canonical human
CCT proteins and the Cpn60 protein (mitochondrial
Hsp60) as queries. We recursively queried the genomes
with the sequences recovered from previous searches
until no other Hsp60 sequences were detected. We used
both search engines also to recover the full list of anno-
tated hsp60-like genes in several other mammal gen-
omes and in chicken. Sequences from frog (Xenopus sp.)
were retrieved from the NCBI nr (non-redundant) data-
base using PSI-BLAST [61] with Cpn60 and the indivi-
dual CCT subunits as queries. To recover complete
hsp60 gene and pseudogene sequences, after the


TBLASTN searches the genomic sequences from
approximately 2,000 nt upstream to 2,000 nt down-
stream of the hit-regions were excised and the hsp60
sequences were extracted using the homology-based
gene prediction method implemented in FGENESH+
[62] at the Softberry web site [63]. For pseudogenes,
when FGENESH+ failed to recognize the complete
sequence due to in-frame stop codons or frame shifts in
the sequence, the coding region was manually recon-
structed, aligning the three-frame-translations of the
genomic sequence to the query sequence with the multi-
ple protein alignment program ITERALIGN [64]. The
Pseudogene.org [33,65] database and Ensembl [34],
Entrez [30] and HUGO [66] annotations were consulted
for the presence of annotated human pseudogenes, as
recorded in our tables of results.

Multiple sequence alignment and secondary structure
prediction
Multiple sequence alignments were obtained using
MUSCLE [67], which in previous analyses [68,69] per-
formed well when aligning divergent sequences. Align-
ments were manually adjusted as needed. Predictions of
secondary structure for each protein family were per-
formed from their multiple alignment using the Jnet
algorithm as implemented in the JPRED-3 secondary
structure prediction server [70,71].

Evolutionary tree reconstructions
To infer phylogenetic relationships, evolutionary trees
were obtained using the maximum-likelihood (ML) tree-
building procedure implemented in PHYML [72] using
the default JTT substitution model and 100 bootstrap
resampling replicates (each ML tree reconstruction
being quite time consuming). Selected trees were com-
pared with those obtained with the Bayesian approach
implemented in MrBayes 3.1 [73] using the WAG sub-
stitution model and 10,000 iterations for the MCMC
process. Conditional probabilities were estimated sam-
pling the MCMC process every 10 iterations after 2,500
burn-in iterations (sample size 750).

Estimates of evolutionary divergence of sequence families
We obtained rates of divergence among families of
sequences using a newly developed estimator, called "B-
index". The B-index is an unbiased estimator of the
average divergence of a family of sequences from its last
common ancestor (root) that takes into consideration
the correlations among sequences determined by their
phylogenetic tree. Briefly, given a rooted tree, a terminal
branch of length di of the original tree is considered a
"cluster" of size wi = 1 and length d = di. Each fork-
structure comprising two terminal branches (clusters) of
lengths d, and d2 and sizes w, and w2 bifurcating from


Page 15 of 19







Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


a stem-branch of length d, is considered in turn. The
average length d of each fork-structure is computed as
d = (dl + d2)/2 + d, and the average size w of the struc-
ture is defined as w = [2(dl + d2)/2 + ld,]/[(dl + d2)/2
+ d] = (dl + d2 + ds)/d. Each fork-structure is progres-
sively replaced by a corresponding cluster of length d
and size w. The procedure is repeated merging bifurcat-
ing clusters of lengths d, and d2 and sizes w, and w2
connected to a stem-branch of length d, into a larger
cluster of average length d = (wid, + w2d2)I(Wl + w2) +
ds and average size w = (dlwl + d2w2 + ds)/d, until the
tree is reduced to two clusters connected to the root (d,
= 0). The global average differentiation D ("B-index")
and size W can finally be computed as D = (widl +
w2d2)/(w1 + w2) and W = w1 + w2. It can be shown that
DW = L is the length of the tree (sum of all branch
lengths). If two sequence families A and B are sampled
from the same set of species and WA = WB, then DBIDA
= LB/LA and the relative rate of differentiation of the
two families of sequences can be estimated by the ratio
of their tree lengths. The B-index has several advantages
compared to the most commonly used average pair-wise
sequence-similarity measure: (i) it takes into account the
correlation among sequences imposed by the topology
of the evolutionary tree; (ii) in contrast to average pair-
wise similarity, its expectations are invariant over the
number and phylogenetic relations of sequences
sampled from a cluster with the same common ancestor
and evolutionary model; and (iii) with the B-index, the
average differentiation rate of a protein family relative to
a reference family sharing the same evolutionary rela-
tions (e.g., sampled from the same set of species) is sim-
ply estimated by the ratio of the lengths of the
evolutionary trees of the two families.

Estimates of ratios of non-synonymous vs. synonymous
mutation rate (Ka/Ks)
Classification of hsp60 sequences as functional genes or
pseudogenes was supported by the absence or presence
of in-frame stop codons and frame-shifts, and by esti-
mating non-synonymous vs. synonymous mutation-rate
ratios (Ka/Ks) along relevant branches of evolutionary
trees. Estimates were obtained using the maximum-like-
lihood branch-specific model implemented in PAML4
[74]. In the case of pseudogenes, Ka/Ks values are
expected not to significantly differ from 1 (absence of
positive or negative selection at the protein level)
whereas protein-coding genes, whose evolution is domi-
nated by negative or positive selection, are expected to
be characterized, respectively, by Ka/Ks < 1 or Ka/Ks >
1. Briefly, we applied the PAML4 "branch-specific
model" creating an evolutionary tree including the
sequences whose evolutionary lineage was tested, the


appropriate sister sequence (in the case of pseudogenes,
the gene sequence from whose lineage the pseudogene
originated) and an out-group sequence. The tree branch
(es) to be tested are designated as "foreground" and
other branches as "background." Using the branch-spe-
cific model the Ka/Ks ratio is estimated for the fore-
ground branches) and an analogous ratio is estimated
for the background branches. The likelihood Li gener-
ated using this evolutionary model is compared to the
likelihood Lo of a null model where Ka/Ks for fore-
ground branches is fixed to 1.0. In the Log-likelihood
Ratio Test (LRT) the significance of the likelihood dif-
ferences between the model with free estimate of Ka/Ks
and the null model is estimated by the quantity 2,ln(L1/
Lo), which approximates a z2 distribution.

Data availability
All relevant gene and pseudogene information, including
start and end positions, chromosomal location, strand,
number of exons, GenBank accession number for func-
tional genes, and Ensembl or Pseudogene.org ID for
pseudogenes, can be found in additional file 22: Table
S7. Newly annotated sequences have been approved and
deposited in the Human Genome Organization (HUGO)
database [66].

Additional file 1: Table S1 Mouse hsp60 genes and pseudogenes
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1471-2148-10-
64-51 DOC]
Additional file 2: Table S2 The rat hsp60 genes and pseudogenes
Click here for file
[http'//wwwbiomedcentralcom/content/supplementary/1471-2148-10-
64-52 DOC]
Additional file 3: Figure S1 Phylogenetic tree of human CCT1-8 and
CCT8L proteins
Click here for file
[http'//wwwbiomedcentralcom/content/supplementary/1471-2148-10-
64-53 PDF]
Additional file 4: Figure S2 Phylogenetic tree of human CCT1-8 and
MKKS proteins
Click here for file
[http'//wwwbiomedcentralcom/content/supplementary/1471-2148-10-
64-54 PDF]
Additional file 5: Figure S3 Phylogenetic tree of human CCT1-8 and
BBS10 proteins
Click here for file
[http'//wwwbiomedcentralcom/content/supplementary/1471-2148-10-
64- 5 PDF]
Additional file 6: Figure S4 Phylogenetic tree of human CCT1-8 and
BB512 proteins
Click here for file
[http'//wwwbiomedcentralcom/content/supplementary/1471-2148-10-
64-56 PDF]
Additional file 7: Figure S5 Phylogenetic tree of vertebrate CCT1-8,
MKKS, BB510, BBS12 and CCT8L proteins
Click here for file
[http'//wwwbiomedcentralcom/content/supplementary/1471-2148-10-
64-57 PDF]


Page 16 of 19








Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


Additional file 8: Figure S6 Phylogenetic trees of CCT8L protein
sequences from primates (a,b) and partial alignment showing a divergent
region in the sequence from rhesus monkey (c)
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1 471-2148-10-
64-S8 PDF]
Additional file 9: Table S3 Codon-base specific counts of mutation
events along human and chimp CCT8L evolutionary branches
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1 471-2148-10-
64-S9DOC]
Additional file 10: Table S4 Expression pattern (EST counts) of the
human CCT and BBS genes from the UniGene database
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1 471-2148-10-
64-S10 DOC]
Additional file 11: Figure S7 Evolutionary tree of vertebrate CCT1 -8
and CCT8L proteins including associated human pseudogenes
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1 471-2148-10-
64-S11 PDF]
Additional file 12: Figure S8 Evolutionary trees of individual CCT1,
CCT3 and CCT4 proteins from vertebrates including associated human
pseudogenes
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1 471-2148-10-
64-S12 PDF]
Additional file 13: Figure S9 Evolutionary trees of individual CCT5,
CCT7 and CCT8 proteins from vertebrates including associated human
pseudogenes
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1 471-2148-10-
64-S13 PDF]
Additional file 14: Table S5 Expression pattern of the human cpn60
gene (HSPD1) and pseudogenes from the UniGene database
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1 471-2148-10-
64- 514 DOC]
Additional file 15: Figure S10 Alignment and secondary-structure
prediction of archaeal thermosome sequences
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1 471-2148-10-
64-S15 PDF]
Additional file 16: Table S11 Alignment and secondary-structure
prediction of human CCT1-8 protein sequences
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1 471-2148-10-
64-S16 PDF]
Additional file 17: Table S12 Alignment and secondary-structure
prediction of vertebrate CCT8L protein sequences
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1 471-2148-10-
64- 517PDF]
Additional file 18: Table S13 Alignment and secondary-structure
prediction of vertebrate MKKS protein sequences
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1 471-2148-10-
64-S18 PDF]
Additional file 19: Table S14 Alignment and secondary-structure
prediction of vertebrate BBS10 protein sequences
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1471-2148-10-
64-S19 PDFI


Additional file 20: Table S15 Alignment and secondary-structure
prediction of vertebrate BBS12 protein sequences
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1471-2148-10-
64-S20 PDF]24-Mar-10
Additional file 22: Table S7 Database and sequence information on all
hsp60-like sequences identified in the human, mouse and rat genomes
Click here for file
[http'//wwwbiomedcentral com/content/supplementary/1471-2148-10-
64-S22 PDF]




Abbreviations
BBS' Bardet-Biedl Syndrome; CCT Chaperonin Containing TCP1; ML'
Maximum- Likelihood; MMKS' McKusick-Kaufman Syndrome; TRiC TCP1 Ring
Complex

Acknowledgements
The authors thank an anonymous reviewer for providing valuable
information AJLM and EC de M thank Wesley Harlow for his help in the
initial stages of this work and the San Francisco Foundation for support LB
and KM thank Mr Steve Oden and Ms Shaina R Wallach for critical
proofreading of the manuscript LB thanks the University of Florida Genetics
Institute for financial support

Author details
1Department of Molecular Genetics and Microbiology, University of Florida,
College of Medicine, 1660 SW Archer Road, Gainesville, FL 32610, USA
2Genetics Institute, University of Florida, Cancer and Genetics Research
Complex, 2033 Mowry Road, Gainesville, FL 32610, USA 3University of
Maryland, Columbus Center, 701 East Pratt Street, Baltimore, MD 21202, USA

Authors' contributions
KM participated in research and methodological approach design, carried
out all searches and most data analyses, wrote drafts of the manuscript and
participated in its refinement, compiled all tables and produced most
figures; EC de M and AJLM envisioned the research project, started data
collection and participated in research design and in manuscript
preparation; LB participated in research design and methodological
approach, produced differentiation and mutation-accumulation estimates
and analyses and participated in writing the manuscript All authors read
and approved the final manuscript

Received: 13 August 2009 Accepted: 1 March 2010
Published: 1 March 2010

References
1 Hardt FU, Hayer-Hartl M Molecular chaperones in the cytosol: from
nascent chain to folded protein. Science 2002, 295 1852-1858
2 Frydman J Folding of newly translated proteins in vivo: the role of
molecular chaperones. Annu Rev Biochem 2001, 70'603-647
3 Sigler PB, Xu Z, Rye HS, Burston SG, Fenton WA, Horwich AL Structure and
function in GroEL-mediated protein folding. Anna Rev Biochem 1998, 67581-608
4 Bukau B, Horwich AL The Hsp70 and Hsp60 chaperone machines. Cell
1998, 92'351-366
5 Hemmingsen SM, Woolford C, Vies van der SM, Tilly K, Dennis DT,
Georgopoulos CP, Hendrix RW, Ellis RJ Homologous plant and bacterial
proteins chaperone oligomeric protein assembly. Nature 1988,
333330-334
6 Trent JD, Nimmesgern E, Wall JS, Hartl FU, Horwich AL A molecular
chaperone from a thermophilic archaebacterium is related to the
eukaryotic protein t-complex polypeptide-1. Nature 1991, 354490-493
7 Kubota H, Hynes G, Willison K The chaperonin containing t-complex
polypeptide 1 (TCP-1). Multisubunit machinery assisting in protein
folding and assembly in the eukaryotic cytosol. Eur J Biochem 1995,
2303-16


Page 17 of 19








Mukherjee et al. BMC Evolutionary Biology 2010, 10:64
http://www.biomedcentral.com/1471-2148/10/64


8 Macario AJL, Malz M, Conway de Macario E Evolution of assisted protein
folding: the distribution of the main chaperoning systems within the
phylogenetic domain archaea. Front Biosci 2004, 9'1318-1332
9 Carrascosa JL, Llorca 0, Valpuesta JM Structural comparison of prokaryotic
and eukaryotic chaperonins. Micron 2001, 3243-50
10 Large AT, Lund PA Archaeal chaperonins. Front Biosci 2009, 14'1304-1324
11 Ranson NA, Clare DK, Farr GW, Houldershaw D, Horwich AL, Saibil HR'
Allosteric signaling of ATP hydrolysis in GroEL-GroES complexes. Nat
Struct Mol Biol 2006, 13 147-152
12 Ranson NA, Dunster NJ, Burston SG, Clarke AR Chaperonins can catalyse
the reversal of early aggregation steps when a protein misfolds. J Mol
Biol 1995, 250581-586
13 Ranson NA, White HE, Saibil HR Chaperonins. Biochem J 1998, 333(Pt
2)233-242
14 Levy-Rimler G, Bell RE, Ben-Tal N, Azem A Type I chaperonins: not all are
created equal. FEBS Lett 2002, 529'1-5
15 Frydman J, Nimmesgern E, Erdjument-Bromage H, Wall JS, Tempst P,
Hartl FU Function in protein folding of TRiC, a cytosolic ring complex
containing TCP-1 and structurally related subunits. EMBO J 1992,
114767-4778
16 Kubota H, Hynes G, Carne A, Ashworth A, Willison K Identification of six
Tcp-1-related genes encoding divergent subunits of the TCP-1-
containing chaperonin. Curr Biol 1994, 489-99
17 Stoldt V, Rademacher F, Kehren V, Ernst JF, Pearce DA, Sherman F Review:
the Cct eukaryotic chaperonin subunits of Saccharomyces cerevisiae and
other yeasts. Yeast 1996, 12'523-529
18 Cappello F, Conway de Macario E, Marasa L, Zummo G, Macario AJL Hsp60
expression, new locations, functions and perspectives for cancer
diagnosis and therapy. Cancer Biol Ther 2008, 7'801-809
19 Macario AJL, Conway de Macario E Chaperonopathies by defect, excess,
or mistake. Ann N Y Acad Sci 2007, 1113'178-191
20 Macario AJL, Conway de Macario E Sick chaperones, cellular stress, and
disease. N Engl J Med 2005, 353'1489-1501
21 Stone DL, Slavotinek A, Bouffard GG, Banerjee-Basu 5, Baxevanis AD, Barr M,
Biesecker LG Mutation of a gene encoding a putative chaperonin causes
McKusick-Kaufman syndrome. Naot Genet 2000, 2579-82
22 Stoetzel C, Laurier V, Davis EE, Muller J, Rix 5, Badano JL, Leitch CC,
Salem N, Chouery E, Corbani 5, et al BBS10 encodes a vertebrate-specific
chaperonin-like protein and is a major BBS locus. Naot Genet 2006,
38521-524
23 Stoetzel C, Muller J, Laurier V, Davis EE, Zaghloul NA, Vicaire S, Jacquelin C,
Plewniak F, Leitch CC, Sarda P, et al Identification of a novel BBS gene
(BBS12) highlights the major role of a vertebrate-specific branch of
chaperonin-related proteins in Bardet-Biedl syndrome. Am J Hum Genet
2007, 80'1-11
24 Katsanis N, Beales PL, Woods MO, Lewis RA, Green JS, Parfrey PS, Ansley 5J,
Davidson WS, Lupski JR' Mutations in MKKS cause obesity, retinal
dystrophy and renal malformations associated with Bardet-Biedl
syndrome. Nat Genet 2000, 26'67-70
25 Blacque OE, Leroux MR Bardet-Biedl syndrome: an emerging
pathomechanism of intracellular transport. Cell Mol Life So 2006,
63 2145-2161
26 Hirayama 5, Yamazaki Y, Kitamura A, Oda Y, Morito D, Okawa K, Kimura H,
Cyr DM, Kubota H, Nagata K MKKS is a centrosome-shuttling protein
degraded by disease-causing mutations via CHIP-mediated
ubiquitination. Mol Biol Cell 2008, 19'899-911
27 Kim JC, Ou YY, Badano IL, Esmail MA, Leitch CC, Fiedrich E, Beales PL,
Archibald JM, Katsanis N, Rattner JB, et al MKKS/BBS6, a divergent
chaperonin-like protein linked to the obesity disorder Bardet-Biedl
syndrome, is a novel centrosomal component required for cytokinesis. J
Cell Sc 2005, 118'1007-1020
28 Marion V, Stoetzel C, Schlicht D, Messaddeq N, Koch M, Flori E, Danse JM,
Mandel JL, Dollfus H Transient ciliogenesis involving Bardet-Biedl
syndrome proteins is a fundamental characteristic of adipogenic
differentiation. Proc Nat/ Acad Sc USA 2009, 106'1820-1825
29 Brocchieri L, Conway de Macario E, Macario AJL Chaperonomics, a new
tool to study ageing and associated diseases. Mech Ageing Dev 2007,
128125-136
30 Entrez Gene. [http//wwwncbi nlm nihgov/sites/entrezdbgene]
31 Shisheva A, 5brissa D, Ikonomov 0 Cloning, characterization, and
expression of a novel Zn2+-binding FYVE finger-containing


phosphoinositide kinase in insulin-sensitive cells. Mol Cell Biol 1999,
19623-634
32 Li 5, Tiab L, Jiao X, Munier FL, Zografos L, Frueh BE, Sergeev Y, Smith J,
Rubin B, Meallet MA, et a/ Mutations in PIP5K3 are associated with
Francois-Neetens mouchetee fleck corneal dystrophy. Am J Hum Genet
2005, 77'54-63
33 Karro JE, Yan Y, Zheng D, Zhang Z, Carriero N, Cayting P, Harrrison P,
Gerstein M Pseudogene.org: a comprehensive database and comparison
platform for pseudogene annotation. Nucleic Acids Res 2007, 35 D55-60
34 Ensembl. [http'//wwwensemblorg/indexhtml]
35 Springer MS, Stanhope MJ, Madsen 0, de Jong WW' Molecules consolidate
the placental mammal tree. Trends Ecol Evol 2004, 19430-438
36 Ditzel L, Lowe J, Stock D, Stetter KO, Huber H, Huber R, Steinbacher 5
Crystal structure of the thermosome, the archaeal chaperonin and
homolog of CCT. Cell 1998, 93'125-138
37 Spiess C, Miller EJ, McClellan AJ, Frydman J' Identification of the TRiC/CCT
substrate binding sites uncovers the function of subunit diversity in
eukaryotic chaperonins. Mol Cell 2006, 2425-37
38 Fares MA, Wolfe KH Positive selection and subfunctionalization of
duplicated CCT chaperonin subunits. Mol Biol Evol 2003, 20'1588-1597
39 Archibald JM, Logsdon JM Jr, Doolittle WF Origin and evolution of
eukaryotic chaperonins: phylogenetic evidence for ancient duplications
in CCT genes. Mol Biol Evol 2000, 17'1456-1466
40 Harrison PM, Zheng D, Zhang Z, Carriero N, Gerstein M Transcribed
processed pseudogenes in the human genome: an intermediate form of
expressed retrosequence lacking protein-coding ability. Nucleic Acids Res
2005, 33'2374-2383
41 Yu Z, Morais D, Ivanga M, Harrison PM Analysis of the role of
retrotransposition in gene evolution in vertebrates. BMC Bioinformatics
2007, 8308
42 Brocchieri L, Karlin Conservation among HSP60 sequences in relation
to structure, function, and evolution. Protein Sci 2000, 9476-486
43 Bigotti MG, Bellamy SR, Clarke AR The asymmetric ATPase cycle of the
thermosome: elucidation of the binding, hydrolysis and product-release
steps. J Mol Biol 2006, 362'835-843
44 Bigotti MG, Clarke AR Cooperativity in the thermosome. J Mol Biol 2005,
34813-26
45 Cliff MJ, Kad NM, Hay N, Lund PA, Webb MR, Burston SG, Clarke AR A
kinetic analysis of the nucleotide-induced allosteric transitions of GroEL.
J Mol Biol 1999, 293'667-684
46 Jackson GS, Staniforth RA, Halsall DJ, Atkinson T, Holbrook JJ, Clarke AR,
Burston SG Binding and hydrolysis of nucleotides in the chaperonin
catalytic cycle: implications for the mechanism of assisted protein
folding. Biochemistry 1993, 32'2554-2563
47 Kafri G, Horovitz A Transient kinetic analysis of ATP-induced allosteric
transitions in the eukaryotic chaperonin containing TCP-1. J Mol Biol
2003, 326'981-987
48 Kafri G, Willison KR, Horovitz A Nested allosteric interactions in the
cytoplasmic chaperonin containing TCP-1. Protein Sci 2001, 10445-449
49 Kanzaki T, lizuka R, Takahashi K, Maki K, Masuda R, Sahlan M, Yebenes H,
Valpuesta JM, Oka T, Furutani M, et al Sequential action of ATP-
dependent subunit conformational change and interaction between
helical protrusions in the closure of the built-in lid of group II
chaperonins. J Biol Chem 2008, 283'34773-34784
50 Staniforth RA, Burston SG, Atkinson T, Clarke AR Affinity of chaperonin-60
for a protein substrate and its modulation by nucleotides and
chaperonin-10. Biochem J 1994, 300(Pt 3)'651-658
51 Todd MJ, Viitanen PV, Lorimer GH Dynamics of the chaperonin ATPase
cycle: implications for facilitated protein folding. Science 1994,
265659-666
52 Yifrach 0, Horovitz A Coupling between protein folding and allostery in
the GroE chaperonin system. Proc Nat/ Acad Sc USA 2000, 97'1521-1524
53 Roobol A, Grantham J, Whitaker HC, Carden MJ Disassembly of the
cytosolic chaperonin in mammalian cell extracts at intracellular levels of
K+ and ATP. J Biol Chem 1999, 27419220-19227
54 Yamada A, Sekiguchi M, Mimura T, Ozeki Y The role of plant CCTalpha in
salt- and osmotic-stress tolerance. Plant Cell Physiol 2002, 43'1043-1048
55 Roobol A, Sahyoun ZP, Carden MJ Selected subunits of the cytosolic
chaperonin associate with microtubules assembled in vitro. J Biol Chem
1999, 274'2408-2415


Page 18 of 19







Mukherjee et al. BMC Evolutionary Biology 2010, 10:64 Page 19 of 19
http://www.biomedcentral.com/1471-2148/10/64




56 Grantham J, Ruddock LW, Roobol A, Carden MJ Eukaryotic chaperonin
containing T-complex polypeptide 1 interacts with filamentous actin
and reduces the initial rate of actin polymerization in vitro. Cell Stress
Chaperones 2002, 7'235-242
57 Shah AS, Farmen SL, Moninger TO, Businga TR, Andrews MP, Bugge K,
Searby CC, Nishimura D, Brogden KA, Kline JN, et al Loss of Bardet-Biedl
syndrome proteins alters the morphology and function of motile cilia in
airway epithelia. Proc Nat/ Acod Sc USA 2008, 105'3380-3385
58 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,
Lipman DJ Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Res 1997, 25'3389-3402
59 Kent WJ BLAT-the BLAST-like alignment tool. Genome Res 2002,
12656-664
60 BLAT Search Genome. [http//genomeucscedu/cgi-bin/hgBlat]
61 Altschul SF, Koonin EV Iterated profile searches with PSI-BLAST-a tool for
discovery in protein databases. Trends Biochem So 1998, 23'444447
62 Salamov AA, Solovyev W' Ab initio gene finding in Drosophila genomic
DNA. Genome Res 2000, 10'516-522
63 Softberry. [http'//wwwsoftberrycom]
64 Brocchieri L, Karlin S A symmetric-iterated multiple alignment of protein
sequences. J Mol Biol 1998, 276249-264
65 Pseudogene.org. I. I i i i,
66 HUGO Gene Nomenclature Committee. [http//wwwgenenamesorg]
67 Edgar RC MUSCLE: multiple sequence alignment with high accuracy and
high throughput. Nucleic Acids Res 2004, 32'1792-1797
68 Mukherjee K, Burglin TR MEKHLA, a novel domain with similarity to PAS
domains, is fused to plant homeodomain-leucine zipper III proteins.
Plant Physiol 2006, 140 1142-1150
69 Mukherjee K, Burglin TR Comprehensive analysis of animal TALE
homeobox genes: new conserved motifs and cases of accelerated
evolution. J Mol Evol 2007, 65'137-153
70 Cole C, Barber JD, Barton GJ The Jpred 3 secondary structure prediction
server. Nucleic Acids Res 2008, 36W197-201
71 Jpred 3. A Secondary Structure Prediction Server. [http'//wwwcompbio
dundee ac uk/www jpred/]
72 Guindon S, Gascuel 0 A simple, fast, and accurate algorithm to estimate
large phylogenies by maximum likelihood. Syst Biol 2003, 52'696-704
73 Ronquist F, Huelsenbeck JP MrBayes 3: Bayesian phylogenetic inference
under mixed models. Bioinformatics 2003, 19'1572-1574
74 Yang Z PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol
Evol 2007, 241586-1591

doi:10.1186/1471-2148-10-64
Cite this article as: Mukherjee et al Chaperonin genes on the rise: new
divergent classes and intense duplication in human and other
vertebrate genomes. BMC Evolutionary Biology 2010 10 64


0 loMed Central


Submit your next manuscript to BioMed Central
and take full advantage of:

* Convenient online submission
* Thorough peer review
* No space constraints or color figure charges
* Immediate publication on acceptance
* Inclusion in PubMed, CAS, Scopus and Google Scholar
* Research which is freely available for redistribution


Submit your manuscript at
www.biomedcentral.com/subm it