1 Supplementary Information for Braun et al. Homoplastic Microinversions and the Avian Tree of Life Supplementary methods Evolutionary rate estimates for Figure 1 Although a comprehensive meta analysis to examine the rates at which various genomic changes accumulate is beyond the scope of the present work, it is important to pla ce estimates of the rate of accumulation for relatively poorly characterized genomic changes in the context of rates that are more familiar. For that reason, we reviewed the relevant literature to obtain approximate estimates of these rates. The Chaisson et al. (2006) estimate of the mammalian microinve rsion rate ( MI ) was ~0.015 microinversions per megabase per million years ( microinversions Mb 1 MY 1 ). Although Chaisson et al. (2006) did not present detailed information regarding the size spectrum of the microinversions they identified, making comparisons to other studies difficult, these units (genomic changes Mb 1 MY 1 ) seem reasonable to use for comparisons These units were also used for genomes that are < 1 Mb in size (e.g., chloroplast genomes). Nucleotide substitutions are the most rapidly accumulating genomic changes that were used for this comparison. Th e neutral nucleotide substitution rate is often assumed 0.005 substitutions site 1 MY 1 in mammals (e.g., Penny et al. 2001). This value is slightly higher than estimates of the neutral nucleotide substitution rate that Chojnowski et al. (2008) obtained fo r birds using introns ( 0.0014 substitutions site 1 MY 1 ). Gaut (1998) reviewed the available estimates of angiosperm nucleotide substitution rates, which ranged from 0.005 0.03 substitutions site 1 MY 1 for nuclear substitutions (a "consensus" value of 0.0 15 substitutions site 1 MY 1 was chosen for simplicity) and 0.001 0.003 substitutions site 1 MY 1 for chloroplast substitutions (a consensus value of 0.002 substitutions site 1 MY 1 was chosen). To convert these rates to the appropriate units for compariso n (nucleotide substitutions Mb 1 MY 1 ) rate estimates were multiplied by 10 6 Estimates of the mammalian non coding region indel rate are ~1/8 of the nucleotide substitution rate (Lunter 2007 ). This suggest s a rate of ~625 indels Mb 1 MY 1 The avian non c oding indel rate was extrapolated from Bonilla et al. (2010) who used ~4.8 kb of non coding DNA for a 23 taxon tree MP analysis of a dataset generated using the simple coding method of Simmons & Ochoterena (2000) revealed a total of 902 indel event s Comb ining this value with the sequence length and TL ( 604 MY ; obtained as described below) allowed us to estimate a rate of ~300 indels Mb 1 MY 1 for avian non coding regions It is unclear whether this difference reflects the use of methods used to estimate t he indel rate for non coding regions differences among the regions of the genome examined by these studies, or differences between birds and mammals. Regardless, these estimates suggest typical vertebrate non coding indel rates are approximately an order of magnitude less than the neutral nucleotide substitution rate. The mammalian coding region indel rate was obtained using information from Murphy et al. (2007), who reported 5,648 and 5,723 coding region indels in human armadillo and human elephant compar isons, respectively. Combining the amount of sequence data examined (17 19 Mb) and the divergence time estimate (105 MYA) from Murphy et al. (2007) suggests a rate of ~1.5 indels Mb 1 MY 1 coding region. We could not obtain a similar estimate using avian d ata Rates of accumulation for transposable element (TE) insertion s show substantial variation that reflects the specifics of elements that are being considered. For simplicity, we limited
2 consideration to the accumulation of the most common avian TE (the CR1 element) because it was possible to use the same methods and data matrix that we use to estimate MI H an et al. ( 2011 ) identified 59 CR1 insertions in an alignment similar to the alignment used for this study Since the sequence length and time availa ble for TE insertions in the Han et al. ( 2011 ) study are similar to those for this study the CR1 insertion rate is ~0.4 TE insertions Mb 1 MY 1 Because Chaisson et al. (2006) did not present information regarding the ir microinversion size we used the Ma et al. (2006) to obtain an a second a priori esti mate of MI for vertebrate nuclear genomes Ma et al. (2006) identified 12,057 micro inversion s in complete genome comparisons involving four mammals and they presented a size spectrum that overlaps with the sizes microinversions we obtained. Combining this number of micro inversion s with divergence times estimates from Murphy et al. (2007) suggests an estimate of MI ( ~ 0.01 8 microinversions Mb 1 MY 1 ) very similar the estimate from Chaisson et al. (2006) Most (6,277) of the micro inversions identified by Ma et al. (2006) were identified when the two roden ts in the study (rat and mouse) were compared, suggesting MI could be higher in murid rodents (~ 0.0 65 microinversions Mb 1 MY 1 ). Several o ther types of substitutions accumulate more rapidly in rodents than in other mammalian lineages (e.g., Cooper et al. 2003 ) making it is reasonable to speculate that MI is a lso higher in this lineage. However, it remains possible that MI has been underestimated in the other mammalian lineages that Ma et al. (2006) examined and it is important to recognize that there is no direct evidence that MI is correlated with the rate s of accumulation for other type s of genomic change s in mammals or any other vertebrate lineage Although the microinversion size spectra presented in some prior studies (Feuk et al. 2005; Ma et al. 2006) overlaps with the sizes of microinversions we identi fied, it seems likely that these large scale studies did not identify the smallest inversions Taken as a whole our analyses of avian microinversions suggest that the relatively low MI estimates based upon Chaisson et al. (2006) and Ma et al. (2006) reflect this acquisition bias Thus, it may be more appropriate to view the estimates of MI from those studies as the rate of accumulation for longer microinversions (>30 bp based upon the size spectra reported by Ma et al. 2006 ). We also extended our estimates of MI to angiosperm chloroplast DNA by using information about a 1.7 kb region sequenced from ten Jasminum species (and one Monodora outgroup) where five inversion events were found (Kim & Lee 2005). TL for that was ~135 MY ba sed upon examination of branch lengths and a molecular clock analyses of the same group (Lee et al. 2007). This suggests the angiosperm chloroplast MI is ~ 22 microinversions Mb 1 MY 1 although this rate should be viewed as approximate given the relativel y small number of microinversions that were identified. Several different types of genomic changes have been shown to accumulate at different rates in different parts of the genome. For example, regional variation has been documented for both neutral nucle otide substitution rates (e.g., Webster et al. 2004) and non coding indel rates (Lunter 2007). Likewise, there are many examples of among taxon variation in the evolutionary rate within groups; Figure 3 in Hackett et al. (2008) provides an excellent exampl e of this rate variation using a subset of the data that were analyzed for this study. Our analyses of MI in birds also indicate that there is among locus variation in the rate of accumulation for these changes (see main text). MI may also exhibit among taxon variation, although the limited number of inversions observed in this study make difficult to establish the magnitude of this variation or even whether this variation actually exists. However, variation among taxa and among loci is
3 typically less tha n an order of magnitude suggesting that the rate estimates presented here provide a sufficient framework to understand the rates at which different genomic changes accumulate over evolutionary time (summarized in Figure 1 in the main text). A vian microinve rsion rate estimates To calculate MI in birds, we estimated the number of inversion events using the MP criterion and combined that information with the mean length of the non coding data examined (in Mb) and the amount of time available for microinversio ns to accumulate. Time available for inversions was estimated using the treelength ( TL ) of the Hackett et al. (2008) tree in MY, estimated using an ultrametric version of that tree generated by non parametric rate smoothing (NPRS; Sanderson 1997 ). Since th ere was missing data for specific taxa in some partitions, TL for each locus may differ. Locus specific TL values were calculated using the ultrametric Hackett et al. (2008) tree after pruning branches that correspond to missing taxa. Divergence times on t he ultrametric tree were calculated by assuming the basal split in Neoaves occurred 1 0 0 MYA a value consistent with the higher estimates obtained in molecular clock studies ( e.g., Brown et al. 2008 ). The timing of evolutionary divergences in birds remains controversial (Brown et al. 2007; Ericson et al. 2007; Chojnowski et al. 2008) and we view the value of TL based upon the 100 MYA Neoaves origin calibration as a reasonable upper limit for the time available for mutation. An approximate lower limit can be obtained by constraining the divergence of Anseranas and the Anatidae to be 68 MYA based upon the fossil Vegavis iaai (Clarke et al. 2005). For the purposes of this study, the more ancient calibration is conservative since it is not expected to overestima te MI ( potentially an important consideration given the relatively high estimates of that MI we obtained ). To convert microinversion rate estimates based upon the 100 MYA calibration to those based upon the Vegavis constraint (which implies that the origin of Neoaves was 85.4 MYA) they should be multiplied by a constant ( 1.17 ) A similar approach was used to generate an ultrametric tree base d upon the Bonilla et al. (2010) galliform phylogeny. Briefly, the tree presented as Figure 2 in Bonilla et al. (2010) was rooted between Megapodidae and the remaining Galliformes as suggested by Hackett et al. ( 2008 ) and a number of additional studies (e.g., Crowe et al. 2006 ) This tree was then subjected to NPRS (Sanderson 1997) and TL was calculated by assuming the d ivergence between the families Cracidae and Phasianidae occurred 68.8 MYA This calibration was based upon the ultrametric Hackett et al. ( 2008 ) tree described above. Both ultrametric trees are available from the Early Bird website ( http://www.biology.ufl.edu/earlybird ) or upon request from ELB. Among locus microinversion rate variation We examined the rate of microinversion accumulation by comparing the simplest evolutionary model (a Poisson process) to the more complex negative binomial ( NB ) model which allows rate variation across loci, as described by Han et al. ( 2011 ). The ML estimate of MI given the Poisson model for a specific locus of length Len given k observed micro inversions and a taxon sampl e corresponding to treelength ( TL ) is simply k /( Len x TL ). The likelihood is proportional to the probability of observing k inversions given MI Len and TL which can be calculated using equation 1: P k | MI Len TL ( ) = MI Len # TL [ ] ( ) k e $ MI Len # TL [ ] k (1)
4 The likelihood given multiple loci is simply the product of the likelihoods for individual loci. The NB model is similar, but it adds a non negative variance inflation parameter ( c ) to the other variables used in equation 1 as shown below : P k | MI Len TL c ( ) = MI Len # TL [ ] ( ) k k # $ 1 c + k ( ) $ 1 c ( ) MI Len # TL [ ] + 1 c ( ) k # 1 + MI Len # TL [ ] 1 c % & ( ) + 1 c (2) The lik elihood ratio test is straightforward since the NB and Poisson models differ by a single parameter (equation 2 reduces to equation 1 when c = 0). This allows us to compare the null hypothesis of equal microinversion rates of at different loci to the altern ative hypothesis of variable rates of microinversions at different loci. Power analyses Determining how many data are necessary to yield a satisfactory estimate of a phylogeny has been examined using at least two distinct ways (reviewed by Spinks et al. 20 09). The first set of approaches is based upon empirical data, ranging from simple brute force (i.e., continue data collection until one achieves a specified degree of precision or accuracy) to extrapolation from existing data matrices (e.g., the "pseudo b ootstrap" method; DeFilippis & Moore 2000). The small number of microinversions available for phylogenetic studies makes empirical approaches difficult. The other set of approaches to this question involves the use of evolutionary models to predict the amo unt of data necessary to achieve a specific degree of precision or accuracy, using either analytical approaches (e.g., Walsh et al. 1999; Braun & Kimball 2001) or simulations (e.g., Chojnowski et al. 2008; Spinks et al. 2009). This approach is also difficu lt for microinversions due to the limited amount of information regarding the appropriate model to use (see main text), but it is possible to place some limits on amount that one would need to collect using the simple power analysis of Braun & Kimball (2001) Braun & Kimball (2001) poi nted out that the probability of obtaining at least one synapomorphy along a short internal branch is straightforward to calculate using equation 1 (above) if one assumes the relevant characters accumulate according to a Poisson model of evolution. A simil ar calculation is possible for the likelihood of finding some minimum number of genomic changes (e.g., three or more) on a branch of specified length. In this case, the complement of the probability of finding fewer substitutions is used (i.e., the sum of the probabilities given zero, one, or two substitutions). This approach was used to establish the amount of sequence data needed to be 95% certain that the specified number of microinversions could be identified. The Braun & Kimball (2001) p ower analysis treats internal branch lengths in a phylogenetic tree as the effect size. This method assumes that homoplasy is rare and that the simplest evolutionary model (a Poisson process) provides an adequate description of the evolutionary process, and it focuses on establishing t he sequence length necessary to have a specific probability (e.g., 95%) of finding at least one genomic change (i.e., microinversion) that occurred on a relevant internal branch. Despite the evidence that some microinversions exhibit homoplasy, the assumpt ion that homoplastic microinversion is relatively rare seems justified. The use of a Poisson model may seem problematic given the better fit of the NB model, but our inability to reject a Poisson model once the hotspot loci ( CLTC and CLTCL1 ) were removed s uggests that using the simpler model is justified.
5 A reasonable set of target branches for this power analysis can be obtained by considering the relationships that were poorly supported in Hackett et al. (2008). These relationships were summarized as an i ncompletely resolved tree (Figure 4 in Hackett et al. 2008) that would require the addition of ten internal branches for complete resolution. Thus, a reasonable lower limit on the amount of non coding sequence data necessary to provide useful information r egarding these branches can be provided by calculating the probability of finding at least one inversion along any one of those (currently unresolved) branches. Estimates of the amounts of sequence data necessary given different assumptions regarding the l engths of the internal branches are provided in Table S3 in the column labeled "!1 informative". If one is interested in the amount of data needed to have the same probability of obtaining at least one microinversion along a specific short branch the sequence lengths in Table S3 should be multiplied by ten. Hemiplasy is also a problem for analyses of microinversions, especially for the shortest branches presented in Table S3. Thus, it is desirable to obtain multiple microinversions along specific branches before making firm conclusions regarding relationships. For this reason the amount of data necessary to have a 95% probability of obtaining at least three microinversions along a specific short branch is also presented. Estimating the exact amounts of data necessary under these conditions is relatively complex, since i t depends upon both the model of microinversion accumulation and demographic factor that determine the probability of hemiplasy. Thus, we view the estimates in Table S3, which can be as large as 20% of avian genomes, as approximate guidelines. Supplementa ry references Supplementary references are listed alphabetically and references that were also used in the main text are indicated with an asterisk. *Bonilla et al. 2010 see main text *Braun & Kimball 2001 see main text Brown JW, Payne RB, Mindell DP: N uclear DNA does not reconcile 'rocks' and 'clocks' in Neoaves: a comment on Ericson et al. Biol Lett 2007, 3 : 257 259. *Brown et al. 2008 see main text Chaisson et al. 2006 see main text *Chojnowski et al. 2008 see main text Clarke JA, Tambussi CP, No riega JI, Erickson G M, Ketcham R A: Definitive fossil evidence for the extant avian radiation in the Cretaceous. Nature 2005, 433 : 305 308 Cooper et al. 2003 see main text Crowe TM, Bowie RCK, Bloomer P, Mandiwana TG, Hedderson TAJ, Randi E, Pereira SL, W akeling J: Phylogenetics, biogeography and classification of, and character evolution in, gamebirds (Aves: Galliformes): effects of character exclusion, data partitioning and missing data. Cladistics 2006, 22 : 495 532 DeFilippis VR, Moore WS: Resolution of phylogenetic relationships among recently evolved species as a function of amount of DNA sequence: an empirical study based on woodpeckers (Aves: Picidae). Mol Phylogenet Evol 2000, 16 : 143 160. Ericson PGP, Anderson CL, Mayr G: Hangin' onto our rocks 'n c locks: A reply to Brown et al. Biol Lett 2007 3: 260 261.
6 Gaut BS: Molecular clocks and nucleotide substitution rates in plants. In Evolutionary Biology. Volume 30. Edited by Hecht MK, MacIntyre RJ, Clegg MT. New York: Plenum Press; 1998: 93 120 *Hackett et al. 2008 see main text *Han et al. 2011 see main text *Kim & Lee 2005 see main text Lunter G: Probabilistic whole genome alignments reveal high indel rates in the human and mouse genomes. Bioinformatics 2007 23: i289 i296. Murphy WJ, Pringle TH, Cride r TA, Springer MS, Miller W: 2007 Using genomic data to unravel the root of the placental mammal phylogeny. Genome Res 2007, 17: 413 421. Penny D, McComish BJ, Charleston MD, Hendy MD: Mathematical elegance with biochemical realism: The covarion model of mo lecular evolution J Mol Evol 2001, 53: 711 723. Sanderson MJ: A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 1997, 14: 1218 1231. S immons MP, Ochoterena H : Gaps as characters in sequence based phylog enetic analysis. Syst Biol 2000, 49: 369 381 Spinks PQ, Thomson RC, Lovely GA, Shaffer HB: Assessing what is needed to resolve a molecular phylogeny: simulations and empirical data from emydid turtles. BMC Evol Biol 2009, 9: 56. *Stamatakis 2006 see main te xt Walsh HE, Kidd MG, Moum T, Friesen VL: Polytomies and the power of phylogenetic inference. Evolution 1999, 53: 932 937. Webster MT, Smith NGC, Lercher MJ, Ellegren H: Gene expression, synteny, and local similarity in human noncoding mutation rates. Mol Biol Evol 2004, 21: 1820 1 8 30. Zuker M: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 2003, 31: 3406 3415.
7 A B C D E F G H I J K A B C D E F G H + + + + + + + + + + + + + + + + + + I + + + + + + + + + + + + + + + + + + J + + K + + ( a ) T axon A T axon B Taxon C Taxon D T axon E Taxon F Taxon G Taxon H Taxon I T axon J T axon K ( b ) Group 1 Group 1 Group 2 Matrix of pairwise complementary strand alignments Microinversion uniting taxa H and I Figure S 1 Microinversion search strategy using complementary strand alignments Hypothetical example of the microinversion search strategy we used ( a ) Cladogram showing a microinversion uniting taxa H and I. Informative microinversions will define two groups of taxa, the outgroup taxa (group 1) and members of the clade united by the microinversion (grou p 2). ( b ) Matrix showing the taxa with the potential to exhibit detectable pairwise complementary strand alignments D ivergence among taxa due to subsequence point mutations and indels is expected to erode sequence similarity in the inverted region; the ex pected "strength" of the complementary strand alignments is indicated using qualitative values ( +' to ++++' ) These values are included to provide an approximate guideline Specific mutations (e.g., deletions) have the potential to render some micro inver sions undetectable in closely related taxa even when they remain detectable when more distantly related taxa are compared
8 Figure S 2 Multiple sequence alignment showing the homoplastic microinversions in CLTC intron 6. I nverted sequences a re outlined in white and phylogenetic relationships based upon Hackett et al. (2008) are shown to the left of the alignment with well supported branches (defined as those with ML bootstrap support !80% in the Hackett et al.  study ) indicated with stars below the relevant branch. Following Figure 3 in the main text, inversion events are indicat ed on the phylogeny using green lines. Support for the branches separating the taxa with microinversions in the CLTC gene tree are presented below in Figure S2 b
9 ( a ) Figure S 3 Chronogram based upon the Hackett et al. (2008) phylogen y An ultrametric version of avian phylogeny presented as Figure 3 in Hackett et al. (2008) with branch lengths that reflect non parametric rate smoothing (Sanderson 1997) of the ML branch length estimate s The time scale was calibrated by assuming the basal split in Neoav es occurred 100 MYA This figure is divided into two parts: ( a ) Paleognathae, Galloanserae, and part of Neoaves; and ( b ) the remainder of Neoaves. This chronogram is available on the Early Bird website ( http://www.biology.ufl.edu/ earlybird ) in nexus format or upon request from ELB
10 ( b ) Figure S 3 Chronogram based upon the Hackett et al. (2008) phylogen y continued. Approximate divergence times for a subset of Neoaves (shorebirds [Charadriiformes] and the Hackett et al.  landbird clade ) are shown. See Table S1 for taxonomic details ; additional information about this chronogram is presented in the Figure S3 a legend (previous page)
11 ( a ) Figure S 4 The CLTC gene tree indicates the homoplast ic microinversions are unlikely to reflect hemiplasy. Estimate of CLTC gene tree ( ln L = 75153.652482) obtained using the GTR+ model in RAxML (Stamatakis 2006) with 25 randomized starting tree searches. ( a ) Phylogram showing relative branch lengths with the homoplastic microinversion indicated in green ( CLTC intron 6) and red ( CLTC intron 7). The names of larger clades are presented in color if all members have the inverted form of the sequence ; otherwise they are presented in black. The character state for the deepest branching member of P asseriform es ( Acanthisitta chloris ) is unclear because the relevant region of CLTC intron 7 is covered by large deletion. For simplicity, the inversion event i s presented on the branch uniting all other Passeriformes. Note that this gene tree requires five inversion events in CLTC intron 7 instead of the four events required on the Hackett et al. (2008) topology (see Figure 3). ( b ) Cladogram showing ML bootstrap support for groups in the CLTC gene tree (next page).
12 ( b ) Figure S 4 The CLTC gene tree indicates the homoplastic microinversions are unlikely to reflect hemiplasy, continued. ( b ) ML bootstrap consensus tree for CLTC presented as a cladogram. B oo tstrap support ( as a percentage of 500 replicates) is presented alongside branches, with values <50% omitted Microinversions are labeled on this tree in the same way they are in Figure S4 a (previous page)
13 Figure S 5 The homoplastic microinversions in CLTC do not appear to be associated with conserved stem loop structures. ( a ) Expected position of an inverted sequence (highlighted) given the stem loop model (e.g., Kim and Lee 2005), which predicts that homoplastic microinversions are associated with conserved stem loop. DNA mfold ( http://mfold.bioinfo.rpi.edu/cgi bin/dna form1.cgi ; Zuker  ) did not reveal stem loop structures in CLTC intron 6 of the ( b ) Great potoo ( Nyctibius grandis ), ( c ) Common loon ( Gavia immer ), or ( d ) Osprey ( Pandion haliaetus ). The structures pre sented here emphasize both the absence of conserved folding structures and the lack of correspondence between any possible stem loops and the inversions in N. grandis Gavia and Pandion Alternative folding structures predicted by mfold showed a similar l ack of correspondence with the position of the inversion S imilar results were obtained using the other homoplastic micro inversion and the overlapping micro inversions (data not shown).
14 T axon A T axon B Taxon C Taxon D T axon E Taxon F Taxon G Taxon H Taxon I T axon J T axon K Type II: (detectable ) Homoplastic or overlapping inversions in a single clade The taxa form two groups: Group 1: taxa A, H, I, J, and K Group 2: taxa B, C, D, E, F, and G ( b ) T axon A T axon B Taxon C Taxon D T axon E Taxon F Taxon G Taxon H Taxon I T axon J T axon K Type III : (difficult or impossible to detect ) Homoplastic or overlapping inversions on a single branch ( c ) ( d ) ( a ) T axon A T axon B Taxon C Taxon D T axon E Taxon F Taxon G Taxon H Taxon I T axon J T axon K Type I: (detectable ) Ho m opla st i c o r o v e r lapping in v e rs ion s in separate clades The taxa form two groups: Group 1: taxa A, E, F, G, J, and K Group 2: taxa B, C, D, H, and I Hypothetical sequence of events (type III): Inferred alignment of taxa G and H: anc1 A C T T A A T T G T C T A C C T T G A T G C A G taxon G A C T T A A T T G T C T A C C T T G A T G C A G anc2 A C T T A A T G T A G A C A C T T G A T G C A G taxon H A C T T A A T G T A A A G T G T C G A T G C A G anc3 A C T T A A T G T A A A G T G T C G A T G C A G | | | | | | | | | | | | | | | | | | Figure S 6 H omoplastic and overlapping microinversions. Homoplast ic and overlapping microinversions can be divided into three categories, two of which can be detected using the strategy outlined in the Methods. Both t ype I ( a ) and t ype I I ( b ) mciroinversions divide the taxa into at least two groups (similar to th e group s in Figure S1). There may be three groups if the overlap ping region is short With the exception of INV 13 (Table S2) where the most parsimonious reconstruct ion for Passeriformes, Psittaciformes, Falcondiae, and Cariamidae conforms to a type II distribut ion, all homoplastic and overlapping microinversions that we identified had a type I distribution ( c ) Type III microinversions are undetectable unless the overlap if very short although they are expected to be rare given the low estimates of MI ( d ) Exa mple showing the impact that overlapping microinversions might have upon sequence alignment. Inversions are indicated in this example by shading the affected nucleotides (colors indicate the nucleotides the inverted during each events indic a ted in part c ) This could render the microinversion undetectable even in the absence of point mutations and indels ; additional mutations will further complicate the detection of these microinversions. Type II microinversions may appear to be type III microinversions whe n taxon sampling is sparse.
15 Table S1 Taxa used for this analysis and accession numbers for novel sequences. This table is presented as an excel spreadsheet (Additional File 1). Table S2 Microinversions identified by this study. A complete list of the microinversions identified is presented below in landscape format. Table S3 Power of microinversions to define short internal branches. Amount of Sequence Data Needed (in Mb) a MI = 0.25 b MI = 0. 01 5 b Branch Length c (MY) !1 informative d !3 on a specific branch e !1 informative d !3 on a specific branch e 0.1 12 252 200 f 0.2 6 126 100 f 0.3 4 84 67 f 0.4 3 63 50 f 0.5 2. 4 50 40 839 0.6 2 42 33 700 0.7 1.7 36 29 600 0 .8 1. 5 31 25 525 0.9 1.3 28 22 466 1 1. 2 25 20 420 1.25 0.9 6 20 16 336 1.5 0. 8 17 13 280 1.75 0.68 14 11 240 2 0. 6 13 10 210 a This table presents the amount of sequence data needed to be 95% certain that the listed number of microinversions will be found on internal branch. b The microinversion rate ( MI ) presented as inversions Mb 1 MY 1 Estimates of MI used are 0.25 (from this study) and 0.015 (from Chaisson et al. 2006) The lower Chaisson et al. ( 2006) rate appears to reflect the failure to identify very short inversions, so it should be viewed as an estimate for the rate of longer microinversions (those longer than ~30 bp based upon the Ma et al.  size spectrum). c Length (in millions of years) assumed for the short branches in the avian tree of life. d Amount of sequence data needed to have a 95% probability of finding at least one parsimony informative microinversion on a short inte rnal branch in the avian tree of life. The amount of data needed to identify at least one microinversion on a specific short branch is 10 fold greater. e Amount of sequence data needed to have a 95% probability of finding at least three parsimony informati ve microinversions on a specific short internal branch in the avian tree of life f The amount of sequence data necessary to meet this criterion exceeds the size of some avian genomes.
16 Table S2 Microinversions identified by this study. INV # Locus Int ron Length (bp) a Inverted sequence found in: Identified bl2seq Identified "by eye" Notes ALDOB (total of 4 sites with inversions) 1 ALDOB 3 10 Vidua no yes Sequence is relatively divergent 2 ALDOB 5 6 Psittaciformes no yes 3 ALDOB 6 11 Podice ps no yes 4 ALDOB 7 6 Smithornis no yes CLTC (total of 14 sites b with inversions) 5 CLTC 6 19 22 Capito Megalaima Indicator Dryocopus yes no 6 CLTC 6 19 Phaethontidae yes no 7 CLTC 6 38 Anseranas yes yes 8 CLTC 6 24 Chauna yes no Overla ps with INV 9. 9 CLTC 6 29 Phaethontidae yes no Overlaps with INV 8. 10 CLTC 6 37 Pelecanus yes yes 11 CLTC 6 28 Nyctibius grandis Gavia Pandion yes yes Homoplastic b (see text). 12 CLTC 7 24 Coliiformes yes no 13 CLTC 7 19 29 Crax Anhinga Cariam a Falconidae, Passeriformes yes yes Homoplastic b (see text) and overlaps with INV 14 and INV 15. Character state unknown in Acanthisitta (Passeriformes) due to large deletion in this region. 14 CLTC 7 29 30 Crypturellus Tinamus yes yes Overlaps with INV 13 and INV 15. 15 CLTC 7 34 Oceanites yes yes Overlaps with INV 13 and INV 14. 16 CLTC 7 27 39 Galloanserae (Galliformes and Anseriformes) yes no 17 CLTC 7 30 31 Megapodidae ( Alectura Megapodius ) yes no 18 CLTC 7 20 28 Passeriformes yes no a The range of observed lengths for the inverted sequence is reported; this value is not an estimate of the ancestral length of the microinversion. b Microinversions were considered to have occurred at the same site only if both endpoints of the inversion appear ed identical based upon the length of the relevant complementary strand alignments Table S2 is continued on the next page.
17 Table S2 Microinversions identified by this study, continued. INV # Locus Intron Length (bp) Inverted sequence found in: Id entified bl2seq Identified "by eye" Notes CLTCL1 (total of 5 sites with inversions) 19 CLTCL1 7 20 Upupa yes no 20 CLTCL1 7 28 Chalcopsitta yes no 21 CLTCL1 7 22 Otididae ( Choriotis Eupodotis ) yes no Overlaps with INV 22. 22 CLTCL1 7 16 Cryp turellus yes no Overlaps with INV 21. 23 CLTCL1 7 16 22 Palaeognathae or Neognathae yes no Derived form of inversion unclear. EEF2 (total of 6 sites with inversions) 24 EEF2 5 18 Syrrhaptes yes no 25 EEF2 6 21 Urocolius yes no 26 EEF2 6 15 Cl imacteris yes no 27 EEF2 7 14 Pandion yes no 28 EEF2 8 17 Jacana yes no Overlaps with INV 29. 29 EEF2 8 23 Haematopus yes no Overlaps with INV 28. FGB (total of 4 sites with inversions) 30 FGB 5 16 Charadrius Haematopus Phegornis yes yes 3 1 FGB 5 5 Sulae ( Anhinga Morus Phalacrocorax ) no yes 32 FGB 6 32 Coturnix yes yes 33 FGB 7 20 Gallus yes yes GH1 (total of 3 sites with inversions) 34 GH1 3 9 12 Phasianidae ( Coturnix Gallus Numida Rollulus ) yes yes 35 GH1 3 36 Sapayoa yes no 36 GH1 3 13 Turnix yes no Table S2 is continued on the next page.
18 Table S2 Microinversions identified by this study, continued. INV # Locus Intron Length (bp) Inverted sequence found in: Identified bl2seq Identified "by eye" Notes H MGN2 (total of 4 sites with inversions) 37 HMGN2 2 25 Eurypyga Rhynochetos yes no 38 HMGN2 4 18 Struthio yes no 39 HMGN2 4 41 Mesitornis yes no 40 HMGN2 5 28 Arenaria yes yes IRF2 (total of 2 sites with inversions) 41 IRF2 2 14 Chalco psitta yes no Overlaps with INV 42. 42 IRF2 2 28 Upupa yes no Overlaps with INV 41. PCBD1 (total of 5 sites with inversions) 43 PCBD1 2 14 Tinamus yes no 44 PCBD1 3 38 Rhea yes no 45 PCBD1 3 24 Chauna yes no 46 PCBD1 3 18 Phegornis yes no 47 PCBD1 3 25 Tockus yes no TGFB2 (single site with an inversion) 48 TGFB2 5 10 Colius yes yes TPM1 (single site with an inversion) 49 TPM1 6 24 Trogon yes yes
RESEARCHARTICLE OpenAccessHomoplasticmicroinversionsandtheaviantree oflifeEdwardLBraun1*,RebeccaTKimball1,Kin-LanHan1,NaomiRIuhasz-Velez2,AmberJBonilla1,JenaLChojnowski1, JordanVSmith1,RauriCKBowie3,4,MichaelJBraun5,6,ShannonJHackett3,JohnHarshman3,7, ChristopherJHuddleston5,BenDMarks8,KathleenJMiglia9,WilliamSMoore9,SushmaReddy3,10, FrederickHSheldon8,ChristopherCWitt8,11andTamakiYuri1,5,12AbstractBackground: MicroinversionsarecytologicallyundetectableinversionsofDNAsequencesthataccumulateslowly ingenomes.Likemanyotherraregenomicchanges(RGCs),microinversionsarethoughttobevirtuallyhomoplasyfreeevolutionarycharacters,suggestingthattheymaybeveryusefulfordifficultphylogeneticproblemssuchas theaviantreeoflife.However,fewdetailedsurveysofthesegenomicrearrangementshavebeenconducted, makingitdifficulttoassessthishypothesisorunderstandtheimpactofmicroinversionsupongenomeevolution. Results: Wesurveyednon-codingsequencedatafromarecentavianphylogeneticstudyandfoundsubstantially moremicroinversionsthanexpectedbaseduponpriorinformationaboutvertebrateinversionrates,althoughthis islikelyduetounderestimationoftheseratesinpreviousstudies.Mostmicroinversionswerelineage-specificor unitedwell-acceptedgroups.However,somehomoplasticmicroinversionswereevidentamongtheinformative characters.Hemiplasy,whichreflectsdifferencesbetweengenetreesandthespeciestree,didnotexplainthe observedhomoplasy.Twospecificlociweremicroinversionhotspots,withhighnumbersofinversionsthat includedboththehomoplasticaswellassomeoverlappingmicroinversions.Neitherstem-loopstructuresnor detectablesequencemotifswereassociatedwithmicroinversionsinthehotspots. Conclusions: Microinversionscanprovidevaluablephylogeneticinformation,althoughpoweranalysisindicates thatlargeamountsofsequencedatawillbenecessarytoidentifyenoughinversions(andsimilarRGCs)toresolve shortbranchesinthetreeoflife.Moreover,microinversionsarenotperfectcharactersandshouldbeinterpreted withcaution,justaswithanyothercharactertype.Independentoftheiruseforphylogeneticanalyses, microinversionsareimportantbecausetheyhavethepotentialtocomplicatealignmentofnon-codingsequences. Despitetheirlowrateofaccumulation,theyhaveclearlycontributedtogenomeevolution,suggestingthatactive identificationofmicroinversionswillproveusefulinfuturephylogenomicstudies.BackgroundReconstructingtheevoluti onaryrelationshipsamong organismsandchangesintheirgenomesaremajor goalsofphylogenomics[1-3].Thecharacteristicsofgenomesthathavebeenusedtoreconstructevolutionary historyreflectthemultitudeofchangesthatarisedueto distinctmutationalmechanismsandaccumulateata varietyofrates(Figure1).Themostslowlyaccumulatingchanges,collectivelydesignatedraregenomic changes(RGCs),reflectaheterogeneoussetofmutationalprocesses.RGCsincludetransposableelement insertions(e.g.,Kriegsetal.),geneorderchanges, andadditionalless-studiedphenomena[6-8].Microinversionsareoneoftheserelativelypoorly-studied typesofRGCs. Despitethisheterogeneity,RGCsarethoughtto exhibitlesshomoplasy(evo lutionaryconvergenceand reversals)thannucleotide substitutions.Indeed, someRGCshavebeenviewedas perfect homoplasyfree(orvirtuallyhomoplasy-free)characters.EstablishingthatspecifictypesofRGCs,likemicroinversions, areperfectcharactersisimportantfortworeasons. *Correspondence:email@example.comDepartmentofBiology,UniversityofFlorida,Gainesville,FL32611,USA FulllistofauthorinformationisavailableattheendofthearticleBraun etal BMCEvolutionaryBiology 2011, 11 :141 http://www.biomedcentral.com/1471-2148/11/141 2011Braunetal;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommons AttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,andreproductionin anymedium,providedtheoriginalworkisproperlycited.
First,itwouldprovideinformationaboutthemutationalandevolutionaryprocessesthatunderlietheir accumulation,illuminatingprocessesthatcontributeto genomeevolution.Second,perfectRGCscouldprovide apracticalmeanstoassemblethetreeoflifebecause phylogeneticreconstructionisstraightforwardwhen homoplasyisabsent. EvenperfectRGCscanappearhomoplasticwhen foundingenomicregionswithanevolutionaryhistory incongruentwiththespeciestree[5,10].Theappearance ofhomoplasyduetoincomple telineagesorting,called hemiplasy,typicallyoccursintreeswithshortinternalbranches[12,13].However,rapidradiationswith shortinternalbranches("bushes or biologicalbig bangs )mayberelativelycommoneventsinthetreeof life[14,15].Thissuggest sthatanalysesofRGCdata shouldconsiderhemiplasyexplicitly. Microinversionsaredefinedascytologicallyundetectableinversions,althoughinpracticethesizerange considereddependsonthetypeofdataexaminedand methodusedfordetection.Feuketal.classified inversionsranginginsizefrom23basepairs(bp)to62 megabases(Mb)asmicroinversions,whereasMaetal. consideredallinversionsgreaterthan50kilobases (kb)tobe large inversionsratherthanmicroinversions. Thelowerlimitalsovaries,goingdownto4bp. Notsurprisingly,studiesusingwholegenomes(e.g., [1,16])haveidentifiedlargerinversions,whilephylogeneticstudies(oftenrestrictedtoasinglelocusorregion ofanorganellargenome)havetypicallyrevealedmuch smallermicroinversions(e.g.,[17-21]).Nonetheless,the sizespectrareportedforgenome-scaleandphylogenetic studiesoverlap,suggestin gthatbothtypesofstudies includeatleastsomeinversionsthatresultfromsimilar biologicalphenomena.Usingtheterm microinversion torefertoinversionsthatarelongenoughtoinclude oneormorecompletegenesseemsinappropriate,suggestingthatitshouldbereservedforshorterinversions. However,thiscriterionmaybedifficulttoapplyinpractice,sincethelengthofgenesexhibitssubstantialvariationamongorganismsandwithingenomes.The majorityofgenesare<50kbinlengthinmostvertebratelineages,suggestingthattheMaetal.sizecriterionmaybeappropriateandsimpletouse.Therefore, werecommendusing50kbasthemaximumsizefor microinversionsinmostvertebrategenomes,although wealsonotethatthemostappropriatesizecriterionis likelytodependuponthefocalorganism. Thehypothesisthatmicro inversionsandotherRGCs areperfectcharactersreflectsboththeirlargestatespace (numberofpotentialcharacterstates)andslowrateof accumulationoverevolutiona rytime,makingindependentchangestothesamestat eunlikely.Thestatespace fordifferentRGCswilldependuponthedetailsofeach typeofgenomicchange,butitseemslikelythatthestate spaceformicroinversionsislarge;theycanbeofavariety oflengthsandhaveanyspecificnucleotideforendpoints, makingitunlikelythatindependentmicroinversionswill appearidentical.Previousstudieshavealsosuggested thatmicroinversionsaccumulateataverylowrate (Figure1),althoughthisobservationmaybebiasedby thesizespectrumoftheinversionsthatwereidentified andconsideredtobemicroinversions.Maetal. reportedthatsmallermicroinversions(theyidentified inversionsasshortas31bp)occurmorefrequentlythan largerones.However,therateofaccumulationfor Figure1 Approximateratesofaccumulationfordifferentgenomicchangesoverevolutionarytime .Detailsoftheliteraturesurveyused toestimatetheseratesareprovidedinAdditionalfile2.Theestimateoftheavianmicroinversionratereflectstheresultsofthispaper.Estimates ofevolutionaryratesfornucleotidesubstitutionsandindelsinbirdsappearlowerthanthoseformammals,consistentwithsomeprevious publications,butitisimportanttonotethatsubstantialratevariationoccurswithineachgroup(e.g.,[27,60]).Asdescribedinthetext,it maybebettertointerpretpriorestimatesofthemammalianmicroinversionrateastherateatwhichrelativelylongmicroinversionsaccumulate. Braun etal BMCEvolutionaryBiology 2011, 11 :141 http://www.biomedcentral.com/1471-2148/11/141 Page2of10
inversionsthatareevenshorterthanthoseidentifiedby Maetal.remainsunclearandthesedifferences amongpreviousstudiesmakedirectcomparisonschallenging.Nonetheless,itseemscertainthatmicroinversionsaccumulateatleastseveralordersofmagnitude moreslowlythannucleotidesubstitutions.Thus,the hypothesisthatmicroinversionsareperfectcharacters thatwillbeveryusefulforassemblingthetreeoflife remainsreasonable. Themechanism(s)responsibleformicroinversion accumulationremainpoor lycharacterized,making empiricaltestsofthe perfectcharacterhypothesis for theserelativelypoorlystudiedRGCscritical.Indeed, homoplasticmicroinversio nshavebeenidentifiedin angiospermchloroplastgen omes[17,19],incontrastto expectationbasedupontheperfectcharacterhypothesis. Mostchloroplastmicroinversionsappeartobeassociatedwithpalindromicsequencesthathavethepotentialtoformstem-loopstructuresintranscripts[17,19] andthesepalindromesmayfacilitateinversion.Indeed, Catalanoetal.reportedthatmicroinversionsare correlatedwithhigherstabilityofthehairpinsthathave thepotentialtoformatthesestem-loopregions,in agreementwiththehypothesisthathairpinformation facilitatesinversion.Sincemanychloroplaststem-loop structureshaveregulatoryfunctions(e.g.,Sternetal. )theyaretypicallyconserved,creatingthepotential forrecurrentinversionsats pecificsites.Regulatory stem-loopsarepresentinvertebrateintrons(e.g.,Hugo etal.)andatleastonevertebratemicroinversion notedinavertebratephylogeneticstudywasassociated withaninvertedrepeat.However,conservedstemloopsappeartobeuncommoninvertebrateintrons whereaschloroplaststem-loopsarerelativelycommon [22,24].Thisdifferenceisco nsistentwiththeobservationthatfewanimalmicroinversionsappearhomoplastic[6,25].Indeed,allmicroi nversionsobservedinthose studieswereeitherhomoplasy-freeorconflictedwith shortbranches.Thus,thesmallnumberofanimal microinversionsthatappeartoconflictwiththespecies treebaseduponotherdatamayresultfromhemiplasy ratherthanhomoplasy.Thus,microinversionsinanimal nucleargenomesremaincandidatesfor idealRGCs abletorecoverbranchesingenetreesaccurately. Microinversionscanbedifficulttoidentify,making thestudyoftheseinterestingandphylogeneticallyuseful genomicchangeschallenging.Infact,~80%oftheinversionsidentifiedintheFeuketal.comparisonofthe humanandchimpanzeegenomeswerelatersuggested tobecontigassemblyartifacts.Thisproblemcanbe solvedbyrestrictingthetermmicroinversiontothe shortestpartoftheinversionspectrum,limitingthe maximumsizeofthemicroinversionstolessthanthe lengthofanindividualsequencingread(i.e.,focusingon inversionsthatare<400bpforSangersequencing). Comparingcloselyrelatedtaxaalsohasthepotentialto facilitatemicroinversionid entification.Indeed,most microinversionsidentifiedinacomparisonoffour mammaliangenomeswerefoundinthetwomostcloselyrelatedtaxa.Hereweusethesestrategiesto identifymicroinversionsinnon-codingregionsassociatedwith17locifrom169birds.Weexaminedvariationamonglociinthemicroinversionrate(hereafter abbreviated lMI),identifiedphylogeneticallyinformative andhomoplasticmicroinversions,andfoundevidence thatthenumberofmicroinversionshasbeenunderestimatedinpreviouslarge-scalestudies.MethodsSequencing,AlignmentandMicroinversionIdentificationWeprimarilyusedpublisheddata[26-28],although somenovel CLTCL1 sequencesweregeneratedusing theprimersandPCRconditionsfromKimballetal. (fordetails,seeAdditionalfile1).Forthisstudy,we focusedonshortersequenceswithextensivetaxonsampling(Table1)insteadofcompletegenomicsequences [26-28].Sequenceswerealignedmanually,sometimes startingfromanalignmentproducedinanautomated manner(i.e.,usingClustalorMAFFT).Alignmentswererefinediterativelywithinputfromatleast twodifferentindividuals.Duringthisprocessalignments wereexaminedcarefully;thisresultedintheidentificationofanumberofmicroinversions byeye (Additionalfile2,TableS2). Microinversionswerealsoidentifiedbyacomputationalmethodthatcombinedthemultiplesequence alignmentswiththeresultsofcomplementarystrand alignmentsforallpairsofsequences(Additionalfile2, FigureS1).Thepairwisecomplementarystrandalignmentsweregeneratedusingbl2seqandYASS andmappedontothemultiplesequencealignments usingaprogramwrittenbyELB.Thisprogramsaveda tablethatincludedthefirstandlastpositionsofeach pairwisecomplementarystrandalignmentinthemultiplesequencealignmentandhighlightedtheoverlappingpairwisecomplementarystrandalignments(an exampleispresentedinAdditionalFile3alongwitha descriptionofthealgorithminpseudocode).Microinversionsareexpectedtoresultincomplementary strandalignmentsthateitheroverlaporarelocated neareachotherinthesequencealignment.Thepresenceorabsenceofmicroinversionsateachposition identifiedasasignificantcomplementarystrandhit involvingsequencesthatwe reoverlappingorlocated neareachotherinthemultiplesequencealignment wasthenvalidatedbyvisualinspection.Microinversion endpointswereassignedbaseduponthelengthofthe complementarystrandalignments,althoughtherewereBraun etal BMCEvolutionaryBiology 2011, 11 :141 http://www.biomedcentral.com/1471-2148/11/141 Page3of10
somecaseswhereinversionendpointsweredifficultto identify(e.g.,Figure2).Validatingmicroinversions shorterthan5bpwasdifficult,sothatwastheminimumsizeconsidered. TheDNAmfoldserver(http://mfold.bioinfo.rpi.edu/cgibin/dna-form1.cgi;)wasusedtosearchforstemloopstructures,andtheM EMEserver(http://meme. sdsc.edu/meme4_4_0/intro.html)wasusedtosearchfor sequencemotifsthatmightbeassociatedwith inversions.PatternsandRatesofMicroinversionEvolutionMicroinversionswerecodedasbinarycharacters,and PAUP*4.0b10wasusedtocalculatenumbersof inversioneventsusingmaximum-parsimony(MP)and theHackettetal.topology. lMIwasexpressedas microinversionsMb-1MY-1tofacilitatecomparisonto otherstudies.Thenullhypothesisofequalgenomewidemicroinversionrateswastestedasdescribedby Hanetal..Briefly,aglobalPoissonmodel(which assumesequalgenome-widerates)wasusedasthenull hypothesis,andthefitofthatnullmodelwascompared tothatofthemoregeneralnegativebinomial(NB) model(whichpermitsvariationin lMI)usingalikelihoodratiotest(LRT).SeeAdditionalfile2fordetails.PhylogeneticAnalysesPhylogeneticanalysesofthe CLTC alignment,conductedtoprovideanestimateofthe CLTC genetree, usedRAxML7.0.4.Micro inversionsandsiteswith gapsand/ormissingdatainmorethan50%oftaxawere excludedbeforeconductingtheRAxMLsearch.See Additionalfile2fordetails.ResultsandDiscussionManyAvianMicroinversionswereIdentifiedManualandautomatedsearchesrevealedthatnon-codingregionsassociatedwith11ofthe17lociweexaminedcontainedmicroinversio ns(e.g.,Figure2)ranging from5bpto38bp(Additionalfile2,TableS2).Their medianlengthwas22bp.Anumberofthemicroinversionsidentifiedhereweremuchshorterthanthose reportedingenome-scalecomparisonsofmammals Figure2 Exampleofamicroinversion .(a)Aconservedregionin TPM1 intron6witha24bpmicroinversion(outlinedinwhite)in Trogon personatus .( b )Invertingthe Trogon sequence(indicatedinlower-case)resultsinasequenceidenticalto Pharomachrusauriceps ,itssistertaxon inthetree. Table1Estimatesofthemicroinversionrate( lMI)for differentlociLocusChraMean Noncoding Length (bp) Treelength (MY)b#of InversionscEstimated Rate( lMI) (inversions Mb-1MY-1) CLTCL1 15360889051.58 CLTC 1913109280191.56 PCBD1 6800915050.68 HMGN2 231340540040.55 EEF2 281210923060.54 IRF2 4600909020.37 GH1 271030909030.32 ALDOB Z1450885040.31 TPM1 10450809010.28 FGB 42070936040.21 TGFB2 3560936010.19 CRYAA 1930874000 EGR1 13490d89700 0 MB 168091900 0 MUSK Z51088100 0 MYC 2620d92400 0 RHO 12119089900 0 Overall 15600 54 0.39 Excluding hotspotse13930 300.25aChromosomallocationinthechicken( Gallusgallus ).bSumofthebranchlengthsafterratesmoothinginmillionsofyears(MY). Divergencetimeswerecalibratedbyassumingofamid-Cretaceous(~100 MYA)originofNeoaves.Differencesamonglocireflecttheamountsof missingdata.cThenumberofinversioneventsbasedupontheMPcriterion.dThenon-codingportionsoftwoloci( EGR1 and MYC )include820bpof3 UTR.All EGR1 non-codingsequenceis3 UTRandabouthalf(330bp)of MYC non-codingsequenceis3 UTR.eCLTC and CLTCL1 wereexcludedforthisestimate.Braun etal BMCEvolutionaryBiology 2011, 11 :141 http://www.biomedcentral.com/1471-2148/11/141 Page4of10
[1,16],wherethesmallestmicroinversionswere23bp and31bp,respectively.Althoughitispossiblethat birdsandmammalshavedistinctmicroinversionsize spectra,itseemsmorelikelythatthelarge-scalesurveys ofmammaliandatafailedtoidentifytheshortest microinversions. If lMIwassimilarinbirdsandmammals,fewerthan fourmicroinversionswouldbeexpectedgiventhe amountofsequencedataexamined;instead,microinversionswereidentifiedat49positions(Table1).Maetal. reportedthatshortinversionsaremorecommon thanlonginversions.Ifthispatterncontinuesasmicroinversionsbecomeevenshorterthanthosetheyidentified,thelargernumberofmicroinversionsthatwe observedcouldreflectouridentificationofsmaller inversionsratherthananyinherentdifferencebetween mammalianandaviangenomes.Thedensertaxonsamplinginourstudy,relativetowholegenomestudiesin mammals,isalsolikelytohaveimprovedmicroinversion identification.Takenasawhole,ourresultssuggestthat previousstudiesthatusedmammaliandata[1,6]underestimated lMI. Theidentificationofmicroinversionscanbedifficult becausepointmutationsandinsertion-deletionevents (indels)continuetoaccumulateafterinversions.This hasthepotentialtomakeancientmicroinversionsparticularlydifficult,orimpossible,toidentify.Densertaxon samplingcanhelpbyincreasingthenumberof sequencescloselyrelatedtothosewiththemicroinversionandbyprovidingmultipleversionsoftheinverted sequence(Additionalfile2,FigureS1).Althoughthe taxonsamplingforthisstudywasdenserthanprevious surveysthatusedmammaliandata,computational searchesformicroinversio nsweredifficult.Manycomplementarystrandalignmentswerenotvalidatedas actualinversions;thefalsepositivesreflectedpalindromesandotherphenomena.bl2seqperformedbetter thanYASS,producingfewerfalsepositiveswhilestill identifyingallofthemicroinversionsalsofoundby YASS.However,evenafteremployingtwocomputationalapproaches,somemicroinversionswereonlyidentified byeye (Additionalfile2,TableS2),suggesting thatfurtherimprovementstothemethodsusedtoidentifymicroinversionsarerequired. Mostmicroinversionswereassignedtoterminal branchesintheHackettetal.phylogeny(Figure3) whentheMPcriterionwasused.Thisraisesthequestionofwhetheranacquisitionbiascausedustomissa numberofancientmicroinversionsthatoccurredcloser tothebaseofthetree.However,thestructureofthe aviantreeoflifeisdominat edbyarapidradiationat thebaseofNeoaves,themostspecioseaviansupergroup (identifiedinFigure3),leadingtoatreedominatedby terminalbranches.Indeed,70.8%oftheoverall treelengthintheHackettetal.MLtreecomprises terminalbranches.Thenumberofmicroinversions observedonterminalbrancheswasnotsignificantlydifferentfromexpectationgiventheproportionofthetree thatreflectedinternalandterminalbranches( c2=3.0; P =0.08).Thus,acquisitionbiasdidnothaveamajor impactuponourabilitytoidentifyancientinversions.AvianMicroinversionRatesVaryAmongLociEstimatesof lMIdifferamongloci(Table1).ThePoissonmodelofmicroinversionaccumulation(thenull hypothesis)wasrejected infavouroftheNBmodel (whichincludesratevariation)usingtheLRT(2 ln L= 27.55; P <10-6).Excludingthehighest-rateloci(CLTC and CLTCL1 )eliminatedourabilitytorejectthePoisson model(2 ln L=2.29; P =0.13)andreducedthe lMIestimateto0.25microinversionsMb-1MY-1(thevalue presentedinFigure1;95%confidenceintervalof0.170.36).Thissuggestsa hotspot modelinwhich CLTC and CLTCL1 areinversion-prone.However,eventhe lowerestimateof lMIfor non-hotspot locigreatly exceededpreviousestimatesof lMI,consistentwithour hypothesisthattheidentificationofmicroinversions, especiallytheshortestinversions,hasbeenimproved relativetopriorstudies. Surprisingly,bothhotspot lociencodeclathrinheavy chains,whichareproteinscri ticalforendocytosis, suggestingthatthehighmicroinversionratescould reflecttheirfunctionalsimil arities.However,theseclathrinheavychainparalogsarosebyduplicationearlyin vertebrateevolution, andthehomologousintrons in CLTC and CLTCL1 donotexhibitdetectable sequencesimilarity.Althoughspecificintronicmotifs canbeoverrepresentedinfunctionallyrelatedgenes ,motifscommontothe CLTC and CLTCL1 introns werenotidentified(datanotshown).Thissuggeststhat itwillbenecessarytoidentifyadditionalhotspotlocito understandthebasisforinversionhotspots. Microinversionswereabsentinsomeloci(Table1), butitisunclearwhetherthisreflectsstochasticvariation ortheexistenceof coldspots .3 UTRsarecoldspot candidatesbecausetheyexhibitalowerrateofsequence evolutionthanintrons[29,41]andtheyareknownto includeregulatoryelements.Manyoftheseregulatorysequencesarenon-palindromic[43,44]andare unlikelytoremainfunctionalafterinversion.Twoto threemicroinversionswereexpectedinour3 UTRdata (assumingequalratesfornon-hotspotloci),butnone wereidentified.Weexamined3 UTRsfromfiveadditionalloci( ALDOB CRYAA EEF2, HMGN2 ,and PCBD1 ),fourofwhichhaveintronicmicroinversions (Table1),byexamining23membersoftheavianorder Galliformes.A36bpmicroinversionispresentin the RollulusroulroulPCBD1 3 UTR,indicatingthatBraun etal BMCEvolutionaryBiology 2011, 11 :141 http://www.biomedcentral.com/1471-2148/11/141 Page5of10
Figure3 MicroinversionsindicatedontheHackettetal . phylogeny .Inversionsinintronsareindicatedwithtickmarks(blueforno homoplasy,greenforthehomoplasticinversionsin CLTC intron6,andredforthehomoplasticinversionsin CLTC intron7).The3 UTRinversion the PCBD1 ,whichwasobtainedfromselectedgalliform(seeResultsandDiscussion),isindicatedwithabluediamond.Thismappingof characterstatechangesassumesareversaltotheancestralstateinPsittaciformesforthe CLTC intron7microinversion(indicatedbyanXover theredtickmark).Aninversionin CLTCL1 wherePalaeognathaeandNeognathaedifferisshownalongtherootbranch.Ordersunitedby microinversionsareindicatedusingnamesabovethebranchunitingthemandbracketstotheright.TheorderGalliformesisemphasized because3 UTRsweresequencedfromadditionaltaxainthatorder(seetext).Thisphylogenyispresentedasacladogrambecausemany internalbranchesareveryshortandthispresentationmakesiteasiertolocatetheinversionevents.Forbranchlengthinformationreferto Figure3inHackettetal.andthechronogrampresentedforthispublication(Additionalfile2,FigureS3). Braun etal BMCEvolutionaryBiology 2011, 11 :141 http://www.biomedcentral.com/1471-2148/11/141 Page6of10
theseregionsarenotabsolutelyrefractorytomicroinversions.Thus,futuresurveysshouldinclude3 UTRsto improve lMIestimatesforthoseregionsandestablish whethertheyexhibitamong-locusratevariationsimilar tointrons.HomoplasticandOverlappingMicroinversionsExistTwomicroinversionsin CLTC appearedhomoplastic becausetheinvertedformswerepresentindivergent lineages(e.g.,AdditionalFile2,FigureS2).Thesehomoplasticmicroinversions requiredatleastthree( CLTC intron6)orfour( CLTC intron7)changesontheHackettetal.phylogenyusingtheMPcriterionto explaintheobserveddistributionofcharacterstates (Figure3).Errorsinthephylogenyareunlikelyto explainthisobservation,sincetherelevantbranchesare wellsupported(compareFigure3toFigure2ofHackett etal.;alsoseeAdditionalFile2,FigureS2).Moreover,whenthesemicroinversionsweremappedonother recentestimatesofavianphylogenyusingtheMPcriteriontheyrequiresimilarl evelsofhomoplasy.These otherestimatesofphylogenyarebaseduponnuclear [26,45],mitochondrial[46-48],andmorphologicaldata [49,50],aswellasexpertopinion(e.g.,Figure27.10in Cracraftetal.andFigure5inMayr). Hemiplasyisunlikelytoexplaintheobservedhomoplasticmicroinversionsfortworeasons.First,hemiplasy wouldrequiremaintenanceofpolymorphicinversions overmultiple,longinterna lbranches(estimatesof branchlengthsarepresentedasachronograminAdditionalFile2,FigureS3).Second,theestimateofthe CLTC genetreewasnotconsistentwiththemicroinversiondistribution(Additionalfile2,FigureS4),evenin thesinglecaseinwhichbranchlengthsareshort enoughthathemiplasyisplausible.Thus,the CLTC inversionsreflectgenuinehomoplasy,nothemiplasy,a novelfindingformicroinversionsinanimalnuclear genomes. Inadditiontothehomoplasticmicroinversionsin CLTC ,wealsofoundseveraloverlappingmicroinversions(Additionalfile2,TableS2).Alloftheseoverlappingmicroinversionsreflectedindependentinversionsin distinctlineages.Weidentifiedtwooverlappingmicroinversionsin CLTC andonein CLTCL1 ;thetwooverlappingmicroinversionsin CLTC (INV-14andINV-15;see Additionalfile2,TableS2)alsooverlappedwithoneof thehomoplasticmicroinversionsin CLTC (INV-13). Thus,therewereatleast12inversioneventsinfour specificregionsofthetwohotspotloci.Therewerealso twoadditionaloverlappinginversionsinlow-rateloci ( EEF2 and IRF2 ).Neitherthehomoplasticnortheoverlappingmicroinversionswereassociatedwithstem-loop motifs(e.g.,Additionalfile2,FigureS4)oranyother motifsthatcouldbeidentifiedusingMEME.These homoplasticandoverlappingmicroinversionsindicate thattheactualstatespaceformicroinversionsislikely tobesmallerthantheirpotentialstatespace.AreMicroinversionsusefulforPhylogenetics?Althoughtheexistenceofhomoplasticmicroinversions demonstratesthattheyarenotperfectcharacters,they stillhavethepotentialtobeusefulphylogeneticmarkers.Theretentionindex ofmicroinversions(RIMI= 0.949)giventheHackettetal.treeissubstantially higherthantheretentionindexfornucleotidechanges (RIintron=0.52,RIcodingexon=0.54,RIUTR=0.58).Such lowamountofhomoplasysuggeststhatanappropriate analyticalapproach(thataccommodateshomoplasyand hemiplasy)shouldyieldanaccuratespeciestreegivena sufficientnumberofinversions. BranchesatthebaseofNeoavesareveryshortand thisradiationisaclassicexampleofa bush phylogeny.Infact,thebaseofNeoaveshasevenbeen suggestedtobea hard polytomy.Hardpolytomiesreflectgenuinemultiplespeciationevents,so theycannotberepresentedasbifurcatingtrees.Evenif Neoavesisa soft polytomy,manybranchesarelikely tobe<1MYinlength(AdditionalFile2,FigureS3; alsosee[26,45]).Thelowestimatesof lMIimplythat microinversionswillseldomoccuralongtheseshort branches.Howmuchsequencedatawouldbenecessarytoresolveinternodeso fthislengthusingmicroinversions?Poweranalysisassuming1MYbranch lengthsusingtherateestimatethatexcludesthehotspotlociindicates~1.2Mbpofnon-coding sequencepertaxonisneededtofindatleastoneinformativeinversionand~12Mbppertaxontoidentifyan inversiononaspecificbranch(Additionalfile2,Table S3).Thisestimateisordersofmagnitudelargerthan theamountneededforofconventionalanalysesof sequencedata(cf.Chojnowskietal.).Moreover,it isdesirabletoidentifymultipleinformativeinversions alonginternodesgiventhepotentialforhemiplasyand homoplasy,suggestingthattheuseofmicroinversions asthesolesourceofinformationtoestimateaphylogenysimilartotheaviantreeoflifewouldrequire evenmoredata(Additionalfile2,TableS3).MicroinversionsandMultipleSequenceAlignmentTheidentificationofmicroinversionsisalsoimportant toensurecorrectsequencealignment.Otherwiseestimatesoftheamountofevolutionarychangewillbedistorted,potentiallyresultinginincorrectphylogenetic estimation.Algorithmsforsequencealignmentthat includethepossibilityofinversionshavebeenproposed [55-57],andtheyhavethepotentialadvantageofincorporatingexplicitpenaltiesforinversionevents.However, theoptimalinversionpenaltytolimitfalsepositivesmayBraun etal BMCEvolutionaryBiology 2011, 11 :141 http://www.biomedcentral.com/1471-2148/11/141 Page7of10
bedifficulttodetermineandtheavailablealgorithmsare limitedtotheidentificationofnon-overlappingmicroinversions.Overlappingmicroinversionswerefoundat fourlocithatweexamined,suggestingthattheinability toidentifyoverlappinginversionsmayrepresentamajor limitation.Overlappingandhomoplasticmicroinversions canbedividedintothreebasiccategories(Additional file2,FigureS6),andthestrategyweemployedshould beabletodetecttwoofthesecategoriesefficiently.The thirdcategory(typeIIIinAdditionalfile2,FigureS6, whichcorrespondstothecaseofmultiplehomoplastic oroverlappinginversioneventsonasinglebranch)is expectedtoberare.Itmaybepossibletoovercomethis probleminamultipleseque ncealignmentframework usingadivide-and-conquerapproachbyselectingsubsetsoftaxaforwhichoverla ppingmicroinversionsare lesslikelytobepresent.Thiswouldnecessitateasubsequentassemblyofthealignments.Moreover,suchan approachmighteliminatethebenefitsofdensetaxon sampling.Despitetheselimitations,fullyautomated approachescouldbelesslabourintensivethanour approach.However,itisunclearwhethermicroinversion identificationcanbefullyautomatedsinceourresults suggestthatshortmicroinversionsmayalwaysrequire manualvalidation.Takenasawhole,theseissues furtheremphasizetheneedtocontinuetoimprovealgorithmsforthedetectionandalignmentoftheseinterestinggenomicchanges.ConclusionsTheseanalysesdemonstratethattheidentificationof microinversionsisimportant,despitetherelativelylow rateofaccumulationofthesegenomicchanges.This studyrevealedthatmicroinversionsaccumulatemore rapidlyinaviangenomesthanexpectedbasedupon prioranalysesofmammaliangenomes,althoughthisdifferenceislikelytoreflectthefailuretoidentifyvery shortinversionsinthelarge-scalecomparisonsofmammaliandata.Ifthisfailuretoidentifyshortmicroinversiondoesexplainthedifferencesamongthisand previousstudies,theestimatesof lMIpresentedhere, whicharesimilartotherateofaccumulationofthe mostcommontypeofavianTEinsertion(Figure1), maybemoretypicalofvertebrategenomes.Thislikelihoodthattypicalvertebrate lMIvaluesmaybehigher thansuggestedbypreviousstudiesemphasizesthe importanceofunderstandin gtheimpactofmicroinversionsupongenomeevolution.Wealsodocumentedthe existenceofmicroinversionhotspots,suggestingthat someregionsofthegenomeareespeciallyproneto thesemutations.Theidentificationofadditionalhotspotsmayprovideinformationaboutthemechanistic basisofthesemutations.Indeed,wewereableto excludeoneproposedmechanism,theexistenceof conservedstem-loops,baseduponanexaminationof theinversionhotspotsidentifiedhere.Despiteour observationthatmicroinversionscanexhibithomoplasy, theyarestillrelativelyreliableRGCsandassuchmay definegenetreebipartitionsmoreaccuratelythanconventionalsequencedata(seeNishiharaetal.).In thefuture,analyticalmethodsthatintegratemicroinversionswithsequencedataandinformationaboutother RGCs(andincorporatethepotentialforbothhemiplasy andhomoplasy)willfacilitaterobustresolutionofdifficultnodesinthetreeoflifeandprovideadditional insightsintothemechanism(s)responsiblefortheir accumulationoverevolutionarytime.AdditionalmaterialAdditionalfile1:Taxonlist .Listofthetaxausedforthisanalysisand theaccessionnumbersforthenovel CLTCL1 sequencescollectedforthis study,inMicrosoftExcelformat. Additionalfile2:Supplementaryinformation .Sixfigures,threetables, andsupplementarymethods(includingthedetailsoftheliterature surveyusedtoestimatetheratesofvarioustypesofgenomicchanges andthepoweranalysisdescribedinthemaintext),inpdfformat. Additionalfile3:Detailsofamicroinversionsearch .Anexampleofa microinversionsearch(of TPM1 intron6)ispresentedalongwitha descriptionofthesearchalgorithmusingpseudocode,inMicrosoftExcel format. Acknowledgements WethankClareRittschofandmembersoftheKimball-Braunlabgroupfor helpfulcommentsonthismanuscriptandwearegratefultothemuseums andcollectors(Additionalfile1)forloaningsamples.Thisworkwas supportedbytheU.S.NationalScienceFoundationAssemblingtheTreeof Lifeprogram,grantsDEB-0228682(toRTK,ELB,andDavidWSteadman), DEB-0228675(toSJH),DEB-0228688(toFHS),andDEB-0228617(toWSM). PublicationofthisarticlewasfundedinpartbytheUniversityofFlorida Open-AccessPublishingFund. Authordetails1DepartmentofBiology,UniversityofFlorida,Gainesville,FL32611,USA.2DepartmentofMathematics,UniversityofFlorida,Gainesville,FL32611, USA.3ZoologyDepartment,FieldMuseumofNaturalHistory,1400S. LakeshoreDrive,Chicago,IL60605,USA.4MuseumofVertebrateZoology andDepartmentofIntegrativeBiology,UniversityofCalifornia,Berkeley, Berkeley,CA94720,USA.5DepartmentofVertebrateZoology,Smithsonian Institution,4210SilverHillRoad,Suitland,MD20746,USA.6Behavior, Ecology,Evolution,andSystematicsProgram,UniversityofMaryland,College Park,MD20742,USA.74869PepperwoodWay,SanJose,CA95124,USA.8MuseumofNaturalScienceandDepartmentofBiologicalSciences,119 FosterHall,LouisianaStateUniversity,BatonRouge,LA70803,USA.9DepartmentofBiologicalSciences,WayneStateUniversity,5047Gullen Mall,Detroit,MI48202,USA.10BiologyDepartment,LoyolaUniversity Chicago,Chicago,IL60626,USA.11DepartmentofBiologyandMuseumof SouthwesternBiology,UniversityofNewMexico,Albuquerque,NM87131, USA.12SamNobleOklahomaMuseumofNaturalHistory,Universityof Oklahoma,Norman,OK73072,USA. Authors contributions ELBdesignedthestudy,wrotemanyofthecomputerprograms,conducted analyses,andwrotethemanuscript.RTKhelpeddesignthestudy,validated computationalmicroinversionsearches,andhelpeddraftthemanuscript.KLHmanuallyidentifiedthehomoplasticmicroinversionsandsome overlappingmicroinversions.NRI-VcontributedtoanalysesandwroteaBraun etal BMCEvolutionaryBiology 2011, 11 :141 http://www.biomedcentral.com/1471-2148/11/141 Page8of10
novelcomputerprogram.AJB,JLC,JVS,RCKB,MJB,JH,SR,CCW,andTY helpedtodraftandrevisethemanuscript.Allauthors(exceptNRI-V) contributedtothesequencedataanalysed,helpedconstructandedit alignments,andmanuallysearchedthealignmentsformicroinversions.All authorsreadandapprovedthefinalmanuscript. Received:26October2010Accepted:25May2011 Published:25May2011 References1.MaJ,ZhangL,SuhBB,RaneyBJ,BurhansRC,KentWJ,BlanchetteM, HausslerD,MillerW: Reconstructingcontiguousregionsofanancestral genome. GenomeRes 2006, 16 :1557-1565. 2.RascolVL,PontarottiP,LevasseurA: Ancestralanimalgenomes reconstruction. CurrOpinImmunol 2007, 19 :542-546. 3.LevasseurA,PontarottiP,PochO,ThompsonJD: Strategiesforreliable exploitationofevolutionaryconceptsinhighthroughputbiology. Evol BioinformOnline 2008,, 4: 121-137. 4.KriegsJO,ChurakovG,KiefmannM,JordanU,BrosiusJ,SchmitzJ: Retroposedelementsasarchivesfortheevolutionaryhistoryof placentalmammals. PLoSBiol 2006, 4 :e91. 5.BooreJL: Theuseofgenome-levelcharactersforphylogenetic reconstruction. TrendsEcolEvol 2006, 21 :439-446. 6.ChaissonMJ,RaphaelBJ,PevznerPA: Microinversionsinmammalian evolution. ProcNatlAcadSciUSA 2006, 103 :19824-19829. 7.KraussV,ThmmlerC,GeorgiF,LehmannJ,StadlerPF,EisenhardtC: Near intronpositionsarereliablephylogeneticmarkers:Anapplicationto holometabolousinsects. MolBiolEvol 2008, 25 :821-830. 8.RogozinIB,ThomsonK,CsrsM,CarmelL,KooninEV: Homoplasyin genome-wideanalysisofrareaminoacidreplacements:ThemolecularevolutionarybasisforVavilov slawofhomologousseries. BiolDirect 2008, 3 :7. 9.RokasA,HollandPWH: Raregenomicchangesasatoolfor phylogenetics. TrendsEcolEvol 2000, 15 :454-459. 10.HillisDM: SINEsoftheperfectcharacter. ProcNatlAcadSciUSA 1999, 96 :9979-9981. 11.AviseJC,RobinsonTJ: Hemiplasy:anewterminthelexiconof phylogenetics. SystBiol 2008, 57 :503-507. 12.PamiloP,NeiM: Relationshipsbetweengenetreesandspeciestrees. Mol BiolEvol 1988, 5 :568-583. 13.MooreWS: InferringphylogeniesfrommtDNAvariation:Mitochondrialgenetreesversusnuclear-genetrees. Evolution 1995, 49 :718-726. 14.RokasA,CarrollSB: BushesintheTreeofLife. PLoSBiol 2006, 4 :e352. 15.KooninEV: TheBiologicalBigBangmodelforthemajortransitionsin evolution. Biol Direct 2007, 2 :21. 16.FeukL,MacDonaldJR,TangT,CarsonAR,LiM,RaoG,KhajaR,SchererSW: Discoveryofhumaninversionpolymorphismsbycomparativeanalysis ofhumanandchimpanzeeDNAsequenceassemblies. PLoSGenet 2005, 1 :e56. 17.KelchnerSA,WendelJF: Hairpinscreateminuteinversionsinnon-coding regionsofchloroplastDNA. CurrGenet 1996, 30 :259-262. 18.HarshmanJ,HuddlestonCJ,BollbackJP,ParsonsTJ,BraunMJ: Trueand falsegharials:Anucleargenephylogenyofcrocodylia. SystBiol 2003, 52 :386-402. 19.KimKJ,LeeHL: Widespreadoccurrenceofsmallinversionsinthe chloroplastgenomesoflandplants. MolCells 2005, 19 :104-113. 20.KimballRT,BraunEL: AmultigenephylogenyofGalliformessupportsa singleoriginoferectileabilityinnon-featheredfacialtraits. JAvianBiol 2008, 39 :438-445. 21.CatalanoSA,SaidmanBO,VilardiJC: Evolutionofsmallinversionsin chloroplastgenome:acasestudyfromarecurrentinversionin angiosperms. Cladistics 2009, 25 :93-104. 22.SternDB,JonesH,GruissemW: FunctionofplastidmRNA3 inverted repeats.RNAstabilizationandgene-specificproteinbinding. JBiolChem 1989, 264 :18742-18750. 23.HugoH,CuresA,SuraweeraN,DrabschY,PurcellD,MantamadiotisT, PhillipsW,DobrovicA,ZupiG,GondaTJ,IacopettaB,RamsayRG: MutationsintheMYBintronIregulatorysequenceincrease transcriptionincoloncancers. GenesChromosomesCancer 2006, 45 :1143-1154. 24.RottR,LiveanuV,DragerRG,SternDB,SchusterG: Thesequenceand structureofthe3 -untranslatedregionsofchloroplasttranscriptsare importantdeterminantsofmRNAaccumulationandstability. PlantMol Biol 1998, 36 :307-314. 25.MacdonaldSJ,LongAD: Finescalestructuralvariantsdistinguishthe genomesof Drosophilamelanogaster and D.pseudoobscura GenomeBiol 2006, 7 :R67. 26.ChojnowskiJL,KimballRT,BraunEL: Intronsoutperformexonsinanalyses ofbasalavianphylogenyusingclathrinheavychaingenes. Gene 2008, 410 :89-96. 27.HackettSJ,KimballRT,ReddyS,BowieRCK,BraunEL,BraunMJ, ChojnowskiJL,CoxWA,HanKL,HarshmanJ,HuddlestonC,MarksBD, MigliaKJ,MooreWS,SheldonFH,SteadmanDW,WittCC,YuriT: A phylogenomicstudyofbirdsrevealstheirevolutionaryhistory. Science 2008, 320 :1763-1768. 28. HarshmanJ,BraunEL,BraunMJ,HuddlestonCJ,BowieRCK,ChojnowskiJL, HackettSJ,HanKL,KimballRT,MarksBD,MigliaKJ,MooreWS,ReddyS, SheldonFH,SteppanSJ,WittCC,YuriT: Phylogenomicevidencefor multiplelossesofflightinratitebirds. ProcNatlAcadSciUSA 2008, 105 :13462-13467. 29.KimballRT,BraunEL,BarkerFK,BowieRCK,BraunMJ,ChojnowskiJL, HackettSJ,HanKL,HarshmanJ,Heimer-TorresV,HolznagelW, HuddlestonCJ,MarksBD,MigliaKJ,MooreWS,ReddyS,SheldonFH, SmithJV,WittCC,YuriT: Awell-testedsetofprimerstoamplifyregions spreadacrosstheaviangenome. MolPhylogenetEvol 2009, 50 :654-660. 30.ChennaR,SugawaraH,KoikeT,LopezR,GibsonTJ,HigginsDG, ThompsonJD: MultiplesequencealignmentwiththeClustalseriesof programs. NucleicAcidsRes 2003, 31 :3497-3500. 31.KatohM,KumaM: MAFFT:anovelmethodforrapidmultiplesequence alignmentbasedonfastFouriertransform. NucleicAcidsRes 2002, 30 :3059-3066. 32.TatusovaTA,MaddenTL: BLAST2Sequences,anewtoolforcomparing proteinandnucleotidesequences. FEMSMicrobiolLett 1999, 174:247-250. 33.NoL,KucherovG: YASS:enhancingthesensitivityofDNAsimilarity search. NucleicAcidsRes 2005, 33 :W540-W543. 34.ZukerM: Mfoldwebserverfornucleicacidfoldingandhybridization prediction. NucleicAcidsRes 2003, 31 :3406-3415. 35.SwoffordDL: PAUP*:Phylogeneticanalysisusingparsimony(*andother methods),Version4. Sunderland,MA:SinauerAssociates;2003. 36.HanKL,BraunEL,KimballRT,ReddyS,BowieRCK,BraunMJ,ChojnowskiJL, HackettSJ,HarshmanJ,HuddlestonCJ,MarksBD,MigliaKJ,MooreWS, SheldonFH,SteadmanDW,WittCC,YuriT: Aretransposableelement insertionshomoplasyfree?Anexaminationusingtheaviantreeoflife. SystBiol 2011, 60 :375-386. 37.StamatakisA: Maximumlikelihood-basedphylogeneticanalyseswith thousandsoftaxaandmixedmodels. Bioinformatics 2006, 22 :2688-2690. 38.KirchhausenT: Clathrin. AnnuRevBiochem 2000, 69 :699-727. 39.WakehamDE,Abi-RachedL,TowlerMC,WilburJD,ParhamP,BrodskyFM: Clathrinheavyandlightchainisoformsoriginatedbyindependent mechanismsofgeneduplicationduringchordateevolution. ProcNatl AcadSciUSA 2005, 102 :7209-7214. 40.TsirigosA,RigoutsosI: Humanandmouseintronsarelinkedtothesame processesandfunctionsthrougheachgenome smostfrequentnonconservedmotifs. NucleicAcidsRes 2008, 36 :3484-3493. 41.BonillaAJ,BraunEL,KimballRT: Comparativemolecularevolutionand phylogeneticutilityof3 -UTRsandintronsinGalliformes. Mol Phylogenet Evol 2010, 56 :536-542. 42.WarrenWC,ClaytonDF,EllegrenH,ArnoldAP,HillierLW,KunstnerA, SearleS,WhiteS,VilellaAJ,FairleyS,HegerA,KongL,PontingCP, JarvisED,MelloCV,MinxP,LovellP,VelhoTAF,FerrisM,BalakrishnanCN, SinhaS,BlattiC,LondonSE,LiY,LinYC,GeorgeJ,SweedlerJ,SoutheyB, GunaratneP,WatsonM, etal : Thegenomeofasongbird. Nature 2010, 464 :757-762. 43.KhabarKSA: TheAU-richtranscriptome:Morethaninterferonsand cytokines,anditsroleindisease. JInterferonCytokineRes 2005, 25 :1-10. 44.ChenJM,FrecC,CooperDN: Asystematicanalysisofdisease-associated variantsinthe3 regulatoryregionsofhumanprotein-codinggenesII: TheimportanceofmRNAsecondarystructureinassessingthe functionalityof3 UTRvariants. HumGenet 2006, 120 :301-333.Braun etal BMCEvolutionaryBiology 2011, 11 :141 http://www.biomedcentral.com/1471-2148/11/141 Page9of10
45.EricsonPGP,AndersonCL,BrittonT,ElzanowkiA,JohanssonUS,KllersjM, OhlsonJI,ParsonsTJ,ZucconD,MayrG: DiversificationofNeoaves: Integrationofmolecularsequencedataandfossils. BiolLett 2006, 2 :543-547. 46.GibbGC,KardailskyO,KimballRT,BraunEL,PennyD: Mitochondrial genomesandavianphylogeny:Complexcharactersandresolvability withoutexplosiveradiations. MolBiolEvol 2007, 24 :269-280. 47.BrownJW,PayneRB,MindellDP: NuclearDNAdoesnotreconcile rocks and clocks inNeoaves:acommentonEricsonetal. BiolLett 2007, 3 :257-259. 48.PrattRC,GibbGC,Morgan-RichardsM,PhillipsMJ,HendyMD,PennyD: TowardresolvingdeepNeoavesphylogeny:Data,signalenhancement, andpriors. MolBiolEvol 2009, 26 :313-326. 49.MayrG,ClarkeJ: Thedeepdivergencesofneornithinebirds:A phylogeneticanalysisofmorphologicalcharacters. Cladistics 2003, 19 :527-553. 50.LivezeyBC,ZusiRL: Higher-orderphylogenyofmodernbirds(Theropoda, Aves:Neornithes)basedoncomparativeanatomy.II.Analysisand discussion. ZoolJLinnSoc 2007, 149 :1-95. 51.CracraftJ,BarkerFK,BraunM,HarshmanJ,DykeGJ,FeinsteinJ,StanleyS, CiboisA,SchiklerP,BeresfordP,Garca-MorenoJ,YuriT,MindellDP: Phylogeneticrelationshipsamongmodernbirds(Neornithes):Towardan aviantreeoflife. In Assemblingthetreeoflife. Editedby:CracraftJ, DonoghueMJ.NewYork:OxfordUniversityPress;2004:468-489. 52.MayrG: Metaves,Mirandornithes,Strisoresandothernovelties-acritical reviewofthehigher-levelphylogenyofneornithinebirds. JZoolSyst EvolRes 2011, 49 :58-76. 53.PoeS,ChubbAL: Birdsinabush:fivegenesindicateexplosiveevolution ofavianorders. Evolution 2004, 58 :404-415. 54.BraunEL,KimballRT: Polytomies,thepowerofphylogeneticinference, andthestochasticnatureofmolecularevolution:AcommentonWalsh etal.(1999). Evolution 2001, 55 :1261-1263. 55.SchnigerM,WatermanMS: AlocalalgorithmforDNAsequence alignmentwithinversions. BullMathBiol 1992, 54 :521-536. 56.VellozoAF,AlvesCER,doLagoAP: Alignmentwithnon-overlapping inversionsin O (n3)-time. LectNotesComputSc 2006, 4175 :186-196. 57.LedergerberC,DessimozC: Alignmentswithnon-overlappingmoves, inversionsandtandemduplicationsin O ( n4)time. JCombOptim 2008, 16 :263-278. 58.NishiharaH,MaruyamaS,OkadaN: Retroposonanalysisandrecent geologicaldatasuggestnear-simultaneousdivergenceofthethree superordersofmammals. ProcNatlAcadSciUSA 2009, 106 :5235-5240. 59.MindellDP,KnightA,BaerC,HuddlestonCJ: Slowratesofmolecular evolutioninbirdsandthemetabolicrateandbodytemperature hypotheses. MolBiolEvol 1996, 13 :422-426. 60.CooperGM,BrudnoM,NISCComparativeSequencingProgram,GreenED, BatzoglouS,SidowA: Quantitativeestimatesofsequencedivergencefor comparativeanalysesofmammaliangenomes. GenomeRes 2003, 13 :813-820.doi:10.1186/1471-2148-11-141 Citethisarticleas: Braun etal .: Homoplasticmicroinversionsandthe aviantreeoflife. BMCEvolutionaryBiology 2011 11 :141. Submit your next manuscript to BioMed Central and take full advantage of: Convenient online submission Thorough peer review No space constraints or color gure charges Immediate publication on acceptance Inclusion in PubMed, CAS, Scopus and Google Scholar Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Braun etal BMCEvolutionaryBiology 2011, 11 :141 http://www.biomedcentral.com/1471-2148/11/141 Page10of10