<%BANNER%>

UFIR




PAGE 1

Bio Med Central Open AccessPage 1 of 10(page number not for citation purposes) TrialsStudy protocolSubgroup Analysis of Trials Is Rarely Easy (SATIRE): a study protocol for a systematic review to characterize the analysis, reporting, and claim of subgro up effects in randomized trialsXinSun1,2, MatthiasBriel1,3, JasonWBusse1,4, ElieAAkl5, JohnJYou1,6, FilipMejza7, MalgorzataBala8, NataliaDiaz-Granados1, DirkBassler9, DominikMertz1,10, SadeeshKSrinathan1,11, PerOlavVandvik12, GermanMalaga13, MohamedAlshurafa1, PhilippDahm14, PabloAlonsoCoello15,16, DianeMHeels-Ansdell1, NeeraBhatnagar17, BradleyCJohnston1, LiWang2, StephenDWalter1, DouglasGAltman18 and GordonHGuyatt*1,6Address: 1Department of Clinical Epidemiology and Biostati stics, McMaster University, Hamilton, Canada, 2Center for Clinical Epidemiology and Evidence-Based Medicine, West China Hospital, Sichuan Univ ersity, Chengdu, PR China, 3Basel Institute for Clinical Epidemiology and Biostatistics, University Hospit al Basel, Basel, Switzerland, 4The Institute for Work & Health, Toronto, Ontario, Canada, 5Departments of Medicine and Family Medicine, State University of New York at Buffalo, NY, USA, 6Department of Medicine, McMaster University, Hamilton, Canada, 7Department of Pulmonary Diseases Jagiellonian Univer sity School of Medicine, Krakow, Poland, 8Department of Internal Medicine, J agiellonian University School of Medicine, Krakow, Poland, 9University Children's Hospital Tuebingen, Depa rtment of Neonatology, Tuebingen, Germany, 10Division of Infectious Diseases & Hospital Epid emiology, University Hospital Basel, Switzerland, 11Section of Thoracic Surgery, Department of Surgery, University of Manito ba, Winnipeg, Manitoba, Canada, 12Norwegian Knowledge Centre for the Health Services, Oslo, Norway, 13Universidad Peruana Cayeta no Heredia, Lima, Peru, 14Department of Urology, University of Florida, College of Medicine, Gainesville, Florida, USA, 15Iberoamerican Cochrane Center. Hospital de la Santa Creu i Sant Pau, Barcelona, Spain, 16CIBER de Epidemiologa y Salud Pblica (CIBERESP), Spain, 17Health Sciences Library, McMaster University, Hamilton, Canada and 18Centre for Statistics in Medicine, University of Oxford, Oxford, UK Email: XinSun-sunx26@mcmaster.ca; Matthia sBriel-MBriel@uhbs.ch; JasonWBusse-jbusse@iwh.on.ca; ElieAAkl-elieakl@buffalo.edu; JohnJYou-jyou@mcmaster.ca; FilipMe jza-filipmejza@mp.pl; MalgorzataBala-gosiabala@mp.pl; NataliaDiazGranados-natalia.diaz.granados @utoronto.ca; DirkBassler-dirk .bassler@med.uni-tuebingen.de; DominikMertz-DMertz@uhbs.ch; SadeeshKSrinathan-ssrinathan@gmail.co m; PerOlavVandvik-pvandvik@start.no; GermanMalaga-gmalaga01@gmail.com; MohamedAlshurafa-alshurm@mcmaster.ca; PhilippDahm-Philipp.Da hm@urology.ufl.edu; PabloAlons o-Coello-PAlonso@santpau.cat; DianeMHeels-Ansdell-ansdell@mcmaster.ca; NeeraBhatnagar-bhatnag@mcmaster.ca; Brad leyCJohnston-bjohnston@med.ualberta.ca; LiWang-wangli_74@hotmail.com; StephenDWalter-walter@mcmaster.ca; Do uglasGAltman-doug.al tman@csm.ox.ac.uk; GordonHGuyatt*-guyatt@mcmaster.ca Corresponding author AbstractBackground: Subgroup analyses in randomized trials examine whether effects of interventions differ between subgroups of st udy populations according to ch aracteristics of patients or interventions. However, findings from subgroup an alyses may be misleading potentially resulting in suboptimal clinical and health decision making. Few studies have investigated the reporting and conduct of subgroup analyses and a number of important questions remain unanswered. The objectives of this study are: 1) to describe the reporting of subgroup analyses and claims ofPublished: 9 November 2009 Trials 2009, 10 :101doi:10.1186/1745-6215-10-101 Received: 12 September 2009 Accepted: 9 November 2009 This article is available from: http:/ /www.trialsjournal.com/content/10/1/101 2009 Sun et al; licens ee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons. org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the orig inal work is properly cited.

PAGE 2

Trials 2009, 10 :101http://www.trialsjo urnal.com/content/10/1/101 Page 2 of 10(page number not for citation purposes)subgroup effects in randomized cont rolled trials, 2) to assess stud y characteristics associated with reporting of subgroup analyses and with claims of subgroup effects, and 3) to examine the analysis, and interpretation of subgroup effects for each study's primary outcome. Methods: We will conduct a systematic review of 464 randomized controlled human trials published in 2007 in the 118 Core Clinical Journals defined by the National Library of Medicine. We will randomly select journal articles, stratifi ed in a 1:1 ratio by higher impact versus lower impact journals. According to 2007 ISI total citations, we consider the New England Journal of Medicine, JAMA, Lancet, A nnals of Internal Medicine and BMJ as higher impact journals. Teams of two reviewers will independently screen full texts of reports for eligibility, and abstract data, using standardized, pilot-tested extracti on forms. We will conduct univa riable and multivariable logistic regression analyses to examine th e association of pre-specified study characteristics with reporting of subgroup analyses and with claims of subgroup ef fects for the primary and any other outcomes. Discussion: A clear understanding of subgroup analyses as currently conducted and reported in published randomized controlled trials, will reveal both strengths and weakne sses of this practice. Our findings will contribute to a set of recommendations to opti mize the conduct and reporting of subgroup analyses, and claim and interpretation of subgroup effects in randomized trials.BackgroundThe effects of healthcare interventions on the entire study population are of primary interest in clinical trials. It remains appealing, however, for investigators and clinicians to identify differential effects in subgroups based on characteristics of patients or interventions. This analytic approach, termed subgroup analysis, can sometimes be informative but it is often misleading [1-4]. Investigators frequently conduct subgroup analyses exploring multiple hypotheses [5]. Conducting multiple tests is associated with the risk of false positive results due to the play of chance [3]. This risk is particularly great if subgroup analyses are data driven: that is, when investigators perform numerous post hoc subgroup analyses seeking statistical significance. Even when investigators specify a limited number of subgroup analyses a priori the play of chance may still result in identification of spurious subgroup effects. Sometimes, investigators explore possible subgroup effects by testing the null hypothesis of no treatment effect in each of the relevant subgroups. A claim of subgroup effect is made if a significant effect is observed in one subgroup but not in the other(s) [6,7]. This strategy, however, fails to address the real issue of subgroup analysis: can chance explain the apparent difference between subgroups? This question can be addressed with a formal test of interaction in which the null hypothesis is that the underlying effect across subgroups is the same. In another instance, investigators report and claim the effect of one subgroup of patients while ignore reporting of other subgroups. Investigators may also test the difference of effects between groups according to the study characteristic measured after randomization. The apparent difference of effects may, however, be explained by the treatment intervention itself, or by differing prognostic characteristics in sub-groups that emerge after randomization, rather than by the subgroup characteristic itself. Therefore, this approach to analyzing subgroups is highly problematic [4,8,9]. Many apparent subgroup effects have been proven to be spurious [10]. Misleading subgroup effects can result in withholding efficacious treatment from patients who would benefit, or encourage ineffective or potentially harmful treatments for subgroups who would fare better without. It is, therefore, imperative to critically assess the validity of claimed subgroup effects. One approach is to use seven previously proposed criteria for determining whether apparent differences in subgroup response are likely to be real [11]. These criteria have been widely used to evaluate subgroup analyses in randomized controlled trials (RCTs) and meta-analyses [12-15]. Several new criteria may further facilitate differentiation between spurious and real subgroup effects (Appendix 1). A limited number of empirical studies have evaluated how trialists conduct and report subgroup analyses, and have revealed several weaknesses (Table 1) [16-21]. Weaknesses include the use of an excessive number of variables and outcomes, inappropriate statistical methods, and insufficient a priori specification of variables. A review of subgroup analyses reported in cardiovascular trials [17], for instance, identified one study reported 23 subgroup variables and 17 outcomes. In another review of 27 surgical trials [16], a test of interaction was reported for only 5.8% (3/54) of subgroup hypotheses tested, whereas 72.2% (39/54) claimed subgroup effects. Across six reviews of subgroup analyses, the prevalence of trials

PAGE 3

Trials 2009, 10 :101http://www.trialsjo urnal.com/content/10/1/101 Page 3 of 10(page number not for citation purposes)claiming at least one subgroup effect ranged from 25% to 60% [16-20]. Two studies one [18] restricted to trials published in the New England Journal of Medicine, and another [17] restricted to moderate or large sized cardiovascular trials found that larger sample size was the only study characteristic statistically associated with reporting of subgroup analyses. Despite the merits of these studies, each of them examined only a relatively small number of trials (median 57, range 11-97). None compared the reporting of subgroup analyses in higher impact journals versus other journals; none examined the reporting of subgroup analyses in relation to type of outcomes (e.g. continuous, binary, time-toevent, count, or multinomial); and none specifically examined subgroup analysis reporting for the primary outcome. In addition, none of the previous reviews documented the magnitude of the apparent subgroup effects and magnitude of p-values of interaction tests; none investigated the validity of claimed subgroup effects; none investigated study characteristics associated with claim of subgroup effects; and none addressed the credibility of the claimed subgroup effects. These shortcomings limit the generalizability of findings and leave important questions unanswered. Therefore, we will conduct a systematic review of RCTs to further inform the current use and reporting of subgroup analyses. In this study, we have three main objectives. The first is to describe the reporting of subgroup analyses and claim of subgroup effects. The second is to assess study characteristics associated with reporting of subgroup analyses, and study characteristics associated with claim of subgroup effects, both for the primary outcome and for any outcome. The third objective is to examine the analysis and interpretation of subgroup effects conducted for the primary outcome.MethodsStudy Design OverviewWe will conduct a systematic review of RCTs conducted in humans and published in 2007 in the Core Clinical Journals defined by the National Library of Medicine http:// www.nlm.nih.gov/bsd/aim.html To maximize the generalizability of study findings, we will include parallel, cross-over, and factorial randomized trials, and both individual and cluster randomised trials. Unless the authors report findings to the contrary, we will assume no treatment-by-treatment interaction in factorial studies, no treatment-by-sequence interaction in cross-over studies, and no treatment-by-cluster interactions in cluster-randomized studies. We will use the standard methodology for conducting systematic reviews [22].Definition of Subgroup, Subg roup Analysis, and Subgroup EffectFor this study, we define a subgroup as a subset of a trial population that is identified on the basis of a patient or intervention characteristic that is either measured at baseline or after randomization. We define a subgroup analysis as a statistical analysis that explores whether effects of the intervention (i.e. experi-Table 1: Characteristics of six studies revi ewing subgroup analyses in randomized trialsStudy IDTrial areaSource of studyNumber of trialsTrial feat ure for eligibility criteria Wang (2007)MultipleNEJM (July 2005 to June 2006)97 (59 reporting subgroup analyses) No restrictions Bhandari (2006)SurgicalTwo su rgical journals plus NEJM, JAMA, BMJ, and Lancet (Jan 2000 to Apr 2003) 72 (27 reporting subgroup analyses) No restriction on size and other trial characteristics Hernandez (2006)CardiovascularFour cardiovascular journals plus "Top Five" (2002 and 2004) 63 (39 reporting subgroup analyses) Phase 3 parallel trials, n 100, superiority trials; restricted to main reports Hernandez (2005)Traumatic brainMEDLINE (1966 to Apr 2004), EMBASE (1978 to Apr 2004), CENTRAL (Apr 2004) 18 (11 reporting subgroup analyses) Phase 3, parallel trials, n 50 per arm Glasgow Outcome Scale (GOS) at 3 months as outcome Moreira Jr (2001)MultipleNEJ M, JAMA, Lancet, American Journal of Public Health (July 1998) 32 (17 reporting subgroup analyses) No restrictions mentioned. Assmann (2000)MultipleNEJM, JAMA, BMJ, and Lancet (July to Sep 1997) 50 (35 reporting subgroup analyses) No crossover and cluster trials, n 50

PAGE 4

Trials 2009, 10 :101http://www.trialsjo urnal.com/content/10/1/101 Page 4 of 10(page number not for citation purposes)mental versus control) differ according to status of a subgroup variable. This includes a case in which investigators report a main result and analyze only a subset of patients. We define a subgroup effect as a difference in the magnitude of a treatment effect across subgroups of a study population. The null hypothesis for a test of a subgroup effect (i.e. subgroup hypothesis) is that there is no difference in the magnitude of a treatment effect across subgroups. We will consider both absolute and relative effect measures in our study.Eligibility CriteriaThe inclusion criteria are: 1) The study is an RCT; 2) The participants are human; 3) The study is published in 2007 in a core clinical journal (as defined by the National Library of Medicine). The exclusion criteria are: 1) The report does not include the entire population enrolled in the original study (i.e. the report focuses on a subset of the original study population); 2) The study is explicitly labelled as a phase I trial; 3) The study is exclusively a pharmacokinetic study; 4) The study is reported as a Research Letter. No restrictions apply with respect to the following aspects:  Trial design (i.e., parallel, factorial or cross-over);  Number of trial arms (i.e., two or more);  Unit of randomization (i.e., individual patient or cluster);  Type of outcome (i.e., continuous, binary, time-toevent, count, or multinomial);  Type of trial (i.e., superiority, non-inferiority or equivalence trial);  Type of report (i.e., main report, longer follow-up report, or interim report);  Subgroup variables measured at baseline versus after randomization.  Sample size, length of follow up, and loss to follow up;  Statistical significance versus non-significance of overall main effects;Literature SearchWe will search for RCTs published in the Core Clinical Journals in 2007. This group of journals is defined by the National Library of Medicine, includes a total of 118 journals covering all specialities of clinical medicine and public health sciences, and is known as the Abridged Index Medicus We will run the Medline search using the OVID platform and a search strategy (Appendix 2) developed with the help of an experienced librarian.Random Sampling of CitationsWe will stratify the Core Clinical Journals into higher and lower impact journals. For this study we define higher impact journals as the five journals with the highest total citations in 2007: the New England Journal of Medicine, JAMA, Lancet, Annals of Internal Medicine and BMJ Lower impact journals consist of the remaining Core Clinical Journals. We will randomly sample the journal articles, with 1:1 stratification by journal type (i.e. higher and lower impact). We will continue the random sampling process until the number of eligible studies meets our required sample size.Review processTeams of two trained reviewers will perform citation and full text screening and data abstraction, in duplicate and independently, including the selection of the primary outcome (using pre-specified criteria see below), selection of the pair-wise comparison for analysis (if there are three or more arms). Each team will attempt to resolve discrepancies by consensus or, if discrepancy remains, through discussion with one of two arbitrators (XS, GHG). The arbitrator will independently review the trial report before discussing it with the reviewers. Before the review formally starts, we will conduct calibration exercises to ensure consistency across reviewers. We will use electronic forms, developed with Microsoft Access and Excel, for study screening and data extraction. The forms will be standardized and pilot-tested, and detailed written instructions will be developed to assist with study screening and data extraction.Study ScreeningTwo reviewers will independently screen the title and abstract of each randomly chosen citation for potential eligibility. In the title and abstract screening, they will judge only if the study is a randomized controlled trial enrolling human participants. Two reviewers will then independently screen the full text of the potentially eligible trials to determine eligibility.

PAGE 5

Trials 2009, 10 :101http://www.trialsjo urnal.com/content/10/1/101 Page 5 of 10(page number not for citation purposes)At the full text screening stage, the reviewers will select a primary outcome for eligible studies, using the following strategy: If the report specifies a primary outcome, we will select it as the primary outcome; if the report specifies more than one primary outcome (i.e. co-primary outcomes), we will select the one with the largest number of subgroup analyses; if outcomes have the same number of subgroup analyses, we will select the one with the greatest relevance to patients according to a pre-defined outcome hierarchy, and if more than one outcome are in the same category, we will take the first reported outcome in the abstract (Appendix 3). If the report does not specify a primary outcome, we will select the outcome used for the study sample size calculation, but if there is no sample size calculation reported or if there is a sample size calculation for several outcomes, we will proceed as detailed in the previous sentence. Reviewers will also identify a pair-wise comparison of interest, using the following strategy. If there are only two groups, we will use them for the pair-wise comparison. If there are three or more groups, we will select the comparison that was clearly and explicitly defined as the primary comparison in the study report; if the primary comparison was not explicitly defined, we will select the comparison that reports the largest number of subgroup analyses for the selected primary outcome; if more than one comparison reported the same largest number of subgroup analyses, we will select the comparison that reports the smallest interaction p value; if the interaction p value is not available, we will select the one that has the smallest p value for the main effect.Data Abstraction Study CharacteristicsWe will extract information on funding sources, clinical area, type of intervention, trial design (parallel, cross-over, or factorial), trial type (superiority, non-inferiority, or equivalence), unit of randomization (randomization at individual or cluster level), methodological characteristics of trials (allocation concealment; blinding of patients, healthcare givers, data collectors, outcome adjudicators, or data analysts; stopping trials early for benefit), number of participants randomized for the selected comparison, and total number of participants randomized. We will categorise the selected primary outcome, according to whether it is a composite endpoint, whether the results are statistically significant, and the type of outcome variable (time-to-event, binary, continuous, count, or multinomial). We will record the type of effect measure for the selected primary outcome. If more than one effect measure is used for binary, time-to-event, or count outcomes, we will use a hierarchical approach to select an effect measure, as follows:  Select the effect measure that the investigators clearly indicated as the effect measure for the primary analysis;  Select the effect measure on which the subgroup analysis is reported and a subgroup effect is claimed;  Select the measure that yields the smallest reported pvalue of the main effect;  Otherwise, use the following order for binary outcomes: risk ratio > odds ratio > relative risk reduction > risk difference; and the following for time-to-event outcomes: hazard ratio > incidence rate ratio > ratio of cumulative incidence > ratio of time > difference in incidence rate > difference in cumulative incidence > difference in time If no effect measure is reported but data for a 2 2 table are available for the primary outcome, we will calculate risk ratios. For binary, time-to-event, and count primary outcomes, we will document their point estimates and 95% confidence intervals for the main effects, as well as whenever possible events and number of patients in a 2 2 table. For continuous outcomes, we will document the number of patients analyzed in the experimental and control groups, and the summary measure (i.e. means, medians) and associated measure of precision (i.e. inter-quartile range, 95% confidence interval, standard deviation, or standard error). We will not document the magnitude of the main effect for multinomial primary outcomes.Reporting of subgroup analysesWe will record whether trials report subgroup analyses for any outcomes (i.e. primary or secondary), the number of outcomes for which subgroup analyses are reported, the type of outcomes, the number of subgroup variables reported in the trial report, the number of subgroup analyses that were most likely conducted, the number of subgroup analyses reported, whether any subgroup analysis was specified a priori, and whether any subgroup effect was stated to have been analyzed by a test of interaction. We will also document the above information specifically for the primary outcome. We will consider a subgroup analysis has been reported if: 1) the investigators report a point estimate and an associated confidence interval or a p-value for one or more subgroups of the study original population, 2) the investigators report the magnitude of difference in the effect according to status of a subgroup variable, 3) the investigators report results from an interaction test, or 4) the investigators explicitly state that they conducted subgroup analyses but do not report any of the data mentioned above.

PAGE 6

Trials 2009, 10 :101http://www.trialsjo urnal.com/content/10/1/101 Page 6 of 10(page number not for citation purposes)Claim of subgroup effectsWe will record whether trials claim a subgroup effect for any outcomes (i.e. primary or secondary outcome), number of subgroup effects claimed in the trial report, and type of outcomes used for the claim. We will judge the strength of the claim based on the inferences drawn by the investigators in the abstract or discussion section. We will also document the above information specifically for the primary outcome. We will consider a subgroup effect is claimed if, in the abstract or discussion of the trial report, the investigators state that the effects of intervention differed, or may have differed, according to status of a subgroup variable. We will classify the strength of a claim according to four categories, and have defined these categories as below: 1) Strong claim of a definitive effect: The authors convey a conviction that the subgroup effect truly exists. 2) Claim of a likely effect: The authors convey a belief that the subgroup effect likely exists. 3) Suggestion of a possible effect: The authors suggest a subgroup effect and convey an uncertainty whether the subgroup effect exists. 4) No claim of a subgroup effect: The authors do not make a claim of a subgroup effect. We have developed explicit criteria to judge the strength of claim (Table 2).Analysis of subgroup effect for the primary outcomeWe will document, for each subgroup analysis, whether the subgroup variable is a baseline characteristic or based on an after-randomization event, whether the investigators specified the variable a priori whether the investigators specified the direction a priori whether the subgroup variable was used as a stratification factor in randomization, the type of tests used for analyzing subgroup effects (test of significance of individual groups, interaction test, or both), the statistical approaches used for a test of interaction, and the methods of adjusting for multiple interaction effects. We will also document, whenever possible, the 2 2 data, the reported point estimate, 95% confidence interval, and p-value of the effect of each subgroup, as well as the reported p-value of the interaction test.Interpretation of claimed subgroup effect for the primary outcomeFor each of the claimed subgroup effects, we will further document whether the authors provided a supportive biological rationale or cited external evidence that is consistent with the observed subgroup effect, whether the authors indicated that the pre-specified direction was correct, or that they indicated the observed subgroup effect was consistent across closely related outcomes.Table 2: Criteria for judging th e strength of a subgroup claimCriteriaStrong claimClaim of a likely effectSuggestion of a possible effect 1. Did the investigators claim the effect in the abstract?YesPossibleNo 2. Did the investigators claim the effect in the conclusion of abstract? Possible*NoNo 3. Did the investigators claim the effe ct in the discussion?YesPossibleYes 4. Did the investigators use the descriptive words (e.g. appear/seem to be, may, and might) to soften their statements of the claims? NoPossiblePossible 5. Did the investigators us ed descriptive words (e.g. particular, and special) to strengthen the statement of the claims PossibleNoNo 6. Were the authors obvio usly cautious about the apparent subgroup effect? (e.g. they stated the subgroup effect did not meet some of important criteria to believe a subgroup effect) NoSome caution possibleYes 7. Did the investigators indicate the apparent effects need to be explored in the future studies (i.e. hypothesis generating)? NoPossible say desirable to confirmYes If a claim appears in the conclu sion section of the abstract, it is considered a strong claim.

PAGE 7

Trials 2009, 10 :101http://www.trialsjo urnal.com/content/10/1/101 Page 7 of 10(page number not for citation purposes)Sample SizeWe conducted a pilot study including 139 randomized trials. The results showed that 62 (44.6%) trials reported subgroup analyses for any outcome, and 41 (29.5%) reported for the primary outcome; 27 (19.4%) trials claimed subgroup effect for any outcome, and 18 (12.9%) claimed for the primary outcome. We calculate the sample size based on the examination of study characteristics associated with claim of subgroup effects for any outcome. In our regression of study characteristics with claim of subgroup effects, we will include 6 study characteristics, a total of 9 categories of variables. We will require 10 events (i.e. claim of subgroup effect) per category to examine the association, resulting in a total of 90 events (and at least 90 total non-events). Given the results of pilot study, we will require a total of 464 trials for this study.Statistical AnalysisWe will assess agreement between reviewers for study inclusion at the full text screening stage, reviewers' judgments whether the investigators reported a subgroup analysis, claimed a subgroup effect, pre-specified the subgroup hypothesis, or used the interaction test. We will calculate both crude agreement and chance-corrected agreement. We will interpret the agreement statistics using the guidelines proposed by Landis and Koch [23]: kappa values of 0 to 0.20 represent slight agreement, 0.21 to 0.40 fair agreement, 0.41 to 0.60 moderate agreement, 0.61 to 0.80 substantial agreement, and greater than 0.80 almost perfect agreement. We will calculate the proportions of trials reporting at least one subgroup analysis for the primary outcome and for any outcome. Treating the reporting of a subgroup analysis as the dependent variable, we will conduct univariable and multivariable logistic regression analyses to examine its association with the pre-specified study characteristics for both the primary outcome and for any outcome. We will also calculate the proportions of trials claiming a subgroup effect for the primary outcome and for any outcome in trials that report a subgroup analysis, and conduct univariable and multivariable logistic regression analyses to examine the association of pre-specified study characteristics with claim of a subgroup effect for the primary outcome and for any outcome. Our pre-specified study characteristics for the regression analyses are: average sample size per study arm, journal type (high vs. lower impact journals), source of funding (partially or completely funded by private for profit organization vs. others), statistical significance of the main effect, trial area (medical vs. surgical), number of pre-specified primary outcomes (used for the regression of reporting of subgroup analyses only), number of subgroup analyses (used for the regression of claim of subgroup effects only). We hypothesize that trials are more likely to report subgroup analyses or claim subgroup effect if they have larger sample size, are published in higher impact journals, receive funding from for profit organizations, do not achieve statistical significance for the main effect, investigate medical versus surgical interventions, have more pre-specified primary outcomes, and larger number of subgroup analyses. In the multiple logistic regression analysis for reporting of subgroup analysis, we will also examine the interaction of source of funding and significance of main effect. We will describe the details of reporting of subgroup analyses and claim of subgroup effects for both any outcome and specifically for the primary outcome. If a variable, in both univariable and multivariable analyses, is found to be significantly associated with reporting of a subgroup analysis and/or claim of a subgroup effect, we will also present the above information stratified by the type of journal. We will describe the details of analysis of subgroup effects for the primary outcome by journal type (i.e. five highest impact journals versus other journals), and by claim versus no claim of a subgroup effect. We will also describe the details of interpretation of claimed subgroup effects by journal type.DiscussionOur study is designed to comprehensively address the analysis, reporting, and claim of subgroup effects in a representative sample of recent RCTs. This study protocol follows the publications of two other protocols [24,25] which reflects our continuing efforts to make objectives and design of methodological studies more transparent.Strengths and limitationsOur study has several strengths. First, we will employ rigorous systematic review methods including explicit and reproducible eligibility criteria, sensitive search strategies, and the use of standardized, pilot-tested forms accompanied by written instructions for study screening and data extraction. Teams of two trained reviewers will independently and in duplicate conduct study screening. We will also undertake calibration exercises and pilot data extraction to enhance consistency between reviewers before embarking on data abstraction. Second, our eligibility criteria are broad, and compared to the previous empirical studies our study findings will be more generalizable. Third, we conducted a pilot study to calculate the required sample size for the definitive study. Finally, our study will

PAGE 8

Trials 2009, 10 :101http://www.trialsjo urnal.com/content/10/1/101 Page 8 of 10(page number not for citation purposes)be the largest empirical study of subgroup analyses which will allow us to reliably address a number of important questions that have not been addressed by existing reviews. Our study also has several limitations. It will be based on reported trial information, and our findings may be vulnerable to underreporting or selective reporting [26]. The limited space allowed by medical journals for reporting on trials may prevent authors from sufficiently reporting relevant information on subgroup analyses. Consequently, the proportion of trials reporting subgroup analyses is probably smaller than the proportion of trials actually conducting subgroup analyses, and the number of subgroup analyses reported in each trial is probably smaller than the actual number of conducted subgroup analyses. In relation to this problem, we will also estimate the number of subgroup analyses that were most likely conducted. Similarly, other details about subgroup analyses, such as a priori specification of the subgroup hypothesis and direction, may also be under-reported. Our study does not include all medical journals, and our findings may not be applicable to journals outside our sample. Our study, however, includes many more journals than the previous studies that typically included high impact journals or specialty journals only. We chose the Core Clinical Journals because they cover all clinical and public health areas, and include all major medical journals. We consider that the quality of studies in these journals will be no worse than that in other journals, and expect that the quality of subgroup analyses reported in other journals will be no better than that in the Core Clinical Journals. Our study will involve reviewers' judgement of the strength of the claim of subgroup effect, and the determination of strength may be subjective and vary across reviewers. We have developed detailed written instructions to assist reviewers in judging the strength, and will check the inter-reviewer agreement.Implications of this studyAlthough a few empirical studies restricted to certain disease areas or journal type have found a significant association between sample size and reporting of subgroup analyses, factors that drive reporting and claiming of subgroup effects in a more representative set of trials remain uncertain. The results of this study will provide robust, generalizable, and reliable evidence on the factors that impact reporting and claiming of subgroup effects. Considerable work, including methodological advocacy [3,27-31] and empirical investigation [5,18,19], has been done to inform the conduct of subgroup analyses. However, few reports have systematically developed the framework of analysis, reporting, claim, and interpretation of subgroup effects. The findings of this study will further aid in the development of recommendations for adequate reporting, and appropriate analysis, claim, and interpretation of subgroup effects. Claimed subgroup effects are of primary interest to clinicians, investigators and other users. Claims of spurious subgroup effects can distort clinical practice and public health decision making, with serious consequences for patients and unnecessary expenditures. Methodological safeguards have been proposed to protect from spurious subgroup findings [4,10,30], but empirical evidence of their validity is limited. The results of this study will reveal the extent to which the investigators considered methodological safeguards in their claims, and provide some evidence regarding the extent to which claims of subgroup effects are valid. The findings of the SATIRE study may influence recommendations on reporting, conduct, claim, and interpretation of subgroup analyses. These will be of particular interest to the stakeholders that have direct influence on trial design, analysis, and reporting, including investigators, health decision makers, guideline developers, funding agencies, and medical journal editors.Competing interestsThe authors declare that they have no competing interests.Authors' contributionsXS and GHG conceptualized the study. All authors contributed to design of the study and read and approved the manuscript. XS developed the first draft of the manuscript and incorporated comments from authors for successive drafts.AppendicesAppendix 1. The eleven criter ia for assessing credibility of claimed subgroup effects Is the subgroup variable a characteristic at randomization?  Is the effect suggested by comparisons within rather than between studies?  Does interaction test suggests a low likelihood that chance explains the apparent subgroup effect?  Is the significant interaction effect independent of other potential subgroup effects?  Was the hypothesis specified a priori?  Was the correct direction of subgroup effect specified a priori?

PAGE 9

Trials 2009, 10 :101http://www.trialsjo urnal.com/content/10/1/101 Page 9 of 10(page number not for citation purposes) Was the subgroup effect one of a small number of hypothesized effects tested?  Is the magnitude of the subgroup effect large?  Is the interaction consistent across studies?  Is the interaction consistent across closed related outcomes within the study?  Is there indirect evidence that supports the hypothesized interaction? The new criteria are italicized .Appendix 2: Search strategy1. exp Randomized Controlled Trials/ 2. (randomized controlled trial$ or randomised controlled trial$).mp. [mp = title, original title, abstract, name of substance word, subject heading word] 3. (randomized trial$ or randomised trial$).mp. [mp = title, original title, abstract, name of substance word, subject heading word] 4. (randomized clinical trial$ or randomised clinical trial$).mp. [mp = title, original title, abstract, name of substance word, subject heading word] 5. 1 or 2 or 3 or 4 6. limit 5 to (English language and humans and "core clinical journals (aim)" and yr="2007")Appendix 3: Hierarchy of outcomesI. Mortality 1) all cause mortality 2) disease specific mortality II. Morbidity 1) cardiovascular major morbid events 2) other major morbid events (e.g. loss of vision, seizures, fracture, revascularization) 3) recurrence/relapse/remission of cancer/disease free survival 4) renal failure requiring dialysis 5) hospitalizations 6) infections 7) dermatological/rheumatologic disorders III. Symptoms/Quality of life/Functional status (e.g. failure to become pregnant, successful nursing/breastfeeding, depression) IV. Surrogate outcomes (e.g. viral load, physical activity, post operative atrial fibrillation)AcknowledgementsWe thank Monica Owen for administra tive assistance. We thank Aravin Duraik for developing the study electron ic forms. The study is partially supported by the National Natural Science Foundation of China (NSFC, 70703025). The funder had no role in the study design, in the writing of the manuscript, or in the decision to submit this or future manuscripts for publication. Xin Sun is supported by two research scholarships from the National Natural Science Foundation of China (70503021, 70703025). Matthias Briel was supported by a scholarship from the Swiss National Science Foundation (PASMA-112951/1) and the Roche Research Foundation. Dominik Mertz was partially supported by a research scholarship from the Swiss National Science Foundation (PBBSP3-124436). Jason Busse is funded by a New Investigator Award from th e Canadian Institutes of Health Research and Canadian Chirop ractic Research Foundation.References1.Fletcher J: Subgroup analyses: how to avoid being misled. BMJ 2007, 335: 96-97. 2.Oxman AD, Guyatt GH: A consumer's guide to subgroup analyses. Ann Intern Med 1992, 116: 78-84. 3.Schulz KF, Grimes DA: Multiplicity in random ised trials II: subgroup and interim analyses. Lancet 2005, 365: 1657-61. 4.Yusuf S, Wittes J, Probstfield J, et al. : Analysis and interpretation of treatment effects in subgro ups of patients in randomized clinical trials. JAMA 1991, 266: 93-98. 5.Pocock SJ, Hughes MD, Lee RJ: Statistical problems in the reporting of clinical trials. A survey of three medical journals. N Engl J Med 1987, 317: 426-32. 6.Barnett HJM, Taylor DW, Eliasziw M, et al. : Benefit of Carotid Endarterectomy in Patients wi th Symptomatic Moderate or Severe Stenosis. N Engl J Med 1998, 339: 1415-1425. 7.Weisberg LA, Ticlopidine Aspirin Stroke Study G: The efficacy and safety of ticlopidine and aspiri n in non-whites: Analysis of a patient subgroup from the Ticlopidine Aspirin Stroke Study. Neurology 1993, 43: 27. 8.van Walraven C, Davis D, Forster AJ, et al. : Time-dependent bias was common in survival analys es published in leading clinical journals. J Clin Epidemiol 2004, 57: 672-82. 9.Hirji K, Fagerland M: Outcome based subgroup analysis: a neglected concern. Trials 2009, 10: 33. 10.Guyatt G, Wyer PC, Ioannidis J: When to Believe a Subgroup Analysis. In User's Guide to the Medical Literature: A Manual for Evidence-Based Clinical Practice Edited by: Guyatt G, et al. AMA: Chicago; 2008:571-583. 11.Oxman A, Guyatt G, Green L, et al. : When to believe a subgroup analysis. In Users' guides to the medical literature. A manual for evidencebased clinical practice Edited by: Guyatt G, Rennie D. Chicago, IL: AMA Press; 2002:553-65. 12.Hatala R, Keitz S, Wyer P, et al. : Tips for learners of evidencebased medicine: 4. Assessing he terogeneity of primary studies in systematic reviews and whether to combine their results. CMAJ 2005, 172: 661-665. 13.Montori VM, Jaeschke R, Schunemann HJ, et al. : Users' guide to detecting misleading claims in clinical research reports. BMJ 2004, 329: 1093-1096.

PAGE 10

Publish with Bio Med Central and every scientist can read your work free of charge"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime." Sir Paul Nurse, Cancer Research UK Your research papers will be: available free of charge to the entire biomedical community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours — you keep the copyright Submit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.asp Bio Med central Trials 2009, 10 :101http://www.trialsjo urnal.com/content/10/1/101 Page 10 of 10(page number not for citation purposes)14.Trevor A, Sheldon GGAH: Criteria for the Implementation of Research Evidence in Policy and Practice. Getting Research Findings Into Practice Second edition. 2008:11-18. 15.Martin CM, Guyatt G, Montori VM: The sirens are singing: the perils of trusting trials stopped early and subgroup analyses. Crit Care Med 2005, 33: 1870-1. 16.Bhandari M, Devereaux PJ, Li P, et al. : Misuse of baseline comparison tests and subgroup analyses in surgical trials. Clin Orthop Relat Res 2006, 447: 247-51. 17.Hernandez AV, Boersma E, Murray GD, et al. : Subgroup analyses in therapeutic cardiovascular c linical trials: are most of them misleading? Am Heart J 2006, 151: 257-64. 18.Wang R, Lagakos SW, Ware JH, et al. : Statistics in Medicine -Reporting of Subgroup Analyses in Clinical Trials. N Engl J Med 2007, 357: 2189-2194. 19.Assmann SF, Pocock SJ, Enos LE, et al. : Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet 2000, 355: 1064-9. 20.Hernandez AV, Steyerberg EW, Taylor GS, et al. : Subgroup analysis and covariate adjustment in ra ndomized clinical trials of traumatic brain injury: a systematic review. Neurosurgery 2005, 57: 1244-53. discussion 1244-53 21.Moreira ED Jr, Stein Z, Susser E: Reporting on methods of subgroup analysis in clinical trials: a survey of four scientific journals. Brazilian Journal of Medica l and Biological Research 2001, 34: 1441-1446. 22.Higgins JPT, G S, (editors): Cochrane Handbook for Systematic Reviews of Interventions Version 5.0.1 [updated September 2008]. The Cochrane Collaboration; 2008. 23.Landis JR, Koch GG, Landis JR, et al. : The measurement of observer agreement fo r categorical data. Biometrics 1977, 33: 159-74. 24.Akl EA, Briel M, You JJ, et al. : LOST to follow-up Information in Trials (LOST-IT): a protocol on the potential impact. Trials 2009, 10: 40. 25.Briel M, Lane M, Montori VM, et al. : Stopping randomized trials early for benefit: a protocol of the Study Of Trial Policy Of Interim Truncatio n-2 (STOPIT-2). Trials 2009, 10: 49. 26.Chan A-W, Hrobjartsson A, Jorgensen KJ, et al. : Discrepancies in sample size calculations and data analyses reported in randomised trials: comparison of publications with protocols. BMJ 2008, 337: a2299. 27.Cui L, Hung HM, Wang SJ, et al. : Issues related to subgroup analysis in clinical trials. J Biopharm Stat 2002, 12: 347-58. 28.Pocock SJ, Assmann SE, Enos LE, et al. : Subgroup analysis, covariate adjustment and baseline co mparisons in clinical trial reporting: current practiceand problems. Statistics in Medicine 2002, 21: 2917-2930. 29.Brookes ST, Whitely E, Egger M, et al. : Subgroup analyses in randomized trials: risks of subg roup-specific analyses; power and sample size for the interaction test. J Clin Epidemiol 2004, 57: 229-36. 30.Rothwell PM: Treating individuals 2. S ubgroup analysis in randomised controlled trials: im portance, indications, and interpretation. Lancet 2005, 365: 176-86. 31.Moher D, Schulz KF, Altman DG: The CONSORT statement: revised recommendations fo r improving the quality of reports of parallel-group randomised trials. Lancet 2001, 357: 1191-1194.


xml version 1.0 encoding utf-8 standalone no
mets ID sort-mets_mets OBJID sword-mets LABEL DSpace SWORD Item PROFILE METS SIP Profile xmlns http:www.loc.govMETS
xmlns:xlink http:www.w3.org1999xlink xmlns:xsi http:www.w3.org2001XMLSchema-instance
xsi:schemaLocation http:www.loc.govstandardsmetsmets.xsd
metsHdr CREATEDATE 2012-07-26T00:07:38
agent ROLE CUSTODIAN TYPE ORGANIZATION
name BioMed Central
dmdSec sword-mets-dmd-1 GROUPID sword-mets-dmd-1_group-1
mdWrap SWAP Metadata MDTYPE OTHER OTHERMDTYPE EPDCX MIMETYPE textxml
xmlData
epdcx:descriptionSet xmlns:epdcx http:purl.orgeprintepdcx2006-11-16 xmlns:MIOJAVI
http:purl.orgeprintepdcxxsd2006-11-16epdcx.xsd
epdcx:description epdcx:resourceId sword-mets-epdcx-1
epdcx:statement epdcx:propertyURI http:purl.orgdcelements1.1type epdcx:valueURI http:purl.orgeprintentityTypeScholarlyWork
http:purl.orgdcelements1.1title
epdcx:valueString Subgroup Analysis of Trials Is Rarely Easy (SATIRE): a study protocol for a systematic review to characterize the analysis, reporting, and claim of subgroup effects in randomized trials
http:purl.orgdctermsabstract
Abstract
Background
Subgroup analyses in randomized trials examine whether effects of interventions differ between subgroups of study populations according to characteristics of patients or interventions. However, findings from subgroup analyses may be misleading, potentially resulting in suboptimal clinical and health decision making. Few studies have investigated the reporting and conduct of subgroup analyses and a number of important questions remain unanswered. The objectives of this study are: 1) to describe the reporting of subgroup analyses and claims of subgroup effects in randomized controlled trials, 2) to assess study characteristics associated with reporting of subgroup analyses and with claims of subgroup effects, and 3) to examine the analysis, and interpretation of subgroup effects for each study's primary outcome.
Methods
We will conduct a systematic review of 464 randomized controlled human trials published in 2007 in the 118 Core Clinical Journals defined by the National Library of Medicine. We will randomly select journal articles, stratified in a 1:1 ratio by higher impact versus lower impact journals. According to 2007 ISI total citations, we consider the New England Journal of Medicine, JAMA, Lancet, Annals of Internal Medicine, and BMJ as higher impact journals. Teams of two reviewers will independently screen full texts of reports for eligibility, and abstract data, using standardized, pilot-tested extraction forms. We will conduct univariable and multivariable logistic regression analyses to examine the association of pre-specified study characteristics with reporting of subgroup analyses and with claims of subgroup effects for the primary and any other outcomes.
Discussion
A clear understanding of subgroup analyses, as currently conducted and reported in published randomized controlled trials, will reveal both strengths and weaknesses of this practice. Our findings will contribute to a set of recommendations to optimize the conduct and reporting of subgroup analyses, and claim and interpretation of subgroup effects in randomized trials.
http:purl.orgdcelements1.1creator
Sun, Xin
Briel, Matthias
Busse, Jason W
Akl, Elie A
You, John J
Mejza, Filip
Bala, Malgorzata
Diaz-Granados, Natalia
Bassler, Dirk
Mertz, Dominik
Srinathan, Sadeesh K
Vandvik, Per O
Malaga, German
Alshurafa, Mohamed
Dahm, Philipp
Alonso-Coello, Pablo
Heels-Ansdell, Diane M
Bhatnagar, Neera
Johnston, Bradley C
Wang, Li
Walter, Stephen D
Altman, Douglas G
Guyatt, Gordon H
http:purl.orgeprinttermsisExpressedAs epdcx:valueRef sword-mets-expr-1
http:purl.orgeprintentityTypeExpression
http:purl.orgdcelements1.1language epdcx:vesURI http:purl.orgdctermsRFC3066
en
http:purl.orgeprinttermsType
http:purl.orgeprinttypeJournalArticle
http:purl.orgdctermsavailable
epdcx:sesURI http:purl.orgdctermsW3CDTF 2009-11-09
http:purl.orgdcelements1.1publisher
BioMed Central Ltd
http:purl.orgeprinttermsstatus http:purl.orgeprinttermsStatus
http:purl.orgeprintstatusPeerReviewed
http:purl.orgeprinttermscopyrightHolder
Xin Sun et al.; licensee BioMed Central Ltd.
http:purl.orgdctermslicense
http://creativecommons.org/licenses/by/2.0
http:purl.orgdctermsaccessRights http:purl.orgeprinttermsAccessRights
http:purl.orgeprintaccessRightsOpenAccess
http:purl.orgeprinttermsbibliographicCitation
Trials. 2009 Nov 09;10(1):101
http:purl.orgdcelements1.1identifier
http:purl.orgdctermsURI http://dx.doi.org/10.1186/1745-6215-10-101
fileSec
fileGrp sword-mets-fgrp-1 USE CONTENT
file sword-mets-fgid-0 sword-mets-file-1
FLocat LOCTYPE URL xlink:href 1745-6215-10-101.xml
sword-mets-fgid-1 sword-mets-file-2 applicationpdf
1745-6215-10-101.pdf
structMap sword-mets-struct-1 structure LOGICAL
div sword-mets-div-1 DMDID Object
sword-mets-div-2 File
fptr FILEID
sword-mets-div-3


!DOCTYPE art SYSTEM 'http:www.biomedcentral.comxmlarticle.dtd'
ui 1745-6215-10-101
ji 1745-6215
fm
dochead Study protocol
bibl
title
p Subgroup Analysis of Trials Is Rarely Easy (SATIRE): a study protocol for a systematic review to characterize the analysis, reporting, and claim of subgroup effects in randomized trials
aug
au id A1
snm Sun
fnm Xin
insr iid I1
I2
email sunx26@mcmaster.ca
A2
Briel
Matthias
I3
MBriel@uhbs.ch
A3
Busse
mi W
Jason
I4
jbusse@iwh.on.ca
A4
Akl
A
Elie
I5
elieakl@buffalo.edu
A5
You
J
John
I6
jyou@mcmaster.ca
A6
Mejza
Filip
I7
filipmejza@mp.pl
A7
Bala
Malgorzata
I8
gosiabala@mp.pl
A8
Diaz-Granados
Natalia
natalia.diaz.granados@utoronto.ca
A9
Bassler
Dirk
I9
dirk.bassler@med.uni-tuebingen.de
A10
Mertz
Dominik
I10
DMertz@uhbs.ch
A11
Srinathan
K
Sadeesh
I11
ssrinathan@gmail.com
A12
Vandvik
mnm Olav
Per
I12
pvandvik@start.no
A13
Malaga
German
I13
gmalaga01@gmail.com
A14
Alshurafa
Mohamed
alshurm@mcmaster.ca
A15
Dahm
Philipp
I14
Philipp.Dahm@urology.ufl.edu
A16
Alonso-Coello
Pablo
I15
I16
PAlonso@santpau.cat
A17
Heels-Ansdell
M
Diane
ansdell@mcmaster.ca
A18
Bhatnagar
Neera
I17
bhatnag@mcmaster.ca
A19
Johnston
C
Bradley
bjohnston@med.ualberta.ca
A20
Wang
Li
wangli_74@hotmail.com
A21
Walter
D
Stephen
walter@mcmaster.ca
A22
Altman
G
Douglas
I18
doug.altman@csm.ox.ac.uk
ca yes A23
Guyatt
H
Gordon
guyatt@mcmaster.ca
insg
ins
Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada
Center for Clinical Epidemiology and Evidence-Based Medicine, West China Hospital, Sichuan University, Chengdu, PR China
Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel, Basel, Switzerland
The Institute for Work & Health, Toronto, Ontario, Canada
Departments of Medicine and Family Medicine, State University of New York at Buffalo, NY, USA
Department of Medicine, McMaster University, Hamilton, Canada
Department of Pulmonary Diseases, Jagiellonian University School of Medicine, Krakow, Poland
Department of Internal Medicine, Jagiellonian University School of Medicine, Krakow, Poland
University Children's Hospital Tuebingen, Department of Neonatology, Tuebingen, Germany
Division of Infectious Diseases & Hospital Epidemiology, University Hospital Basel, Switzerland
Section of Thoracic Surgery, Department of Surgery, University of Manitoba, Winnipeg, Manitoba, Canada
Norwegian Knowledge Centre for the Health Services, Oslo, Norway
Universidad Peruana Cayetano Heredia, Lima, Peru
Department of Urology, University of Florida, College of Medicine, Gainesville, Florida, USA
Iberoamerican Cochrane Center. Hospital de la Santa Creu i Sant Pau, Barcelona, Spain
CIBER de Epidemiología y Salud Pública (CIBERESP), Spain
Health Sciences Library, McMaster University, Hamilton, Canada
Centre for Statistics in Medicine, University of Oxford, Oxford, UK
source Trials
issn 1745-6215
pubdate 2009
volume 10
issue 1
fpage 101
url http://www.trialsjournal.com/content/10/1/101
xrefbib
pubidlist
pubid idtype pmpid 19900273
doi 10.1186/1745-6215-10-101
history
rec
date
day 12
month 9
year 2009
acc
09
11
2009
pub
09
11
2009
cpyrt
2009
collab Sun et al; licensee BioMed Central Ltd.
note This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
abs
sec
st
Abstract
Background
Subgroup analyses in randomized trials examine whether effects of interventions differ between subgroups of study populations according to characteristics of patients or interventions. However, findings from subgroup analyses may be misleading, potentially resulting in suboptimal clinical and health decision making. Few studies have investigated the reporting and conduct of subgroup analyses and a number of important questions remain unanswered. The objectives of this study are: 1) to describe the reporting of subgroup analyses and claims of subgroup effects in randomized controlled trials, 2) to assess study characteristics associated with reporting of subgroup analyses and with claims of subgroup effects, and 3) to examine the analysis, and interpretation of subgroup effects for each study's primary outcome.
Methods
We will conduct a systematic review of 464 randomized controlled human trials published in 2007 in the 118 Core Clinical Journals defined by the National Library of Medicine. We will randomly select journal articles, stratified in a 1:1 ratio by higher impact versus lower impact journals. According to 2007 ISI total citations, we consider the it New England Journal of Medicine, JAMA, Lancet, Annals of Internal Medicine, and BMJ as higher impact journals. Teams of two reviewers will independently screen full texts of reports for eligibility, and abstract data, using standardized, pilot-tested extraction forms. We will conduct univariable and multivariable logistic regression analyses to examine the association of pre-specified study characteristics with reporting of subgroup analyses and with claims of subgroup effects for the primary and any other outcomes.
Discussion
A clear understanding of subgroup analyses, as currently conducted and reported in published randomized controlled trials, will reveal both strengths and weaknesses of this practice. Our findings will contribute to a set of recommendations to optimize the conduct and reporting of subgroup analyses, and claim and interpretation of subgroup effects in randomized trials.
meta
classifications
classification endnote subtype user_supplied_xml type bmc
bdy
Background
The effects of healthcare interventions on the entire study population are of primary interest in clinical trials. It remains appealing, however, for investigators and clinicians to identify differential effects in subgroups based on characteristics of patients or interventions. This analytic approach, termed subgroup analysis, can sometimes be informative but it is often misleading abbrgrp abbr bid B1 1B2 2B3 3B4 4.
Investigators frequently conduct subgroup analyses exploring multiple hypotheses B5 5. Conducting multiple tests is associated with the risk of false positive results due to the play of chance 3. This risk is particularly great if subgroup analyses are data driven: that is, when investigators perform numerous post hoc subgroup analyses seeking statistical significance. Even when investigators specify a limited number of subgroup analyses a priori, the play of chance may still result in identification of spurious subgroup effects.
Sometimes, investigators explore possible subgroup effects by testing the null hypothesis of no treatment effect in each of the relevant subgroups. A claim of subgroup effect is made if a significant effect is observed in one subgroup but not in the other(s) B6 6B7 7. This strategy, however, fails to address the real issue of subgroup analysis: can chance explain the apparent difference between subgroups? This question can be addressed with a formal test of interaction in which the null hypothesis is that the underlying effect across subgroups is the same. In another instance, investigators report and claim the effect of one subgroup of patients while ignore reporting of other subgroups. Investigators may also test the difference of effects between groups according to the study characteristic measured after randomization. The apparent difference of effects may, however, be explained by the treatment intervention itself, or by differing prognostic characteristics in sub-groups that emerge after randomization, rather than by the subgroup characteristic itself. Therefore, this approach to analyzing subgroups is highly problematic 4B8 8B9 9.
Many apparent subgroup effects have been proven to be spurious B10 10. Misleading subgroup effects can result in withholding efficacious treatment from patients who would benefit, or encourage ineffective or potentially harmful treatments for subgroups who would fare better without. It is, therefore, imperative to critically assess the validity of claimed subgroup effects. One approach is to use seven previously proposed criteria for determining whether apparent differences in subgroup response are likely to be real B11 11. These criteria have been widely used to evaluate subgroup analyses in randomized controlled trials (RCTs) and meta-analyses B12 12B13 13B14 14B15 15. Several new criteria may further facilitate differentiation between spurious and real subgroup effects (Appendix 1).
A limited number of empirical studies have evaluated how trialists conduct and report subgroup analyses, and have revealed several weaknesses (Table tblr tid T1 1) B16 16B17 17B18 18B19 19B20 20B21 21. Weaknesses include the use of an excessive number of variables and outcomes, inappropriate statistical methods, and insufficient a priori specification of variables. A review of subgroup analyses reported in cardiovascular trials 17, for instance, identified one study reported 23 subgroup variables and 17 outcomes. In another review of 27 surgical trials 16, a test of interaction was reported for only 5.8% (3/54) of subgroup hypotheses tested, whereas 72.2% (39/54) claimed subgroup effects. Across six reviews of subgroup analyses, the prevalence of trials claiming at least one subgroup effect ranged from 25% to 60% 1617181920. Two studies one 18 restricted to trials published in the New England Journal of Medicine, and another 17 restricted to moderate or large sized cardiovascular trials found that larger sample size was the only study characteristic statistically associated with reporting of subgroup analyses.
tbl
Table 1
caption
Characteristics of six studies reviewing subgroup analyses in randomized trials
tblbdy cols 5
r
c left
b Study ID
Trial area
Source of study
Number of trials
Trial feature for eligibility criteria
cspan
hr
Wang (2007)
Multiple
NEJM (July 2005 to June 2006)
97 (59 reporting subgroup analyses)
No restrictions
Bhandari (2006)
Surgical
Two surgical journals plus NEJM, JAMA, BMJ, and Lancet (Jan 2000 to Apr 2003)
72 (27 reporting subgroup analyses)
No restriction on size and other trial characteristics
Hernandez (2006)
Cardiovascular
Four cardiovascular journals plus "Top Five" (2002 and 2004)
63 (39 reporting subgroup analyses)
Phase 3 parallel trials, n ≥ 100, superiority trials; restricted to main reports
Hernandez (2005)
Traumatic brain
MEDLINE (1966 to Apr 2004), EMBASE (1978 to Apr 2004), CENTRAL (Apr 2004)
18 (11 reporting subgroup analyses)
Phase 3, parallel trials, n ≥ 50 per arm
Glasgow Outcome Scale (GOS) at 3 months as outcome
Moreira Jr (2001)
Multiple
NEJM, JAMA, Lancet, American Journal of Public Health (July 1998)
32 (17 reporting subgroup analyses)
No restrictions mentioned.
Assmann (2000)
Multiple
NEJM, JAMA, BMJ, and Lancet (July to Sep 1997)
50 (35 reporting subgroup analyses)
No crossover and cluster trials, n ≥ 50
Despite the merits of these studies, each of them examined only a relatively small number of trials (median 57, range 11-97). None compared the reporting of subgroup analyses in higher impact journals versus other journals; none examined the reporting of subgroup analyses in relation to type of outcomes (e.g. continuous, binary, time-to-event, count, or multinomial); and none specifically examined subgroup analysis reporting for the primary outcome. In addition, none of the previous reviews documented the magnitude of the apparent subgroup effects and magnitude of p-values of interaction tests; none investigated the validity of claimed subgroup effects; none investigated study characteristics associated with claim of subgroup effects; and none addressed the credibility of the claimed subgroup effects.
These shortcomings limit the generalizability of findings and leave important questions unanswered. Therefore, we will conduct a systematic review of RCTs to further inform the current use and reporting of subgroup analyses.
In this study, we have three main objectives. The first is to describe the reporting of subgroup analyses and claim of subgroup effects. The second is to assess study characteristics associated with reporting of subgroup analyses, and study characteristics associated with claim of subgroup effects, both for the primary outcome and for any outcome. The third objective is to examine the analysis and interpretation of subgroup effects conducted for the primary outcome.
Methods
Study Design Overview
We will conduct a systematic review of RCTs conducted in humans and published in 2007 in the Core Clinical Journals defined by the National Library of Medicine http://www.nlm.nih.gov/bsd/aim.html. To maximize the generalizability of study findings, we will include parallel, cross-over, and factorial randomized trials, and both individual and cluster randomised trials. Unless the authors report findings to the contrary, we will assume no treatment-by-treatment interaction in factorial studies, no treatment-by-sequence interaction in cross-over studies, and no treatment-by-cluster interactions in cluster-randomized studies. We will use the standard methodology for conducting systematic reviews B22 22.
Definition of Subgroup, Subgroup Analysis, and Subgroup Effect
For this study, we define a subgroup as a subset of a trial population that is identified on the basis of a patient or intervention characteristic that is either measured at baseline or after randomization.
We define a subgroup analysis as a statistical analysis that explores whether effects of the intervention (i.e. experimental versus control) differ according to status of a subgroup variable. This includes a case in which investigators report a main result and analyze only a subset of patients.
We define a subgroup effect as a difference in the magnitude of a treatment effect across subgroups of a study population. The null hypothesis for a test of a subgroup effect (i.e. subgroup hypothesis) is that there is no difference in the magnitude of a treatment effect across subgroups. We will consider both absolute and relative effect measures in our study.
Eligibility Criteria
The inclusion criteria are:
1) The study is an RCT;
2) The participants are human;
3) The study is published in 2007 in a core clinical journal (as defined by the National Library of Medicine).
The exclusion criteria are:
1) The report does not include the entire population enrolled in the original study (i.e. the report focuses on a subset of the original study population);
2) The study is explicitly labelled as a phase I trial;
3) The study is exclusively a pharmacokinetic study;
4) The study is reported as a Research Letter.
No restrictions apply with respect to the following aspects:
• Trial design (i.e., parallel, factorial or cross-over);
• Number of trial arms (i.e., two or more);
• Unit of randomization (i.e., individual patient or cluster);
• Type of outcome (i.e., continuous, binary, time-to-event, count, or multinomial);
• Type of trial (i.e., superiority, non-inferiority or equivalence trial);
• Type of report (i.e., main report, longer follow-up report, or interim report);
• Subgroup variables measured at baseline versus after randomization.
• Sample size, length of follow up, and loss to follow up;
• Statistical significance versus non-significance of overall main effects;
Literature Search
We will search for RCTs published in the Core Clinical Journals in 2007. This group of journals is defined by the National Library of Medicine, includes a total of 118 journals covering all specialities of clinical medicine and public health sciences, and is known as the Abridged Index Medicus. We will run the Medline search using the OVID platform and a search strategy (Appendix 2) developed with the help of an experienced librarian.
Random Sampling of Citations
We will stratify the Core Clinical Journals into higher and lower impact journals. For this study we define higher impact journals as the five journals with the highest total citations in 2007: the New England Journal of Medicine, JAMA, Lancet, Annals of Internal Medicine, and BMJ. Lower impact journals consist of the remaining Core Clinical Journals. We will randomly sample the journal articles, with 1:1 stratification by journal type (i.e. higher and lower impact). We will continue the random sampling process until the number of eligible studies meets our required sample size.
Review process
Teams of two trained reviewers will perform citation and full text screening and data abstraction, in duplicate and independently, including the selection of the primary outcome (using pre-specified criteria see below), selection of the pair-wise comparison for analysis (if there are three or more arms). Each team will attempt to resolve discrepancies by consensus or, if discrepancy remains, through discussion with one of two arbitrators (XS, GHG). The arbitrator will independently review the trial report before discussing it with the reviewers. Before the review formally starts, we will conduct calibration exercises to ensure consistency across reviewers. We will use electronic forms, developed with Microsoft Access and Excel, for study screening and data extraction. The forms will be standardized and pilot-tested, and detailed written instructions will be developed to assist with study screening and data extraction.
Study Screening
Two reviewers will independently screen the title and abstract of each randomly chosen citation for potential eligibility. In the title and abstract screening, they will judge only if the study is a randomized controlled trial enrolling human participants. Two reviewers will then independently screen the full text of the potentially eligible trials to determine eligibility.
At the full text screening stage, the reviewers will select a primary outcome for eligible studies, using the following strategy: If the report specifies a primary outcome, we will select it as the primary outcome; if the report specifies more than one primary outcome (i.e. co-primary outcomes), we will select the one with the largest number of subgroup analyses; if outcomes have the same number of subgroup analyses, we will select the one with the greatest relevance to patients according to a pre-defined outcome hierarchy, and if more than one outcome are in the same category, we will take the first reported outcome in the abstract (Appendix 3). If the report does not specify a primary outcome, we will select the outcome used for the study sample size calculation, but if there is no sample size calculation reported or if there is a sample size calculation for several outcomes, we will proceed as detailed in the previous sentence.
Reviewers will also identify a pair-wise comparison of interest, using the following strategy. If there are only two groups, we will use them for the pair-wise comparison. If there are three or more groups, we will select the comparison that was clearly and explicitly defined as the primary comparison in the study report; if the primary comparison was not explicitly defined, we will select the comparison that reports the largest number of subgroup analyses for the selected primary outcome; if more than one comparison reported the same largest number of subgroup analyses, we will select the comparison that reports the smallest interaction p value; if the interaction p value is not available, we will select the one that has the smallest p value for the main effect.
Data Abstraction
Study Characteristics
We will extract information on funding sources, clinical area, type of intervention, trial design (parallel, cross-over, or factorial), trial type (superiority, non-inferiority, or equivalence), unit of randomization (randomization at individual or cluster level), methodological characteristics of trials (allocation concealment; blinding of patients, healthcare givers, data collectors, outcome adjudicators, or data analysts; stopping trials early for benefit), number of participants randomized for the selected comparison, and total number of participants randomized.
We will categorise the selected primary outcome, according to whether it is a composite endpoint, whether the results are statistically significant, and the type of outcome variable (time-to-event, binary, continuous, count, or multinomial). We will record the type of effect measure for the selected primary outcome. If more than one effect measure is used for binary, time-to-event, or count outcomes, we will use a hierarchical approach to select an effect measure, as follows:
• Select the effect measure that the investigators clearly indicated as the effect measure for the primary analysis;
• Select the effect measure on which the subgroup analysis is reported and a subgroup effect is claimed;
• Select the measure that yields the smallest reported p-value of the main effect;
• Otherwise, use the following order for binary outcomes: risk ratio odds ratio relative risk reduction risk difference; and the following for time-to-event outcomes: hazard ratio incidence rate ratio ratio of cumulative incidence ratio of time difference in incidence rate difference in cumulative incidence difference in timep
pIf no effect measure is reported but data for a 2 × 2 table are available for the primary outcome, we will calculate risk ratios.p
pFor binary, time-to-event, and count primary outcomes, we will document their point estimates and 95% confidence intervals for the main effects, as well as whenever possible events and number of patients in a 2 × 2 table. For continuous outcomes, we will document the number of patients analyzed in the experimental and control groups, and the summary measure (i.e. means, medians) and associated measure of precision (i.e. inter-quartile range, 95% confidence interval, standard deviation, or standard error). We will not document the magnitude of the main effect for multinomial primary outcomes.p
sec
sec
st
pReporting of subgroup analysesp
st
pWe will record whether trials report subgroup analyses for any outcomes (i.e. primary or secondary), the number of outcomes for which subgroup analyses are reported, the type of outcomes, the number of subgroup variables reported in the trial report, the number of subgroup analyses that were most likely conducted, the number of subgroup analyses reported, whether any subgroup analysis was specified a priori, and whether any subgroup effect was stated to have been analyzed by a test of interaction. We will also document the above information specifically for the primary outcome.p
pWe will consider a subgroup analysis has been reported if: 1) the investigators report a point estimate and an associated confidence interval or a p-value for one or more subgroups of the study original population, 2) the investigators report the magnitude of difference in the effect according to status of a subgroup variable, 3) the investigators report results from an interaction test, or 4) the investigators explicitly state that they conducted subgroup analyses but do not report any of the data mentioned above.p
sec
sec
st
pClaim of subgroup effectsp
st
pWe will record whether trials claim a subgroup effect for any outcomes (i.e. primary or secondary outcome), number of subgroup effects claimed in the trial report, and type of outcomes used for the claim. We will judge the strength of the claim based on the inferences drawn by the investigators in the abstract or discussion section. We will also document the above information specifically for the primary outcome.p
pWe will consider a subgroup effect is claimed if, in the abstract or discussion of the trial report, the investigators state that the effects of intervention differed, or may have differed, according to status of a subgroup variable.p
pWe will classify the strength of a claim according to four categories, and have defined these categories as below:p
p1) Strong claim of a definitive effect: The authors convey a conviction that the subgroup effect truly exists.p
p2) Claim of a likely effect: The authors convey a belief that the subgroup effect likely exists.p
p3) Suggestion of a possible effect: The authors suggest a subgroup effect and convey an uncertainty whether the subgroup effect exists.p
p4) No claim of a subgroup effect: The authors do not make a claim of a subgroup effect.p
pWe have developed explicit criteria to judge the strength of claim (Table tblr tid="T2"2tblr).p
tbl id="T2"
title
pTable 2p
title
caption
pCriteria for judging the strength of a subgroup claimp
caption
tblbdy cols="4"
r
c ca="left"
p
bCriteriab
p
c
c ca="center"
p
bStrong claimb
p
c
c ca="center"
p
bClaim of a likely effectb
p
c
c ca="center"
p
bSuggestion of a possible effectb
p
c
r
r
c cspan="4"
hr
c
r
r
c ca="left"
p1. Did the investigators claim the effect in the abstractp
c
c ca="center"
pYesp
c
c ca="center"
pPossiblep
c
c ca="center"
pNop
c
r
r
c cspan="4"
hr
c
r
r
c ca="left"
p2. Did the investigators claim the effect in the conclusion of abstractp
c
c ca="center"
pPossible*p
c
c ca="center"
pNop
c
c ca="center"
pNop
c
r
r
c cspan="4"
hr
c
r
r
c ca="left"
p3. Did the investigators claim the effect in the discussionp
c
c ca="center"
pYesp
c
c ca="center"
pPossiblep
c
c ca="center"
pYesp
c
r
r
c cspan="4"
hr
c
r
r
c ca="left"
p4. Did the investigators use the descriptive words (e.g. appearseem to be, may, and might) to soften their statements of the claimsp
c
c ca="center"
pNop
c
c ca="center"
pPossiblep
c
c ca="center"
pPossiblep
c
r
r
c cspan="4"
hr
c
r
r
c ca="left"
p5. Did the investigators used descriptive words (e.g. particular, and special) to strengthen the statement of the claimsp
c
c ca="center"
pPossiblep
c
c ca="center"
pNop
c
c ca="center"
pNop
c
r
r
c cspan="4"
hr
c
r
r
c ca="left"
p6. Were the authors obviously cautious about the apparent subgroup effect (e.g. they stated the subgroup effect did not meet some of important criteria to believe a subgroup effect)p
c
c ca="center"
pNop
c
c ca="center"
pSome caution possiblep
c
c ca="center"
pYesp
c
r
r
c cspan="4"
hr
c
r
r
c ca="left"
p7. Did the investigators indicate the apparent effects need to be explored in the future studies (i.e. hypothesis generating)p
c
c ca="center"
pNop
c
c ca="center"
pPossible say desirable to confirmp
c
c ca="center"
pYesp
c
r
tblbdy
tblfn
p* If a claim appears in the conclusion section of the abstract, it is considered a strong claim.p
tblfn
tbl
sec
sec
st
pAnalysis of subgroup effect for the primary outcomep
st
pWe will document, for each subgroup analysis, whether the subgroup variable is a baseline characteristic or based on an after-randomization event, whether the investigators specified the variable ita prioriit, whether the investigators specified the direction ita prioriit, whether the subgroup variable was used as a stratification factor in randomization, the type of tests used for analyzing subgroup effects (test of significance of individual groups, interaction test, or both), the statistical approaches used for a test of interaction, and the methods of adjusting for multiple interaction effects.p
pWe will also document, whenever possible, the 2 × 2 data, the reported point estimate, 95% confidence interval, and p-value of the effect of each subgroup, as well as the reported p-value of the interaction test.p
sec
sec
st
pInterpretation of claimed subgroup effect for the primary outcomep
st
pFor each of the claimed subgroup effects, we will further document whether the authors provided a supportive biological rationale or cited external evidence that is consistent with the observed subgroup effect, whether the authors indicated that the pre-specified direction was correct, or that they indicated the observed subgroup effect was consistent across closely related outcomes.p
sec
sec
sec
st
pSample Sizep
st
pWe conducted a pilot study including 139 randomized trials. The results showed that 62 (44.6%) trials reported subgroup analyses for any outcome, and 41 (29.5%) reported for the primary outcome; 27 (19.4%) trials claimed subgroup effect for any outcome, and 18 (12.9%) claimed for the primary outcome.p
pWe calculate the sample size based on the examination of study characteristics associated with claim of subgroup effects for any outcome. In our regression of study characteristics with claim of subgroup effects, we will include 6 study characteristics, a total of 9 categories of variables. We will require 10 events (i.e. claim of subgroup effect) per category to examine the association, resulting in a total of 90 events (and at least 90 total non-events). Given the results of pilot study, we will require a total of 464 trials for this study.p
sec
sec
st
pStatistical Analysisp
st
pWe will assess agreement between reviewers for study inclusion at the full text screening stage, reviewers' judgments whether the investigators reported a subgroup analysis, claimed a subgroup effect, pre-specified the subgroup hypothesis, or used the interaction test. We will calculate both crude agreement and chance-corrected agreement. We will interpret the agreement statistics using the guidelines proposed by Landis and Koch abbrgrpabbr bid="B23"23abbrabbrgrp: kappa values of 0 to 0.20 represent slight agreement, 0.21 to 0.40 fair agreement, 0.41 to 0.60 moderate agreement, 0.61 to 0.80 substantial agreement, and greater than 0.80 almost perfect agreement.p
pWe will calculate the proportions of trials reporting at least one subgroup analysis for the primary outcome and for any outcome. Treating the reporting of a subgroup analysis as the dependent variable, we will conduct univariable and multivariable logistic regression analyses to examine its association with the pre-specified study characteristics for both the primary outcome and for any outcome.p
pWe will also calculate the proportions of trials claiming a subgroup effect for the primary outcome and for any outcome in trials that report a subgroup analysis, and conduct univariable and multivariable logistic regression analyses to examine the association of pre-specified study characteristics with claim of a subgroup effect for the primary outcome and for any outcome.p
pOur pre-specified study characteristics for the regression analyses are: average sample size per study arm, journal type (high vs. lower impact journals), source of funding (partially or completely funded by private for profit organization vs. others), statistical significance of the main effect, trial area (medical vs. surgical), number of pre-specified primary outcomes (used for the regression of reporting of subgroup analyses only), number of subgroup analyses (used for the regression of claim of subgroup effects only). We hypothesize that trials are more likely to report subgroup analyses or claim subgroup effect if they have larger sample size, are published in higher impact journals, receive funding from for profit organizations, do not achieve statistical significance for the main effect, investigate medical versus surgical interventions, have more pre-specified primary outcomes, and larger number of subgroup analyses. In the multiple logistic regression analysis for reporting of subgroup analysis, we will also examine the interaction of source of funding and significance of main effect.p
pWe will describe the details of reporting of subgroup analyses and claim of subgroup effects for both any outcome and specifically for the primary outcome. If a variable, in both univariable and multivariable analyses, is found to be significantly associated with reporting of a subgroup analysis andor claim of a subgroup effect, we will also present the above information stratified by the type of journal.p
pWe will describe the details of analysis of subgroup effects for the primary outcome by journal type (i.e. five highest impact journals versus other journals), and by claim versus no claim of a subgroup effect. We will also describe the details of interpretation of claimed subgroup effects by journal type.p
sec
sec
sec
st
pDiscussionp
st
pOur study is designed to comprehensively address the analysis, reporting, and claim of subgroup effects in a representative sample of recent RCTs. This study protocol follows the publications of two other protocols abbrgrpabbr bid="B24"24abbrabbr bid="B25"25abbrabbrgrp which reflects our continuing efforts to make objectives and design of methodological studies more transparent.p
sec
st
pStrengths and limitationsp
st
pOur study has several strengths. First, we will employ rigorous systematic review methods including explicit and reproducible eligibility criteria, sensitive search strategies, and the use of standardized, pilot-tested forms accompanied by written instructions for study screening and data extraction. Teams of two trained reviewers will independently and in duplicate conduct study screening. We will also undertake calibration exercises and pilot data extraction to enhance consistency between reviewers before embarking on data abstraction. Second, our eligibility criteria are broad, and compared to the previous empirical studies our study findings will be more generalizable. Third, we conducted a pilot study to calculate the required sample size for the definitive study. Finally, our study will be the largest empirical study of subgroup analyses which will allow us to reliably address a number of important questions that have not been addressed by existing reviews.p
pOur study also has several limitations. It will be based on reported trial information, and our findings may be vulnerable to underreporting or selective reporting abbrgrpabbr bid="B26"26abbrabbrgrp. The limited space allowed by medical journals for reporting on trials may prevent authors from sufficiently reporting relevant information on subgroup analyses. Consequently, the proportion of trials reporting subgroup analyses is probably smaller than the proportion of trials actually conducting subgroup analyses, and the number of subgroup analyses reported in each trial is probably smaller than the actual number of conducted subgroup analyses. In relation to this problem, we will also estimate the number of subgroup analyses that were most likely conducted. Similarly, other details about subgroup analyses, such as ita priori itspecification of the subgroup hypothesis and direction, may also be under-reported.p
pOur study does not include all medical journals, and our findings may not be applicable to journals outside our sample. Our study, however, includes many more journals than the previous studies that typically included high impact journals or specialty journals only. We chose the Core Clinical Journals because they cover all clinical and public health areas, and include all major medical journals. We consider that the quality of studies in these journals will be no worse than that in other journals, and expect that the quality of subgroup analyses reported in other journals will be no better than that in the Core Clinical Journals.p
pOur study will involve reviewers' judgement of the strength of the claim of subgroup effect, and the determination of strength may be subjective and vary across reviewers. We have developed detailed written instructions to assist reviewers in judging the strength, and will check the inter-reviewer agreement.p
sec
sec
st
pImplications of this studyp
st
pAlthough a few empirical studies restricted to certain disease areas or journal type have found a significant association between sample size and reporting of subgroup analyses, factors that drive reporting and claiming of subgroup effects in a more representative set of trials remain uncertain. The results of this study will provide robust, generalizable, and reliable evidence on the factors that impact reporting and claiming of subgroup effects.p
pConsiderable work, including methodological advocacy abbrgrpabbr bid="B3"3abbrabbr bid="B27"27abbrabbr bid="B28"28abbrabbr bid="B29"29abbrabbr bid="B30"30abbrabbr bid="B31"31abbrabbrgrp and empirical investigation abbrgrpabbr bid="B5"5abbrabbr bid="B18"18abbrabbr bid="B19"19abbrabbrgrp, has been done to inform the conduct of subgroup analyses. However, few reports have systematically developed the framework of analysis, reporting, claim, and interpretation of subgroup effects. The findings of this study will further aid in the development of recommendations for adequate reporting, and appropriate analysis, claim, and interpretation of subgroup effects.p
pClaimed subgroup effects are of primary interest to clinicians, investigators and other users. Claims of spurious subgroup effects can distort clinical practice and public health decision making, with serious consequences for patients and unnecessary expenditures. Methodological safeguards have been proposed to protect from spurious subgroup findings abbrgrpabbr bid="B4"4abbrabbr bid="B10"10abbrabbr bid="B30"30abbrabbrgrp, but empirical evidence of their validity is limited. The results of this study will reveal the extent to which the investigators considered methodological safeguards in their claims, and provide some evidence regarding the extent to which claims of subgroup effects are valid.p
pThe findings of the SATIRE study may influence recommendations on reporting, conduct, claim, and interpretation of subgroup analyses. These will be of particular interest to the stakeholders that have direct influence on trial design, analysis, and reporting, including investigators, health decision makers, guideline developers, funding agencies, and medical journal editors.p
sec
sec
sec
st
pCompeting interestsp
st
pThe authors declare that they have no competing interests.p
sec
sec
st
pAuthors' contributionsp
st
pXS and GHG conceptualized the study. All authors contributed to design of the study and read and approved the manuscript. XS developed the first draft of the manuscript and incorporated comments from authors for successive drafts.p
sec
sec
st
pAppendicesp
st
sec
st
pAppendix 1. The eleven criteria for assessing credibility of claimed subgroup effectsp
st
p• itIs the subgroup variable a characteristic at randomizationitp
p• Is the effect suggested by comparisons within rather than between studiesp
p• Does interaction test suggests a low likelihood that chance explains the apparent subgroup effectp
p• itIs the significant interaction effect independent of other potential subgroup effectsitp
p• Was the hypothesis specified a priorip
p• itWas the correct direction of subgroup effect specified a prioriitp
p• Was the subgroup effect one of a small number of hypothesized effects testedp
p• Is the magnitude of the subgroup effect largep
p• Is the interaction consistent across studiesp
p• itIs the interaction consistent across closed related outcomes within the studyitp
p• Is there indirect evidence that supports the hypothesized interactionp
pitThe new criteria are italicizedit.p
sec
sec
st
pAppendix 2: Search strategyp
st
p1. exp Randomized Controlled Trialsp
p2. (randomized controlled trial$ or randomised controlled trial$).mp. [mp = title, original title, abstract, name of substance word, subject heading word]p
p3. (randomized trial$ or randomised trial$).mp. [mp = title, original title, abstract, name of substance word, subject heading word]p
p4. (randomized clinical trial$ or randomised clinical trial$).mp. [mp = title, original title, abstract, name of substance word, subject heading word]p
p5. 1 or 2 or 3 or 4p
p6. limit 5 to (English language and humans and "core clinical journals (aim)" and yr="2007")p
sec
sec
st
pAppendix 3: Hierarchy of outcomesp
st
p indent="1"I. Mortalityp
p indent="2"1) all cause mortalityp
p indent="2"2) disease specific mortalityp
p indent="1"II. Morbidityp
p indent="2"1) cardiovascular major morbid eventsp
p indent="2"2) other major morbid events (e.g. loss of vision, seizures, fracture, revascularization)p
p indent="2"3) recurrencerelapseremission of cancerdisease free survivalp
p indent="2"4) renal failure requiring dialysisp
p indent="2"5) hospitalizationsp
p indent="2"6) infectionsp
p indent="2"7) dermatologicalrheumatologic disordersp
p indent="1"III. SymptomsQuality of lifeFunctional status (e.g. failure to become pregnant, successful nursingbreastfeeding, depression)p
p indent="1"IV. Surrogate outcomes (e.g. viral load, physical activity, post operative atrial fibrillation)p
sec
sec
bdy
bm
ack
sec
st
pAcknowledgementsp
st
pWe thank Monica Owen for administrative assistance. We thank Aravin Duraik for developing the study electronic forms. The study is partially supported by the National Natural Science Foundation of China (NSFC, 70703025). The funder had no role in the study design, in the writing of the manuscript, or in the decision to submit this or future manuscripts for publication. Xin Sun is supported by two research scholarships from the National Natural Science Foundation of China (70503021, 70703025). Matthias Briel was supported by a scholarship from the Swiss National Science Foundation (PASMA-1129511) and the Roche Research Foundation. Dominik Mertz was partially supported by a research scholarship from the Swiss National Science Foundation (PBBSP3-124436). Jason Busse is funded by a New Investigator Award from the Canadian Institutes of Health Research and Canadian Chiropractic Research Foundation.p
sec
ack
refgrp
bibl id="B1"
title
pSubgroup analyses: how to avoid being misledp
title
aug
au
snmFletchersnm
fnmJfnm
au
aug
sourceBMJsource
pubdate2007pubdate
volume335volume
fpage96fpage
lpage97lpage
xrefbib
pubidlist
pubid idtype="pmcid"1914513pubid
pubid idtype="pmpid" link="fulltext"17626964pubid
pubid idtype="doi"10.1136bmj.39265.596262.ADpubid
pubidlist
xrefbib
bibl
bibl id="B2"
title
pA consumer's guide to subgroup analysesp
title
aug
au
snmOxmansnm
fnmADfnm
au
au
snmGuyattsnm
fnmGHfnm
au
aug
sourceAnn Intern Medsource
pubdate1992pubdate
volume116volume
fpage78fpage
lpage84lpage
xrefbib
pubid idtype="pmpid"1530753pubid
xrefbib
bibl
bibl id="B3"
title
pMultiplicity in randomised trials II: subgroup and interim analysesp
title
aug
au
snmSchulzsnm
fnmKFfnm
au
au
snmGrimessnm
fnmDAfnm
au
aug
sourceLancetsource
pubdate2005pubdate
volume365volume
fpage1657fpage
lpage61lpage
xrefbib
pubidlist
pubid idtype="doi"10.1016S0140-6736(05)66516-6pubid
pubid idtype="pmpid" link="fulltext"15885299pubid
pubidlist
xrefbib
bibl
bibl id="B4"
title
pAnalysis and interpretation of treatment effects in subgroups of patients in randomized clinical trialsp
title
aug
au
snmYusufsnm
fnmSfnm
au
au
snmWittessnm
fnmJfnm
au
au
snmProbstfieldsnm
fnmJfnm
au
etal
aug
sourceJAMAsource
pubdate1991pubdate
volume266volume
fpage93fpage
lpage98lpage
xrefbib
pubidlist
pubid idtype="doi"10.1001jama.266.1.93pubid
pubid idtype="pmpid"2046134pubid
pubidlist
xrefbib
bibl
bibl id="B5"
title
pStatistical problems in the reporting of clinical trials. A survey of three medical journalsp
title
aug
au
snmPococksnm
fnmSJfnm
au
au
snmHughessnm
fnmMDfnm
au
au
snmLeesnm
fnmRJfnm
au
aug
sourceN Engl J Medsource
pubdate1987pubdate
volume317volume
fpage426fpage
lpage32lpage
xrefbib
pubid idtype="pmpid"3614286pubid
xrefbib
bibl
bibl id="B6"
title
pBenefit of Carotid Endarterectomy in Patients with Symptomatic Moderate or Severe Stenosisp
title
aug
au
snmBarnettsnm
fnmHJMfnm
au
au
snmTaylorsnm
fnmDWfnm
au
au
snmEliasziwsnm
fnmMfnm
au
etal
aug
sourceN Engl J Medsource
pubdate1998pubdate
volume339volume
fpage1415fpage
lpage1425lpage
xrefbib
pubidlist
pubid idtype="doi"10.1056NEJM199811123392002pubid
pubid idtype="pmpid" link="fulltext"9811916pubid
pubidlist
xrefbib
bibl
bibl id="B7"
title
pThe efficacy and safety of ticlopidine and aspirin in non-whites: Analysis of a patient subgroup from the Ticlopidine Aspirin Stroke Studyp
title
aug
au
snmWeisbergsnm
fnmLAfnm
au
au
cnmTiclopidine Aspirin Stroke Study Gcnm
au
aug
sourceNeurologysource
pubdate1993pubdate
volume43volume
fpage27fpage
xrefbib
pubid idtype="pmpid"8423906pubid
xrefbib
bibl
bibl id="B8"
title
pTime-dependent bias was common in survival analyses published in leading clinical journalsp
title
aug
au
snmvan Walravensnm
fnmCfnm
au
au
snmDavissnm
fnmDfnm
au
au
snmForstersnm
fnmAJfnm
au
etal
aug
sourceJ Clin Epidemiolsource
pubdate2004pubdate
volume57volume
fpage672fpage
lpage82lpage
xrefbib
pubidlist
pubid idtype="doi"10.1016j.jclinepi.2003.12.008pubid
pubid idtype="pmpid" link="fulltext"15358395pubid
pubidlist
xrefbib
bibl
bibl id="B9"
title
pOutcome based subgroup analysis: a neglected concernp
title
aug
au
snmHirjisnm
fnmKfnm
au
au
snmFagerlandsnm
fnmMfnm
au
aug
sourceTrialssource
pubdate2009pubdate
volume10volume
fpage33fpage
xrefbib
pubidlist
pubid idtype="pmcid"2693510pubid
pubid idtype="pmpid" link="fulltext"19454041pubid
pubid idtype="doi"10.11861745-6215-10-33pubid
pubidlist
xrefbib
bibl
bibl id="B10"
title
pWhen to Believe a Subgroup Analysisp
title
aug
au
snmGuyattsnm
fnmGfnm
au
au
snmWyersnm
fnmPCfnm
au
au
snmIoannidissnm
fnmJfnm
au
aug
sourceUser's Guide to the Medical Literature: A Manual for Evidence-Based Clinical Practicesource
publisherAMA: Chicagopublisher
editorGuyatt G, et aleditor
pubdate2008pubdate
fpage571fpage
lpage583lpage
bibl
bibl id="B11"
title
pWhen to believe a subgroup analysisp
title
aug
au
snmOxmansnm
fnmAfnm
au
au
snmGuyattsnm
fnmGfnm
au
au
snmGreensnm
fnmLfnm
au
etal
aug
sourceUsers' guides to the medical literature. A manual for evidence-based clinical practicesource
publisherChicago, IL: AMA Presspublisher
editorGuyatt G, Rennie Deditor
pubdate2002pubdate
fpage553fpage
lpage65lpage
bibl
bibl id="B12"
title
pTips for learners of evidence-based medicine: 4. Assessing heterogeneity of primary studies in systematic reviews and whether to combine their resultsp
title
aug
au
snmHatalasnm
fnmRfnm
au
au
snmKeitzsnm
fnmSfnm
au
au
snmWyersnm
fnmPfnm
au
etal
aug
sourceCMAJsource
pubdate2005pubdate
volume172volume
fpage661fpage
lpage665lpage
xrefbib
pubidlist
pubid idtype="pmcid"550638pubid
pubid idtype="pmpid" link="fulltext"15738493pubid
pubidlist
xrefbib
bibl
bibl id="B13"
title
pUsers' guide to detecting misleading claims in clinical research reportsp
title
aug
au
snmMontorisnm
fnmVMfnm
au
au
snmJaeschkesnm
fnmRfnm
au
au
snmSchunemannsnm
fnmHJfnm
au
etal
aug
sourceBMJsource
pubdate2004pubdate
volume329volume
fpage1093fpage
lpage1096lpage
xrefbib
pubidlist
pubid idtype="pmcid"526126pubid
pubid idtype="pmpid" link="fulltext"15528623pubid
pubid idtype="doi"10.1136bmj.329.7474.1093pubid
pubidlist
xrefbib
bibl
bibl id="B14"
title
pCriteria for the Implementation of Research Evidence in Policy and Practicep
title
aug
au
snmTrevorsnm
fnmAfnm
au
au
snmSheldonsnm
fnmGGAHfnm
au
aug
sourceGetting Research Findings Into Practicesource
editorAndrew Haines ADeditor
editionSecondedition
pubdate2008pubdate
fpage11fpage
lpage18lpage
bibl
bibl id="B15"
title
pThe sirens are singing: the perils of trusting trials stopped early and subgroup analysesp
title
aug
au
snmMartinsnm
fnmCMfnm
au
au
snmGuyattsnm
fnmGfnm
au
au
snmMontorisnm
fnmVMfnm
au
aug
sourceCrit Care Medsource
pubdate2005pubdate
volume33volume
fpage1870fpage
lpage1lpage
xrefbib
pubidlist
pubid idtype="doi"10.109701.CCM.0000174484.77537.F2pubid
pubid idtype="pmpid" link="fulltext"16096474pubid
pubidlist
xrefbib
bibl
bibl id="B16"
title
pMisuse of baseline comparison tests and subgroup analyses in surgical trialsp
title
aug
au
snmBhandarisnm
fnmMfnm
au
au
snmDevereauxsnm
fnmPJfnm
au
au
snmLisnm
fnmPfnm
au
etal
aug
sourceClin Orthop Relat Ressource
pubdate2006pubdate
volume447volume
fpage247fpage
lpage51lpage
xrefbib
pubidlist
pubid idtype="doi"10.109701.blo.0000218736.23506.fepubid
pubid idtype="pmpid"16672904pubid
pubidlist
xrefbib
bibl
bibl id="B17"
title
pSubgroup analyses in therapeutic cardiovascular clinical trials: are most of them misleadingp
title
aug
au
snmHernandezsnm
fnmAVfnm
au
au
snmBoersmasnm
fnmEfnm
au
au
snmMurraysnm
fnmGDfnm
au
etal
aug
sourceAm Heart Jsource
pubdate2006pubdate
volume151volume
fpage257fpage
lpage64lpage
xrefbib
pubidlist
pubid idtype="doi"10.1016j.ahj.2005.04.020pubid
pubid idtype="pmpid" link="fulltext"16442886pubid
pubidlist
xrefbib
bibl
bibl id="B18"
title
pStatistics in Medicine -- Reporting of Subgroup Analyses in Clinical Trialsp
title
aug
au
snmWangsnm
fnmRfnm
au
au
snmLagakossnm
fnmSWfnm
au
au
snmWaresnm
fnmJHfnm
au
etal
aug
sourceN Engl J Medsource
pubdate2007pubdate
volume357volume
fpage2189fpage
lpage2194lpage
xrefbib
pubidlist
pubid idtype="doi"10.1056NEJMsr077003pubid
pubid idtype="pmpid" link="fulltext"18032770pubid
pubidlist
xrefbib
bibl
bibl id="B19"
title
pSubgroup analysis and other (mis)uses of baseline data in clinical trialsp
title
aug
au
snmAssmannsnm
fnmSFfnm
au
au
snmPococksnm
fnmSJfnm
au
au
snmEnossnm
fnmLEfnm
au
etal
aug
sourceLancetsource
pubdate2000pubdate
volume355volume
fpage1064fpage
lpage9lpage
xrefbib
pubidlist
pubid idtype="doi"10.1016S0140-6736(00)02039-0pubid
pubid idtype="pmpid" link="fulltext"10744093pubid
pubidlist
xrefbib
bibl
bibl id="B20"
title
pSubgroup analysis and covariate adjustment in randomized clinical trials of traumatic brain injury: a systematic reviewp
title
aug
au
snmHernandezsnm
fnmAVfnm
au
au
snmSteyerbergsnm
fnmEWfnm
au
au
snmTaylorsnm
fnmGSfnm
au
etal
aug
sourceNeurosurgerysource
pubdate2005pubdate
volume57volume
fpage1244fpage
lpage53lpage
notediscussion 1244-53note
xrefbib
pubidlist
pubid idtype="doi"10.122701.NEU.0000186039.57548.96pubid
pubid idtype="pmpid" link="fulltext"16331173pubid
pubidlist
xrefbib
bibl
bibl id="B21"
title
pReporting on methods of subgroup analysis in clinical trials: a survey of four scientific journalsp
title
aug
au
snmMoreirasnm
fnmEDfnm
sufJrsuf
au
au
snmSteinsnm
fnmZfnm
au
au
snmSussersnm
fnmEfnm
au
aug
sourceBrazilian Journal of Medical and Biological Researchsource
pubdate2001pubdate
volume34volume
fpage1441fpage
lpage1446lpage
bibl
bibl id="B22"
title
pCochrane Handbook for Systematic Reviews of Interventions Version 5.0.1 [updated September 2008]p
title
aug
au
snmHigginssnm
fnmJPTfnm
au
au
snmGsnm
fnmSfnm
au
au
cnm(editors)cnm
au
aug
publisherThe Cochrane Collaborationpublisher
pubdate2008pubdate
bibl
bibl id="B23"
title
pThe measurement of observer agreement for categorical datap
title
aug
au
snmLandissnm
fnmJRfnm
au
au
snmKochsnm
fnmGGfnm
au
au
snmLandissnm
fnmJRfnm
au
etal
aug
sourceBiometricssource
pubdate1977pubdate
volume33volume
fpage159fpage
lpage74lpage
xrefbib
pubidlist
pubid idtype="doi"10.23072529310pubid
pubid idtype="pmpid"843571pubid
pubidlist
xrefbib
bibl
bibl id="B24"
title
pLOST to follow-up Information in Trials (LOST-IT): a protocol on the potential impactp
title
aug
au
snmAklsnm
fnmEAfnm
au
au
snmBrielsnm
fnmMfnm
au
au
snmYousnm
fnmJJfnm
au
etal
aug
sourceTrialssource
pubdate2009pubdate
volume10volume
fpage40fpage
xrefbib
pubidlist
pubid idtype="pmcid"2706244pubid
pubid idtype="pmpid" link="fulltext"19519891pubid
pubid idtype="doi"10.11861745-6215-10-40pubid
pubidlist
xrefbib
bibl
bibl id="B25"
title
pStopping randomized trials early for benefit: a protocol of the Study Of Trial Policy Of Interim Truncation-2 (STOPIT-2)p
title
aug
au
snmBrielsnm
fnmMfnm
au
au
snmLanesnm
fnmMfnm
au
au
snmMontorisnm
fnmVMfnm
au
etal
aug
sourceTrialssource
pubdate2009pubdate
volume10volume
fpage49fpage
xrefbib
pubidlist
pubid idtype="pmcid"2723099pubid
pubid idtype="pmpid" link="fulltext"19580665pubid
pubid idtype="doi"10.11861745-6215-10-49pubid
pubidlist
xrefbib
bibl
bibl id="B26"
title
pDiscrepancies in sample size calculations and data analyses reported in randomised trials: comparison of publications with protocolsp
title
aug
au
snmChansnm
fnmA-Wfnm
au
au
snmHrobjartssonsnm
fnmAfnm
au
au
snmJorgensensnm
fnmKJfnm
au
etal
aug
sourceBMJsource
pubdate2008pubdate
volume337volume
fpagea2299fpage
xrefbib
pubidlist
pubid idtype="pmcid"2600604pubid
pubid idtype="pmpid" link="fulltext"19056791pubid
pubid idtype="doi"10.1136bmj.a2299pubid
pubidlist
xrefbib
bibl
bibl id="B27"
title
pIssues related to subgroup analysis in clinical trialsp
title
aug
au
snmCuisnm
fnmLfnm
au
au
snmHungsnm
fnmHMfnm
au
au
snmWangsnm
fnmSJfnm
au
etal
aug
sourceJ Biopharm Statsource
pubdate2002pubdate
volume12volume
fpage347fpage
lpage58lpage
xrefbib
pubidlist
pubid idtype="doi"10.1081BIP-120014565pubid
pubid idtype="pmpid"12448576pubid
pubidlist
xrefbib
bibl
bibl id="B28"
title
pSubgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practiceand problemsp
title
aug
au
snmPococksnm
fnmSJfnm
au
au
snmAssmannsnm
fnmSEfnm
au
au
snmEnossnm
fnmLEfnm
au
etal
aug
sourceStatistics in Medicinesource
pubdate2002pubdate
volume21volume
fpage2917fpage
lpage2930lpage
xrefbib
pubidlist
pubid idtype="doi"10.1002sim.1296pubid
pubid idtype="pmpid" link="fulltext"12325108pubid
pubidlist
xrefbib
bibl
bibl id="B29"
title
pSubgroup analyses in randomized trials: risks of subgroup-specific analyses; power and sample size for the interaction testp
title
aug
au
snmBrookessnm
fnmSTfnm
au
au
snmWhitelysnm
fnmEfnm
au
au
snmEggersnm
fnmMfnm
au
etal
aug
sourceJ Clin Epidemiolsource
pubdate2004pubdate
volume57volume
fpage229fpage
lpage36lpage
xrefbib
pubidlist
pubid idtype="doi"10.1016j.jclinepi.2003.08.009pubid
pubid idtype="pmpid" link="fulltext"15066682pubid
pubidlist
xrefbib
bibl
bibl id="B30"
title
pTreating individuals 2. Subgroup analysis in randomised controlled trials: importance, indications, and interpretationp
title
aug
au
snmRothwellsnm
fnmPMfnm
au
aug
sourceLancetsource
pubdate2005pubdate
volume365volume
fpage176fpage
lpage86lpage
xrefbib
pubidlist
pubid idtype="doi"10.1016S0140-6736(05)17709-5pubid
pubid idtype="pmpid" link="fulltext"15639301pubid
pubidlist
xrefbib
bibl
bibl id="B31"
title
pThe CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trialsp
title
aug
au
snmMohersnm
fnmDfnm
au
au
snmSchulzsnm
fnmKFfnm
au
au
snmAltmansnm
fnmDGfnm
au
aug
sourceLancetsource
pubdate2001pubdate
volume357volume
fpage1191fpage
lpage1194lpage
xrefbib
pubidlist
pubid idtype="doi"10.1016S0140-6736(00)04337-3pubid
pubid idtype="pmpid" link="fulltext"11323066pubid
pubidlist
xrefbib
bibl
refgrp
bm
art


Subgroup Analysis of Trials Is Rarely Easy (SATIRE): a study protocol for a systematic review to characterize the analys...
CITATION SEARCH DOWNLOADS PDF VIEWER PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00100283/00001
 Material Information
Title: Subgroup Analysis of Trials Is Rarely Easy (SATIRE): a study protocol for a systematic review to characterize the analysis, reporting, and claim of subgroup effects in randomized trials
Series Title: Trials 2009, 10:101
Physical Description: Archival
Creator: Sun X
Briel M
Busse JW
Akl EA
You JJ
Mejza F
Bala M
Diaz-Granados N
Bassler D
Mertz D
Srinathan SK
Vandvik PO
Malaga G
Alshurafa M
Dahm P
Alonso-Coello P
Heels-Ansdell DM
Bhatnagar N
Johnston BC
Wang L
Walter SD
Altman DG
Guyatt GH
Publication Date: 40126
 Record Information
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: Open Access: http://www.biomedcentral.com/info/about/openaccess/
System ID: UF00100283:00001

Downloads
Full Text



Trials BioMed Central



Study protocol

Subgroup Analysis of Trials Is Rarely Easy (SATIRE): a study
protocol for a systematic review to characterize the analysis,
reporting, and claim of subgroup effects in randomized trials
Xin Sun1,2, Matthias Briell,3, Jason W Bussel,4, Elie A Akl5, John J You'6,
Filip Mejza7, Malgorzata Bala8, Natalia Diaz-Granados', Dirk Bassler9,
Dominik Mertz1,10, Sadeesh K Srinathan1'11, Per Olav Vandvik12,
German Malaga13, Mohamed Alshurafa', Philipp Dahm'4, Pablo Alonso-
Coello15,16, Diane M Heels-Ansdell', Neera Bhatnagar17,
Bradley C Johnston', Li Wang2, Stephen D Walter', Douglas G Altman18 and
Gordon H Guyatt* ,6


Address: 'Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada, 2Center for Clinical Epidemiology and
Evidence-Based Medicine, West China Hospital, Sichuan University, Chengdu, PR China, 3Basel Institute for Clinical Epidemiology and
Biostatistics, University Hospital Basel, Basel, Switzerland, The Institute for Work & Health, Toronto, Ontario, Canada, 5Departments of Medicine
and Family Medicine, State University of New York at Buffalo, NY, USA, 6Department of Medicine, McMaster University, Hamilton, Canada,
7Department of Pulmonary Diseases, Jagiellonian University School of Medicine, Krakow, Poland, 8Department of Internal Medicine, Jagiellonian
University School of Medicine, Krakow, Poland, 9University Children's Hospital Tuebingen, Department of Neonatology, Tuebingen, Germany,
'1Division of Infectious Diseases & Hospital Epidemiology, University Hospital Basel, Switzerland, "Section of Thoracic Surgery, Department of
Surgery, University of Manitoba, Winnipeg, Manitoba, Canada, 12Norwegian Knowledge Centre for the Health Services, Oslo, Norway,
13Universidad Peruana Cayetano Heredia, Lima, Peru, 14Department of Urology, University of Florida, College of Medicine, Gainesville, Florida,
USA, "Iberoamerican Cochrane Center. Hospital de la Santa Creu i Sant Pau, Barcelona, Spain, 16CIBER de Epidemiologia y Salud Puiblica
(CIBERESP), Spain, 17Health Sciences Library, McMaster University, Hamilton, Canada and 18Centre for Statistics in Medicine, University of
Oxford, Oxford, UK
Email: Xin Sun sunx26@mcmaster.ca; Matthias Briel MBriel@uhbs.ch; Jason W Busse jbusse@iwh.on.ca; Elie A Akl elieakl@buffalo.edu;
John J You jyou@mcmaster.ca; Filip Mejza filipmejza@mp.pl; Malgorzata Bala gosiabala@mp.pl; Natalia Diaz-
Granados natalia.diaz.granados@utoronto.ca; Dirk Bassler dirk.bassler@med.uni-tuebingen.de; Dominik Mertz DMertz@uhbs.ch;
Sadeesh K Srinathan ssrinathan@gmail.com; Per Olav Vandvik pvandvik@start.no; German Malaga gmalaga0l @gmail.com;
Mohamed Alshurafa alshurm@mcmaster.ca; Philipp Dahm Philipp.Dahm@urology.ufl.edu; Pablo Alonso-Coello PAlonso@santpau.cat;
Diane M Heels-Ansdell ansdell@mcmaster.ca; Neera Bhatnagar bhatnag@mcmaster.ca; Bradley C Johnston bjohnston@med.ualberta.ca;
Li Wang wangli_74@hotmail.com; Stephen D Walter walter@mcmaster.ca; Douglas G Altman doug.altman@csm.ox.ac.uk;
Gordon H Guyatt* guyatt@mcmaster.ca
* Corresponding author



Published: 9 November 2009 Received: 12 September 2009
Trials2009, 10:101 doi:10.1186/1745-6215-10-101 Accepted: 9 November 2009
This article is available from: http://www.trialsjournal.com/content/ 10/1/10 I
2009 Sun et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.



Abstract
Background: Subgroup analyses in randomized trials examine whether effects of interventions
differ between subgroups of study populations according to characteristics of patients or
interventions. However, findings from subgroup analyses may be misleading, potentially resulting in
suboptimal clinical and health decision making. Few studies have investigated the reporting and
conduct of subgroup analyses and a number of important questions remain unanswered. The
objectives of this study are: I) to describe the reporting of subgroup analyses and claims of


Page 1 of 10
(page number not for citation purposes)







http://www.trialsjournal.com/content/10/1/101


subgroup effects in randomized controlled trials, 2) to assess study characteristics associated with
reporting of subgroup analyses and with claims of subgroup effects, and 3) to examine the analysis,
and interpretation of subgroup effects for each study's primary outcome.
Methods: We will conduct a systematic review of 464 randomized controlled human trials
published in 2007 in the 118 Core Clinical Journals defined by the National Library of Medicine.
We will randomly select journal articles, stratified in a 1:1 ratio by higher impact versus lower
impact journals. According to 2007 ISI total citations, we consider the New England journal of
Medicine, JAMA, Lancet, Annals of Internal Medicine, and BMJ as higher impact journals. Teams of two
reviewers will independently screen full texts of reports for eligibility, and abstract data, using
standardized, pilot-tested extraction forms. We will conduct univariable and multivariable logistic
regression analyses to examine the association of pre-specified study characteristics with reporting
of subgroup analyses and with claims of subgroup effects for the primary and any other outcomes.
Discussion: A clear understanding of subgroup analyses, as currently conducted and reported in
published randomized controlled trials, will reveal both strengths and weaknesses of this practice.
Our findings will contribute to a set of recommendations to optimize the conduct and reporting
of subgroup analyses, and claim and interpretation of subgroup effects in randomized trials.


Background
The effects of healthcare interventions on the entire study
population are of primary interest in clinical trials. It
remains appealing, however, for investigators and clini-
cians to identify differential effects in subgroups based on
characteristics of patients or interventions. This analytic
approach, termed subgroup analysis, can sometimes be
informative but it is often misleading [1-4].

Investigators frequently conduct subgroup analyses
exploring multiple hypotheses [5]. Conducting multiple
tests is associated with the risk of false positive results due
to the play of chance [3]. This risk is particularly great if
subgroup analyses are data driven: that is, when investiga-
tors perform numerous post hoc subgroup analyses seeking
statistical significance. Even when investigators specify a
limited number of subgroup analyses a priori, the play of
chance may still result in identification of spurious sub-
group effects.

Sometimes, investigators explore possible subgroup
effects by testing the null hypothesis of no treatment effect
in each of the relevant subgroups. A claim of subgroup
effect is made if a significant effect is observed in one sub-
group but not in the others) [6,7]. This strategy, however,
fails to address the real issue of subgroup analysis: can
chance explain the apparent difference between sub-
groups? This question can be addressed with a formal test
of interaction in which the null hypothesis is that the
underlying effect across subgroups is the same. In another
instance, investigators report and claim the effect of one
subgroup of patients while ignore reporting of other sub-
groups. Investigators may also test the difference of effects
between groups according to the study characteristic
measured after randomization. The apparent difference of


effects may, however, be explained by the treatment inter-
vention itself, or by differing prognostic characteristics in
sub-groups that emerge after randomization, rather than
by the subgroup characteristic itself. Therefore, this
approach to analyzing subgroups is highly problematic
[4,8,9].

Many apparent subgroup effects have been proven to be
spurious [10]. Misleading subgroup effects can result in
withholding efficacious treatment from patients who
would benefit, or encourage ineffective or potentially
harmful treatments for subgroups who would fare better
without. It is, therefore, imperative to critically assess the
validity of claimed subgroup effects. One approach is to
use seven previously proposed criteria for determining
whether apparent differences in subgroup response are
likely to be real [11]. These criteria have been widely used
to evaluate subgroup analyses in randomized controlled
trials (RCTs) and meta-analyses [12-15]. Several new cri-
teria may further facilitate differentiation between spuri-
ous and real subgroup effects (Appendix 1).

A limited number of empirical studies have evaluated
how trialists conduct and report subgroup analyses, and
have revealed several weaknesses (Table 1) [16-21]. Weak-
nesses include the use of an excessive number of variables
and outcomes, inappropriate statistical methods, and
insufficient a priori specification of variables. A review of
subgroup analyses reported in cardiovascular trials [17],
for instance, identified one study reported 23 subgroup
variables and 17 outcomes. In another review of 27 surgi-
cal trials [16], a test of interaction was reported for only
5.8% (3/54) of subgroup hypotheses tested, whereas
72.2% (39/54) claimed subgroup effects. Across six
reviews of subgroup analyses, the prevalence of trials


Page 2 of 10
(page number not for citation purposes)


Trials 2009, 10:101








http://www.trialsjournal.com/content/10/1/101


Table I: Characteristics of six studies reviewing subgroup analyses in randomized trials


Study ID


Wang (2007) Multiple


Bhandari (2006)


Surgical


Hernandez (2006) Cardiovascular


Hernandez (2005) Traumatic brain



Moreira Jr (2001) Multiple


Assmann (2000)


Multiple


NEJM (July 2005 to June 2006)


Two surgical journals plus NEJM,
JAMA, BMJ, and Lancet
(Jan 2000 to Apr 2003)
Four cardiovascular journals plus
"Top Five" (2002 and 2004)

MEDLINE (1966 to Apr 2004),
EMBASE (1978 to Apr 2004),
CENTRAL (Apr 2004)

NEJM, JAMA, Lancet, American
Journal of Public Health (July 1998)

NEJM, JAMA, BMJ, and Lancet
(July to Sep 1997)


Trial area Source of study


Trial feature for eligibility
criteria

No restrictions


No restriction on size and other trial
characteristics

Phase 3 parallel trials, n > 100,
superiority trials; restricted to main
reports
Phase 3, parallel trials, n > 50 per
arm
Glasgow Outcome Scale (GOS) at 3
months as outcome
No restrictions mentioned.


No crossover and cluster trials, n >
50


claiming at least one subgroup effect ranged from 25% to
60% [16-20]. Two studies one [18] restricted to trials
published in the New England Journal of Medicine, and
another [17] restricted to moderate or large sized cardio-
vascular trials found that larger sample size was the only
study characteristic statistically associated with reporting
of subgroup analyses.

Despite the merits of these studies, each of them exam-
ined only a relatively small number of trials (median 57,
range 11-97). None compared the reporting of subgroup
analyses in higher impact journals versus other journals;
none examined the reporting of subgroup analyses in rela-
tion to type of outcomes (e.g. continuous, binary, time-to-
event, count, or multinomial); and none specifically
examined subgroup analysis reporting for the primary
outcome. In addition, none of the previous reviews docu-
mented the magnitude of the apparent subgroup effects
and magnitude of p-values of interaction tests; none
investigated the validity of claimed subgroup effects; none
investigated study characteristics associated with claim of
subgroup effects; and none addressed the credibility of the
claimed subgroup effects.

These shortcomings limit the generalizability of findings
and leave important questions unanswered. Therefore, we
will conduct a systematic review of RCTs to further inform
the current use and reporting of subgroup analyses.

In this study, we have three main objectives. The first is to
describe the reporting of subgroup analyses and claim of
subgroup effects. The second is to assess study characteris-


tics associated with reporting of subgroup analyses, and
study characteristics associated with claim of subgroup
effects, both for the primary outcome and for any out-
come. The third objective is to examine the analysis and
interpretation of subgroup effects conducted for the pri-
mary outcome.

Methods
Study Design Overview
We will conduct a systematic review of RCTs conducted in
humans and published in 2007 in the Core Clinical Jour-
nals defined by the National Library of Medicine http://
www.nlm.nih.gov/bsd/aim.html. To maximize the gener-
alizability of study findings, we will include parallel,
cross-over, and factorial randomized trials, and both indi-
vidual and cluster randomised trials. Unless the authors
report findings to the contrary, we will assume no treat-
ment-by-treatment interaction in factorial studies, no
treatment-by-sequence interaction in cross-over studies,
and no treatment-by-cluster interactions in cluster-rand-
omized studies. We will use the standard methodology for
conducting systematic reviews [22].

Definition of Subgroup, Subgroup Analysis, and Subgroup
Effect
For this study, we define a subgroup as a subset of a trial
population that is identified on the basis of a patient or
intervention characteristic that is either measured at base-
line or after randomization.

We define a subgroup analysis as a statistical analysis that
explores whether effects of the intervention (i.e. experi-




Page 3 of 10
(page number not for citation purposes)


Number of trials


97
(59 reporting subgroup
analyses)
72
(27 reporting subgroup
analyses)
63
(39 reporting subgroup
analyses)
18
(I I reporting subgroup
analyses)

32
(17 reporting subgroup
analyses)
50
(35 reporting subgroup
analyses)


Trials 2009, 10:101







http://www.trialsjournal.com/content/10/1/101


mental versus control) differ according to status of a sub-
group variable. This includes a case in which investigators
report a main result and analyze only a subset of patients.

We define a subgroup effect as a difference in the magni-
tude of a treatment effect across subgroups of a study pop-
ulation. The null hypothesis for a test of a subgroup effect
(i.e. subgroup hypothesis) is that there is no difference in
the magnitude of a treatment effect across subgroups. We
will consider both absolute and relative effect measures in
our study.

Eligibility Criteria
The inclusion criteria are:

1) The study is an RCT;

2) The participants are human;

3) The study is published in 2007 in a core clinical journal
(as defined by the National Library of Medicine).

The exclusion criteria are:

1) The report does not include the entire population
enrolled in the original study (i.e. the report focuses on a
subset of the original study population);

2) The study is explicitly labelled as a phase I trial;

3) The study is exclusively a pharmacokinetic study;

4) The study is reported as a Research Letter.

No restrictions apply with respect to the following aspects:

* Trial design (i.e., parallel, factorial or cross-over);

* Number of trial arms (i.e., two or more);

* Unit of randomization (i.e., individual patient or clus-
ter);

* Type of outcome (i.e., continuous, binary, time-to-
event, count, or multinomial);

* Type of trial (i.e., superiority, non-inferiority or equiva-
lence trial);

* Type of report (i.e., main report, longer follow-up
report, or interim report);

* Subgroup variables measured at baseline versus after
randomization.


* Sample size, length of follow up, and loss to follow up;

* Statistical significance versus non-significance of overall
main effects;

Literature Search
We will search for RCTs published in the Core Clinical
Journals in 2007. This group of journals is defined by the
National Library of Medicine, includes a total of 118 jour-
nals covering all specialities of clinical medicine and pub-
lic health sciences, and is known as the Abridged Index
Medicus. We will run the Medline search using the OVID
platform and a search strategy (Appendix 2) developed
with the help of an experienced librarian.

Random Sampling of Citations
We will stratify the Core Clinical Journals into higher and
lower impact journals. For this study we define higher
impact journals as the five journals with the highest total
citations in 2007: the New England Journal of Medicine,
JAMA, Lancet, Annals of Internal Medicine, and BMJ. Lower
impact journals consist of the remaining Core Clinical
Journals. We will randomly sample the journal articles,
with 1:1 stratification by journal type (i.e. higher and
lower impact). We will continue the random sampling
process until the number of eligible studies meets our
required sample size.

Review process
Teams of two trained reviewers will perform citation and
full text screening and data abstraction, in duplicate and
independently, including the selection of the primary out-
come (using pre-specified criteria see below), selection
of the pair-wise comparison for analysis (if there are three
or more arms). Each team will attempt to resolve discrep-
ancies by consensus or, if discrepancy remains, through
discussion with one of two arbitrators (XS, GHG). The
arbitrator will independently review the trial report before
discussing it with the reviewers. Before the review for-
mally starts, we will conduct calibration exercises to
ensure consistency across reviewers. We will use electronic
forms, developed with Microsoft Access and Excel, for
study screening and data extraction. The forms will be
standardized and pilot-tested, and detailed written
instructions will be developed to assist with study screen-
ing and data extraction.

Study Screening
Two reviewers will independently screen the title and
abstract of each randomly chosen citation for potential
eligibility. In the title and abstract screening, they will
judge only if the study is a randomized controlled trial
enrolling human participants. Two reviewers will then
independently screen the full text of the potentially eligi-
ble trials to determine eligibility.


Page 4 of 10
(page number not for citation purposes)


Trials 2009, 10:101







http://www.trialsjournal.com/content/10/1/101


At the full text screening stage, the reviewers will select a
primary outcome for eligible studies, using the following
strategy: If the report specifies a primary outcome, we will
select it as the primary outcome; if the report specifies
more than one primary outcome (i.e. co-primary out-
comes), we will select the one with the largest number of
subgroup analyses; if outcomes have the same number of
subgroup analyses, we will select the one with the greatest
relevance to patients according to a pre-defined outcome
hierarchy, and if more than one outcome are in the same
category, we will take the first reported outcome in the
abstract (Appendix 3). If the report does not specify a pri-
mary outcome, we will select the outcome used for the
study sample size calculation, but if there is no sample
size calculation reported or if there is a sample size calcu-
lation for several outcomes, we will proceed as detailed in
the previous sentence.

Reviewers will also identify a pair-wise comparison of
interest, using the following strategy. If there are only two
groups, we will use them for the pair-wise comparison. If
there are three or more groups, we will select the compar-
ison that was clearly and explicitly defined as the primary
comparison in the study report; if the primary comparison
was not explicitly defined, we will select the comparison
that reports the largest number of subgroup analyses for
the selected primary outcome; if more than one compari-
son reported the same largest number of subgroup analy-
ses, we will select the comparison that reports the smallest
interaction p value; if the interaction p value is not availa-
ble, we will select the one that has the smallest p value for
the main effect.

Data Abstraction
Study Characteristics
We will extract information on funding sources, clinical
area, type of intervention, trial design (parallel, cross-over,
or factorial), trial type (superiority, non-inferiority, or
equivalence), unit of randomization (randomization at
individual or cluster level), methodological characteristics
of trials (allocation concealment; blinding of patients,
healthcare givers, data collectors, outcome adjudicators,
or data analysts; stopping trials early for benefit), number
of participants randomized for the selected comparison,
and total number of participants randomized.

We will categorise the selected primary outcome, accord-
ing to whether it is a composite endpoint, whether the
results are statistically significant, and the type of outcome
variable (time-to-event, binary, continuous, count, or
multinomial). We will record the type of effect measure
for the selected primary outcome. If more than one effect
measure is used for binary, time-to-event, or count out-
comes, we will use a hierarchical approach to select an
effect measure, as follows:


* Select the effect measure that the investigators clearly
indicated as the effect measure for the primary analysis;

* Select the effect measure on which the subgroup analysis
is reported and a subgroup effect is claimed;

* Select the measure that yields the smallest reported p-
value of the main effect;

* Otherwise, use the following order for binary outcomes:
risk ratio > odds ratio > relative risk reduction > risk differ-
ence; and the following for time-to-event outcomes: haz-
ard ratio > incidence rate ratio > ratio of cumulative
incidence > ratio of time > difference in incidence rate >
difference in cumulative incidence > difference in time

If no effect measure is reported but data for a 2 x 2 table
are available for the primary outcome, we will calculate
risk ratios.

For binary, time-to-event, and count primary outcomes,
we will document their point estimates and 95% confi-
dence intervals for the main effects, as well as whenever
possible events and number of patients in a 2 x 2 table.
For continuous outcomes, we will document the number
of patients analyzed in the experimental and control
groups, and the summary measure (i.e. means, medians)
and associated measure of precision (i.e. inter-quartile
range, 95% confidence interval, standard deviation, or
standard error). We will not document the magnitude of
the main effect for multinomial primary outcomes.

Reporting of subgroup analyses
We will record whether trials report subgroup analyses for
any outcomes (i.e. primary or secondary), the number of
outcomes for which subgroup analyses are reported, the
type of outcomes, the number of subgroup variables
reported in the trial report, the number of subgroup anal-
yses that were most likely conducted, the number of sub-
group analyses reported, whether any subgroup analysis
was specified a priori, and whether any subgroup effect
was stated to have been analyzed by a test of interaction.
We will also document the above information specifically
for the primary outcome.

We will consider a subgroup analysis has been reported if:
1) the investigators report a point estimate and an associ-
ated confidence interval or a p-value for one or more sub-
groups of the study original population, 2) the
investigators report the magnitude of difference in the
effect according to status of a subgroup variable, 3) the
investigators report results from an interaction test, or 4)
the investigators explicitly state that they conducted sub-
group analyses but do not report any of the data men-
tioned above.


Page 5 of 10
(page number not for citation purposes)


Trials 2009, 10:101








http://www.trialsjournal.com/content/10/1/101


Claim of subgroup effects
We will record whether trials claim a subgroup effect for
any outcomes (i.e. primary or secondary outcome),
number of subgroup effects claimed in the trial report,
and type of outcomes used for the claim. We will judge the
strength of the claim based on the inferences drawn by the
investigators in the abstract or discussion section. We will
also document the above information specifically for the
primary outcome.

We will consider a subgroup effect is claimed if, in the
abstract or discussion of the trial report, the investigators
state that the effects of intervention differed, or may have
differed, according to status of a subgroup variable.

We will classify the strength of a claim according to four
categories, and have defined these categories as below:

1) Strong claim of a definitive effect: The authors convey
a conviction that the subgroup effect truly exists.

2) Claim of a likely effect: The authors convey a belief that
the subgroup effect likely exists.

3) Suggestion of a possible effect: The authors suggest a
subgroup effect and convey an uncertainty whether the
subgroup effect exists.

4) No claim of a subgroup effect: The authors do not make
a claim of a subgroup effect.
Table 2: Criteria for judging the strength of a subgroup claim


We have developed explicit criteria to judge the strength of
claim (Table 2).

Analysis of subgroup effect for the primary outcome
We will document, for each subgroup analysis, whether
the subgroup variable is a baseline characteristic or based
on an after-randomization event, whether the investiga-
tors specified the variable a priori, whether the investiga-
tors specified the direction a priori, whether the subgroup
variable was used as a stratification factor in randomiza-
tion, the type of tests used for analyzing subgroup effects
(test of significance of individual groups, interaction test,
or both), the statistical approaches used for a test of inter-
action, and the methods of adjusting for multiple interac-
tion effects.

We will also document, whenever possible, the 2 x 2 data,
the reported point estimate, 95% confidence interval, and
p-value of the effect of each subgroup, as well as the
reported p-value of the interaction test.

Interpretation of claimed subgroup effect for the primary outcome
For each of the claimed subgroup effects, we will further
document whether the authors provided a supportive bio-
logical rationale or cited external evidence that is consist-
ent with the observed subgroup effect, whether the
authors indicated that the pre-specified direction was cor-
rect, or that they indicated the observed subgroup effect
was consistent across closely related outcomes.


Criteria


Strong claim Claim of a likely effect


I. Did the investigators claim the effect in the abstract?

2. Did the investigators claim the effect in the conclusion
of abstract?

3. Did the investigators claim the effect in the discussion?

4. Did the investigators use the descriptive words (e.g.
appear/seem to be, may, and might) to soften their
statements of the claims?

5. Did the investigators used descriptive words (e.g.
particular, and special) to strengthen the statement of the
claims


Possible


Possible*


Possible

Possible


Suggestion of a possible effect

No

No


Possible


Possible


6. Were the authors obviously cautious about the No Some cat
apparent subgroup effect? (e.g. they stated the subgroup
effect did not meet some of important criteria to believe a
subgroup effect)

7. Did the investigators indicate the apparent effects need No Possible say d
to be explored in the future studies (i.e. hypothesis
generating)?

* If a claim appears in the conclusion section of the abstract, it is considered a strong claim.


ition possible


desirable to confirm


Page 6 of 10
(page number not for citation purposes)


Trials 2009, 10:101







http://www.trialsjournal.com/content/10/1/101


Sample Size
We conducted a pilot study including 139 randomized tri-
als. The results showed that 62 (44.6%) trials reported
subgroup analyses for any outcome, and 41 (29.5%)
reported for the primary outcome; 27 (19.4%) trials
claimed subgroup effect for any outcome, and 18 (12.9%)
claimed for the primary outcome.

We calculate the sample size based on the examination of
study characteristics associated with claim of subgroup
effects for any outcome. In our regression of study charac-
teristics with claim of subgroup effects, we will include 6
study characteristics, a total of 9 categories of variables.
We will require 10 events (i.e. claim of subgroup effect)
per category to examine the association, resulting in a
total of 90 events (and at least 90 total non-events). Given
the results of pilot study, we will require a total of 464 tri-
als for this study.

Statistical Analysis
We will assess agreement between reviewers for study
inclusion at the full text screening stage, reviewers' judg-
ments whether the investigators reported a subgroup anal-
ysis, claimed a subgroup effect, pre-specified the subgroup
hypothesis, or used the interaction test. We will calculate
both crude agreement and chance-corrected agreement.
We will interpret the agreement statistics using the guide-
lines proposed by Landis and Koch [23]: kappa values of
0 to 0.20 represent slight agreement, 0.21 to 0.40 fair
agreement, 0.41 to 0.60 moderate agreement, 0.61 to 0.80
substantial agreement, and greater than 0.80 almost per-
fect agreement.

We will calculate the proportions of trials reporting at
least one subgroup analysis for the primary outcome and
for any outcome. Treating the reporting of a subgroup
analysis as the dependent variable, we will conduct uni-
variable and multivariable logistic regression analyses to
examine its association with the pre-specified study char-
acteristics for both the primary outcome and for any out-
come.

We will also calculate the proportions of trials claiming a
subgroup effect for the primary outcome and for any out-
come in trials that report a subgroup analysis, and con-
duct univariable and multivariable logistic regression
analyses to examine the association of pre-specified study
characteristics with claim of a subgroup effect for the pri-
mary outcome and for any outcome.

Our pre-specified study characteristics for the regression
analyses are: average sample size per study arm, journal
type (high vs. lower impact journals), source of funding
(partially or completely funded by private for profit
organization vs. others), statistical significance of the


main effect, trial area (medical vs. surgical), number of
pre-specified primary outcomes (used for the regression of
reporting of subgroup analyses only), number of sub-
group analyses (used for the regression of claim of sub-
group effects only). We hypothesize that trials are more
likely to report subgroup analyses or claim subgroup
effect if they have larger sample size, are published in
higher impact journals, receive funding from for profit
organizations, do not achieve statistical significance for
the main effect, investigate medical versus surgical inter-
ventions, have more pre-specified primary outcomes, and
larger number of subgroup analyses. In the multiple logis-
tic regression analysis for reporting of subgroup analysis,
we will also examine the interaction of source of funding
and significance of main effect.

We will describe the details of reporting of subgroup anal-
yses and claim of subgroup effects for both any outcome
and specifically for the primary outcome. If a variable, in
both univariable and multivariable analyses, is found to
be significantly associated with reporting of a subgroup
analysis and/or claim of a subgroup effect, we will also
present the above information stratified by the type of
journal.

We will describe the details of analysis of subgroup effects
for the primary outcome by journal type (i.e. five highest
impact journals versus other journals), and by claim ver-
sus no claim of a subgroup effect. We will also describe the
details of interpretation of claimed subgroup effects by
journal type.

Discussion
Our study is designed to comprehensively address the
analysis, reporting, and claim of subgroup effects in a rep-
resentative sample of recent RCTs. This study protocol fol-
lows the publications of two other protocols [24,25]
which reflects our continuing efforts to make objectives
and design of methodological studies more transparent.

Strengths and limitations
Our study has several strengths. First, we will employ rig-
orous systematic review methods including explicit and
reproducible eligibility criteria, sensitive search strategies,
and the use of standardized, pilot-tested forms accompa-
nied by written instructions for study screening and data
extraction. Teams of two trained reviewers will independ-
ently and in duplicate conduct study screening. We will
also undertake calibration exercises and pilot data extrac-
tion to enhance consistency between reviewers before
embarking on data abstraction. Second, our eligibility cri-
teria are broad, and compared to the previous empirical
studies our study findings will be more generalizable.
Third, we conducted a pilot study to calculate the required
sample size for the definitive study. Finally, our study will


Page 7 of 10
(page number not for citation purposes)


Trials 2009, 10:101







http://www.trialsjournal.com/content/10/1/101


be the largest empirical study of subgroup analyses which
will allow us to reliably address a number of important
questions that have not been addressed by existing
reviews.

Our study also has several limitations. It will be based on
reported trial information, and our findings may be vul-
nerable to underreporting or selective reporting [26]. The
limited space allowed by medical journals for reporting
on trials may prevent authors from sufficiently reporting
relevant information on subgroup analyses. Conse-
quently, the proportion of trials reporting subgroup anal-
yses is probably smaller than the proportion of trials
actually conducting subgroup analyses, and the number
of subgroup analyses reported in each trial is probably
smaller than the actual number of conducted subgroup
analyses. In relation to this problem, we will also estimate
the number of subgroup analyses that were most likely
conducted. Similarly, other details about subgroup analy-
ses, such as a priori specification of the subgroup hypoth-
esis and direction, may also be under-reported.

Our study does not include all medical journals, and our
findings may not be applicable to journals outside our
sample. Our study, however, includes many more jour-
nals than the previous studies that typically included high
impact journals or specialty journals only. We chose the
Core Clinical Journals because they cover all clinical and
public health areas, and include all major medical jour-
nals. We consider that the quality of studies in these jour-
nals will be no worse than that in other journals, and
expect that the quality of subgroup analyses reported in
other journals will be no better than that in the Core Clin-
ical Journals.

Our study will involve reviewers' judgement of the
strength of the claim of subgroup effect, and the determi-
nation of strength may be subjective and vary across
reviewers. We have developed detailed written instruc-
tions to assist reviewers in judging the strength, and will
check the inter-reviewer agreement.

Implications of this study
Although a few empirical studies restricted to certain dis-
ease areas or journal type have found a significant associ-
ation between sample size and reporting of subgroup
analyses, factors that drive reporting and claiming of sub-
group effects in a more representative set of trials remain
uncertain. The results of this study will provide robust,
generalizable, and reliable evidence on the factors that
impact reporting and claiming of subgroup effects.

Considerable work, including methodological advocacy
[3,27-31] and empirical investigation [5,18,19], has been
done to inform the conduct of subgroup analyses. How-


ever, few reports have systematically developed the frame-
work of analysis, reporting, claim, and interpretation of
subgroup effects. The findings of this study will further aid
in the development of recommendations for adequate
reporting, and appropriate analysis, claim, and interpreta-
tion of subgroup effects.

Claimed subgroup effects are of primary interest to clini-
cians, investigators and other users. Claims of spurious
subgroup effects can distort clinical practice and public
health decision making, with serious consequences for
patients and unnecessary expenditures. Methodological
safeguards have been proposed to protect from spurious
subgroup findings [4,10,30], but empirical evidence of
their validity is limited. The results of this study will reveal
the extent to which the investigators considered method-
ological safeguards in their claims, and provide some evi-
dence regarding the extent to which claims of subgroup
effects are valid.

The findings of the SATIRE study may influence recom-
mendations on reporting, conduct, claim, and interpreta-
tion of subgroup analyses. These will be of particular
interest to the stakeholders that have direct influence on
trial design, analysis, and reporting, including investiga-
tors, health decision makers, guideline developers, fund-
ing agencies, and medical journal editors.

Competing interests
The authors declare that they have no competing interests.

Authors' contributions
XS and GHG conceptualized the study. All authors con-
tributed to design of the study and read and approved the
manuscript. XS developed the first draft of the manuscript
and incorporated comments from authors for successive
drafts.

Appendices
Appendix I. The eleven criteria for assessing credibility of
claimed subgroup effects
* Is the subgroup variable a characteristic at randomization?

* Is the effect suggested by comparisons within rather than
between studies?

* Does interaction test suggests a low likelihood that
chance explains the apparent subgroup effect?

* Is the significant interaction effect independent of other
potential subgroup effects?

* Was the hypothesis specified a priori?

* Was the correct direction of subgroup effect specified a priori?


Page 8 of 10
(page number not for citation purposes)


Trials 2009, 10:101








http://www.trialsjournal.com/content/10/1/101


* Was the subgroup effect one of a small number of
hypothesized effects tested?

* Is the magnitude of the subgroup effect large?

* Is the interaction consistent across studies?

* Is the interaction consistent across closed related outcomes
within the study?

* Is there indirect evidence that supports the hypothesized
interaction?

The new criteria are italicized.

Appendix 2: Search strategy
1. exp Randomized Controlled Trials/

2. (randomized controlled trial$ or randomised control-
led trial$).mp. [mp = title, original title, abstract, name of
substance word, subject heading word]

3. (randomized trial$ or randomised trial$).mp. [mp =
title, original title, abstract, name of substance word, sub-
ject heading word]

4. (randomized clinical trial$ or randomised clinical
trial$).mp. [mp = title, original tite, abstract, name of
substance word, subject heading word]

5. 1 or 2 or 3 or 4

6. limit 5 to (English language and humans and "core
clinical journals (aim)" and yr="2007")

Appendix 3: Hierarchy of outcomes
I. Mortality

1) all cause mortality

2) disease specific mortality

II. Morbidity

1) cardiovascular major morbid events

2) other major morbid events (e.g. loss of vision,
seizures, fracture, revascularization)

3) recurrence/relapse/remission of cancer/disease
free survival

4) renal failure requiring dialysis


6) infections

7) dermatological/rheumatologic disorders

III. Symptoms/Quality of life/Functional status (e.g.
failure to become pregnant, successful nursing/breast-
feeding, depression)

IV. Surrogate outcomes (e.g. viral load, physical activ-
ity, post operative atrial fibrillation)

Acknowledgements
We thank Monica Owen for administrative assistance. We thank Aravin
Duraikfor developing the study electronic forms. The study is partially sup-
ported by the National Natural Science Foundation of China (NSFC,
70703025). The founder had no role in the study design, in the writing of the
manuscript, or in the decision to submit this or future manuscripts for pub-
lication. Xin Sun is supported by two research scholarships from the
National Natural Science Foundation of China (70503021, 70703025). Mat-
thias Briel was supported by a scholarship from the Swiss National Science
Foundation (PASMA- 1 2951/1) and the Roche Research Foundation.
Dominik Mertz was partially supported by a research scholarship from the
Swiss National Science Foundation (PBBSP3-124436). Jason Busse is funded
by a New Investigator Award from the Canadian Institutes of Health
Research and Canadian Chiropractic Research Foundation.

References
I. Fletcher J: Subgroup analyses: how to avoid being misled. BMJ
2007, 335:96-97.
2. Oxman AD, Guyatt GH: A consumer's guide to subgroup anal-
yses. Ann Intern Med 1992, I 16:78-84.
3. Schulz KF, Grimes DA: Multiplicity in randomised trials II: sub-
group and interim analyses. Lancet 2005, 365:1657-61.
4. Yusuf S, Wittes J, Probstfield J, et al.: Analysis and interpretation
of treatment effects in subgroups of patients in randomized
clinical trials. JAMA 1991, 266:93-98.
5. Pocock SJ, Hughes MD, Lee RJ: Statistical problems in the
reporting of clinical trials. A survey of three medical jour-
nals. N Englj Med 1987, 317:426-32.
6. Barnett HJM, Taylor DW, Eliasziw M, et al.: Benefit of Carotid
Endarterectomy in Patients with Symptomatic Moderate or
Severe Stenosis. N Englj Med 1998, 339:1415-1425.
7. Weisberg LA, Ticlopidine Aspirin Stroke Study G: The efficacy and
safety of ticlopidine and aspirin in non-whites: Analysis of a
patient subgroup from the Ticlopidine Aspirin Stroke Study.
Neurology 1993, 43:27.
8. van Walraven C, Davis D, Forster AJ, et al.: Time-dependent bias
was common in survival analyses published in leading clinical
journals. J Clin Epidemiol 2004, 57:672-82.
9. Hirji K, Fagerland M: Outcome based subgroup analysis: a
neglected concern. Trials 2009, 10:33.
10. Guyatt G, Wyer PC, loannidis J: When to Believe a Subgroup
Analysis. In User's Guide to the Medical Literature: A Manual for Evi-
dence-Based Clinical Practice Edited by: Guyatt G, et al. AMA: Chicago;
2008:571-583.
I I. Oxman A, Guyatt G, Green L, et al.: When to believe a subgroup
analysis. In Users' guides to the medical literature. A manual for evidence-
based clinical practice Edited by: Guyatt G, Rennie D. Chicago, IL: AMA
Press; 2002:553-65.
12. Hatala R, Keitz S, Wyer P, et al.: Tips for learners of evidence-
based medicine: 4. Assessing heterogeneity of primary stud-
ies in systematic reviews and whether to combine their
results. CMAJ 2005, 172:661-665.
13. Montori VM, Jaeschke R, Schunemann HJ, et al.: Users' guide to
detecting misleading claims in clinical research reports. BMJ
2004, 329:1093-1096.


5) hospitalizations


Page 9 of 10
(page number not for citation purposes)


Trials 2009, 10:101








Trials 2009, 10:101


http://www.trialsjournal.com/content/10/1/101


14. Trevor A, Sheldon GGAH: Criteria for the Implementation of
Research Evidence in Policy and Practice. Getting Research Find-
ings Into Practice Second edition. 2008:1 1-18.
15. Martin CM, Guyatt G, Montori VM: The sirens are singing: the
perils of trusting trials stopped early and subgroup analyses.
Crit Care Med 2005, 33:1870-1.
16. Bhandari M, Devereaux PJ, Li P, et al.: Misuse of baseline compar-
ison tests and subgroup analyses in surgical trials. Clin Orthop
Relat Res 2006, 447:247-5 I.
17. Hernandez AV, Boersma E, Murray GD, et al.: Subgroup analyses
in therapeutic cardiovascular clinical trials: are most of them
misleading? Am Heart] 2006, 151:257-64.
18. Wang R, Lagakos SW, Ware JH, et al.: Statistics in Medicine --
Reporting of Subgroup Analyses in Clinical Trials. N Engl J
Med 2007, 357:2189-2194.
19. Assmann SF, Pocock SJ, Enos LE, et al.: Subgroup analysis and
other (mis)uses of baseline data in clinical trials. Lancet 2000,
355:1064-9.
20. Hernandez AV, Steyerberg EW, Taylor GS, et al.: Subgroup analysis
and covariate adjustment in randomized clinical trials of
traumatic brain injury: a systematic review. Neurosurgery 2005,
57:1244-53. discussion 1244-53
21. Moreira ED Jr, Stein Z, Susser E: Reporting on methods of sub-
group analysis in clinical trials: a survey of four scientific jour-
nals. Brazilian journal of Medical and Biological Research 200 1,
34:1441-1446.
22. Higgins JPT, G S, (editors): Cochrane Handbook for Systematic
Reviews of Interventions Version 5.0.1 [updated September
2008]. The Cochrane Collaboration; 2008.
23. Landis JR, Koch GG, Landis JR, et al.: The measurement of
observer agreement for categorical data. Biometrics 1977,
33:159-74.
24. AkI EA, Briel M, You JJ, et al.: LOST to follow-up Information in
Trials (LOST-IT): a protocol on the potential impact. Trials
2009, 10:40.
25. Briel M, Lane M, Montori VM, et al.: Stopping randomized trials
early for benefit: a protocol of the Study Of Trial Policy Of
Interim Truncation-2 (STOPIT-2). Trials 2009, 10:49.
26. Chan A-W, Hrobjartsson A, Jorgensen KJ, et al.: Discrepancies in
sample size calculations and data analyses reported in ran-
domised trials: comparison of publications with protocols.
BMJ 2008, 337:a2299.
27. Cui L, Hung HM, Wang SJ, et al.: Issues related to subgroup anal-
ysis in clinical trials. J Biopharm Stat 2002, 12:347-58.
28. Pocock SJ, Assmann SE, Enos LE, et al.: Subgroup analysis, covari-
ate adjustment and baseline comparisons in clinical trial
reporting: current practiceand problems. Statistics in Medicine
2002, 21:2917-2930.
29. Brookes ST, Whitely E, Egger M, et al.: Subgroup analyses in ran-
domized trials: risks of subgroup-specific analyses; power
and sample size for the interaction test. J Clin Epidemiol 2004,
57:229-36.
30. Rothwell PM: Treating individuals 2. Subgroup analysis in ran-
domised controlled trials: importance, indications, and
interpretation. Lancet 2005, 365:176-86.
31. Moher D, Schulz KF, Altman DG: The CONSORT statement:
revised recommendations for improving the quality of
reports of parallel-group randomised trials. Lancet 2001,
357:1191-1194.

Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours you keep the copyright

Submit your manuscript here: BioMedcentral
http://www.biomedcentral.com/info/publishingadv.asp -


Page 10 of 10
(page number not for citation purposes)