Publication bias in meta-analyses of Posttraumatic Stress Disorder interventions

Meta-analyses are susceptible to publication bias, the selective publication of studies with statistically significant results. If publication bias is present in psychotherapy research, the efficacy of interventions will likely be overestimated. This study has two aims: (1) investigate whether the application of publication bias methods is warranted in psychotherapy research on posttraumatic stress disorder (PTSD) and (2) investigate the degree and impact of publication bias in meta-analyses of the efficacy of psychotherapeutic treatment for PTSD. A comprehensive literature search was conducted and 26 meta-analyses were eligible for bias assessment. A Monte-Carlo simulation study closely re-sembling characteristics of the included meta-analyses revealed that statistical power of publication bias tests was generally low. Our results showed that publication bias tests had low statistical power and yielded imprecise estimates corrected for publication bias due to characteristics of the data. We recommend to assess publication bias using multiple publication bias methods, but only include methods that show acceptable performance in a method performance check that researchers first have to conduct themselves.

Posttraumatic stress disorder (PTSD) following potentially traumatic events is a highly distressing and common condition, with lifetime prevalence rates in the adult population of 11.7% for women and 4% for men in the United States of America (Kessler, Petukhova, Sampson, Zaslavsky, & Wittchen, 2012).PTSD is characterized by the re-experiencing of a traumatic event, avoidance of stimuli that could trigger traumatic memories, negative cognitions and mood, and alterations in arousal and reactivity (American Psychiatric Association, 2013).The DSM criteria have been updated recently, but most research is still based on the previous versions DSM-IV-TR (American Psychiatric Association, 2000), DSM-IV (American Psychiatric Association, 1994) or DSM-III-R (American Psychiatric Association, 1987).
Various forms of psychological interventions for treating PTSD have been investigated in a large number of studies.Cognitive behavioral therapies (CBT) and eye movement desensitization and reprocessing (EMDR) are the most frequently studied approaches (e.g., Bisson, Roberts, Andrew, Cooper, & Lewis, 2013).Trauma-focused cognitive behavioral therapies (TF-CBT) use exposure to trauma memory or reminders and the identification and modification of maladaptive cognitive distortions related to the trauma in their treatment protocols (e.g., Ehlers, Clark, Hackmann, McManus, & Fennell, 2005;Foa & Rothbaum, 1998;Resick & Schnicke, 1993).Non trauma-focused cognitive behavioral therapies (non TF-CBT) do not focus on trauma memory or meaning, but for example on stress management (Veronen & Kilpatrick, 1983).EMDR includes an imaginal confrontation of traumatic images, the use of eye movements and some core elements of TF-CBT (see Forbes et al., 2010).Although a range of other psychological treatments exists (e.g., psychodynamic therapies or hypnotherapy), fewer empirical studies of these approaches have been conducted (Bisson et al., 2013).
Meta-analysis methods are used to quantitatively synthesize the results of different studies on the same research question.Meta-analysis has become more popular according to the gradual increase of published papers that apply meta-analysis methods especially since the beginning of the 21st century (Aguinis, Dalton, Bosco, Pierce, & Dalton, 2010), and it has been called the "gold standard" for synthesizing individual study results (Aguinis, Gottfredson, & Wright, 2011;Head, Holman, Lanfear, Kahn, & Jennions, 2015).Results of meta-analyses are often used for deciding which treatment should be applied in clinical practice, and international evidence-based guidelines recommend TF-CBT and EMDR for the treatment of PTSD (ACPMH; Forbes et al., 2007;NICE;National Collaborating Centre for Mental Health, 2005).

Publication Bias in Psychotherapy Research
The validity of meta-analyses is highly dependent on the quality of the included data from primary studies (Valentine, 2009).One of the most severe threats to the validity of a meta-analysis is publication bias, which is the selective reporting of statistically significant results (Rothstein, Sutton, & Borenstein, 2005).Approximately 90% of the main hypotheses of published studies within psychology are statistically significant (Fanelli, 2012;Sterling, Rosenbaum, & Weinkam, 1995) and this is not in line with the on average low statistical power of studies (Bakker, van Dijk, & Wicherts, 2012;Ellis, 2010).If only published studies are included in a meta-analysis, the efficacy of interventions may be overestimated (Hopewell, Clarke, & Mallett, 2005;Ioannidis, 2008;Lipsey & Wilson, 2001;Rothstein et al., 2005).About one out of four funded studies examining the efficacy of a psychological treatment for depression did not result in a publication, and adding the results of the retrieved unpublished studies lowered the mean effect estimate by 25% from a medium to a small effect size (Driessen, Hollon, Bockting, & Cuijpers, 2017).
The treatments in evidence-based psychotherapy are mainly selected based on published research (Gilbody & Song, 2000).The scientist-practitioner model (Shapiro & Forrest, 2001) calls for clinical psychologists to let empirical results guide their work, aiming to move away from opinion-and experiencedriven therapeutic decision making toward the use of research results in clinical practice.If publication bias is present, guidelines may offer recommendations seemingly based on apparent empirical evidence that are only erroneously supported by the results of meta-analyses (Berlin & Ghersi, 2005).
Consequently, psychotherapists who follow the scientist-practitioner model would be prompted to apply interventions in routine care that may be less efficacious than assumed and may even have detrimental effects for patients.
A re-analysis of meta-analyses in psychotherapy research for schizophrenia and depression revealed that evidence for publication bias was found in about 15% of these meta-analyses (Niemeyer, Musch, & Pietrowsky, 2012, 2013).However, until now no further comprehensive assessment of publication bias in meta-analyses of the efficacy of psychotherapeutic treatments for other clinical disorders has been conducted.Hence, the presence and impact of publication bias in psychotherapy research also for PTSD remains largely unknown.Although trauma-focused interventions are claimed to be efficacious, their efficacy may be overestimated and might be lower if publication bias was taken into account.This in turn would result in suboptimal recommendations in the treatment guidelines and consequently also in unnecessarily high costs for the health care system (Jaycox & Foa, 1999;Maljanen et al., 2016;Margraf, 2009).
Due to publication bias being widespread and its detrimental impact on the results of meta-analyses (Dickersin, 2005;Fanelli, 2012;Rothstein & Hopewell, 2009), a statistical assessment of publication bias should be conducted in every meta-analysis investigating psychotherapeutic treatments.This is in line with recommendations in the Meta-Analysis Reporting Standards (MARS; American Psychological Association, 2010) and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Moher, Liberati, Tetzlaff, & Altman, 2009).A considerable number of statistical methods to investigate the presence and impact of publication bias have been developed in recent years.These methods should also be applied to already published meta-analyses in order to examine whether publication bias distorts the results (Banks, Kepes, & Banks, 2012;van Assen, van Aert, & Wicherts, 2015).
The development of publication bias methods and recommendations to apply these methods will likely yield a more routinely assessment of publication bias in meta-analyses.However, research has shown that publication bias tests generally suffer from low statistical power and especially if there are only a small number of studies included in a metaanalysis and publication bias is not extreme (Begg & Mazumdar, 1994;Egger, Smith, Schneider, & Minder, 1997;Renkewitz & Keiner, 2019;Sterne, Gavaghan, & Egger, 2000;van Assen, van Aert, & Wicherts, 2015).This raises the question whether routinely applying publication bias tests without taking into account characteristics of the meta-analysis, such as the number of included studies, is a good practice.

Objectives
The first goal of this paper is to study whether applying publication bias tests is warranted under conditions that are representative for published meta-analyses on PTSD treatments.Applying publication bias tests may not always be appropriate if, for example, statistical power of these tests is low caused by a small number of studies included in the meta-analysis.Hence, we study the statistical properties of publication bias tests by conducting a Monte-Carlo simulation study that closely resembles the meta-analyses on PTSD treatments.
The second goal of our study is to assess the severity of publication bias in the meta-analyses published on PTSD treatments.We will not interpret the results of the publication bias tests if it turns out that these tests have low statistical power.Regardless of these results, we will apply multiple methods to correct effect size for publication bias to the meta-analyses on PTSD treatments.Effect size estimates of these methods become less precise (wider confidence intervals), but they still provide relevant insights into whether the effect size estimate becomes closer to zero if publication bias is taken into account.

Data Sources
We conducted a literature search following the search strategies recommended by Lipsey and Wilson (2001) to identify all meta-analyses published on PTSD treatments.We screened the databases PsycINFO, Psyndex, PubMed, and the Cochrane Database of Systematic Reviews for all published and unpublished meta-analyses in English or German up to 5th September 2015.The search combined terms indicative of meta-analyses or reviews and terms indicative of PTSD.The exact search terms were [("metaana*" OR "meta-ana*" OR "review" OR "Übersichtsarbeit") AND ("stress disorders, post traumatic" (MeSH) OR "post-trauma*" OR "posttrauma*" OR "posttraumatic stress disorder" OR "trauma*" OR "PTSD" OR "PTBS")].
In addition, a snowball search system was used for the identification of further potentially relevant studies by screening the reference lists of included articles and of conference programs from the field of PTSD and trauma as well as psychotherapy research (see https://osf.io/9b4df/ for more information).Experts in the field were contacted, but no additional meta-analyses were obtained.Meta-analyses were retrieved for further assessment if the title or abstract suggested that these dealt with a meta-analysis of psychotherapy for PTSD.If an abstract provided insufficient information, the respective article was examined in order not to miss a relevant meta-analysis.

Study Selection and Data Extraction
Meta-analyses were required to meet the following inclusion criteria: 1) a psychotherapeutic intervention was evaluated.Psychotherapy was defined as "the informed and intentional application of clinical methods and interpersonal stances derived from established psychological principles for the purpose of assisting people to modify their behaviors, cognitions, emotions, and/or other personal characteristics in directions that the participants deem desirable" (Norcross, 1990, p. 219).2) The intervention aimed at reducing subclinical or clinical PTSD, according to diagnostic criteria for PTSD (e.g., using one of the versions of the DSM) or according to PTSD symptomatology as measured by a validated self-report or clinician measure in an adult population (i.e., aged 18 years and older).And 3) a summary effect size was provided.Both uncontrolled designs investigating changes in one group (within-subjects design) and multiple group comparisons (between-subjects design) were suitable for inclusion.Exclusion criteria were: 1) pooling of studies with various disorders, so that samples composed of other disorders along with PTSD were included in a meta-analysis and the effect sizes were combined to an overall effect estimate not restricted to the treatment of PTSD; and 2) the metaanalysis examined the efficacy of pharmacological treatment.Three independent raters (DU, HN, SSch) decided on the inclusion or exclusion of each meta-analysis upon preliminary reading of the abstract and discussed in the case of dissent. 1 We included a meta-analysis if it did not explicitly target children and adolescents, but minor hints for the inclusion of such studies were present.However, this was only suitable if it concerned individual studies in a metaanalysis, and if we found such hints only when thoroughly checking the list of references.
For conciseness, we use the term meta-analysis to refer to the article that was published and use the term data set for the effect sizes included in a metaanalysis.A meta-analysis can comprise more than one data set if, for instance, treatment efficacy was investigated for different outcomes, such as PTSD symptoms and depressive symptoms, or when the efficacy of two treatments (e.g., TF-CBT and EMDR) was investigated separately in the same meta-analysis.The term primary study is used to refer to the original study that was included in the meta-analysis.When a meta-analysis consists of multiple data sets, we included all data sets for which primary studies' effect sizes and a measure of their precision were provided or could be computed.
We tried to extract effect sizes and their precision of the primary studies from the meta-analysis.If the required data were not reported, we contacted the corresponding authors and re-analyzed the primary studies in order to obtain the data.Data were extracted independently by one author (SSch), cross-checked by a second reviewer (HN), and in case of deviations during the statistical calculations checked by two researchers (RvA, HN).All data sets for which the data were available and we could reproduce the average effect size reported in the meta-analysis ourselves were included.An absolute difference in average effect size larger than 0.1 was set as criterion for reproducibility.We labeled a data set as not reproducible if we could not reproduce the results based on the available data and description of the analyses after contacting the authors of a meta-analysis.Moreover, there were no restrictions with respect to the dependent variable.That is, all primary and secondary outcomes of the meta-analyses were suitable for inclusion.Primary outcomes in meta-analyses on PTSD are usually PTSD symptom score or clinical status, whereas secondary outcomes often vary (e.g.anxiety, depression, dropout, or other; see also Bisson, Roberts, Andrew, Cooper, & Lewis, 2013).
The objectives of our paper were to study whether applying publication bias tests is warranted in meta-analyses on the efficacy of psychotherapeutic treatment for PTSD and to assess the severity of publication bias in these meta-analyses.The majority of statistical methods to detect the presence of publication bias does not perform well if the true effect sizes are heterogeneous (e.g., Stanley & Doucouliagos, 2014;van Aert et al., 2016;van Assen et al., 2015), some are even recommended not to be used in this situation (Ioannidis, 2005).Hence, it was necessary to only include data sets where the proportion of variance that is caused by heterogeneity in true effect size as quantified by the I 2 -statistic was smaller than 50%.
We excluded all data sets of a meta-analysis that included less than six studies, because publication bias tests suffer from low statistical power in case of a small number of studies in a meta-analysis and if severe publication bias is absent (Begg & Mazumdar, 1994;Sterne et al., 2000).Others recommend a minimum of 10 studies (Sterne et al., 2011), but we adopted a less strict criterion for two reasons.First, we want to study whether applying publication bias tests is warranted for conditions that are representative for published meta-analyses.Meta-analyses often contain less than 10 studies.For example, the median number of studies in meta-analyses published in the Cochrane Database of Systematic Reviews is 3 (Rhodes, Turner, & Higgins, 2015;Turner, Jackson, Wei, Thompson, & Higgins, 2015).Also the number of studies in meta-analyses for psychotherapy research is usually small.Meta-analyses on the efficacy of psychotherapy for schizophrenia (Niemeyer, Musch, & Pietrowsky, 2012) as well as depression (Niemeyer, Musch, & Pietrowsky, 2013) also applied a minimum of 6 studies as lower limit for the application of publication bias tests.
Second, more recently developed methods to correct effect size for publication bias can be used to estimate the effect size even if the number of studies in a meta-analysis is small.For example, a method that was developed for combining an original study and replication has shown that two studies can already be sufficient for accurately evaluating effect size (van Aert & van Assen, 2018).However, a consequence of applying publication bias methods to meta-analyses based on a small number of studies is that effect size estimates become less precise and corresponding confidence intervals wider (Stanley et al., 2017;van Assen et al., 2015).

Statistical Methods
Publication bias test.We assessed for the following publication bias tests whether it was warranted to apply these methods to the data sets in PTSD psychotherapy research: Egger's regression test (Egger et al., 1997), rank-correlation test (Begg & Mazumdar, 1994), Test of Excess Significance (Ioannidis & Trikalinos, 2007b), and p-uniform's publication bias test (van Assen et al., 2015).These methods were included, because these are commonly applied in meta-analyses (Egger's regression test and rankcorrelation test) or outperformed existing methods in some situations (TES and p-uniform's publication bias test; Renkewitz & Keiner, 2019).It is important to note that Egger's regression test and the rankcorrelation test were developed to test for smallstudy effects.Small-study effects refer to the tendency of smaller studies to go along with larger effect sizes.One of the causes of small-study effects is publication bias, but another cause is, for instance, heterogeneity in true effect size (see Egger et al., 1997, for a list of causes of small-study effects).The TES was also not specifically developed to test for publication bias, but examines whether the observed and expected number of statistically significant effect sizes in a meta-analysis are in line with each other (see https://osf.io/b9t7v/ for an elaborate overview of existing publication bias tests).
In order to investigate whether the application of the publication bias tests to the included data sets was warranted, we conducted a Monte-Carlo simulation study to examine the statistical power of the publication bias tests for the data sets.Data were generated in a way to stay as close as possible to the characteristics of the data sets.That is, the same number of effect sizes as in the data set as well as the same effect size measure were used for generating the data.The data were simulated under the fixed-effect (a.k.a.equal-effects) model, so effect sizes for each data set were sampled from a normal distribution with mean and variance equal to the "observed" squared standard errors.Statistically significant effect sizes based on a one-tailed test with =.025 (to reflect common practice of testing a two-tailed hypothesis and only reporting results in the predicted direction) were always "published" and included in a simulated meta-analysis.Publication bias restricted the "publication" of statistically nonsignificant effect sizes in a way that these effect sizes had a probability of 1-pub to be included in a simulated meta-analysis.Effect sizes were simulated till the included number of simulated effect sizes equaled the number of effect sizes in a data set.
We examined the Type-I error rate and statistical power of Egger's regression test, rank-correlation test, TES, and p-uniform's publication bias test for each simulated meta-analysis using =.05.Twotailed hypothesis tests were conducted for Egger's regression test and the rank-correlation test.Onetailed hypothesis tests were used for TES and p-uniform's publication bias test, because only evidence in one direction for these methods is indicative of publication bias.For each simulated meta-analysis, we recorded the proportion of data sets for which the statistical power of a publication bias test was larger than 0.8.Meta-analyses were simulated 10,000 times for all included data sets.True effect size was fixed to zero for generating data, because this enabled simulating data using the same effect size measure as in the data sets.Selected values for publication bias (pub) were 0, 0.25, 0.5, 0.75, 0.85, 0.95, and 1 where pub equal to 0 indicates no publication bias and 1 extreme publication bias.This Monte-Carlo simulation study was programmed in R 3.5.3(R Core Team, 2019) and the packages "metafor" (Viechtbauer, 2010), "puniform" (van Aert, 2019), and "parallel" (R Core Team, 2019) were used.R code for this Monte-Carlo simulation study is available at https://osf.io/pg7sj.
Estimating effect size corrected for publication bias.Five different methods were included to estimate the effect size: traditional meta-analysis, trim and fill, PET-PEESE, p-uniform, and the selection model approach proposed by Vevea and Hedges (1995).Traditional meta-analysis was included, because it is the analysis that is conducted in every meta-analysis.Either a fixed-effect (FE) or randomeffects (RE) model was selected depending on the statistical model used in the meta-analysis.These publication bias methods were selected, because they were either often applied in meta-analyses (trim and fill) or outperformed other methods (PET-PEESE, p-uniform, and the selection model approach; McShane et al., 2016;Stanley & Doucouliagos, 2014;van Assen et al., 2015).P-curve (Simonsohn et al., 2014) was not included in the present study because the methodology underlying p-curve is the same as p-uniform, and p-uniform has the advantage that it can also test for publication bias and estimate a 95% confidence interval (CI; see https://osf.io/b9t7v/for an elaborate overview of existing methods to correct effect size for publication bias).
Average effect size estimates of traditional metaanalysis, trim and fill, PET-PEESE, p-uniform, and the selection model approach were computed and transformed to a common effect size measure (i.e., Cohen's d) before interpreting them.Data sets that used log relative risks as effect size measure were conducted based on log odds ratios and these average effect size estimates were transformed to Cohen's d values.If there was not enough information to transform Hedges' g to Cohen's d, Hedges' g was used in the analyses.Effect sizes were computed using the formulas described in Borenstein (2009).
We assessed the severity of publication bias by computing difference scores in effect size estimates between traditional meta-analysis and each publication bias method (i.e., trim and fill, PET-PEESE, puniform, and the selection model approach).That is, we subtracted the effect size estimate of traditional meta-analysis from the method's effect size estimate.A difference score of zero reflects that the estimates of traditional meta-analysis and the publication bias method were the same, whereas a positive or negative difference score indicates that the estimates were different.Subsequently, the mean and standard deviation (SD) of these difference scores were computed for the three methods.
All analyses were conducted using R version 3.5.3(R Core Team, 2019).The "metafor" package (Viechtbauer, 2010) was used for conducting fixed-effect or random-effects meta-analysis, trim and fill, rankcorrelation test, and Egger's regression test.The "puniform" package (van Aert, 2019) was used for applying the p-uniform method using the default estimator based on the Irwin-Hall distribution.In line with the recommendation by Stanley (2017), =0.1 was used for the right-tailed test whether the intercept of a PET analysis was different from zero, and therefore whether the results of PET or PEESE had to be interpreted.The selection model approach as proposed by (Vevea & Hedges, 1995) and implemented in the "weightr" package (Coburn & Vevea, 2019) was applied to all data sets.Data and R code of the analyses are available at https://osf.io/afnvr/and https://osf.io/taq5f/?.

Description of Meta-Analyses investigated
A flowchart illustrating the procedure of selecting meta-analyses and data sets is presented in Figure 1.The literature search resulted in 7,647 hits including duplicates, the screening process reduced this number to 502 meta-analyses, of which 89 dealt with the efficacy of psychotherapeutic interventions for PTSD and were included (see Appendix A and https://osf.io/pkzx8/).Of these 89 meta-analyses, four could not be located as they were unpublished dissertations and the authors did not reply to our requests. 2One meta-analysis was excluded because it used a network meta-analysis approach (Gerger et al., 2014) and the included publication bias methods cannot be applied to this type of data.A multi-site study (Morrissey et al., 2015) was excluded, because meta-analysis methods were used to combine the results from the different sites.Of the remaining 83 meta-analyses, we contacted 36 authors (43.4%) because the effect size data was not fully reported in their paper and obtained data from six authors (16.7%).Our analysis of the 83 meta-analyses first examined whether they discussed the problem of publication bias.Fifty-eight meta-analyses (69.9%) mentioned publication bias, whereas 25 (30.1%) did not mention it at all.In 35 meta-analyses (42.2%), it was specified that the search strategies included unpublished studies, and 20 (24.1%) indeed found and included unpublished studies.However, in 46 metaanalyses (55.4%) unpublished studies were explicitly regarded as unsuitable for inclusion, and two metaanalyses (2.4%) did not specify their search and inclusion criteria with respect to unpublished studies.
Forty-seven meta-analyses (56.6%) statistically assessed publication bias, whereas 36 (43.4%) did not.Five meta-analyses (6.0%) included the rankcorrelation test, six (7.2%)Egger's regression test, and nine (10.8%) the trim and fill procedure.TES, PET-PEESE and p-uniform were not applied in any of the meta-analyses.A funnel plot (Light & Pillemer, 1984) was presented in 26 meta-analyses (31.3%) and failsafe N (Rosenthal, 1979) was computed in 26 meta-analyses (31.3%).These results indicate that a large number of meta-analyses did not assess publication bias or only applied a selection of publication bias methods.PET-PEESE and p-uniform have been developed more recently and therefore we did not expect them to be regularly applied.
The 83 meta-analyses included a total number of 2,110 data sets, of which 98 (4.6%) data sets from 26 meta-analyses fulfilled all inclusion criteria and were eligible for publication bias assessment (see flowchart in Figure 1).Figure 2 is a histogram of the number of effect sizes per data sets before data sets were excluded due to less than six studies and heterogeneous true effect size.The results show that the majority of data sets contained less than six effect sizes, and that only a small number of data sets included more than 15 effect sizes.
Many data sets were excluded because there were less than six studies (1,510 data sets), and due to heterogeneity in true effect size (309 data sets).All meta-analyses of which data sets were included in our study are marked with an asterisk in the list of references.
The median number of effect sizes in a data set was 7 (first quartile 7, third quartile 10).Since publication bias tests have low statistical power if the number of effect sizes is small in a meta-analysis (Begg & Mazumdar, 1994;Sterne et al., 2000;van Assen et al., 2015), the characteristics of many of the data sets are not well-suited for methods to detect publication bias.Additionally, p-uniform cannot be applied if there are no statistically significant effect sizes in a meta-analysis, because a requirement is that at least one study in a meta-analysis is statistically significant.The median number of statistically significant effect sizes in the data sets was 3 (34.3%;first quartile 1 (13%), third quartile 6 (80.4%)), and 77 data sets (78.6%) included at least one significant effect size (see Appendix A, which also reports the number of studies included in each data set).
Consequently, conditions were also not well-suited for p-uniform in particular, since this method uses only the statistically significant effect sizes.The median I 2 -statistic was 0% (first quartile 0%, third quartile 28.7%).We also studied in the simulations whether for each data set the statistical power of a publication bias test was larger than 0.8.This enabled us to select the data sets where publication bias tests would be reasonable powered to detect publication bias if it was present.Statistical power of none of the methods was larger than 0.8 for any data set if pub < 0.95 (results are available at https://osf.io/6bnc5/for the rank-correlation test, https://osf.io/ufdps/for Egger's test, https://osf.io/5yehp/for the TES, and https://osf.io/feux3/for p-uniform).It is highly unlikely that publication bias is this extreme in the included data sets, because many data sets contained statistically nonsignificant effect sizes (median percentage of nonsignificant effect sizes in a data set 65.7%).The publication bias tests would be most likely severely underpowered when applied to the published meta-analyses on PTSD, and it follows from these results that the tests should not be applied here.Therefore, we only report the results of applying the publication bias tests to the data sets as supplement in the online repository (https://osf.io/49cke/)for completeness.

Effect Size Corrected for Publication Bias
The data set with ID 77 (from the meta-analysis by Kehle-Forbes et al., 2013) was excluded for estimating effect sizes corrected for publication bias because not enough information was available to transform the log relative risks to Cohen's d.Hedges' g effect sizes could not be transformed into Cohen's d for 12 data sets and Hedges' g was used instead (see Appendix A).Descriptive results of the effect size estimates of traditional meta-analysis, trim and fill, PET-PEESE, p-uniform, and the selection model approach are presented in Table 1.P-uniform could only be applied to data sets with at least one statistically significant result (77 data sets), and the selection model approach did not converge for two data sets.Results showed that especially estimates of PET-PEESE were closer to zero than traditional meta-analysis and that the standard deviation of the estimates of PET-PEESE and p-uniform was larger than traditional meta-analysis, trim and fill, and the selection model approach.See Appendix A for the results of the effect size estimates corrected for publication bias per data set.

Descriptive results of data sets analyzed with metaanalysis (fixed-effect or random-effects model depending on the model that was used in the original meta-analysis), trim and fill, PET-PEESE, p-uniform, and the selection model approach.
Mean, median [min.; max.], (SD) of estimates The mean of the difference in effect size estimate between PET-PEESE and the meta-analytic estimate was -0.101 (SD = 0.872).However, the median of the difference in effect size estimate was close to zero (Mdn = -0.002),suggesting that the estimates of PET-PEESE and traditional meta-analysis were close.The mean of the difference between the estimates of trim and fill and traditional meta-analysis (-0.009,Mdn = 0, SD = 0.104) and the selection model approach and traditional meta-analysis was negligible (0.026, Mdn = 0.026, SD = 0.145).
Analyses for data sets including significant effect sizes.P-uniform was applied to a subset of 77 data sets (see Appendix A), because this method requires that at least one study is statistically significant.The mean of the difference in effect size estimate of p-uniform and traditional meta-analysis was -0.174 (Mdn = 0.04, SD = 1.273).The large standard deviation is caused by situations in which an extreme effect size was estimated because a primary study's effect size was only marginally significant (i.e., p-value just below .05).In order to counteract these extreme effect size estimates, we set p-uniform's effect size estimate to zero when the average of the statistically significant p-values was larger than half the -level. 3This is in line with the recommendation by van Aert et al. (2016).Setting this effect size to zero resulted in a mean of the difference in effect size estimate between p-uniform and traditional meta-analysis of -0.019 (Mdn = 0.04, SD = 0.364).The change in difference in effect size estimate was caused by setting the effect size estimates of p-uniform in seven data sets to zero, in which puniform originally substantially corrected for publication bias.The mean of the difference scores between PET-PEESE and traditional meta-analysis when computed based on this subset of 77 data sets was -0.129 (Mdn = -0.011,SD = 0.968), for trim and fill the mean of the difference scores was -0.014 (Mdn = 0, SD = 0.105), and for the selection model approach the mean of the difference scores was 0.028 (Mdn = 0.024, SD = 0.155).
Explaining estimates of p-uniform, the selection model approach, and PET-PEESE.We illustrate deficiencies of p-uniform, the selection model approach, and PET-PEESE by discussing the results of two exemplary data sets.Estimates of p-uniform can be imprecise (i.e., with a wide CI) if they are based on a small number of effect sizes in combination with p-values of these effect sizes close to thelevel.In 29 out of 77 data sets p-uniform's estimate was based on at most three studies.For instance, the estimated average log relative risk of random-effect meta-analysis of the data set from Bisson et al. (2013, ID=20) was -0.177, 95% CI [-0.499, 0.145] and p-uniform's estimate was based on a single study and equaled -0.504, 95% CI [-3.809, 8.174].The effect size estimate of p-uniform, as for any other method, is more precise the larger the number of effect sizes in a data set or the larger the primary study's sample sizes.
The selection model approach also suffers from a small number of statistically significant effect sizes.The computed weights for the intervals of the method's selection model are imprecisely estimated if only a small number of effect sizes are within an interval.In an extreme situation where no effect sizes are observed in an interval of the selection model, the implementation of the selection model approach by Vevea and Hedges (1995) in the R package "weightr" assigns a weight of 0.01 to this interval.Bias in effect size estimation increases the more this weight deviates from its true value.
PET-PEESE also did not result in reasonable effect size estimates in each of the data sets, and especially not if the standard errors of the primary studies were highly similar (i.e., were based on similar sample sizes).Figure 4 shows the funnel plot based on the data set from Bisson et al. (2007) comparing TF-CBT versus wait list and active controls (ID=14; left panel) with the filled circles being the 15 observed effect sizes.The studies' standard errors diverged from each other, which makes it possible to fit a regression line through the observed effect sizes in the data set (dashed black line).PET-PEESE's effect size estimate was -0.027 (95% CI [-0.663, 0.609]) denoted by the asterisk in Figure 4), which was closer to zero than traditional meta-analysis (0.260, 95% CI [-0.057, 0.578]) but had a wider CI.The data set from Diehle et al. ( 2014) comparing two different treatments of TF-CBT (ID=44) is presented in the right panel of Figure 4. PET-PEESE was hindered by the highly similar studies' standard error, which ranged from 0.227 to 0.478.Hence, the effect size estimate of PET-PEESE (0.44, 95% CI [-1.079, -1.958]) was unrealistically larger than the estimate traditional meta-analysis (-0.153, 95% CI [-0.084, 0.39]), and its CI was wider.

Discussion
Publication bias is widespread in the psychology research literature (Bakker et al., 2012;Fanelli, 2012;Sterling et al., 1995) resulting in overestimated effect sizes in primary studies and meta-analyses (Kraemer, Gardner, Brooks, & Yesavage, 1998;Lane & Dunlap, 1978).Guidelines such as the MARS (American Psychological Association, 2010) and PRISMA (Moher, Liberati, Tetzlaff, & Altman, 2009) recommend to routinely correct for publication bias in any meta-analysis.Others recommend to re-analyze published meta-analyses to study the extent of publication bias in whole fields of research (Ioannidis, 2009;Ioannidis & Trikalinos, 2007a;van Assen et al., 2015) by using multiple publication bias methods (Coburn & Vevea, 2015;Kepes et al., 2012).However, the question is whether routinely assessing publication bias is indeed a good recommendation, because researchers may end up in applying publication bias methods in situations where these do not have appropriate statistical properties, potentially leading to drawing faulty conclusions.We tried to answer this question by re-analyzing a large number of meta-analyses published on the efficacy of psychotherapeutic treatment for PTSD.
We re-analyzed 98 data sets from 26 meta-analyses studying a wide variety of psychotherapeutic treatments for PTSD.We had to exclude a large portion of data sets (95.4%) mainly due to heterogeneity in true effect size and data sets containing less than six primary studies.These exclusion criteria were necessary, because publication bias methods do not perform well in case of heterogeneity in true effect size (Ioannidis, 2005) and a small number of primary studies yields low power of publication bias methods and imprecise effect size estimation (Sterne et al., 2000).
The included data sets were characterized by including a small number of primary studies (median 7 studies) resulting in challenging conditions for any publication bias method.Before applying publication bias tests, we studied whether these tests would have sufficient statistical power (>0.8).We conducted a Monte-Carlo simulation study in which data were generated in a way to stay as close as possible to the included data sets.The statistical power of the publication bias tests was only larger than 0.8 in case of extreme publication bias (i.e., nonsignificant effect sizes having a probability of 0.05 or smaller to be included in a meta-analysis).Hence, we concluded that it was not warranted to apply publication bias tests.Of note is that the median percentage of nonsignificant effect sizes in a data set was 65.7% suggesting that extreme publication bias was absent.
Publication bias methods that correct the effect size for bias are also affected by a small number of primary studies, because the effect size estimates become then imprecise (i.e., a wide CI).However, comparing estimates of these methods with those of traditional meta-analysis that does not correct for publication bias still provides insights about the severity of publication bias.This analysis revealed no evidence for severe overestimation caused by publication bias as the corrected estimates were close to those of traditional meta-analysis.
Our results imply that following up on the guidelines to assess publication bias in any meta-analysis is far from straightforward in practice.Many data sets in our study where too heterogeneous for publication bias analyses.Moreover, even after the exclusion of data sets with less than six studies, statistical power of publication bias tests for each data set was low if extreme publication bias was absent and CIs of methods that provided estimates corrected for publication bias were wide.These results even call for revising the recommendation by Sterne et al. (2011) to apply publication bias tests only to meta-analyses with more than 10 studies.Our results are also corroborated by a recent study of Renkewitz and Keiner (2019) who concluded based on a simulation study that publication bias could only be reliably detected with at least 30 studies.However, a caveat here is that these recommendations heavily depend on the severity of publication bias that is assumed to be present in a metaanalysis.Hence, most important is that researchers are aware of the fact that publication bias tests suffer from low statistical power and that a nonsignificant publication bias test does not imply that publication bias is absent.

Recommendations
We consider it important to give practical advice to researchers.We recommend researchers to follow the MARS guidelines, apply publication bias tests, and report effect size estimates corrected for publication bias.However, a well-informed choice has to be made to select the publication bias methods with the best statistical properties as no method outperforms all other methods in all conditions (Carter et al., 2019;Renkewitz & Keiner, 2019).Carter and colleagues (2019) conclude that it has not been investigated yet whether the application of publication bias methods is warranted in real data in psychology, and that this ultimately is an empirical question which should be the focus of future research.Routinely applying publication bias methods without paying attention to their statistical properties for the characteristics of the respective metaanalysis cannot be recommended.Hence, researchers need to consider the characteristics of the data sets and check the properties of publication bias methods for these data sets before actually applying these methods.Such a "method performance check" has also been recommended by Carter et al. (2019) for methods to correct effect size for publication bias and can be conducted by their meta explorer web application (http://www.shinyapps.org/apps/metaExplorer/) or simulation studies.A complicating factor, however, is that a method performance check requires information about the true effect size, true heterogeneity in true effect size, and the extent of publication bias that is not available.Hence, researchers are advised to use multiple levels for these parameters in a method performance check as a sensitivity analyses.
As there is no single publication bias method that outperforms all other methods and selecting a method depends on unknown parameters, we recommend to apply multiple publication bias methods that show acceptable performance in a method performance check.A so-called triangulation (Kepes et al., 2012;Coburn & Vevea, 2015) following a methods performance check, rather than applying only one publication bias method, will yield more insight into the presence and severity of publication bias, because each method uses its own approach to examine publication bias.Researchers should refrain from testing for publication bias if a method performance check by means of a power analysis reveals that publication bias is unlikely to be detected in their meta-analysis.Applying methods to correct effect size for publication bias is still useful in case of a small number of studies in a meta-analysis, because estimates corrected for publication bias can be compared to the uncorrected estimate to assess the severity of publication bias.
We consider it important to emphasize that the reporting of publication bias methods should be independent of their results.The analysis procedure of the meta-analysis as well as the publication bias tests is preferred to be preregistered in a pre-analysis plan before the analyses are actually conducted.Moreover, conflicting results of publication bias methods are an interesting and important finding on its own that has to be discussed in the paper.

Limitations
Heterogeneous data sets had to be excluded, because assessing publication bias with the included methods is only accurate when based on meta-analyses with no or small heterogeneity in true effect size (Ioannidis & Trikalinos, 2007a;Terrin et al., 2003).For that reason, data sets were excluded from the analyses if the I 2 -statistic was larger than 50%, but the I 2 -statistic is generally imprecise and especially if the number of effect sizes in a meta-analysis is small (Ioannidis, Patsopoulos, & Evangelou, 2007).This is also reflected in the wide confidence intervals around the I 2 -statistics of the included data sets in the analyses (see Appendix A).Moreover, there is also an effect of publication bias on the I 2 -statistic which has been shown to be large, complex and non-linear, such that publication bias can both dramatically decrease and increase the I 2 -statistic (Augusteijn, van Aert, & van Assen, 2019).Therefore, a consequence of using a selection criterion based on the I 2 -statistic in the current study is that this may have led to the inclusion of data sets with heterogeneity in true effect size, which may, in turn, also have biased the results of the publication bias methods because these methods do not perform well under substantial heterogeneity (Ioannidis, 2005;Terrin et al., 2003;van Assen et al., 2015).
Data sets affected by publication bias may also have been excluded by limiting ourselves to homogeneous data sets.Imagine a data set consisting of multiple statistically significant effect sizes because of publication bias and one nonsignificant effect size that is not influenced by publication bias.The inclusion of this nonsignificant effect size likely causes the I 2 -statistic to be larger than 50% while the true effect size in fact may be homogeneous.Hence, publication bias may also have resulted in the exclusion of homogeneous data sets.Another limitation is that questionable research practices, known as phacking (i.e., all behaviors researchers can use to obtain the desired results; Simmons, Nelson, & Simonsohn, 2011), may have further biased the results of the publication bias methods as well as the traditional meta-analysis (van Aert et al., 2016).
Of note is also that the data sets in the current investigation often contained multiple statistically nonsignificant effect sizes when an active treatment was compared to a passive or active control group, which is not expected in case of extreme publication bias.Especially comparisons between two active treatments resulted in very few significant differences in efficacy.These meta-analyses with nonsignificant comparative effects might also be affected by publication bias.For example, when a new treatment is found to be as efficacious as an established one, this might be newsworthy and have a larger chance to get published than a finding demonstrating the well-known superiority of the state-of-theart treatment.This implies that publication bias lead to the publication of statistically nonsignificant rather than significant effects.Publication bias will not be detected by any of the methods in such a situation in this study.

Conclusion
Routinely assessing publication bias in any metaanalysis is recommended by guidelines such as MARS and PRISMA.We have shown, however, that the characteristics of meta-analyses in research on PTSD treatments are generally unfavorable for publication bias methods.That is, heterogeneity and small numbers of studies in meta-analyses result in low statistical power and imprecise corrected estimates.Of note is that interpreting results from small data sets cautiously accounts in general for metaanalyses.The characteristics of the meta-analyses in our study on PTSD treatments are deemed to be typical for psychotherapy research, and potentially for other areas of clinical psychology, as well.
The development of new publication bias methods and the improvement of existing methods is necessary that allow the true effect size to be heterogeneous and perform well in case of a small number of effect sizes in a meta-analysis.Promising developments are p-uniform being extended to enable accurate effect size estimation in the presence of heterogeneity in true effect size (van Aert & van Assen, 2020).Other promising developments are Bayesian methods to correct for publication bias (Du, Liu, & Wang, 2017;Guan & Vandekerckhove, 2016) and the increased attention for selection model approaches (Citkowicz & Vevea, 2017;McShane et al., 2016).
We hope that our work creates awareness for the limitations of publication bias methods and recommend researchers to apply and report multiple publication bias methods that have shown good statistical properties for the meta-analysis under study.entire editorial process, including the open reviews, are published in the online supplement.

Figure 1 .
Figure 1.Flow chart: Identification and selection of meta-analyses and data sets.Note.† positive and negative signs were randomly assigned to each effect in the meta-analysis

Figure 2 .
Figure 2. Histogram of the number of primary studies' effect sizes included in data sets.The vertical red dashed line denotes the cut-off that was used for assessing publication bias in a meta-analysis.
Before applying the publication bias tests to the data sets, we conducted a Monte-Carlo simulation study to examine whether statistical power of the tests is large enough (> 0.8) to warrant applying these tests.Type-I error rate and statistical power of the rank-correlation test (open circles), Egger's test (triangles), TES (diamonds), and p-uniform's publication bias test (solid circles) as a function of publication bias (pub) are shown in Figure3.The results in the figure were obtained by averaging over the 98 data sets and the 10,000 replications in the Monte-Carlo simulation study.Type-I error rate of all publication bias tests was smaller than =.05 implying that the tests were conservative.These results indicate that statistical power of all methods was not above 0.5 for pub < 0.95.Statistical power of only the TES was larger than 0.8 in case of extreme publication bias (pub = 1).

Figure 3 .
Figure 3. Type-I error rate and statistical power obtained with the Monte-Carlo simulation study of the rank-correlation test (open circles), Egger's test (triangles), test of excess significance (TES; diamonds), and p-uniform's publication bias test (solid circles)

Figure 4 .
Figure 4. Funnel plots of the data sets from Bisson et al. (2013) (ID=14; left panel) and Diehle et al. (2014) (ID=44; right panel).Filled circles are the observed effect sizes in a meta-analysis, the dashed black line is the fitted regression line through the observed effect sizes, the asterisks indicate the estimate of PET-PEESE.
grouped by treatment category.PUBLICATION BIAS IN META-ANALYSES OF POSTTRAUMATIC STRESS DISORDER INTERVENTIONS 23 PUBLICATION BIAS IN META-ANALYSES OF POSTTRAUMATIC STRESS DISORDER INTERVENTIONS 25