Estimating Population Mean Power Under Conditions of Heterogeneity and Selection for Significance
In scientific fields that use significance tests, statistical power is important for successful replications of significant results because it is the long-run success rate in a series of exact replication studies. For any population of significant results, there is a population of power values of the statistical tests on which conclusions are based. We give exact theoretical results showing how selection for significance affects the distribution of statistical power in a heterogeneous population of significance tests. In a set of large-scale simulation studies, we compare four methods for estimating population mean power of a set of studies selected for significance (a maximum likelihood model, extensions of p-curve and p-uniform, & z-curve). The p-uniform and p-curve methods performed well with a fixed effects size and varying sample sizes. However, when there was substantial variability in effect sizes as well as sample sizes, both methods systematically overestimate mean power. With heterogeneity in effect sizes, the maximum likelihood model produced the most accurate estimates when the distribution of effect sizes matched the assumptions of the model, but z-curve produced more accurate estimates when the assumptions of the maximum likelihood model were not met. We recommend the use of z-curve to estimate the typical power of significant results, which has implications for the replicability of significant results in psychology journals.
Copyright (c) 2020 Jerry Brunner, Ulrich Schimmack
This work is licensed under a Creative Commons Attribution 4.0 International License.