A Multi-faceted Mess: A Review of Statistical Power Analysis in Psychology Journal Articles
DOI:
https://doi.org/10.15626/MP.2020.2643Keywords:
power analysis, statistical power, Type II error rate, precision-based sample planning, minimally meaningful effect sizeAbstract
Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is very challenging, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact Psychology journals in 2016 and 2017 using Google Scholar. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.
Metrics
References
Aberson, C. L. (2015). Statistical power analysis [Advance online publication]. In R. A. Scott & S. M. Kosslyn (Eds.), Emerging trends in the behavioral and social sciences. Wiley.
Aberson, C. L. (2018). Improving scientific practices and increasing access. Analyses of Social Issues and Public Policy, 18(1), 7–10. https://doi.org/10.1111/asap.12152
Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of Experimental Social Psychology, 74, 187–195. https://doi.org/10.1016/j.jesp.2017.09.004
Algermissen, J., & Mehler, D. M. (2018). May the power be with you: Are there highly powered studies in neuroscience, and how can we get more of them? Journal of Neurophysiology, 119(6), 2114–2117. https://doi.org/10.1152/jn.00765.2017
Association for Psychological Science. (2018). Submission guidelines. https://www.psychologicalscience.org/publications/psychological_science/ps-submissions
Bakker, M., Veldkamp, C. L., & van Assen, M. A. (2020). Recommendations in pre-registrations and internal review board proposals promote formal power analyses but do not increase sample size. PLOS One, 15(7), e0236079. https://doi.org/10.1371/journal.pone.0236079
Batterham, A. M., & Atkinson, G. (2005). How big does my sample need to be? A primer on the murky world of sample size estimation. Physical Therapy in Sport, 6(3), 153–163. https://doi.org/10.1016/j.ptsp.2005.05.004
Champely, S. (2020). pwr: Basic functions for power analysis [R package version 1.3-0]. https://CRAN.R-project.org/package=pwr
Cook, J. A., Hislop, J., Altman, D. G., & Ramsay, C. R. (2018). Delta2 guidance on choosing the target difference and undertaking and reporting the sample size calculation for a randomised controlled trial. BMJ, 363, k3750. https://doi.org/10.1136/bmj.k3750
Corty, E. W., & Corty, R. W. (2011). Setting sample size to ensure narrow confidence intervals for precise estimation of population values. Nursing Research, 60(2), 148–153. https://doi.org/10.1097/NNR.0b013e318209785a
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966
Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in Psychology, 5, 781. https://doi.org/10.3389/fpsyg.2014.00781
Elsevier. (2020). Guide for authors. https://www.elsevier.com/journals/journal-of-experimental-social-psychology/0022-1031/guide-for-authors
Fayers, P. M., Jordhøy, M. S., & Kaasa, S. (2000). Sample size calculation for clinical trials: The impact of clinician beliefs. British Journal of Cancer, 82(1), 213–219. https://doi.org/10.1054/bjoc.1999.0902
Gelman, A. (2019). Don’t calculate post-hoc power using observed estimate of effect size. Annals of Surgery, 269(1). https://doi.org/10.1097/sla.0000000000002908
Giner-Sorolla, R., Bain, P., Guay, M., & Gromet, D. (2019). Power to detect what? Considerations for planning and evaluating sample size [Preprint]. https://osf.io/jnmya/
Guo, J. H., & Luh, W. M. (2009). Optimum sample size allocation to minimize cost or maximize power for the two-sample trimmed mean test. British Journal of Mathematical and Statistical Psychology, 62(2), 283–298. https://doi.org/10.1348/000711007X267289
Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55(1), 19–24. https://doi.org/10.1198/000313001300339897
Kelley, K., Darku, F. B., & Chattopadhyay, B. (2018). Accuracy in parameter estimation for a general class of effect sizes: A sequential approach. Psychological Methods, 23(2), 226–243. https://doi.org/10.1037/13619-012
Kelley, K., & Maxwell, S. E. (2012). Sample size planning. In H. Cooper, P. M. Camic, D. L. Long, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.), APA handbook of research methods in psychology, Vol. 1. Foundations, planning, measures, and psychometrics (pp. 181–202). American Psychological Association. https://doi.org/10.1177/0163278703255242
Kelley, K., Maxwell, S. E., & Rausch, J. R. (2003). Obtaining power or obtaining precision: Delineating methods of sample-size planning. Evaluation & the Health Professions, 26(3), 258–287. https://doi.org/10.1177/0163278703255242
Kelley, K., & Rausch, J. R. (2006). Sample size planning for the standardized mean difference: Accuracy in parameter estimation via narrow confidence intervals. Psychological Methods, 11(4), 363–385. https://doi.org/10.1037/1082-989X.11.4.363
Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research. Sage.
Maxwell, J. A. (2004). Causal explanation, qualitative research, and scientific inquiry in education. Educational Researcher, 33(2), 3–11. https://doi.org/10.3102/0013189X033002003
Mistler, S. (2012). Planning your analyses: Advice for avoiding analysis problems in your research. https://www.apa.org/science/about/psa/2012/11/planning-analyses
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, aac4716. https://doi.org/10.1126/science.aac4716
Rost, D. H. (1991). Effect strength vs. statistical significance: A warning against the danger of small samples. European Journal of High Ability, 2(2), 236–243. https://doi.org/10.1080/0937445910020212
Schauer, J. M., & Hedges, L. V. (2020). Assessing heterogeneity and power in replications of psychological experiments [Advance online publication]. Psychological Bulletin. https://doi.org/10.1037/bul0000232
Sedlmeier, P., & Gigerenzer, G. (1992). Do studies of statistical power have an effect on the power of studies? In A. E. Kazdin (Ed.), Methodological issues and strategies in clinical research (pp. 389–406). American Psychological Association. https://doi.org/10.1037/10109-032
Tanaka, J. (1987). "How big is big enough?": Sample size and goodness of fit in structural equation models with latent variables. Child Development, 58(1), 134–146. https://doi.org/10.2307/1130296
Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31(3), 25–32. https://doi.org/10.3102/0013189X031003025
Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority. Chapman Hall/CRC.
Wilkinson, L. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54(8), 594–604. https://doi.org/10.1037/0003-066X.54.8.594
Williamson, P., Altman, D. G., Blundell, N. J., & Babiker, A. (2000). Statistical review by research ethics committees. Journal of the Royal Statistical Society: Series A (Statistics in Society), 163(1), 5–13. https://doi.org/10.1111/1467-985X.00152
Zodpey, S. P. (2004). Sample size and power analysis in medical research. Indian Journal of Dermatology, Venereology, and Leprology, 70(2), 123–128.
Published
Issue
Section
License
Copyright (c) 2025 Nataly Beribisky, Udi Alter, Robert Cribbie

This work is licensed under a Creative Commons Attribution 4.0 International License.