A Multi-faceted Mess: A Review of Statistical Power Analysis in Psychology Journal Articles

Nataly Beribisky; Udi Alter; Robert Cribbie

doi:10.15626/MP.2020.2643

Authors

Nataly Beribisky Quantitative Methods Program, Department of Psychology, York University, Toronto, Ontario, Canada
Udi Alter Quantitative Methods Program, Department of Psychology, York University, Toronto, Ontario, Canada
Robert Cribbie Quantitative Methods Program, Department of Psychology, York University, Toronto, Ontario, Canada

DOI:

https://doi.org/10.15626/MP.2020.2643

Keywords:

power analysis, statistical power, Type II error rate, precision-based sample planning, minimally meaningful effect size

Abstract

Many bodies recommend that a sample planning procedure, such as traditional NHST a priori power analysis, is conducted during the planning stages of a study. Power analysis allows the researcher to estimate how many participants are required in order to detect a minimally meaningful effect size at a specific level of power and Type I error rate. However, there are several drawbacks to the procedure that render it “a mess.” Specifically, the identification of the minimally meaningful effect size is very challenging, the procedure is not precision oriented, and does not guide the researcher to collect as many participants as feasibly possible. In this study, we explore how these three theoretical issues are reflected in applied psychological research in order to better understand whether these issues are concerns in practice. To investigate how power analysis is currently used, this study reviewed the reporting of 443 power analyses in high impact Psychology journals in 2016 and 2017 using Google Scholar. It was found that researchers rarely use the minimally meaningful effect size as a rationale for the chosen effect in a power analysis. Further, precision-based approaches and collecting the maximum sample size feasible are almost never used in tandem with power analyses. In light of these findings, we offer that researchers should focus on tools beyond traditional power analysis when sample planning, such as collecting the maximum sample size feasible.

References

Aberson, C. L. (2015). Statistical power analysis [Advance online publication]. In R. A. Scott & S. M. Kosslyn (Eds.), Emerging trends in the behavioral and social sciences. Wiley.

Aberson, C. L. (2018). Improving scientific practices and increasing access. Analyses of Social Issues and Public Policy, 18(1), 7–10. https://doi.org/10.1111/asap.12152

Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of Experimental Social Psychology, 74, 187–195. https://doi.org/10.1016/j.jesp.2017.09.004

Algermissen, J., & Mehler, D. M. (2018). May the power be with you: Are there highly powered studies in neuroscience, and how can we get more of them? Journal of Neurophysiology, 119(6), 2114–2117. https://doi.org/10.1152/jn.00765.2017

Association for Psychological Science. (2018). Submission guidelines. https://www.psychologicalscience.org/publications/psychological_science/ps-submissions

Bakker, M., Veldkamp, C. L., & van Assen, M. A. (2020). Recommendations in pre-registrations and internal review board proposals promote formal power analyses but do not increase sample size. PLOS One, 15(7), e0236079. https://doi.org/10.1371/journal.pone.0236079

Batterham, A. M., & Atkinson, G. (2005). How big does my sample need to be? A primer on the murky world of sample size estimation. Physical Therapy in Sport, 6(3), 153–163. https://doi.org/10.1016/j.ptsp.2005.05.004

Champely, S. (2020). pwr: Basic functions for power analysis [R package version 1.3-0]. https://CRAN.R-project.org/package=pwr

Cook, J. A., Hislop, J., Altman, D. G., & Ramsay, C. R. (2018). Delta2 guidance on choosing the target difference and undertaking and reporting the sample size calculation for a randomised controlled trial. BMJ, 363, k3750. https://doi.org/10.1136/bmj.k3750

Corty, E. W., & Corty, R. W. (2011). Setting sample size to ensure narrow confidence intervals for precise estimation of population values. Nursing Research, 60(2), 148–153. https://doi.org/10.1097/NNR.0b013e318209785a

Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.

Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29. https://doi.org/10.1177/0956797613504966

Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in Psychology, 5, 781. https://doi.org/10.3389/fpsyg.2014.00781

Elsevier. (2020). Guide for authors. https://www.elsevier.com/journals/journal-of-experimental-social-psychology/0022-1031/guide-for-authors

Fayers, P. M., Jordhøy, M. S., & Kaasa, S. (2000). Sample size calculation for clinical trials: The impact of clinician beliefs. British Journal of Cancer, 82(1), 213–219. https://doi.org/10.1054/bjoc.1999.0902

Gelman, A. (2019). Don’t calculate post-hoc power using observed estimate of effect size. Annals of Surgery, 269(1). https://doi.org/10.1097/sla.0000000000002908

Giner-Sorolla, R., Bain, P., Guay, M., & Gromet, D. (2019). Power to detect what? Considerations for planning and evaluating sample size [Preprint]. https://osf.io/jnmya/

Guo, J. H., & Luh, W. M. (2009). Optimum sample size allocation to minimize cost or maximize power for the two-sample trimmed mean test. British Journal of Mathematical and Statistical Psychology, 62(2), 283–298. https://doi.org/10.1348/000711007X267289

Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55(1), 19–24. https://doi.org/10.1198/000313001300339897

Kelley, K., Darku, F. B., & Chattopadhyay, B. (2018). Accuracy in parameter estimation for a general class of effect sizes: A sequential approach. Psychological Methods, 23(2), 226–243. https://doi.org/10.1037/13619-012

Kelley, K., & Maxwell, S. E. (2012). Sample size planning. In H. Cooper, P. M. Camic, D. L. Long, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.), APA handbook of research methods in psychology, Vol. 1. Foundations, planning, measures, and psychometrics (pp. 181–202). American Psychological Association. https://doi.org/10.1177/0163278703255242

Kelley, K., Maxwell, S. E., & Rausch, J. R. (2003). Obtaining power or obtaining precision: Delineating methods of sample-size planning. Evaluation & the Health Professions, 26(3), 258–287. https://doi.org/10.1177/0163278703255242

Kelley, K., & Rausch, J. R. (2006). Sample size planning for the standardized mean difference: Accuracy in parameter estimation via narrow confidence intervals. Psychological Methods, 11(4), 363–385. https://doi.org/10.1037/1082-989X.11.4.363

Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research. Sage.

Maxwell, J. A. (2004). Causal explanation, qualitative research, and scientific inquiry in education. Educational Researcher, 33(2), 3–11. https://doi.org/10.3102/0013189X033002003

Mistler, S. (2012). Planning your analyses: Advice for avoiding analysis problems in your research. https://www.apa.org/science/about/psa/2012/11/planning-analyses

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, aac4716. https://doi.org/10.1126/science.aac4716

Rost, D. H. (1991). Effect strength vs. statistical significance: A warning against the danger of small samples. European Journal of High Ability, 2(2), 236–243. https://doi.org/10.1080/0937445910020212

Schauer, J. M., & Hedges, L. V. (2020). Assessing heterogeneity and power in replications of psychological experiments [Advance online publication]. Psychological Bulletin. https://doi.org/10.1037/bul0000232

Sedlmeier, P., & Gigerenzer, G. (1992). Do studies of statistical power have an effect on the power of studies? In A. E. Kazdin (Ed.), Methodological issues and strategies in clinical research (pp. 389–406). American Psychological Association. https://doi.org/10.1037/10109-032

Tanaka, J. (1987). "How big is big enough?": Sample size and goodness of fit in structural equation models with latent variables. Child Development, 58(1), 134–146. https://doi.org/10.2307/1130296

Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31(3), 25–32. https://doi.org/10.3102/0013189X031003025

Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority. Chapman Hall/CRC.

Wilkinson, L. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54(8), 594–604. https://doi.org/10.1037/0003-066X.54.8.594

Williamson, P., Altman, D. G., Blundell, N. J., & Babiker, A. (2000). Statistical review by research ethics committees. Journal of the Royal Statistical Society: Series A (Statistics in Society), 163(1), 5–13. https://doi.org/10.1111/1467-985X.00152

Zodpey, S. P. (2004). Sample size and power analysis in medical research. Indian Journal of Dermatology, Venereology, and Leprology, 70(2), 123–128.