Bayesian Evaluation of Replication Studies
DOI:
https://doi.org/10.15626/MP.2020.2554Keywords:
Bayes factor, Informative Hypothesis, Replication Crisis, Replication StudyAbstract
In this paper a method is proposed to determine whether the result from an original study is corroborated in a replication study. The paper is illustrated using two replication studies and the corresponding original studies from the Reproducibility Project: Psychology by the Open Science Collaboration. This method emphasizes the need to determine what one wants to replicate from the original paper. This can be done by translating the research hypotheses formulated in the introduction into informative hypotheses, or, by translating the results into interval hypotheses. The Bayes factor will be used to determine whether the hypotheses resulting from the original study are corroborated by the replication study. Our method to assess the successfulness of replication will better fit the needs and desires of researchers in fields that use replication studies.
Metrics
References
Anderson, S., Kelley, K., & Maxwell, S. (2017). Sample-size planning for more accurate statistical power: A method adjusting sample effect sizes for publication bias and uncertainty. Psychological Science, 28(11), 1547–1562.
Anderson, S., & Maxwell, S. (2016). There’s more than one way to conduct a replication study: Beyond statistical significance. Psychological Methods, 21(1), 1–12.
Bargh, J. A., Chen, M., & Burrows, L. (1996). Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action. Journal of Personality and Social Psychology, 71(2), 230–244.
Berger, J., & Pericchi, L. (2004). Training samples in objective Bayesian model selection. The Annals of Statistics, 32(3), 841–869.
Boeije, H. (2010). Analysis in qualitative research. Sage.
Brandt, M., IJzerman, H., Dijksterhuis, A., Farach, F., Geller, J., Giner-Sorolla, R., Grange, J., Perugini, N., Spies, J., & Van ’t Veer, A. (2014). The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217–224.
Charmaz, K. (2006). Constructing grounded theory: A practical guide through qualitative analysis. Sage.
Cho, H.-C., & Abe, S. (2013). Is two-tailed testing for directional research hypotheses tests legitimate? Journal of Business Research, 66(9), 1261–1266.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Lawrence Erlbaum Associates.
Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers in Psychology, 5, Article 778.
Doyen, S., Klein, O., Pichon, C. L., & Cleeremans, A. (2012). Behavioral priming: It’s all in the mind, but whose mind? PLoS ONE, 7(1), e29081.
Erdfelder, E., Faul, F., & Buchner, A. (1996). GPower: A general power analysis program. Behavior Research Methods, Instruments, & Computers, 28(1), 1–11.
Etz, A., & Vandekerckhove, J. (2016). A Bayesian perspective on the reproducibility project: Psychology. PLoS ONE, 11(2), e0149794.
Field, S., Hoekstra, R., Brinkmann, L., & Van Ravenzwaaij, D. (2019). When and why to replicate: As easy as 1, 2, 3? Collabra: Psychology, 5(1), Article 46.
Fu, Q. (2022). Sample size determination for Bayesian informative hypothesis testing [Doctoral dissertation, Utrecht University]. Utrecht University Repository. https://dspace.library.uu.nl/handle/1874/416118
Galak, J. (2012). Replication of study 1 by Janssen, Schirm, Mahon, & Caramazza (2008, JEP:LMC). OSF. Retrieved from https://osf.io/uhpyr/
Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician, 60(4), 328–331.
Gigerenzer, G. (2004). Judgment and decision making. In Blackwell handbook of judgment and decision making. Blackwell Publishing.
Glaser, B. (1978). Theoretical sensitivity. The Sociology Press.
Gu, X., Mulder, J., & Hoijtink, H. (2018). Approximate adjusted fractional Bayes factors: A general method for testing informative hypotheses. British Journal of Mathematical and Statistical Psychology, 71(2), 229–261.
Harms, C. (2018). A Bayes factor for replications of ANOVA results. The American Statistician, 73(4), 327–339.
Hawthorne, J. (2021). Inductive logic. In The Stanford Encyclopedia of Philosophy (Spring 2021 Edition). Retrieved December 2, 2022, from https://plato.stanford.edu/archives/spr2021/entries/logic-inductive/
Hoijtink, H. (2012). Informative hypotheses: Theory and practice for behavioral and social scientists. Chapman & Hall/CRC.
Hoijtink, H. (2022). Prior sensitivity of null hypothesis Bayesian testing. Psychological Methods, 27(5), 804–821.
Hoijtink, H., Gu, X., Mulder, J., & Rosseel, Y. (2019a). Bayesian evaluation of informative hypotheses for multiple populations. British Journal of Mathematical and Statistical Psychology, 72(2), 219–243.
Hoijtink, H., Gu, X., Mulder, J., & Rosseel, Y. (2019b). Computing Bayes factors from data with missing values. Psychological Methods, 24(2), 253–268.
Hoijtink, H., Mulder, J., Van Lissa, C., & Gu, X. (2019). A tutorial on testing hypotheses using the Bayes factor. Psychological Methods, 24(5), 539–556.
Hüffmeier, J., Mazei, J., & Schultze, T. (2016). Reconceptualizing replication as a sequence of different studies: A replication typology. Journal of Experimental Psychology, 66, 81–92.
Informative hypotheses [Computer software]. (2019). Retrieved from https://informative-hypotheses.sites.uu.nl/software/bain/
Janssen, N., Schirm, W., Mahon, B., & Caramazza, A. (2008). Semantic interference in a delayed naming task: Evidence for the response exclusion hypothesis. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(1), 249–256.
Joy-Gaba, J., Clay, R., & Cleary, H. (2012). Replication of keeping one’s distance: The influence of spatial distance cues on affect and evaluation by Williams & Bargh (2008, Psychological Science). OSF. Retrieved from https://osf.io/vnsqg/
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.
Klugkist, I., Laudy, O., & Hoijtink, H. (2005). Inequality constrained analysis of variance: A Bayesian approach. Psychological Methods, 10(4), 477–493.
Kuiper, R., Hoijtink, H., Buskens, V., & Raub, W. (2013). Combining statistical evidence from several studies: A method using Bayesian updating and an example from research on trust problems in social and economic exchange. Sociological Methods and Research, 42(1), 60–81.
Lakens, D., Scheel, A., & Isager, P. (2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1(2), 259–269.
Ly, A., Etz, A., Marsman, M., & Wagenmakers, E.-J. (2018). Replication Bayes factors from evidence updating. Behavior Research Methods, 51(6), 2498–2508.
Marsman, M., Schönbrodt, F., Morey, R., Yao, Y., Gelman, A., & Wagenmakers, E.-J. (2017). A Bayesian bird’s eye view of "replications of important results in social psychology." Royal Society Open Science, 4(1), Article 160426.
Morey, R. D., & Lakens, D. (n.d.). Why most of psychology is statistically unfalsifiable. Retrieved November 22, 2022, from https://raw.githubusercontent.com/richarddmorey/psychology_resolution/master/paper/response.pdf
Morey, R. D., Romeijn, J.-W., & Rouder, J. N. (2016). The philosophy of Bayes factors and the quantification of statistical evidence. Journal of Mathematical Psychology, 72, 6–18.
Mulder, J. (2014). Prior adjusted default Bayes factors for testing (in)equality constrained hypotheses. Computational Statistics and Data Analysis, 71, 448–463.
O’Hagan, A. (1995). Fractional Bayes factors for model comparison (with discussion). Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 99–138. Retrieved June 10, 2021, from http://www.jstor.org/stable/2346088
Open Science Collaboration. (2012). An open, large-scale, collaborative effort to estimate the reproducibility of psychological science. Perspectives on Psychological Science, 7(6), 657–660.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), Article aac4716.
Patil, P., Peng, R. D., & Leek, J. T. (2016). What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspectives on Psychological Science, 11(4), 539–544.
Popper, K. R. (1963). Science as falsification. In Conjectures and refutations (Vol. 1, pp. 33–39). Retrieved from https://curiousphilosophy.net/2023/09/is-sex-binary-a-reasoned-objection-to-rationality-rules-in-the-pursuit-of-truth/uploads/pdfs/Science_as_Falsification_Karl_R_Popper.pdf
Rosenthal, R. (1979). The "file drawer problem" and tolerance for null effects. Psychological Bulletin, 86(3), 638–641.
Rouder, J. (2014). Optional stopping: No problem for Bayesians. Psychonomic Bulletin & Review, 21(2), 301–308.
Schönbrodt, F. D., & Wagenmakers, E.-J. (2018). Bayes factor design analysis: Planning for compelling evidence. Psychonomic Bulletin & Review, 25(1), 128–142.
Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26(5), 559–569.
Stiekema, J. (2017). Bayesiaanse evaluatie van informatieve hypotheses als methode voor replicatiebeoordeling [Unpublished bachelor’s thesis].
Van Aert, R., & Van Assen, M. (2017). Bayesian evaluation of effect size after replicating an original study. PLoS ONE, 12(4), e0175302.
Van Lissa, C., Gu, X., Mulder, J., Rosseel, Y., Van Zundert, C., & Hoijtink, H. (2020). Teacher’s corner: Evaluating informative hypotheses using the Bayes factor in structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 28(2), 292–301.
Verhagen, J., & Wagenmakers, E.-J. (2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology: General, 143(4), 1457–1475.
Williams, L., & Bargh, J. (2008). Keeping one’s distance: The influence of spatial distance cues on affect and evaluation. Psychological Science, 19(3), 302–308.
Wilson, A. (2016). Exact replications in an inexact context: Commentary on Ebersole et al. Journal of Experimental Social Psychology, 67, 84–85.
Zondervan-Zwijnenburg, M., van de Schoot, R., & Hoijtink, H. (2019). Testing anova replications by means of the prior predictive p-value [Retrieved on January 28, 2019].
Published
Issue
Section
License
Copyright (c) 2024 Hidde Jelmer Leplaa, Charlotte Rietbergen, Herbert Hoijtink
This work is licensed under a Creative Commons Attribution 4.0 International License.