Responsible Research Assessment II: A specific proposal for hiring and promotion in psychology
DOI:
https://doi.org/10.15626/MP.2024.4604Keywords:
DORA, CoARA, research assessment, research quality, impactAbstract
Traditional metric indicators of scientific productivity (e.g., journal impact factor; h-index) have been heavily criticized for being invalid and fueling a culture that focuses on the quantity, rather than the quality, of a person's scientific output. There is now a wide-spread demand for viable alternatives to current academic evaluation practices. In a previous report, we laid out four basic principles of a more responsible research assessment in academic hiring and promotion processes (Schönbrodt et al., 2025). The present paper offers a specific proposal for how these principles may be implemented in practice: We argue in favor of broadening the range of relevant research contributions and propose a set of concrete quality criteria (including a ready-to-use online tool) for research articles. These criteria are supposed to be used primarily in the first phase of the assessment process. Their function is to help establish a minimum threshold of methodological (i.e., theoretical and empirical) rigor that candidates need to pass in order to be further considered for hiring or promotion. In contrast, the second phase of the assessment process focuses more on the actual content of candidates' research and necessarily uses more narrative means of assessment. The debate over ways of replacing current invalid evaluation criteria with ones that relate more closely to scientific quality continues. Its course and outcome will depend on the willingness of researchers to get involved and help shape it.
Metrics
References
Abele-Brehm, A. E., & Bühner, M. (2016). Wer soll die Professur bekommen? Eine Untersuchung zur Bewertung von Auswahlkriterien in Berufungsverfahren der Psychologie. Psychologische Rundschau, 67(4), 250–261. https://doi.org/10.1026/0033-3042/a000335
Brembs, B., Button, K., & Munafò, M. (2013). Deep impact: Unintended consequences of journal rank. Frontiers in Human Neuroscience, 7, 291. https://doi.org/10.3389/fnhum.2013.00291
Brown, N. J. L., & Heathers, J. A. J. (2017). The grim test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363–369. https://doi.org/10.1177/1948550616673876
Chapman, C. A., Bicca-Marques, J. C., Calvignac-Spencer, S., Fan, P., Fashing, P. J., Gogarten, J., Guo, S., Hemingway, C. A., Leendertz, F., Li, B., Matsuda, I., Hou, R., Serio-Silva, J. C., & Stenseth, N. C. (2019). Games academics play and their consequences: How authorship, h-index and journal impact factors are shaping the future of academia. Proceedings of the Royal Society B: Biological Sciences, 286(1916), 20192047. https://doi.org/10.1098/rspb.2019.2047
Etzel, F. T., Seyffert-Müller, A., Schönbrodt, F. D., Kreuzer, L., Gärtner, A., Knischewski, P., & Leising, D. (2025). Inter-rater reliability in assessing the methodological quality of research papers in psychology [PsyArXiv Preprint]. https://doi.org/10.31234/osf.io/4w7rb_v2
Gärtner, A., Leising, D., & Schönbrodt, F. D. (2023). Empfehlungen zur Berücksichtigung von wissenschaftlicher Leistung bei Berufungsverfahren in der Psychologie. Psychologische Rundschau, 74(3), 166–174. https://doi.org/10.1026/0033-3042/a000630
Gärtner, A., Leising, D., & Schönbrodt, F. D. (2024). Towards responsible research assessment: How to reward research quality. PLoS Biology, 22(2), e3002553. https://doi.org/10.1371/journal.pbio.3002553
Heathers, J. A., Anaya, J., Van Der Zee, T., & Brown, N. J. (2018). Recovering data from summary statistics: Sample parameter reconstruction via iterative techniques (sprite). https://doi.org/10.7287/peerj.preprints.26968v1
Henninger, F., Shevchenko, Y., Mertens, U. K., Kieslich, P. J., & Hilbig, B. E. (2022). Lab.js: A free, open, online study builder. Behavior Research Methods, 54(2), 556–573. https://doi.org/10.3758/s13428-019-01283-5
Hicks, D., Wouters, P., Waltman, L., de Rijcke, S., & Rafols, I. (2015). Bibliometrics: The Leiden manifesto for research metrics. Nature, 520(7548), 429–431. https://doi.org/10.1038/520429a
Kepes, S., Keener, S. K., McDaniel, M. A., & Hartman, N. S. (2022). Questionable research practices among researchers in the most research-productive management programs. Journal of Organizational Behavior, 43(7), 1190–1208. https://doi.org/10.1002/job.2623
Lange, J., Freyer, N., Musfeld, P., Schönbrodt, F., & Leising, D. (2025). A checklist for incentivizing and facilitating good theory building. Zeitschrift für Psychologie, 233(4), 279–283. https://doi.org/10.1027/2151-2604/a000604
Leising, D., Gärtner, A., & Schönbrodt, F. D. (2025). Responsible Research Assessment (Parts I and II): Responses to the Commentaries. Meta-Psychology, 9. https://doi.org/10.15626/MP.2024.4603
Leising, D., Thielmann, I., Glöckner, A., Gärtner, A., & Schönbrodt, F. (2022b). Ten steps toward a better personality science – a rejoinder to the comments. Personality Science, 3, e7961. https://doi.org/10.5964/ps.7961
Leising, D., Thielmann, I., Glöckner, A., Gärtner, A., & Schönbrodt, F. (2022a). Ten steps toward a better personality science – how quality may be rewarded more in research evaluation. Personality Science, 3, e6029. https://doi.org/10.5964/ps.6029
Muna, D., Alexander, M., Allen, A., Ashley, R., Asmus, D., Azzollini, R., Bannister, M., Beaton, R., Benson, A., Berriman, G. B., Bilicki, M., Boyce, P., Bridge, J., Cami, J., Cangi, E., Chen, X., Christiny, N., Clark, C., Collins, M., & Zonca, A. (2016). The astropy problem. https://doi.org/10.48550/arXiv.1610.03159
Paulus, F. M., Cruz, N., & Krach, S. (2018). The impact factor fallacy. Frontiers in Psychology, 9, 1487. https://doi.org/10.3389/fpsyg.2018.01487
Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Höchenberger, R., Sogo, H., Kastman, E., & Lindeløv, J. K. (2019). Psychopy2: Experiments in behavior made easy. Behavior Research Methods, 51(1), 195–203. https://doi.org/10.3758/s13428-018-01193-y
R Core Team. (2024). R: A language and environment for statistical computing [[Computer software]]. R Foundation for Statistical Computing. https://www.R-project.org/
Rosseel, Y. (2012). Lavaan: An r package for structural equation modeling. Journal of Statistical Software, 48(2). https://doi.org/10.18637/jss.v048.i02
Schönbrodt, F. D., Gärtner, A., Frank, M., Gollwitzer, M., Ihle, M., Mischkowski, D., Phan, L. V., Schmitt, M., Scheel, A. M., Schubert, A.-L., Steinberg, U., & Leising, D. (2025). Responsible research assessment I: Implementing DORA and CoARA for hiring and promotion in psychology. Meta-Psychology, 9. https://doi.org/10.15626/MP.2024.4601
Stefan, A. M., & Schönbrodt, F. D. (2022). Big little lies: A compendium and simulation of p-hacking strategies. https://doi.org/10.31234/osf.io/xy2dk
The PLoS Medicine Editors. (2006). The impact factor game. PLoS Medicine, 3(6), e291. https://doi.org/10.1371/journal.pmed.0030291
Published
Issue
Section
License
Copyright (c) 2025 Anne Gärtner, Daniel Leising, Nele Freyer, Philipp Musfeld, Jens Lange, Felix D. Schönbrodt

This work is licensed under a Creative Commons Attribution 4.0 International License.