Responsible Research Assessment II: A specific proposal for hiring and promotion in psychology

Anne Gärtner; Daniel Leising; Nele Freyer; Philipp Musfeld; Jens Lange; Felix Schönbrodt

doi:10.15626/MP.2024.4604

Authors

Anne Gärtner TU Dresden
Daniel Leising TU Dresden https://orcid.org/0000-0001-8503-5840 (unauthenticated)
Nele Freyer
Philipp Musfeld https://orcid.org/0000-0002-6539-0105 (unauthenticated)
Jens Lange https://orcid.org/0000-0002-5375-3247 (unauthenticated)
Felix D. Schönbrodt https://orcid.org/0000-0002-8282-3910 (unauthenticated)

DOI:

https://doi.org/10.15626/MP.2024.4604

Keywords:

DORA, CoARA, research assessment, research quality, impact

Abstract

Traditional metric indicators of scientific productivity (e.g., journal impact factor; h-index) have been heavily criticized for being invalid and fueling a culture that focuses on the quantity, rather than the quality, of a person's scientific output. There is now a wide-spread demand for viable alternatives to current academic evaluation practices. In a previous report, we laid out four basic principles of a more responsible research assessment in academic hiring and promotion processes (Schönbrodt et al., 2025). The present paper offers a specific proposal for how these principles may be implemented in practice: We argue in favor of broadening the range of relevant research contributions and propose a set of concrete quality criteria (including a ready-to-use online tool) for research articles. These criteria are supposed to be used primarily in the first phase of the assessment process. Their function is to help establish a minimum threshold of methodological (i.e., theoretical and empirical) rigor that candidates need to pass in order to be further considered for hiring or promotion. In contrast, the second phase of the assessment process focuses more on the actual content of candidates' research and necessarily uses more narrative means of assessment. The debate over ways of replacing current invalid evaluation criteria with ones that relate more closely to scientific quality continues. Its course and outcome will depend on the willingness of researchers to get involved and help shape it.

References

Abele-Brehm, A. E., & Bühner, M. (2016). Wer soll die Professur bekommen? Eine Untersuchung zur Bewertung von Auswahlkriterien in Berufungsverfahren der Psychologie. Psychologische Rundschau, 67(4), 250–261. https://doi.org/10.1026/0033-3042/a000335

Brembs, B., Button, K., & Munafò, M. (2013). Deep impact: Unintended consequences of journal rank. Frontiers in Human Neuroscience, 7, 291. https://doi.org/10.3389/fnhum.2013.00291

Brown, N. J. L., & Heathers, J. A. J. (2017). The grim test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363–369. https://doi.org/10.1177/1948550616673876

Chapman, C. A., Bicca-Marques, J. C., Calvignac-Spencer, S., Fan, P., Fashing, P. J., Gogarten, J., Guo, S., Hemingway, C. A., Leendertz, F., Li, B., Matsuda, I., Hou, R., Serio-Silva, J. C., & Stenseth, N. C. (2019). Games academics play and their consequences: How authorship, h-index and journal impact factors are shaping the future of academia. Proceedings of the Royal Society B: Biological Sciences, 286(1916), 20192047. https://doi.org/10.1098/rspb.2019.2047

Etzel, F. T., Seyffert-Müller, A., Schönbrodt, F. D., Kreuzer, L., Gärtner, A., Knischewski, P., & Leising, D. (2025). Inter-rater reliability in assessing the methodological quality of research papers in psychology [PsyArXiv Preprint]. https://doi.org/10.31234/osf.io/4w7rb_v2

Gärtner, A., Leising, D., & Schönbrodt, F. D. (2023). Empfehlungen zur Berücksichtigung von wissenschaftlicher Leistung bei Berufungsverfahren in der Psychologie. Psychologische Rundschau, 74(3), 166–174. https://doi.org/10.1026/0033-3042/a000630

Gärtner, A., Leising, D., & Schönbrodt, F. D. (2024). Towards responsible research assessment: How to reward research quality. PLoS Biology, 22(2), e3002553. https://doi.org/10.1371/journal.pbio.3002553

Heathers, J. A., Anaya, J., Van Der Zee, T., & Brown, N. J. (2018). Recovering data from summary statistics: Sample parameter reconstruction via iterative techniques (sprite). https://doi.org/10.7287/peerj.preprints.26968v1

Henninger, F., Shevchenko, Y., Mertens, U. K., Kieslich, P. J., & Hilbig, B. E. (2022). Lab.js: A free, open, online study builder. Behavior Research Methods, 54(2), 556–573. https://doi.org/10.3758/s13428-019-01283-5

Hicks, D., Wouters, P., Waltman, L., de Rijcke, S., & Rafols, I. (2015). Bibliometrics: The Leiden manifesto for research metrics. Nature, 520(7548), 429–431. https://doi.org/10.1038/520429a

Kepes, S., Keener, S. K., McDaniel, M. A., & Hartman, N. S. (2022). Questionable research practices among researchers in the most research-productive management programs. Journal of Organizational Behavior, 43(7), 1190–1208. https://doi.org/10.1002/job.2623

Lange, J., Freyer, N., Musfeld, P., Schönbrodt, F., & Leising, D. (2025). A checklist for incentivizing and facilitating good theory building. Zeitschrift für Psychologie, 233(4), 279–283. https://doi.org/10.1027/2151-2604/a000604

Leising, D., Gärtner, A., & Schönbrodt, F. D. (2025). Responsible Research Assessment (Parts I and II): Responses to the Commentaries. Meta-Psychology, 9. https://doi.org/10.15626/MP.2024.4603

Leising, D., Thielmann, I., Glöckner, A., Gärtner, A., & Schönbrodt, F. (2022b). Ten steps toward a better personality science – a rejoinder to the comments. Personality Science, 3, e7961. https://doi.org/10.5964/ps.7961

Leising, D., Thielmann, I., Glöckner, A., Gärtner, A., & Schönbrodt, F. (2022a). Ten steps toward a better personality science – how quality may be rewarded more in research evaluation. Personality Science, 3, e6029. https://doi.org/10.5964/ps.6029

Muna, D., Alexander, M., Allen, A., Ashley, R., Asmus, D., Azzollini, R., Bannister, M., Beaton, R., Benson, A., Berriman, G. B., Bilicki, M., Boyce, P., Bridge, J., Cami, J., Cangi, E., Chen, X., Christiny, N., Clark, C., Collins, M., & Zonca, A. (2016). The astropy problem. https://doi.org/10.48550/arXiv.1610.03159

Paulus, F. M., Cruz, N., & Krach, S. (2018). The impact factor fallacy. Frontiers in Psychology, 9, 1487. https://doi.org/10.3389/fpsyg.2018.01487

Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Höchenberger, R., Sogo, H., Kastman, E., & Lindeløv, J. K. (2019). Psychopy2: Experiments in behavior made easy. Behavior Research Methods, 51(1), 195–203. https://doi.org/10.3758/s13428-018-01193-y

R Core Team. (2024). R: A language and environment for statistical computing [[Computer software]]. R Foundation for Statistical Computing. https://www.R-project.org/

Rosseel, Y. (2012). Lavaan: An r package for structural equation modeling. Journal of Statistical Software, 48(2). https://doi.org/10.18637/jss.v048.i02

Schönbrodt, F. D., Gärtner, A., Frank, M., Gollwitzer, M., Ihle, M., Mischkowski, D., Phan, L. V., Schmitt, M., Scheel, A. M., Schubert, A.-L., Steinberg, U., & Leising, D. (2025). Responsible research assessment I: Implementing DORA and CoARA for hiring and promotion in psychology. Meta-Psychology, 9. https://doi.org/10.15626/MP.2024.4601

Stefan, A. M., & Schönbrodt, F. D. (2022). Big little lies: A compendium and simulation of p-hacking strategies. https://doi.org/10.31234/osf.io/xy2dk

The PLoS Medicine Editors. (2006). The impact factor game. PLoS Medicine, 3(6), e291. https://doi.org/10.1371/journal.pmed.0030291