Peer review, DORA, and science

Today we continue our blog series for Peer Review Week 2020 with a piece by Haakon Gjerløw, discussing the San Francisco Declaration on Research Assessment (DORA) and how its criticism of the use of publication metrics in research assessment relates to peer review in journals.

Photo: Martin Adams

Without peer review, we are nothing but well-paid bloggers.

The system of peer review and the hierarchy of journal prestige are tightly connected for two reasons. First, more prestigious journals attract more prestigious, and hopefully better, reviewers. Second, when researchers review for more prestigious journals, they are stricter than when reviewing for less prestigious journals, since they are aware that more stringent expectations are placed on the editorial process.

This hierarchy, and the peer review process, has come under increased scrutiny as a part of the debate surrounding open access and Plan S. The most sceptical individuals believe the system is broken. There is no correlation, they argue, between the hierarchy of journals and the quality of the research journals publish. Consequently, they support what is called the San Francisco Declaration on Research Assessment (the DORA declaration). The core of the debate concerns a part of the declaration that argues that institutions should ignore journal hierarchies in assessment processes for both recruitment and funding, and instead use qualitative indicators for research impact. In other words, read and evaluate the research yourself.

I have several reservations against such a system. In the following, I highlight two important ones. First, that it invites discrimination. Second, that it rests on a logical fallacy.

Rigged for discrimination

The expectation that hiring and funding committees should read all research by all applicants is naïve at best. If we disregard the enormous costs such a system implies, it rests on an assumption that the committee is qualified to evaluate all research. My own subject, political science, encompasses a range of topics, such as international war and diplomacy, municipal government, political and economic development, to name but a few. I am in no position to evaluate the quality of research within the vast majority of these topics, because I am unfamiliar with the theoretical frontier. This is not a problem in the system of journal peer review, which is the basis of the journal hierarchy, because reviewers will be invited based on more specific expertise.

For institutions committed to DORA, however, this invites both conscious and subconscious discrimination. Everyone who has ever been in a hiring or funding committee know that researchers have all kinds of prejudice against topics they are not familiar with. They will, in general, tend to favour applications dealing with subjects that are close to their hearts, and rank applications accordingly. While this might sometimes be done explicitly, lack of expertise increases the influence of irrelevant heuristics when evaluating applicants. A system of hierarchy that is outside of the control of these committees – such as the journal hierarchy – constrains the influence of such (sub)conscious discrimination and pushes the system towards greater meritocracy.

Of course, the risk of discrimination in purely qualitative assessment processes does not stop at discrimination regarding topics. If one believes that hiring committees suffer from (sub)conscious discrimination based on, say, gender or ethnicity, then your reaction should be that the same individuals should not be given control of defining the hierarchy of the research being assessed without strict anonymity. Such anonymity is not usually practiced in academic hiring.

Furthermore, and even if one were to enforce anonymity in the DORA process, given the importance of the most prestigious journals, any discrimination is more likely to be uncovered, criticized, and hopefully corrected. Such an investigation was, for example, recently conducted in top political science journals, after they came under criticism for discriminating against women. I find it very hard to believe that individual institutions and processes are subject to the same scrutiny.

You cannot have it both ways

It remains unclear why the critics of the status quo believe that researchers conducting journal peer review are unable to distinguish ‘good’ versus ‘bad’ research, but that those same researchers can make this distinction when conducting assessments on the basis of DORA principles. You simply cannot have it both ways. Either researchers are able to do peer review, or they are unable to make assessments in accordance with DORA.

If, as a response to my first criticism, institutions were to hire specific expertise in order to be able to ‘DORA assess’ research from topics they are unfamiliar with, these researchers would be doing evaluations that had already been done during journal peer review. Even if we scrap journal peer review, the evaluations will overlap with all other DORA-compliant assessments by other institutions. The inefficiencies of such a system adds to the already enormous costs of DORA-compliant assessments.

Without peer review, we are simply bloggers

Without going into a further discussion on what science is, peer review is the only institutional feature that distinguishes academic production from any other production of text. The raison d’être of philosophy of science is to try to figure out ways to discriminate good truth statements from bad truth statements. Since the discovery of the scientific method, we have a superior toolkit for this. The goal of peer review is to demand the proper use of this toolkit for anything that wants the stamp of science.

This is important, because modern academia is, it is my belief, among the most important institutions we have. Our goal should be to make peer review work as well as possible.

However, I should note that the DORA declaration has several suggestions for an improved system that I in fact support. For example, DORA suggests that publishers introduce a range of article-level metrics, such as one metric for the theoretical contribution of the article and one for the empirical contribution. We could also make the reviews themselves available, making it possible for others to review reviews. (For a discussion of open peer review, see Sebastian Schutte’s blog in this series [LINK].) Going forward, improving the system should be the focus of attention, not sacking it.

 

Haakon Gjerløw is a Senior Researcher at PRIO. His research investigates the dynamics of democracies and autocracies using quantitative methods.

Share this:

2 Comments

Kevin K. Lehmann

Speaking from my experience in the physical sciences, it is clear that the highest tier journals have far worse reviewing than the top journals devoted to specific subfields. The editors of Science and Nature have to cover much broader subject areas and are almost never scientists actively working in the field of a submission. As a result, they simply do not know, as well as specialists, whom to consult. In discussion of selectivity, it is important to distinguish the rigor and quality of the research vs. how “impactful” or of broad interest an article will be. The highest tier journals focus almost exclusively on the later and expect their reviewers to do the same. As the Schoen scandal demonstrated, papers in Science and Nature are subjected for far lower standards of technical evaluation than specialist journals. The top tier journals publish articles largely devote of technical detail, which is considered not of interest to the broad audience they seek to appeal tom. The much more common use of supplementary material has reduced this limitation to a significant degree, but not entirely eliminated it. It is widely acknowledged that articles in the highest tier journals are much more likely to be wrong than those in specialist journals. This is to be expected for articles in emerging and highly competitive fields, but it is indefensible to claim that the highest tier journals publish articles that are held to higher standards of rigor; rather it is clearly the opposite.

Brian Deer

“The raison d’être of philosophy of science is to try to figure out ways to discriminate good truth statements from bad truth statements. Since the discovery of the scientific method, we have a superior toolkit for this. The goal of peer review is to demand the proper use of this toolkit for anything that wants the stamp of science.”

It’s such a shame then that peer review is not a test of truth at all. It’s a test of plausibility. And too often not even very good at that.

Comments are closed