News and Opinion

On the Increasingly Stochastic Peer Review

The peer review system is broken. But artificial intelligence won't save us. Instead, editors and conference chairs should at least try.

Not a month goes by without reading a piece in popular media, be it a column in a newspaper or a blog post, about scientific peer review being broken. Increasingly, as an editor, as an author, and as a peer reviewer, I tend to agree with these critical opinions.

Granted, the peer review system is under a tremendous pressure due to the volume of manuscripts submitted. As an anecdote: during the past two years, I have had three manuscripts "rejected" due to editors not having found peer reviewers and two manuscripts still in editorial systems after fourteen months. Having said that, none of these issues matter for established researchers - yet, for doctoral researchers and even for postdoctoral researchers they can be a career-breaker.

Three notorious points:

1. Increasing use of large language models (LLMs) by peer reviewers, whether fellow peers reviewing a manuscript or peers reviewing my own manuscript. You can smell it easily; vague, oftentimes incorrect, and generally of poor quality. Among other things, several times I have seen a "peer review" that is merely a short, one-paragraph summary of a manuscript. Publishers are also taking a note in the hype, and thus you already have start-ups offering LLM-based peer review for publishers. However, upon a little investigation, many of them build upon prompts such as "find five to seven mistakes in this manuscript". Such prompts will destroy the whole science enterprise.

2. Increasing prevalence of vague, unprofessional, and unethical but not LLM-based reviews. Recently, Sankaralingam, from ACM fame, had a great take on these types of reviews. In particular, actually reading a manuscript is nowadays often lacking in peer review, criticism without substance is increasingly common, and finding so-called "fatal flaws" are used by many reviewers to "kill papers".

3. Increasing reliance by editors on voting. As scientists, we should all know that having two, three, or even ten peer reviewers is not a representative sample of anything. This voting practice is particularly pronounced in computer science due to the still strong preference of publishing in conferences. When coupled with voting, conferences reproduce what was already from the 1960s onward recognized as a silly idea for grading students according to the normal distribution. If all submissions to a conference are excellent, the criteria for rejections are more or less arbitrary. In reverse: if all submissions are below a general quality level in a field, the conference in question should be probably canceled.

How to deal with these issues? One suggestion would be to get rid of voting altogether. In other words, submission systems could remove the recommendation options to accept, revise, and whatnot altogether. Editors and conference chairs would then have to actually read the reviews and the submissions to deduce about verdicts. To this end, Smith's classical take onĀ the task of the referee are worth quoting:

After the editor has received a sufficient number of referee reports, typically three, the editor must decide whether to accept the paper, and if so, to what extent revisions are required. The editor does not simply count the referee reports as votes. The editor must read the referee report recommendations, and their reasons, and must decide, using his own judgement, whether to accept the paper. An editor, in theory, can overrule the unanimous recommendation of the referees; in practice, the editor can and sometimes does side with a minority of the referees. It is important that the referees state the reasons for their recommendations and justify them; those reasons count as heavily or more heavily than the recommendations themselves.

As for the pressure with volumes, I would recommend editors to also desk reject much more liberally. You do not want to know what kind of things I have had a privilege to review recently. Another testable idea would be to have a fast-track channel for doctoral researchers and the like.