Structured AI Integration as Quality Control for Peer Review

Decades of research shows that inter-rater reliability is low in grant/manuscript reviews. A new paper presents a case for institutional AI integration in peer review.

Mar 09, 2026

A few weeks ago I wrote about the idea that AI could serve as a rubric enforcer in peer review, reducing the variability introduced by fatigue, mood, and ordering effects while preserving the domain expertise that makes review valuable.

Rubrics, Not Vibes: Consistency as a Feature in Peer Review with AI

Stephen D. Turner

Feb 18

Read full story

Related, late last year I wrote about how unregulated individual adoption of AI in review can introduce its own biases when reviewers selectively prompt and cherry-pick AI outputs.

Prejudicial Peer Review with AI

Stephen D. Turner

August 19, 2025

Read full story

A new preprint I co-authored with Agnieszka Swiatecka-Urban (UVA School of Medicine) and Arjun Krishnan (CU Anschutz), now available on SSRN, develops these arguments more fully, with a particular focus on AI as a quality control and consistency tool for peer review.

The paper surveys decades of research documenting low inter-rater reliability in both manuscript and grant review, all of which supports the idea that reviewer assignment rather than scientific merit was the primary driver of outcomes.

Read the Paper

At the same time, the peer review system is under enormous structural pressure. Submission volumes continue to grow, reviewers are overburdened, and editors increasingly struggle to find willing reviewers. I see us entering a self-reinforcing cycle in which declining review quality encourages more speculative submissions, further taxing an already strained system.

This chart (from https://arxiv.org/stats/monthly_submissions) displays the number of new submissions received during each month since 1991. In October 2025 alone, arXiv received 27,692 submissions. That’s about **900 papers per day**, or roughly one paper submission **every 90 seconds**.

Against this backdrop, recent large-scale studies show that AI systems achieve very high concordance with human reviewers when evaluation criteria are explicit, and that structured AI feedback improved review quality in the vast majority of assessed cases in a randomized trial.

The paper proposes a framework distinguishing where AI can help (systematic rubric-based evaluation, quality control of the reviews themselves) from where human judgment remains essential (novelty assessment, feasibility, recognizing creative leaps). Surveys of researchers themselves support this kind of complementary model over full automation.

The paper includes two supplementary tables cataloging common failure modes in grant and manuscript review that are amenable to AI-assisted quality control, from score-comment misalignment and implicit demographic bias to citation coercion and unprofessional tone, along with specific AI approaches for flagging each one.

Supplementary Table 1 from the paper. Common peer review failure modes amenable to AI-assisted quality control in grant proposal review.

Supplementary Table 2 from the paper. Common peer review failure modes amenable to AI-assisted quality control in manuscript review.

We also genuinely engage with the risks, including confidentiality concerns, automation bias, gaming, and the worry that AI tools give overcommitted reviewers an excuse to skip the hard cognitive work of actually reading the manuscript. AI is a useful tool, but we try to be honest about the limitations.

Alternative views presented in the paper.

You can read the full paper here. Citation details below. The paper is in peer review right now. I’d welcome any thoughts or comments you have (after you read the paper, not just the title/abstract). You can leave a comment here or find me on Bluesky.

Read the Paper

Turner, S., Swiatecka-Urban, A., & Krishnan, A. (2026). Rubrics, Not Vibes: Structured AI Integration as Quality Control for Peer Review. Social Science Research Network 6314421. https://doi.org/10.2139/ssrn.6314421

Paired Ends

Rubrics, Not Vibes: Consistency as a Feature in Peer Review with AI

Prejudicial Peer Review with AI

Comments

Ready for more?