Zero-shot reasoning for simulating scholarly peer-review
Khalid M. Saqr
TL;DR
The paper introduces xPeerd, a zero-shot reasoning framework for simulating scholarly peer review with normative constraints to ensure integrity, transparency, and contestability. It formalizes the reasoning as a constrained Bayesian-argumentation process with a Dung graph for critique attacks and supports, and deontic guards that enforce disclosure, citation verification, and human adjudication. Using a dataset of $352$ valid simulated reviews from $n = 500$ cases, it shows Revise decisions dominate across disciplines ($>50\%$), field-specific Reject rates up to $45\%$ in Health Sciences, and a stable evidence-anchoring compliance rate of $29\%$ across tasks and domains. The results position xPeerd as a reproducible, auditable benchmark tool for policy and governance in scholarly publishing, capable of auditing workflows and managing integrity risks in AI-assisted peer review.
Abstract
The scholarly publishing ecosystem faces a dual crisis of unmanageable submission volumes and unregulated AI, creating an urgent need for new governance models to safeguard scientific integrity. The traditional human-only peer review regime lacks a scalable, objective benchmark, making editorial processes opaque and difficult to audit. Here we investigate a deterministic simulation framework that provides the first stable, evidence-based standard for evaluating AI-generated peer review reports. Analyzing 352 peer-review simulation reports, we identify consistent system state indicators that demonstrate its reliability. First, the system is able to simulate calibrated editorial judgment, with 'Revise' decisions consistently forming the majority outcome (>50%) across all disciplines, while 'Reject' rates dynamically adapt to field-specific norms, rising to 45% in Health Sciences. Second, it maintains unwavering procedural integrity, enforcing a stable 29% evidence-anchoring compliance rate that remains invariant across diverse review tasks and scientific domains. These findings demonstrate a system that is predictably rule-bound, mitigating the stochasticity of generative AI. For the scientific community, this provides a transparent tool to ensure fairness; for publishing strategists, it offers a scalable instrument for auditing workflows, managing integrity risks, and implementing evidence-based governance. The framework repositions AI as an essential component of institutional accountability, providing the critical infrastructure to maintain trust in scholarly communication.
