Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

William Thorne; Joseph James; Yang Wang; Chenghua Lin; Diana Maynard

Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

William Thorne, Joseph James, Yang Wang, Chenghua Lin, Diana Maynard

TL;DR

It is concluded that current LLMs may provide supplementary value within EPSRC review but exhibit high variability and misaligned review priorities, and a perturbation-based framework probing LLM sensitivity across six quality axes.

Abstract

As AI-assisted grant proposals outpace manual review capacity in a kind of ``Malthusian trap'' for the research ecosystem, this paper investigates the capabilities and limitations of LLM-based grant reviewing for high-stakes evaluation. Using six EPSRC proposals, we develop a perturbation-based framework probing LLM sensitivity across six quality axes: funding, timeline, competency, alignment, clarity, and impact. We compare three review architectures: single-pass review, section-by-section analysis, and a 'Council of Personas' ensemble emulating expert panels. The section-level approach significantly outperforms alternatives in both detection rate and scoring reliability, while the computationally expensive council method performs no better than baseline. Detection varies substantially by perturbation type, with alignment issues readily identified but clarity flaws largely missed by all systems. Human evaluation shows LLM feedback is largely valid but skewed toward compliance checking over holistic assessment. We conclude that current LLMs may provide supplementary value within EPSRC review but exhibit high variability and misaligned review priorities. We release our code and any non-protected data.

Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

TL;DR

Abstract

Paper Structure (47 sections, 1 equation, 3 figures, 12 tables)

This paper contains 47 sections, 1 equation, 3 figures, 12 tables.

Introduction
Related Work
LLM-Assisted Peer Review
Grant Reviewing vs. Paper Reviewing
Stance Detection and Argument Mining
Evaluation via Perturbation
Methodology
Review Frameworks
Zero-shot Baseline
Section-Level Review Framework
Council of Personas
Data
Assessment Context
Preprocessing
Perturbation Strategy
...and 32 more sections

Figures (3)

Figure 1: Detection scores across review systems and perturbation categories. Darker cells indicate higher detection rates. The section-level system shows strongest performance on alignment and impact perturbations, while all systems fail on clarity-based changes.
Figure 2: \ref{['tab:claim-influence']} shows the scale for severity. The dashed line on agreement indicates the neutral agreement.
Figure 3: \ref{['tab:claim-influence']} shows the scale for severity. The dashed line on agreement indicates the neutral stance (2 on the scale).

Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

TL;DR

Abstract

Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

Authors

TL;DR

Abstract

Table of Contents

Figures (3)