Table of Contents
Fetching ...

Why Agentic-PRs Get Rejected: A Comparative Study of Coding Agents

Sota Nakashima, Yuta Ishimoto, Masanari Kondo, Shane Mclntosh, Yasutaka Kamei

TL;DR

This study compares PR rejection reasons across multiple coding agents and human authors using the AIDev dataset, revealing agent-specific rejection modes and a high incidence of reviews with no explicit feedback. By qualitatively coding 654 rejected PRs from six authors, the authors extend prior Claude Code analyses and introduce new rejection categories tied to agent usage. They also propose practical heuristics to filter unknown feedback cases, enabling more robust cross-agent comparisons. The findings highlight critical failure modes in autonomous coding and offer a path toward more reliable agentic coding through preprocessing and future work on implicit signals. The work is relevant for improving the reliability and integration of autonomous coding agents in OSS workflows.

Abstract

Agentic coding -- software development workflows in which autonomous coding agents plan, implement, and submit code changes with minimal human involvement -- is rapidly gaining traction. Prior work has shown that Pull Requests (PRs) produced using coding agents (Agentic-PRs) are accepted less often than PRs that are not labeled as agentic (Human-PRs). The rejection reasons for a single agent (Claude Code) have been explored, but a comparison of how rejection reasons differ between Agentic-PRs generated by different agents has not yet been performed. This comparison is important since different coding agents are often used for different purposes, which can lead to agent-specific failure patterns. In this paper, we inspect 654 rejected PRs from the AIDev dataset covering five coding agents, as well as a human baseline. Our results show that seven rejection modes occur only in Agentic-PRs, including distrust of AI-generated code. We also observe agent-specific patterns (e.g., automated withdrawal of inactive PRs by Devin), reflecting differences in how agents are configured and used in practice. Notably, a large proportion of rejected PRs (67.9%) lack explicit reviewer feedback, making their rejection reasons difficult to determine. To mitigate this issue, we propose a set of heuristics that reduce the proportion of such cases, offering a practical preprocessing step for future studies of PR rejection in agentic coding.

Why Agentic-PRs Get Rejected: A Comparative Study of Coding Agents

TL;DR

This study compares PR rejection reasons across multiple coding agents and human authors using the AIDev dataset, revealing agent-specific rejection modes and a high incidence of reviews with no explicit feedback. By qualitatively coding 654 rejected PRs from six authors, the authors extend prior Claude Code analyses and introduce new rejection categories tied to agent usage. They also propose practical heuristics to filter unknown feedback cases, enabling more robust cross-agent comparisons. The findings highlight critical failure modes in autonomous coding and offer a path toward more reliable agentic coding through preprocessing and future work on implicit signals. The work is relevant for improving the reliability and integration of autonomous coding agents in OSS workflows.

Abstract

Agentic coding -- software development workflows in which autonomous coding agents plan, implement, and submit code changes with minimal human involvement -- is rapidly gaining traction. Prior work has shown that Pull Requests (PRs) produced using coding agents (Agentic-PRs) are accepted less often than PRs that are not labeled as agentic (Human-PRs). The rejection reasons for a single agent (Claude Code) have been explored, but a comparison of how rejection reasons differ between Agentic-PRs generated by different agents has not yet been performed. This comparison is important since different coding agents are often used for different purposes, which can lead to agent-specific failure patterns. In this paper, we inspect 654 rejected PRs from the AIDev dataset covering five coding agents, as well as a human baseline. Our results show that seven rejection modes occur only in Agentic-PRs, including distrust of AI-generated code. We also observe agent-specific patterns (e.g., automated withdrawal of inactive PRs by Devin), reflecting differences in how agents are configured and used in practice. Notably, a large proportion of rejected PRs (67.9%) lack explicit reviewer feedback, making their rejection reasons difficult to determine. To mitigate this issue, we propose a set of heuristics that reduce the proportion of such cases, offering a practical preprocessing step for future studies of PR rejection in agentic coding.
Paper Structure (11 sections, 1 figure, 4 tables)

This paper contains 11 sections, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Distribution of rejection reasons by agent