Human-AI Synergy in Agentic Code Review

Suzhen Zhong; Shayan Noei; Ying Zou; Bram Adams

Human-AI Synergy in Agentic Code Review

Suzhen Zhong, Shayan Noei, Ying Zou, Bram Adams

Abstract

Code review is a critical software engineering practice where developers review code changes before integration to ensure code quality, detect defects, and improve maintainability. In recent years, AI agents that can understand code context, plan review actions, and interact with development environments have been increasingly integrated into the code review process. However, there is limited empirical evidence to compare the effectiveness of AI agents and human reviewers in collaborative workflows. To address this gap, we conduct a large-scale empirical analysis of 278,790 code review conversations across 300 open-source GitHub projects. In our study, we aim to compare the feedback differences provided by human reviewers and AI agents. We investigate human-AI collaboration patterns in review conversations to understand how interaction shapes review outcomes. Moreover, we analyze the adoption of code suggestions provided by human reviewers and AI agents into the codebase and how adopted suggestions change code quality. We find that human reviewers provide additional feedback than AI agents, including understanding, testing, and knowledge transfer. Human reviewers exchange 11.8% more rounds when reviewing AI-generated code than human-written code. Moreover, code suggestions made by AI agents are adopted into the codebase at a significantly lower rate than suggestions proposed by human reviewers. Over half of unadopted suggestions from AI agents are either incorrect or addressed through alternative fixes by developers. When adopted, suggestions provided by AI agents produce significantly larger increases in code complexity and code size than suggestions provided by human reviewers. Our findings suggest that while AI agents can scale defect screening, human oversight remains critical for ensuring suggestion quality and providing contextual feedback that AI agents lack.

Human-AI Synergy in Agentic Code Review

Abstract

Paper Structure (15 sections, 4 equations, 9 figures, 6 tables)

This paper contains 15 sections, 4 equations, 9 figures, 6 tables.

Introduction
Experiment Setup
Overview of Our Approach
Data Collection
Labeling Conversations
Interaction Pattern Extraction
Code Metric Assessment
Results
RQ1: What are the similarities and differences between the review comments by AI agents and human reviewers?
RQ2: How do interaction patterns differ between human and AI agent code reviews?
RQ3: What is the impact of code suggestions from human reviewers and AI agents on code quality?
Implication
Threats to Validity
Related Work
Conclusion

Figures (9)

Figure 1: Example of an AI agent (GitHub Copilot) reviewing a hunk. A hunk is a block of changes in the code, displaying lines added (+) and removed (-). The AI agent provides feedback with natural language and code suggestions (proposed modifications enclosed in triple backticks with the suggestion tag) to fix a typo.
Figure 2: Overview of our approach.
Figure 3: Prompt for labelling feedback types.
Figure 4: Distribution of feedback categories across the four review categories: HRH (Human reviews Human-written code), HRA (Human reviews Agent-generated code), ARH (Agent reviews Human-written code), and ARA (Agent reviews Agent-generated code).
Figure 5: Results of the Scott-Knott ESD test on Comment-to-Code Density (CD) across review categories. Categories within the same group (G1, G2, or G3) exhibit no statistically significant difference in CD.
...and 4 more figures

Human-AI Synergy in Agentic Code Review

Abstract

Human-AI Synergy in Agentic Code Review

Authors

Abstract

Table of Contents

Figures (9)