Table of Contents
Fetching ...

Automating Code Review: A Systematic Literature Review

Rosalia Tufano, Gabriele Bavota

TL;DR

The paper presents the largest known systematic review of automating code review, compiling 119 primary studies across 34 automated tasks and mapping the techniques, datasets, and evaluation practices used. It reveals a strong shift toward deep learning and large language models for generative tasks (e.g., generating review comments and revising code) while maintaining diverse approaches for classification and retrieval tasks, with substantial emphasis on Java and language-independent solutions. The study also analyzes replication and data sharing, finding that about half of the papers provide replication packages and highlighting ongoing challenges in data quality, evaluation metrics, usability, and deployment in industry. Collectively, the work guides researchers and practitioners by clarifying current capabilities, identifying gaps, and proposing directions toward more usable, verifiable, and impactful code review automation. The findings underscore the importance of realistic evaluations and open data to advance practical adoption in software engineering workflows.

Abstract

Code Review consists in assessing the code written by teammates with the goal of increasing code quality. Empirical studies documented the benefits brought by such a practice that, however, has its cost to pay in terms of developers' time. For this reason, researchers have proposed techniques and tools to automate code review tasks such as the reviewers selection (i.e., identifying suitable reviewers for a given code change) or the actual review of a given change (i.e., recommending improvements to the contributor as a human reviewer would do). Given the substantial amount of papers recently published on the topic, it may be challenging for researchers and practitioners to get a complete overview of the state-of-the-art. We present a systematic literature review (SLR) featuring 119 papers concerning the automation of code review tasks. We provide: (i) a categorization of the code review tasks automated in the literature; (ii) an overview of the under-the-hood techniques used for the automation, including the datasets used for training data-driven techniques; (iii) publicly available techniques and datasets used for their evaluation, with a description of the evaluation metrics usually adopted for each task. The SLR is concluded by a discussion of the current limitations of the state-of-the-art, with insights for future research directions.

Automating Code Review: A Systematic Literature Review

TL;DR

The paper presents the largest known systematic review of automating code review, compiling 119 primary studies across 34 automated tasks and mapping the techniques, datasets, and evaluation practices used. It reveals a strong shift toward deep learning and large language models for generative tasks (e.g., generating review comments and revising code) while maintaining diverse approaches for classification and retrieval tasks, with substantial emphasis on Java and language-independent solutions. The study also analyzes replication and data sharing, finding that about half of the papers provide replication packages and highlighting ongoing challenges in data quality, evaluation metrics, usability, and deployment in industry. Collectively, the work guides researchers and practitioners by clarifying current capabilities, identifying gaps, and proposing directions toward more usable, verifiable, and impactful code review automation. The findings underscore the importance of realistic evaluations and open data to advance practical adoption in software engineering workflows.

Abstract

Code Review consists in assessing the code written by teammates with the goal of increasing code quality. Empirical studies documented the benefits brought by such a practice that, however, has its cost to pay in terms of developers' time. For this reason, researchers have proposed techniques and tools to automate code review tasks such as the reviewers selection (i.e., identifying suitable reviewers for a given code change) or the actual review of a given change (i.e., recommending improvements to the contributor as a human reviewer would do). Given the substantial amount of papers recently published on the topic, it may be challenging for researchers and practitioners to get a complete overview of the state-of-the-art. We present a systematic literature review (SLR) featuring 119 papers concerning the automation of code review tasks. We provide: (i) a categorization of the code review tasks automated in the literature; (ii) an overview of the under-the-hood techniques used for the automation, including the datasets used for training data-driven techniques; (iii) publicly available techniques and datasets used for their evaluation, with a description of the evaluation metrics usually adopted for each task. The SLR is concluded by a discussion of the current limitations of the state-of-the-art, with insights for future research directions.

Paper Structure

This paper contains 35 sections, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Study selection process
  • Figure 2: Publication years
  • Figure 3: Publication venues
  • Figure 4: Availability of a working replication package by publication year