Table of Contents
Fetching ...

Understanding Code Understandability Improvements in Code Reviews

Delano Oliveira, Reydne Santos, Benedito de Oliveira, Martin Monperrus, Fernando Castor, Fernanda Madeiral

TL;DR

This work manually analyzed 2,401 code review comments from Java open-source projects on GitHub and found that over 42% of all comments focus on improving code understandability, demonstrating the significance of this quality attribute in code reviews.

Abstract

Motivation: Code understandability is crucial in software development, as developers spend 58% to 70% of their time reading source code. Improving it can improve productivity and reduce maintenance costs. Problem: Experimental studies often identify factors influencing code understandability in controlled settings but overlook real-world influences like project culture, guidelines, and developers' backgrounds. Ignoring these factors may yield results with limited external validity. Objective: This study investigates how developers enhance code understandability through code review comments, assuming that code reviewers are specialists in code quality. Method and Results: We analyzed 2,401 code review comments from Java open-source projects on GitHub, finding that over 42% focus on improving code understandability. We further examined 385 comments specifically related to this aspect and identified eight categories of concerns, such as inadequate documentation and poor identifiers. Notably, 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted. We identified various types of patches that enhance understandability, from simple changes like removing unused code to context-dependent improvements such as optimizing method calls. Additionally, we evaluated four well-known linters for their ability to flag these issues, finding they cover less than 30%, although many could be easily added as new rules. Implications: Our findings encourage the development of tools to enhance code understandability, as accepted changes can serve as reliable training data for specialized machine-learning models. Our dataset supports this training and can inform the development of evidence-based code style guides. Data Availability: Our data is publicly available at https://codeupcrc.github.io.

Understanding Code Understandability Improvements in Code Reviews

TL;DR

This work manually analyzed 2,401 code review comments from Java open-source projects on GitHub and found that over 42% of all comments focus on improving code understandability, demonstrating the significance of this quality attribute in code reviews.

Abstract

Motivation: Code understandability is crucial in software development, as developers spend 58% to 70% of their time reading source code. Improving it can improve productivity and reduce maintenance costs. Problem: Experimental studies often identify factors influencing code understandability in controlled settings but overlook real-world influences like project culture, guidelines, and developers' backgrounds. Ignoring these factors may yield results with limited external validity. Objective: This study investigates how developers enhance code understandability through code review comments, assuming that code reviewers are specialists in code quality. Method and Results: We analyzed 2,401 code review comments from Java open-source projects on GitHub, finding that over 42% focus on improving code understandability. We further examined 385 comments specifically related to this aspect and identified eight categories of concerns, such as inadequate documentation and poor identifiers. Notably, 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted. We identified various types of patches that enhance understandability, from simple changes like removing unused code to context-dependent improvements such as optimizing method calls. Additionally, we evaluated four well-known linters for their ability to flag these issues, finding they cover less than 30%, although many could be easily added as new rules. Implications: Our findings encourage the development of tools to enhance code understandability, as accepted changes can serve as reliable training data for specialized machine-learning models. Our dataset supports this training and can inform the development of evidence-based code style guides. Data Availability: Our data is publicly available at https://codeupcrc.github.io.

Paper Structure

This paper contains 26 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Inline code review suggestion to replace an if statement by a ternary expression r409002709.
  • Figure 2: Inline code review suggestion to replace a ternary expression by an if statement r489083957.
  • Figure 3: Example of comment comprised solely of images r423093729.
  • Figure 4: Example of a suggested change to preserve the function but change the logic r471343349.
  • Figure 5: The reviewer explicitly asks for understandability improvement r385993973.
  • ...and 2 more figures