Table of Contents
Fetching ...

Understanding Code Change with Micro-Changes

Lei Chen, Michele Lanza, Shinpei Hayashi

TL;DR

This work tackles the challenge of understanding developer code changes by converting textual diffs into a semantic, natural-language layer called micro-changes. It defines a 20-type catalog focused on conditional changes and develops an automated detector that integrates AST-based diffs with refactoring signals. Across 73 open-source Java repositories, the approach explains about $67.1\%$ of conditional-related changes, outperforming traditional refactoring-only explanations and achieving high precision on manual checks. The study highlights practical applications such as automated commit-message generation and micro-change–annotated code reviews, with a clear path to expanding the catalog to broader change types and languages.

Abstract

A crucial activity in software maintenance and evolution is the comprehension of the changes performed by developers, when they submit a pull request and/or perform a commit on the repository. Typically, code changes are represented in the form of code diffs, textual representations highlighting the differences between two file versions, depicting the added, removed, and changed lines. This simplistic representation must be interpreted by developers, and mentally lifted to a higher abstraction level, that more closely resembles natural language descriptions, and eases the creation of a mental model of the changes. However, the textual diff-based representation is cumbersome, and the lifting requires considerable domain knowledge and programming skills. We present an approach, based on the concept of micro-change, to overcome these difficulties, translating code diffs into a series of pre-defined change operations, which can be described in natural language. We present a catalog of micro-changes, together with an automated micro-change detector. To evaluate our approach, we performed an empirical study on a large set of open-source repositories, focusing on a subset of our micro-change catalog, namely those related to changes affecting the conditional logic. We found that our detector is capable of explaining more than 67% of the changes taking place in the systems under study.

Understanding Code Change with Micro-Changes

TL;DR

This work tackles the challenge of understanding developer code changes by converting textual diffs into a semantic, natural-language layer called micro-changes. It defines a 20-type catalog focused on conditional changes and develops an automated detector that integrates AST-based diffs with refactoring signals. Across 73 open-source Java repositories, the approach explains about of conditional-related changes, outperforming traditional refactoring-only explanations and achieving high precision on manual checks. The study highlights practical applications such as automated commit-message generation and micro-change–annotated code reviews, with a clear path to expanding the catalog to broader change types and languages.

Abstract

A crucial activity in software maintenance and evolution is the comprehension of the changes performed by developers, when they submit a pull request and/or perform a commit on the repository. Typically, code changes are represented in the form of code diffs, textual representations highlighting the differences between two file versions, depicting the added, removed, and changed lines. This simplistic representation must be interpreted by developers, and mentally lifted to a higher abstraction level, that more closely resembles natural language descriptions, and eases the creation of a mental model of the changes. However, the textual diff-based representation is cumbersome, and the lifting requires considerable domain knowledge and programming skills. We present an approach, based on the concept of micro-change, to overcome these difficulties, translating code diffs into a series of pre-defined change operations, which can be described in natural language. We present a catalog of micro-changes, together with an automated micro-change detector. To evaluate our approach, we performed an empirical study on a large set of open-source repositories, focusing on a subset of our micro-change catalog, namely those related to changes affecting the conditional logic. We found that our detector is capable of explaining more than 67% of the changes taking place in the systems under study.
Paper Structure (26 sections, 7 equations, 7 figures, 4 tables)

This paper contains 26 sections, 7 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Example of the code diff in a commit in repository HikariCP.
  • Figure 2: Overview of micro-change detection.
  • Figure 3: Rename Attribute refactoring.
  • Figure 4: Distribution of dataset.
  • Figure 5: Coverage across the dataset.
  • ...and 2 more figures