Hyperparameter Optimization for AST Differencing

Matias Martinez; Jean-Rémy Falleri; Martin Monperrus

Hyperparameter Optimization for AST Differencing

Matias Martinez, Jean-Rémy Falleri, Martin Monperrus

TL;DR

The paper addresses how hyperparameters in AST differencing algorithms can degrade or improve diff quality, and introduces Diff Auto Tuning (DAT) to optimize GumTree configurations in a data-driven manner. DAT employs Grid Search, Hyperopt, and Optuna to minimize edit-script length across a training set, with global optimization learning language/meta-model defaults and local optimization tailoring results to individual file pairs. Empirical results show substantial gains: global optimization improves 21.8% (JDT) and 16.1% (Spoon) cases, while local optimization yields up to 27.4% improvements, with Hyperopt/Optuna offering faster search than exhaustive grid search. The work provides a public tool and protocol, demonstrating that hyperparameter tuning is broadly applicable to AST differencing and can be used to produce shorter, more understandable edit-scripts in practice.

Abstract

Computing the differences between two versions of the same program is an essential task for software development and software evolution research. AST differencing is the most advanced way of doing so, and an active research area. Yet, AST differencing algorithms rely on configuration parameters that may have a strong impact on their effectiveness. In this paper, we present a novel approach named DAT (Diff Auto Tuning) for hyperparameter optimization of AST differencing. We thoroughly state the problem of hyper-configuration for AST differencing. We evaluate our data-driven approach DAT to optimize the edit-scripts generated by the state-of-the-art AST differencing algorithm named GumTree in different scenarios. DAT is able to find a new configuration for GumTree that improves the edit-scripts in 21.8% of the evaluated cases.

Hyperparameter Optimization for AST Differencing

TL;DR

Abstract

Hyperparameter Optimization for AST Differencing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)