CommitBench: A Benchmark for Commit Message Generation

Maximilian Schall; Tamara Czinczoll; Gerard de Melo

CommitBench: A Benchmark for Commit Message Generation

Maximilian Schall, Tamara Czinczoll, Gerard de Melo

TL;DR

A new large-scale dataset, CommitBench, is compiled, adopting best practices for dataset creation and using CommitBench to compare existing models and show that other approaches are outperformed by a Transformer model pretrained on source code.

Abstract

Writing commit messages is a tedious daily task for many software developers, and often remains neglected. Automating this task has the potential to save time while ensuring that messages are informative. A high-quality dataset and an objective benchmark are vital preconditions for solid research and evaluation towards this goal. We show that existing datasets exhibit various problems, such as the quality of the commit selection, small sample sizes, duplicates, privacy issues, and missing licenses for redistribution. This can lead to unusable models and skewed evaluations, where inferior models achieve higher evaluation scores due to biases in the data. We compile a new large-scale dataset, CommitBench, adopting best practices for dataset creation. We sample commits from diverse projects with licenses that permit redistribution and apply our filtering and dataset enhancements to improve the quality of generated commit messages. We use CommitBench to compare existing models and show that other approaches are outperformed by a Transformer model pretrained on source code. We hope to accelerate future research by publishing the source code( https://github.com/Maxscha/commitbench ).

CommitBench: A Benchmark for Commit Message Generation

TL;DR

Abstract

Paper Structure (60 sections, 4 figures, 11 tables)

This paper contains 60 sections, 4 figures, 11 tables.

Introduction
Problem Statement
Related Work
Commit Messages
Commit Generation Approaches
Text-Based Approaches
Structure-Based Approaches
Previous Datasets
CommitGenData jiang_automatically_2017
NNgenData liu_neural-machine-translation-based_2018
PtrGNCMsgData liu_generating_2019
MutliLang loyola_neural_2017
CoDiSumData xu_commit_2019
CommitBert jung_commitbert_2021
MCMD tao_evaluation_2021
...and 45 more sections

Figures (4)

Figure 1: Example of a diff, the reference commit message, and predicted commit messages from CodeTransBase trained on the respective datasets. It can be observed that the better-generated commit message is further away from the reference message, which results in a lower evaluation score for this sample.
Figure 2: Sample of CommitBench-Test data, with predictions from CommitGen, NNGen, T5Small, T5Base, CodeTransSmall, and CodeTransBase
Figure 3: Sample of CommitBench-Test data, with predictions from CommitGen, NNGen, T5Small, T5Base, CodeTransSmall, and CodeTransBase
Figure 4: Sample of CommitBench-Test data, with predictions from CommitGen, NNGen, T5Small, T5Base, CodeTransSmall, and CodeTransBase

CommitBench: A Benchmark for Commit Message Generation

TL;DR

Abstract

CommitBench: A Benchmark for Commit Message Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)