Rationale Dataset and Analysis for the Commit Messages of the Linux Kernel Out-of-Memory Killer
Mouna Dhaouadi, Bentley James Oakes, Michalis Famelis
TL;DR
The creation of a labelled dataset to analyze the code commit messages of the Linux Kernel Out-Of-Memory Killer component is detailed and aspects of rationale information, such as presence, temporal evolution, and structure are studied.
Abstract
Code commit messages can contain useful information on why a developer has made a change. However, the presence and structure of rationale in real-world code commit messages is not well studied. Here, we detail the creation of a labelled dataset to analyze the code commit messages of the Linux Kernel Out-Of-Memory Killer component. We study aspects of rationale information, such as presence, temporal evolution, and structure. We find that 98.9% of commits in our dataset contain sentences with rationale information, and that experienced developers report rationale in about 60% of the sentences in their commits. We report on the challenges we faced and provide examples for our labelling.
