Table of Contents
Fetching ...

The List is the Process: Reliable Pre-Integration Tracking of Commits on Mailing Lists

Ralf Ramsauer, Daniel Lohmann, Wolfgang Mauerer

TL;DR

A novel method for tracking this otherwise invisible evolution of software changes on mailing lists by connecting all early revisions of changes to their final version in repositories is presented, allowing for the first time to quantitatively determine if an open development process effectively aligns with given formal process requirements.

Abstract

A considerable corpus of research on software evolution focuses on mining changes in software repositories, but omits their pre-integration history. We present a novel method for tracking this otherwise invisible evolution of software changes on mailing lists by connecting all early revisions of changes to their final version in repositories. Since artefact modifications on mailing lists are communicated by updates to fragments (i.e., patches) only, identifying semantically similar changes is a non-trivial task that our approach solves in a language-independent way. We evaluate our method on high-profile open source software (OSS) projects like the Linux kernel, and validate its high accuracy using an elaborately created ground truth. Our approach can be used to quantify properties of OSS development processes, which is an essential requirement for using OSS in reliable or safety-critical industrial products, where certifiability and conformance to processes are crucial. The high accuracy of our technique allows, to the best of our knowledge, for the first time to quantitatively determine if an open development process effectively aligns with given formal process requirements.

The List is the Process: Reliable Pre-Integration Tracking of Commits on Mailing Lists

TL;DR

A novel method for tracking this otherwise invisible evolution of software changes on mailing lists by connecting all early revisions of changes to their final version in repositories is presented, allowing for the first time to quantitatively determine if an open development process effectively aligns with given formal process requirements.

Abstract

A considerable corpus of research on software evolution focuses on mining changes in software repositories, but omits their pre-integration history. We present a novel method for tracking this otherwise invisible evolution of software changes on mailing lists by connecting all early revisions of changes to their final version in repositories. Since artefact modifications on mailing lists are communicated by updates to fragments (i.e., patches) only, identifying semantically similar changes is a non-trivial task that our approach solves in a language-independent way. We evaluate our method on high-profile open source software (OSS) projects like the Linux kernel, and validate its high accuracy using an elaborately created ground truth. Our approach can be used to quantify properties of OSS development processes, which is an essential requirement for using OSS in reliable or safety-critical industrial products, where certifiability and conformance to processes are crucial. The high accuracy of our technique allows, to the best of our knowledge, for the first time to quantitatively determine if an open development process effectively aligns with given formal process requirements.

Paper Structure

This paper contains 31 sections, 8 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Typical workflow: A patch gets resubmitted and improved for two times, before its integration
  • Figure 2: Example of two mails and one commit that were automatically found and linked by our tool
  • Figure 3: $\alpha$: sim determines the similarity (edge weights) of patches. Dashed edges remain below the threshold $\text{ta} = 0.80$. $\beta$: Connected components above the threshold form equivalence classes of similar patches. Green and orange vertices exemplarily denote patches on ML and commits respectively.
  • Figure 4: Boxplot of irrelevant parameters: filename and hunk header threshold have no substantial influence.
  • Figure 5: Illustration of the influence of autoaccept threshold, diff-length ratio and the message-diff weight (connecting lines in all figures are used to guide the eye).
  • ...and 2 more figures