Table of Contents
Fetching ...

Using Changeset Descriptions as a Data Source to Assist Feature Location

Muslim Chochlov, Michael English, Jim Buckley

TL;DR

This paper introduces ACIR, a feature-location technique that leverages changeset descriptions from version control as the lexical basis for annotating software artifacts. By partitioning code into artifacts (file or method level) and aggregating relevant changeset texts (most recent or all), ACIR builds an IR corpus indexed with Lucene under a Vector Space Model. An empirical study on Rhino and Mylyn.Tasks assesses efficiency, granularity effects, and changeset-range influence using reenactment of change requests and standard IR metrics (MAP, MRR, effectiveness). Findings show ACIR is competitive with existing text-based FLTs, with method-level granularity reducing developer effort up to 64% and changeset-range impact depending on the project’s history. The work highlights the potential of changeset descriptions as a viable data source for FLT and outlines directions for scaling, evolution-aware selection, and integration with other FL approaches.

Abstract

Feature location attempts to assist developers in discovering functionality in source code. Many textual feature location techniques utilize information retrieval and rely on comments and identifiers of source code to describe software entities. An interesting alternative would be to employ the changeset descriptions of the code altered in that changeset as a data source to describe such software entities. To investigate this we implement a technique utilizing changeset descriptions and conduct an empirical study to observe this technique's overall performance. Moreover, we study how the granularity (i.e. file or method level of software entities) and changeset range inclusion (i.e. most recent or all historical changesets) affect such an approach. The results of a preliminary study with Rhino and Mylyn.Tasks systems suggest that the approach could lead to a potentially efficient feature location technique. They also suggest that it is advantageous in terms of the effort to configure the technique at method level granularity and that older changesets from older systems may reduce the effectiveness of the technique.

Using Changeset Descriptions as a Data Source to Assist Feature Location

TL;DR

This paper introduces ACIR, a feature-location technique that leverages changeset descriptions from version control as the lexical basis for annotating software artifacts. By partitioning code into artifacts (file or method level) and aggregating relevant changeset texts (most recent or all), ACIR builds an IR corpus indexed with Lucene under a Vector Space Model. An empirical study on Rhino and Mylyn.Tasks assesses efficiency, granularity effects, and changeset-range influence using reenactment of change requests and standard IR metrics (MAP, MRR, effectiveness). Findings show ACIR is competitive with existing text-based FLTs, with method-level granularity reducing developer effort up to 64% and changeset-range impact depending on the project’s history. The work highlights the potential of changeset descriptions as a viable data source for FLT and outlines directions for scaling, evolution-aware selection, and integration with other FL approaches.

Abstract

Feature location attempts to assist developers in discovering functionality in source code. Many textual feature location techniques utilize information retrieval and rely on comments and identifiers of source code to describe software entities. An interesting alternative would be to employ the changeset descriptions of the code altered in that changeset as a data source to describe such software entities. To investigate this we implement a technique utilizing changeset descriptions and conduct an empirical study to observe this technique's overall performance. Moreover, we study how the granularity (i.e. file or method level of software entities) and changeset range inclusion (i.e. most recent or all historical changesets) affect such an approach. The results of a preliminary study with Rhino and Mylyn.Tasks systems suggest that the approach could lead to a potentially efficient feature location technique. They also suggest that it is advantageous in terms of the effort to configure the technique at method level granularity and that older changesets from older systems may reduce the effectiveness of the technique.
Paper Structure (9 sections, 3 figures, 5 tables)

This paper contains 9 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The design of ACIR
  • Figure 2: The effort of different levels of granularity: a) when the method level effectiveness is adjusted; b) when the file level effectiveness is adjusted. * The legend to read the data is as in Table \ref{['tbl:effort_stats']}. Additionally: R - Rhino, M - Mylyn.Tasks.
  • Figure 3: The effectiveness of different changeset ranges: a) in Rhino; b) in Mylyn.Tasks. * The legend to read the data is as in Table \ref{['tbl:effort_stats']}.