Table of Contents
Fetching ...

TRIAD: Automated Traceability Recovery based on Biterm-enhanced Deduction of Transitive Links among Artifacts

Hui Gao, Hongyu Kuang, Wesley K. G. Assunção, Christoph Mayr-Dorn, Guoping Rong, He Zhang, Xiaoxing Ma, Alexander Egyed

TL;DR

The paper tackles the abstraction gap in cross-artifact software traceability by introducing TRIAD, a biterm-enhanced method that leverages intermediate artifacts through consensual biterms and outer- and inner-transitive links. The approach comprises three stages: extracting intermediate-centric biterms from natural language and code, enriching artifacts and computing IR-based candidate links, and adjusting IR scores via transitive links to form robust trace paths. Empirical evaluation on five real-world systems shows TRIAD outperforms four strong baselines in AP and MAP across three IR models, with ablation studies confirming the additive value of biterms and both transitive link types. The work demonstrates meaningful improvements in traceability recovery and offers practical guidance on threshold tuning and artifact selection to adapt TRIAD to real-world settings. Data and code are publicly available, enabling replication and further research in automated traceability.

Abstract

Traceability allows stakeholders to extract and comprehend the trace links among software artifacts introduced across the software life cycle, to provide significant support for software engineering tasks. Despite its proven benefits, software traceability is challenging to recover and maintain manually. Hence, plenty of approaches for automated traceability have been proposed. Most rely on textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, artifacts in different abstraction levels usually have different textual descriptions, which can greatly hinder the performance of IR-based approaches (e.g., a requirement in natural language may have a small textual similarity to a Java class). In this work, we leverage the consensual biterms and transitive relationships (i.e., inner- and outer-transitive links) based on intermediate artifacts to improve IR-based traceability recovery. We first extract and filter biterms from all source, intermediate, and target artifacts. We then use the consensual biterms from the intermediate artifacts to extend the biterms of both source and target artifacts, and finally deduce outer and inner-transitive links to adjust text similarities between source and target artifacts. We conducted a comprehensive empirical evaluation based on five systems widely used in other literature to show that our approach can outperform four state-of-the-art approaches, and how its performance is affected by different conditions of source, intermediate, and target artifacts. The results indicate that our approach can outperform baseline approaches in AP over 15% and MAP over 10% on average.

TRIAD: Automated Traceability Recovery based on Biterm-enhanced Deduction of Transitive Links among Artifacts

TL;DR

The paper tackles the abstraction gap in cross-artifact software traceability by introducing TRIAD, a biterm-enhanced method that leverages intermediate artifacts through consensual biterms and outer- and inner-transitive links. The approach comprises three stages: extracting intermediate-centric biterms from natural language and code, enriching artifacts and computing IR-based candidate links, and adjusting IR scores via transitive links to form robust trace paths. Empirical evaluation on five real-world systems shows TRIAD outperforms four strong baselines in AP and MAP across three IR models, with ablation studies confirming the additive value of biterms and both transitive link types. The work demonstrates meaningful improvements in traceability recovery and offers practical guidance on threshold tuning and artifact selection to adapt TRIAD to real-world settings. Data and code are publicly available, enabling replication and further research in automated traceability.

Abstract

Traceability allows stakeholders to extract and comprehend the trace links among software artifacts introduced across the software life cycle, to provide significant support for software engineering tasks. Despite its proven benefits, software traceability is challenging to recover and maintain manually. Hence, plenty of approaches for automated traceability have been proposed. Most rely on textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, artifacts in different abstraction levels usually have different textual descriptions, which can greatly hinder the performance of IR-based approaches (e.g., a requirement in natural language may have a small textual similarity to a Java class). In this work, we leverage the consensual biterms and transitive relationships (i.e., inner- and outer-transitive links) based on intermediate artifacts to improve IR-based traceability recovery. We first extract and filter biterms from all source, intermediate, and target artifacts. We then use the consensual biterms from the intermediate artifacts to extend the biterms of both source and target artifacts, and finally deduce outer and inner-transitive links to adjust text similarities between source and target artifacts. We conducted a comprehensive empirical evaluation based on five systems widely used in other literature to show that our approach can outperform four state-of-the-art approaches, and how its performance is affected by different conditions of source, intermediate, and target artifacts. The results indicate that our approach can outperform baseline approaches in AP over 15% and MAP over 10% on average.
Paper Structure (30 sections, 6 equations, 5 figures, 4 tables)

This paper contains 30 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Motivating example adapted from the Dronology system with the following three consensual biterms: (apply, operation), (assign, route), and (select, UAV)
  • Figure 2: Overview of the TRIAD framework
  • Figure 3: Stanford CoreNLP parse result for the sentence in DD-647.
  • Figure 4: Applying TRIAD in the motivating scenario depicted in Figure \ref{['fig:case']}
  • Figure 5: Precision/Recall curves grouped by evaluated systems and IR models (VSM, LSI, and JS).