Table of Contents
Fetching ...

SoK: Systematizing Software Artifacts Traceability via Associations, Techniques, and Applications

Zhifei Chen, Lata Yi, Liming Nie, Yangyang Zhao, Hao Liu, Yiqing Shi, Wei Song

Abstract

Software development relies heavily on traceability links between various software artifacts to ensure quality and facilitate maintenance. While automated traceability recovery techniques have advanced for different artifact pairs, the field remains fragmented with an incomplete overview of artifact associations, ambiguous linking techniques, and fragmented knowledge of application scenarios. To bridge these gaps, we conducted a systematic literature review on software traceability recovery to synthesize the linked artifacts, recovery tools, and usage scenarios across the traceability ecosystem. First, we constructed the first global artifacts traceability graph of 23 associations among 22 artifact types, exposing a severe research imbalance that heavily favors code-related links. Second, while recovery techniques are shifting toward deep semantic models, a reproducibility crisis persists (e.g., only 37% of studies released code); to address this, we provided a comprehensive evaluation framework including a technical decision map and standardized benchmarks. Finally, we quantified an industrial adoption gap (i.e., 95% of tools remain confined to academia) and proposed a role-centric framework to dynamically align artifact paths with concrete engineering activities. This review contributes a coherent knowledge framework for artifacts traceability research, identifies current trends, and provides directions for future work.

SoK: Systematizing Software Artifacts Traceability via Associations, Techniques, and Applications

Abstract

Software development relies heavily on traceability links between various software artifacts to ensure quality and facilitate maintenance. While automated traceability recovery techniques have advanced for different artifact pairs, the field remains fragmented with an incomplete overview of artifact associations, ambiguous linking techniques, and fragmented knowledge of application scenarios. To bridge these gaps, we conducted a systematic literature review on software traceability recovery to synthesize the linked artifacts, recovery tools, and usage scenarios across the traceability ecosystem. First, we constructed the first global artifacts traceability graph of 23 associations among 22 artifact types, exposing a severe research imbalance that heavily favors code-related links. Second, while recovery techniques are shifting toward deep semantic models, a reproducibility crisis persists (e.g., only 37% of studies released code); to address this, we provided a comprehensive evaluation framework including a technical decision map and standardized benchmarks. Finally, we quantified an industrial adoption gap (i.e., 95% of tools remain confined to academia) and proposed a role-centric framework to dynamically align artifact paths with concrete engineering activities. This review contributes a coherent knowledge framework for artifacts traceability research, identifies current trends, and provides directions for future work.
Paper Structure (37 sections, 1 equation, 6 figures, 7 tables)

This paper contains 37 sections, 1 equation, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Overview of the Study Framework.
  • Figure 2: Hierarchical Software Artifacts Diagram. It presents a hierarchical taxonomy of 22 distinct software artifacts identified in the literature, organized into eight functional groups and three granularity layers. In each group, green, blue, and yellow nodes denote the first, second, and third hierarchical layer, respectively.
  • Figure 3: Artifacts Traceability Graph. Background colors represent different artifact groups in Fig. \ref{['fig:tree']}. The dashed lines denote "is-a" relationships, while solid lines represent concrete relationships. The upper-right corner displays the links corresponding to each relationship.
  • Figure 4: Technical Decision Map for Artifacts Traceability Linking. The techniques for each pair of artifact representations are categorized into four groups based on their costs: LLLC (low labor and low computation), LLHC (low labor and high computation), HLLC (high labor and low computation), and HLHC (high labor and high computation).
  • Figure 5: Metrics and Datasets Recommendation for Software Artifact Pairs. This figure summarizes common datasets and minimum/additional metrics criteria for artifact pairs studied in the literature.
  • ...and 1 more figures