On Interpreting the Effectiveness of Unsupervised Software Traceability with Information Theory
David N. Palacio, Daniel Rodriguez-Cardenas, Denys Poshyvanyk, Kevin Moran
TL;DR
The paper addresses the limited effectiveness of unsupervised traceability methods in software engineering by showing that standard evaluation metrics can mislead when data are imbalanced. It introduces TraceXplainer, an information-theoretic framework that uses self-information, mutual information, information loss, noise, and a new minimum shared information metric to diagnose and bound traceability performance across multiple datasets and neural encodings. The empirical results reveal a consistent information-content imbalance between source and target artifacts ($H(Y) obreakdash- obreakdash H(X) \approx 1.48\\mathcal{B}$) and an average mutual information of $4.81\\mathcal{B}$ with loss $1.75\\mathcal{B}$ and noise $0.28\\mathcal{B}$, signaling fundamental limits of unsupervised approaches under current data conditions. The findings offer guidance for artifact refactoring and dataset design to improve traceability and motivate further research at the intersection of information theory and traceability interpretability.
Abstract
Traceability is a cornerstone of modern software development, ensuring system reliability and facilitating software maintenance. While unsupervised techniques leveraging Information Retrieval (IR) and Machine Learning (ML) methods have been widely used for predicting trace links, their effectiveness remains underexplored. In particular, these techniques often assume traceability patterns are present within textual data - a premise that may not hold universally. Moreover, standard evaluation metrics such as precision, recall, accuracy, or F1 measure can misrepresent the model performance when underlying data distributions are not properly analyzed. Given that automated traceability techniques tend to struggle to establish links, we need further insight into the information limits related to traceability artifacts. In this paper, we propose an approach, TraceXplainer, for using information theory metrics to evaluate and better understand the performance (limits) of unsupervised traceability techniques. Specifically, we introduce self-information, cross-entropy, and mutual information (MI) as metrics to measure the informativeness and reliability of traceability links. Through a comprehensive replication and analysis of well-studied datasets and techniques, we investigate the effectiveness of unsupervised techniques that predict traceability links using IR/ML. This application of TraceXplainer illustrates an imbalance in typical traceability datasets where the source code has on average 1.48 more information bits (i.e., entropy) than the linked documentation. Additionally, we demonstrate that an average MI of 4.81 bits, loss of 1.75, and noise of 0.28 bits signify that there are information-theoretic limits on the effectiveness of unsupervised traceability techniques. We hope these findings spur additional research on understanding the limits and progress of traceability research.
