BugLens: Leveraging Bisection for Lightweight Compiler Bug Deduplication
Xintong Zhou, Zhenyang Xu, Yongqiang Tian, Chengnian Sun
TL;DR
This paper tackles the bug deduplication problem in compiler testing by evaluating whether bisection, a simple debugging technique, can effectively identify unique miscompilation bugs exposed by random testing. It introduces BugLens, a bisection-based deduplication approach that uses failure-inducing commits as the primary criterion and augments this with bug-triggering optimizations to mitigate false negatives. Empirical results across four real-world GCC/LLVM datasets show BugLens significantly reduces human effort compared with state-of-the-art analysis-based methods (Tamer and D3), while maintaining strong generality and practical efficiency. The work demonstrates that a lightweight, generalizable strategy can outperform more complex, tool-heavy approaches, highlighting the value of re-evaluating simple techniques for real-world compiler debugging tasks.
Abstract
Random testing has proven to be an effective technique for compiler validation. However, the debugging of bugs identified through random testing presents a significant challenge due to the frequent occurrence of duplicate test programs that expose identical compiler bugs. The process to identify duplicates is a practical research problem known as bug deduplication. Prior methodologies for compiler bug deduplication primarily rely on program analysis to extract bug-related features for duplicate identification, which can result in substantial computational overhead and limited generalizability. This paper investigates the feasibility of employing bisection, a standard debugging procedure largely overlooked in prior research on compiler bug deduplication, for this purpose. Our study demonstrates that the utilization of bisection to locate failure-inducing commits provides a valuable criterion for deduplication, albeit one that requires supplementary techniques for more accurate identification. Building on these results, we introduce BugLens, a novel deduplication method that primarily uses bisection, enhanced by the identification of bug-triggering optimizations to minimize false negatives. Empirical evaluations conducted on four real-world datasets demonstrate that BugLens significantly outperforms the state-of-the-art analysis-based methodologies Tamer and D3 by saving an average of 26.98% and 9.64% human effort to identify the same number of distinct bugs. Given the inherent simplicity and generalizability of bisection, it presents a highly practical solution for compiler bug deduplication in real-world applications.
