Validity-Preserving Delta Debugging via Generator Trace Reduction
Luyao Ren, Xing Zhang, Ziyue Hua, Yanyan Jiang, Xiao He, Yingfei Xiong, Tao Xie
TL;DR
This work tackles the validity problem in delta debugging when test inputs must adhere to rich specifications. It introduces GReduce, a generator-based delta debugging framework that reduces the execution trace of a test input generator to produce smaller, valid inputs while preserving bug manifestation. The approach combines trace instrumentation, trace-aware reduction with loop and selection patterns, and trace-aligned re-execution to synthesize reduced inputs efficiently. Empirical evaluation across graphs, deep learning models, JavaScript programs, SymPy, and SmartCheck demonstrates that GReduce substantially outperforms state-of-the-art syntax-based reducers and other baselines in both effectiveness and efficiency, with modest instrumentation overhead. The results indicate broad applicability, robustness, and practical impact for domain-specific test input reduction when inputs are governed by complex specifications.
Abstract
Reducing test inputs that trigger bugs is crucial for efficient debugging. Delta debugging is the most popular approach for this purpose. When test inputs need to conform to certain specifications, existing delta debugging practice encounters a validity problem: it blindly applies reduction rules, producing a large number of invalid test inputs that do not satisfy the required specifications. This overall diminishing effectiveness and efficiency becomes even more pronounced when the specifications extend beyond syntactical structures. Our key insight is that we should leverage input generators, which are aware of these specifications, to generate valid reduced inputs, rather than straightforwardly performing reduction on test inputs. In this paper, we propose a generator-based delta debugging method, namely GReduce, which derives validity-preserving reducers. Specifically, given a generator and its execution, demonstrating how the bug-inducing test input is generated, GReduce searches for other executions on the generator that yield reduced, valid test inputs. The evaluation results on five benchmarks (i.e., graphs, DL models, JavaScript programs, SymPy, and algebraic data types) show that GReduce substantially outperforms state-of-the-art syntax-based reducers including Perses and T-PDD, and also outperforms QuickCheck, SmartCheck, as well as the state-of-the-art choice-sequence-based reducer Hypothesis, demonstrating the effectiveness, efficiency, and versatility of GReduce.
