Adaptive Random Testing with Q-grams: The Illusion Comes True
Matteo Biagiola, Robert Feldt, Paolo Tonella
TL;DR
The paper tackles the scalability bottleneck of Adaptive Random Testing (ART) arising from quadratic distance computations. It introduces a general ART framework that replaces pairwise distances with an incremental aggregation, instantiated via q-gram counts and entropy to measure diversity, achieving linear-time diversity computations. Theoretical analysis and experiments show that ART with q-gram aggregation substantially improves fault detection and coverage, especially for low-failure-rate and medium-to-high-complexity programs, while maintaining practical runtimes. The findings demonstrate significant practical impact for applying ART to real-world web applications and point to future work on alternative diversity measures and broader testing tasks.
Abstract
Adaptive Random Testing (ART) has faced criticism, particularly for its computational inefficiency, as highlighted by Arcuri and Briand. Their analysis clarified how ART requires a quadratic number of distance computations as the number of test executions increases, which limits its scalability in scenarios requiring extensive testing to uncover faults. Simulation results support this, showing that the computational overhead of these distance calculations often outweighs ART's benefits. While various ART variants have attempted to reduce these costs, they frequently do so at the expense of fault detection, lack complexity guarantees, or are restricted to specific input types, such as numerical or discrete data. In this paper, we introduce a novel framework for adaptive random testing that replaces pairwise distance computations with a compact aggregation of past executions, such as counting the q-grams observed in previous runs. Test case selection then leverages this aggregated data to measure diversity (e.g., entropy of q-grams), allowing us to reduce the computational complexity from quadratic to linear. Experiments with a benchmark of six web applications, show that ART with q-grams covers, on average, 4x more unique targets than random testing, and 3.5x more than ART using traditional distance-based methods.
