Table of Contents
Fetching ...

Adaptive Random Testing with Q-grams: The Illusion Comes True

Matteo Biagiola, Robert Feldt, Paolo Tonella

TL;DR

The paper tackles the scalability bottleneck of Adaptive Random Testing (ART) arising from quadratic distance computations. It introduces a general ART framework that replaces pairwise distances with an incremental aggregation, instantiated via q-gram counts and entropy to measure diversity, achieving linear-time diversity computations. Theoretical analysis and experiments show that ART with q-gram aggregation substantially improves fault detection and coverage, especially for low-failure-rate and medium-to-high-complexity programs, while maintaining practical runtimes. The findings demonstrate significant practical impact for applying ART to real-world web applications and point to future work on alternative diversity measures and broader testing tasks.

Abstract

Adaptive Random Testing (ART) has faced criticism, particularly for its computational inefficiency, as highlighted by Arcuri and Briand. Their analysis clarified how ART requires a quadratic number of distance computations as the number of test executions increases, which limits its scalability in scenarios requiring extensive testing to uncover faults. Simulation results support this, showing that the computational overhead of these distance calculations often outweighs ART's benefits. While various ART variants have attempted to reduce these costs, they frequently do so at the expense of fault detection, lack complexity guarantees, or are restricted to specific input types, such as numerical or discrete data. In this paper, we introduce a novel framework for adaptive random testing that replaces pairwise distance computations with a compact aggregation of past executions, such as counting the q-grams observed in previous runs. Test case selection then leverages this aggregated data to measure diversity (e.g., entropy of q-grams), allowing us to reduce the computational complexity from quadratic to linear. Experiments with a benchmark of six web applications, show that ART with q-grams covers, on average, 4x more unique targets than random testing, and 3.5x more than ART using traditional distance-based methods.

Adaptive Random Testing with Q-grams: The Illusion Comes True

TL;DR

The paper tackles the scalability bottleneck of Adaptive Random Testing (ART) arising from quadratic distance computations. It introduces a general ART framework that replaces pairwise distances with an incremental aggregation, instantiated via q-gram counts and entropy to measure diversity, achieving linear-time diversity computations. Theoretical analysis and experiments show that ART with q-gram aggregation substantially improves fault detection and coverage, especially for low-failure-rate and medium-to-high-complexity programs, while maintaining practical runtimes. The findings demonstrate significant practical impact for applying ART to real-world web applications and point to future work on alternative diversity measures and broader testing tasks.

Abstract

Adaptive Random Testing (ART) has faced criticism, particularly for its computational inefficiency, as highlighted by Arcuri and Briand. Their analysis clarified how ART requires a quadratic number of distance computations as the number of test executions increases, which limits its scalability in scenarios requiring extensive testing to uncover faults. Simulation results support this, showing that the computational overhead of these distance calculations often outweighs ART's benefits. While various ART variants have attempted to reduce these costs, they frequently do so at the expense of fault detection, lack complexity guarantees, or are restricted to specific input types, such as numerical or discrete data. In this paper, we introduce a novel framework for adaptive random testing that replaces pairwise distance computations with a compact aggregation of past executions, such as counting the q-grams observed in previous runs. Test case selection then leverages this aggregated data to measure diversity (e.g., entropy of q-grams), allowing us to reduce the computational complexity from quadratic to linear. Experiments with a benchmark of six web applications, show that ART with q-grams covers, on average, 4x more unique targets than random testing, and 3.5x more than ART using traditional distance-based methods.

Paper Structure

This paper contains 21 sections, 1 equation, 3 figures, 5 tables, 2 algorithms.

Figures (3)

  • Figure 1: Portion of the navigation model of the petclinic web application. Nodes represent pages of the web application, while edges represent the actions that bring the application from one page to another. The home page of the web application is highlighted in blue, while the path highlighted in bold represents an example of a feasible navigation path.
  • Figure 2: Coverage over number of executed tests for three subjects and all techniques Rand, Dist, q-gramss, q-gramss+i. Solid lines represent the average over five repetitions, while the shaded areas around them represent the standard error of the mean. Curves are padded with the respective last values, such that all techniques have curves with the same number of points. The left-hand side shows the coverage trend considering all executed test cases, with a zoomed-in view of the latest half of the executed test cases on the right. Best viewed in color.
  • Figure 3: Length of the selected tests over the number of executed tests for three subjects and all techniques Rand, Dist, q-gramss, q-gramss+i. The length (y-axis) is expressed as number of statements. Solid lines represent the average over five repetitions, while the shaded areas around them represent the standard error of the mean. Each point is smoothed with a window size of 100, such that the trend is more visible. Curves are truncated at the minimum number of executed tests across techniques, to display the same number of points for all techniques. Best viewed in color.