Table of Contents
Fetching ...

Using Fourier Analysis and Mutant Clustering to Accelerate DNN Mutation Testing

Ali Ghanbari, Sasan Tavakkol

TL;DR

This paper addresses the high cost of DNN mutation testing by proposing DM#, an FFT-based accelerator that characterizes mutant behavior from small samples. DM# computes FFT spectra of mutant outputs, defines a similarity measure, and clusters mutants to test only representatives, reusing results for the rest. Across 14 models and multiple baselines, DM# achieves about 28.38% end-to-end speed-up with only 0.72% mutation score error on average, and outperforms baselines in mutation-score accuracy. The approach demonstrates that FFT-based signatures preserve behavioral fidelity and enables scalable mutation testing with practical applicability to real-world DNNs, while maintaining reliable predictive clustering.

Abstract

Deep neural network (DNN) mutation analysis is a promising approach to evaluating test set adequacy. Due to the large number of generated mutants that must be tested on large datasets, mutation analysis is costly. In this paper, we present a technique, named DM#, for accelerating DNN mutation testing using Fourier analysis. The key insight is that DNN outputs are real-valued functions suitable for Fourier analysis that can be leveraged to quantify mutant behavior using only a few data points. DM# uses the quantified mutant behavior to cluster the mutants so that the ones with similar behavior fall into the same group. A representative from each group is then selected for testing, and the result of the test, e.g., whether the mutant is killed or survived, is reused for all other mutants represented by the selected mutant, obviating the need for testing other mutants. 14 DNN models of sizes ranging from thousands to millions of parameters, trained on different datasets, are used to evaluate DM# and compare it to several baseline techniques. Our results provide empirical evidence on the effectiveness of DM# in accelerating mutation testing by 28.38%, on average, at the average cost of only 0.72% error in mutation score. Moreover, on average, DM# incurs 11.78, 15.16, and 114.36 times less mutation score error compared to random mutant selection, boundary sample selection, and random sample selection techniques, respectively, while generally offering comparable speed-up.

Using Fourier Analysis and Mutant Clustering to Accelerate DNN Mutation Testing

TL;DR

This paper addresses the high cost of DNN mutation testing by proposing DM#, an FFT-based accelerator that characterizes mutant behavior from small samples. DM# computes FFT spectra of mutant outputs, defines a similarity measure, and clusters mutants to test only representatives, reusing results for the rest. Across 14 models and multiple baselines, DM# achieves about 28.38% end-to-end speed-up with only 0.72% mutation score error on average, and outperforms baselines in mutation-score accuracy. The approach demonstrates that FFT-based signatures preserve behavioral fidelity and enables scalable mutation testing with practical applicability to real-world DNNs, while maintaining reliable predictive clustering.

Abstract

Deep neural network (DNN) mutation analysis is a promising approach to evaluating test set adequacy. Due to the large number of generated mutants that must be tested on large datasets, mutation analysis is costly. In this paper, we present a technique, named DM#, for accelerating DNN mutation testing using Fourier analysis. The key insight is that DNN outputs are real-valued functions suitable for Fourier analysis that can be leveraged to quantify mutant behavior using only a few data points. DM# uses the quantified mutant behavior to cluster the mutants so that the ones with similar behavior fall into the same group. A representative from each group is then selected for testing, and the result of the test, e.g., whether the mutant is killed or survived, is reused for all other mutants represented by the selected mutant, obviating the need for testing other mutants. 14 DNN models of sizes ranging from thousands to millions of parameters, trained on different datasets, are used to evaluate DM# and compare it to several baseline techniques. Our results provide empirical evidence on the effectiveness of DM# in accelerating mutation testing by 28.38%, on average, at the average cost of only 0.72% error in mutation score. Moreover, on average, DM# incurs 11.78, 15.16, and 114.36 times less mutation score error compared to random mutant selection, boundary sample selection, and random sample selection techniques, respectively, while generally offering comparable speed-up.

Paper Structure

This paper contains 23 sections, 1 equation, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Row 1: outputs 2 and 5 of a DNN classifier trained on MNIST dataset in columns (a) and (b), respectively. Rows 2-4: Fourier approximation of the two functions using 5, 10, and 100 terms, respectively. Last row: bar charts for the first 20 frequency buckets of the FFT for the two functions. The bar charts are annotated with gray guidelines to aid visual comparison of the heights of the bars between two diagrams.
  • Figure 2: A DNN mutation analysis workflow involving DM#. Sharp-edged rectangles denote processes; rounded ones denote data/artifacts. Arrows denote control flow. Gear and lightning bolt icons are used to annotate user-configurable and non-deterministic processes, resp. Processes and artifacts inside the area marked by dashed line are part of DM#.
  • Figure 3: Mutant reduction rate vs. linkage threshold when $x=1$
  • Figure 4: Box-plot visualizing the mutant reduction rates that result in no more than 5% mutation score error and at least 10% mutation testing speed gain
  • Figure : Parameter search procedure in DM#

Theorems & Definitions (1)

  • Definition 1: Mutant Similarity Graph