Table of Contents
Fetching ...

GAN-enhanced Simulation-driven DNN Testing in Absence of Ground Truth

Mohammed Attaoui, Fabrizio Pastore

TL;DR

This work addresses the oracle problem in simulator-driven DNN testing by introducing ORBIT, a GT-free framework that combines GAN-enhanced input generation with oracle-inspired fitness (Flipping, Noise, Surprise Adequacy, and MCD) and NSGA-II search. By replacing ground-truth-based guidance with transformation-consistency and uncertainty metrics, ORBIT achieves test input diversity and can drive retraining effectively, approaching the performance of GT-based DESIGNATE in many settings. Empirical results on a Mars-segmentation task show that GT-free fitnesses, particularly Flipping and SA, yield strong testing signals and meaningful retraining gains, with CycleGAN enabling GT-free realism without hurting outcomes. The findings highlight a practical path to cost-effective, ground-truth-free testing—and suggest that diffusion models and LLMs could further enhance GT-free testing paradigms in the future.

Abstract

The generation of synthetic inputs via simulators driven by search algorithms is essential for cost-effective testing of Deep Neural Network (DNN) components for safety-critical systems. However, in many applications, simulators are unable to produce the ground-truth data needed for automated test oracles and to guide the search process. To tackle this issue, we propose an approach for the generation of inputs for computer vision DNNs that integrates a generative network to ensure simulator fidelity and employs heuristic-based search fitnesses that leverage transformation consistency, noise resistance, surprise adequacy, and uncertainty estimation. We compare the performance of our fitnesses with that of a traditional fitness function leveraging ground truth; further, we assess how the integration of a GAN not leveraging the ground truth impacts on test and retraining effectiveness. Our results suggest that leveraging transformation consistency is the best option to generate inputs for both DNN testing and retraining; it maximizes input diversity, spots the inputs leading to worse DNN performance, and leads to best DNN performance after retraining. Besides enabling simulator-based testing in the absence of ground truth, our findings pave the way for testing solutions that replace costly simulators with diffusion and large language models, which might be more affordable than simulators, but cannot generate ground-truth data.

GAN-enhanced Simulation-driven DNN Testing in Absence of Ground Truth

TL;DR

This work addresses the oracle problem in simulator-driven DNN testing by introducing ORBIT, a GT-free framework that combines GAN-enhanced input generation with oracle-inspired fitness (Flipping, Noise, Surprise Adequacy, and MCD) and NSGA-II search. By replacing ground-truth-based guidance with transformation-consistency and uncertainty metrics, ORBIT achieves test input diversity and can drive retraining effectively, approaching the performance of GT-based DESIGNATE in many settings. Empirical results on a Mars-segmentation task show that GT-free fitnesses, particularly Flipping and SA, yield strong testing signals and meaningful retraining gains, with CycleGAN enabling GT-free realism without hurting outcomes. The findings highlight a practical path to cost-effective, ground-truth-free testing—and suggest that diffusion models and LLMs could further enhance GT-free testing paradigms in the future.

Abstract

The generation of synthetic inputs via simulators driven by search algorithms is essential for cost-effective testing of Deep Neural Network (DNN) components for safety-critical systems. However, in many applications, simulators are unable to produce the ground-truth data needed for automated test oracles and to guide the search process. To tackle this issue, we propose an approach for the generation of inputs for computer vision DNNs that integrates a generative network to ensure simulator fidelity and employs heuristic-based search fitnesses that leverage transformation consistency, noise resistance, surprise adequacy, and uncertainty estimation. We compare the performance of our fitnesses with that of a traditional fitness function leveraging ground truth; further, we assess how the integration of a GAN not leveraging the ground truth impacts on test and retraining effectiveness. Our results suggest that leveraging transformation consistency is the best option to generate inputs for both DNN testing and retraining; it maximizes input diversity, spots the inputs leading to worse DNN performance, and leads to best DNN performance after retraining. Besides enabling simulator-based testing in the absence of ground truth, our findings pave the way for testing solutions that replace costly simulators with diffusion and large language models, which might be more affordable than simulators, but cannot generate ground-truth data.

Paper Structure

This paper contains 41 sections, 8 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Overview of DESIGNATE.
  • Figure 2: Examples of a Mars simulator's simulated image, its ground truth, and the realistic image generated from it by Pix2PixHD.
  • Figure 3: Overview of ORBIT.
  • Figure 4: Fitness computation using the Flipping metric
  • Figure 5: Fitness computation using the Noise metric
  • ...and 3 more figures