Search-based DNN Testing and Retraining with GAN-enhanced Simulations

Mohammed Oualid Attaoui; Fabrizio Pastore; Lionel Briand

Search-based DNN Testing and Retraining with GAN-enhanced Simulations

Mohammed Oualid Attaoui, Fabrizio Pastore, Lionel Briand

TL;DR

This work addresses the fidelity gap between simulators and real-world data in safety-critical DNN testing by introducing DESIGNATE, a framework that couples meta-heuristic search with GAN-enhanced simulations to generate realistic, diverse test inputs. It uses two fitness objectives—accuracy (grounded in IoU metrics) and diversity (via feature-based image representations from a ResNet50)—and employs NSGA-II with an archive to explore diverse failure-inducing scenarios. Empirical results in urban road segmentation and Martian terrain detection show DESIGNATE outperforms baselines and state-of-the-art testing methods in both discovering failure cases and enabling more effective retraining, with GAN-generated inputs yielding the most substantial gains. The findings support integrating GANs with simulators for robust DNN testing and retraining in vision-based safety-critical systems, suggesting broader applications in automotive, robotics, and space domains where high-fidelity realism is crucial.

Abstract

In safety-critical systems (e.g., autonomous vehicles and robots), Deep Neural Networks (DNNs) are becoming a key component for computer vision tasks, particularly semantic segmentation. Further, since the DNN behavior cannot be assessed through code inspection and analysis, test automation has become an essential activity to gain confidence in the reliability of DNNs. Unfortunately, state-of-the-art automated testing solutions largely rely on simulators, whose fidelity is always imperfect, thus affecting the validity of test results. To address such limitations, we propose to combine meta-heuristic search, used to explore the input space using simulators, with Generative Adversarial Networks (GANs), to transform the data generated by simulators into realistic input images. Such images can be used both to assess the DNN performance and to retrain the DNN more effectively. We applied our approach to a state-of-the-art DNN performing semantic segmentation and demonstrated that it outperforms a state-of-the-art GAN-based testing solution and several baselines. Specifically, it leads to the largest number of diverse images leading to the worst DNN performance. Further, the images generated with our approach, lead to the highest improvement in DNN performance when used for retraining. In conclusion, we suggest to always integrate GAN components when performing search-driven, simulator-based testing.

Search-based DNN Testing and Retraining with GAN-enhanced Simulations

TL;DR

Abstract

Paper Structure (26 sections, 10 equations, 5 figures, 13 tables, 2 algorithms)

This paper contains 26 sections, 10 equations, 5 figures, 13 tables, 2 algorithms.

Introduction
Background
Terminology
Semantic Segmentation
Simulated Environments
AirSim
MarsSim
GAN-based Image-to-image Translation
The DESIGNATE Approach
GAN-based input generation
Fitness Functions
Accuracy fitness
Diversity fitness
Search algorithm
Empirical Evaluation
...and 11 more sections

Figures (5)

Figure 1: Example of images from the cityscapes dataset showing the same situation (a turning road, buildings on the side along with parked cars).
Figure 2: Examples of simulator image, ground truth (i.e., segmentation map provided by the simulator), and realistic image generated by Pix2PixHD from the ground truth of AirSim.
Figure 3: Examples of a Mars simulator's simulated image, its ground truth, and the realistic image generated from it by Pix2pixHD.
Figure 4: Overview of DESIGNATE.
Figure 5: Examples of images generated by TACTIC for various weather conditions (right), along with the original cityscapes image (left) and the median $IoU_{car}$ of the generated images for each respective weather condition.

Search-based DNN Testing and Retraining with GAN-enhanced Simulations

TL;DR

Abstract

Search-based DNN Testing and Retraining with GAN-enhanced Simulations

Authors

TL;DR

Abstract

Table of Contents

Figures (5)