Table of Contents
Fetching ...

Search-based DNN Testing and Retraining with GAN-enhanced Simulations

Mohammed Oualid Attaoui, Fabrizio Pastore, Lionel Briand

TL;DR

This work addresses the fidelity gap between simulators and real-world data in safety-critical DNN testing by introducing DESIGNATE, a framework that couples meta-heuristic search with GAN-enhanced simulations to generate realistic, diverse test inputs. It uses two fitness objectives—accuracy (grounded in IoU metrics) and diversity (via feature-based image representations from a ResNet50)—and employs NSGA-II with an archive to explore diverse failure-inducing scenarios. Empirical results in urban road segmentation and Martian terrain detection show DESIGNATE outperforms baselines and state-of-the-art testing methods in both discovering failure cases and enabling more effective retraining, with GAN-generated inputs yielding the most substantial gains. The findings support integrating GANs with simulators for robust DNN testing and retraining in vision-based safety-critical systems, suggesting broader applications in automotive, robotics, and space domains where high-fidelity realism is crucial.

Abstract

In safety-critical systems (e.g., autonomous vehicles and robots), Deep Neural Networks (DNNs) are becoming a key component for computer vision tasks, particularly semantic segmentation. Further, since the DNN behavior cannot be assessed through code inspection and analysis, test automation has become an essential activity to gain confidence in the reliability of DNNs. Unfortunately, state-of-the-art automated testing solutions largely rely on simulators, whose fidelity is always imperfect, thus affecting the validity of test results. To address such limitations, we propose to combine meta-heuristic search, used to explore the input space using simulators, with Generative Adversarial Networks (GANs), to transform the data generated by simulators into realistic input images. Such images can be used both to assess the DNN performance and to retrain the DNN more effectively. We applied our approach to a state-of-the-art DNN performing semantic segmentation and demonstrated that it outperforms a state-of-the-art GAN-based testing solution and several baselines. Specifically, it leads to the largest number of diverse images leading to the worst DNN performance. Further, the images generated with our approach, lead to the highest improvement in DNN performance when used for retraining. In conclusion, we suggest to always integrate GAN components when performing search-driven, simulator-based testing.

Search-based DNN Testing and Retraining with GAN-enhanced Simulations

TL;DR

This work addresses the fidelity gap between simulators and real-world data in safety-critical DNN testing by introducing DESIGNATE, a framework that couples meta-heuristic search with GAN-enhanced simulations to generate realistic, diverse test inputs. It uses two fitness objectives—accuracy (grounded in IoU metrics) and diversity (via feature-based image representations from a ResNet50)—and employs NSGA-II with an archive to explore diverse failure-inducing scenarios. Empirical results in urban road segmentation and Martian terrain detection show DESIGNATE outperforms baselines and state-of-the-art testing methods in both discovering failure cases and enabling more effective retraining, with GAN-generated inputs yielding the most substantial gains. The findings support integrating GANs with simulators for robust DNN testing and retraining in vision-based safety-critical systems, suggesting broader applications in automotive, robotics, and space domains where high-fidelity realism is crucial.

Abstract

In safety-critical systems (e.g., autonomous vehicles and robots), Deep Neural Networks (DNNs) are becoming a key component for computer vision tasks, particularly semantic segmentation. Further, since the DNN behavior cannot be assessed through code inspection and analysis, test automation has become an essential activity to gain confidence in the reliability of DNNs. Unfortunately, state-of-the-art automated testing solutions largely rely on simulators, whose fidelity is always imperfect, thus affecting the validity of test results. To address such limitations, we propose to combine meta-heuristic search, used to explore the input space using simulators, with Generative Adversarial Networks (GANs), to transform the data generated by simulators into realistic input images. Such images can be used both to assess the DNN performance and to retrain the DNN more effectively. We applied our approach to a state-of-the-art DNN performing semantic segmentation and demonstrated that it outperforms a state-of-the-art GAN-based testing solution and several baselines. Specifically, it leads to the largest number of diverse images leading to the worst DNN performance. Further, the images generated with our approach, lead to the highest improvement in DNN performance when used for retraining. In conclusion, we suggest to always integrate GAN components when performing search-driven, simulator-based testing.
Paper Structure (26 sections, 10 equations, 5 figures, 13 tables, 2 algorithms)

This paper contains 26 sections, 10 equations, 5 figures, 13 tables, 2 algorithms.

Figures (5)

  • Figure 1: Example of images from the cityscapes dataset showing the same situation (a turning road, buildings on the side along with parked cars).
  • Figure 2: Examples of simulator image, ground truth (i.e., segmentation map provided by the simulator), and realistic image generated by Pix2PixHD from the ground truth of AirSim.
  • Figure 3: Examples of a Mars simulator's simulated image, its ground truth, and the realistic image generated from it by Pix2pixHD.
  • Figure 4: Overview of DESIGNATE.
  • Figure 5: Examples of images generated by TACTIC for various weather conditions (right), along with the original cityscapes image (left) and the median $IoU_{car}$ of the generated images for each respective weather condition.