Bridging the Gap between Real-world and Synthetic Images for Testing Autonomous Driving Systems
Mohammad Hossein Amini, Shiva Nejati
TL;DR
This work tackles the mismatch between real-world training data and simulator test images for autonomous driving DNNs by evaluating three domain-to-domain translators (CycleGAN, neural style transfer, SAEVAE). It assesses their impact on offline and online testing across lane keeping and object detection, using a rigorous set of data-quality and fault-revealing metrics. The findings show translators, particularly SAEVAE, substantially bridge distribution gaps, improve offline and online fault detection, preserve test-data quality, and impose minimal online-time overhead, while also increasing the correlation between offline and online results. These results support integrating SAEVAE into ADS testing workflows to achieve more reliable, scalable, and cost-effective testing, and the authors provide replication materials to enable broader reuse.
Abstract
Deep Neural Networks (DNNs) for Autonomous Driving Systems (ADS) are typically trained on real-world images and tested using synthetic simulator images. This approach results in training and test datasets with dissimilar distributions, which can potentially lead to erroneously decreased test accuracy. To address this issue, the literature suggests applying domain-to-domain translators to test datasets to bring them closer to the training datasets. However, translating images used for testing may unpredictably affect the reliability, effectiveness and efficiency of the testing process. Hence, this paper investigates the following questions in the context of ADS: Could translators reduce the effectiveness of images used for ADS-DNN testing and their ability to reveal faults in ADS-DNNs? Can translators result in excessive time overhead during simulation-based testing? To address these questions, we consider three domain-to-domain translators: CycleGAN and neural style transfer, from the literature, and SAEVAE, our proposed translator. Our results for two critical ADS tasks -- lane keeping and object detection -- indicate that translators significantly narrow the gap in ADS test accuracy caused by distribution dissimilarities between training and test data, with SAEVAE outperforming the other two translators. We show that, based on the recent diversity, coverage, and fault-revealing ability metrics for testing deep-learning systems, translators do not compromise the diversity and the coverage of test data, nor do they lead to revealing fewer faults in ADS-DNNs. Further, among the translators considered, SAEVAE incurs a negligible overhead in simulation time and can be efficiently integrated into simulation-based testing. Finally, we show that translators increase the correlation between offline and simulation-based testing results, which can help reduce the cost of simulation-based testing.
