Representation Improvement in Latent Space for Search-Based Testing of Autonomous Robotic Systems
Dmytro Humeniuk, Foutse Khomh
TL;DR
RILaST introduces a latent-space representation strategy for search-based testing of autonomous robotic systems by learning a variational autoencoder on a dataset of test scenarios and performing GA-driven search in the latent space. The approach yields significantly more failure-revealing tests and maintains diversity across two use cases (UAV obstacle avoidance and LKAS road topology) than baseline methods, while providing insights into latent dimension interpretability and hyperparameter effects. The study demonstrates that latent-space optimization can reduce reliance on hand-crafted search operators and can incorporate lightweight surrogate guidance to steer data collection. Overheads are primarily in dataset collection and VAE training, but inference remains fast and parallelizable, making RILaST attractive for scalable SBST pipelines in practice. The results underscore the potential of learned representations to improve the efficiency and effectiveness of SBST for complex, real-time robotic systems.
Abstract
Testing autonomous robotic systems, such as self-driving cars and unmanned aerial vehicles, is challenging due to their interaction with highly unpredictable environments. A common practice is to first conduct simulation-based testing, which, despite reducing real-world risks, remains time-consuming and resource-intensive due to the vast space of possible test scenarios. A number of search-based approaches were proposed to generate test scenarios more efficiently. A key aspect of any search-based test generation approach is the choice of representation used during the search process. However, existing methods for improving test scenario representation remain limited. We propose RILaST (Representation Improvement in Latent Space for Search-Based Testing) approach, which enhances test representation by mapping it to the latent space of a variational autoencoder. We evaluate RILaST on two use cases, including autonomous drone and autonomous lane-keeping assist system. The obtained results show that RILaST allows finding between 3 to 4.6 times more failures than baseline approaches, achieving a high level of test diversity.
