Boundary State Generation for Testing and Improvement of Autonomous Driving Systems
Matteo Biagiola, Paolo Tonella
TL;DR
This work tackles the problem of dependable autonomous driving system testing under resource constraints by focusing on boundary conditions within a fixed environment. It introduces GENBO, a boundary-state generator that mutates the ego-vehicle state to locate boundary state pairs and uses a binary-search-based strategy to cross the decision boundary efficiently. The authors demonstrate that boundary-state pairs exist even for well-trained systems, that these pairs discriminate model quality, and that retraining with expert-labeled boundary data yields substantial improvements in evaluation performance (up to 3x on average). By avoiding environment changes and instead mutating driving conditions, GENBO offers a data-efficient approach to expose hidden failure modes and enhance generalization through targeted retraining.
Abstract
Recent advances in Deep Neural Networks (DNNs) and sensor technologies are enabling autonomous driving systems (ADSs) with an ever-increasing level of autonomy. However, assessing their dependability remains a critical concern. State-of-the-art ADS testing approaches modify the controllable attributes of a simulated driving environment until the ADS misbehaves. In such approaches, environment instances in which the ADS is successful are discarded, despite the possibility that they could contain hidden driving conditions in which the ADS may misbehave. In this paper, we present GENBO (GENerator of BOundary state pairs), a novel test generator for ADS testing. GENBO mutates the driving conditions of the ego vehicle (position, velocity and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (i.e., where the model starts to misbehave) in the same environment instance. We use such boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has, on average, up to 3x higher success rate on a separate set of evaluation tracks with respect to the original DNN model.
