Learning Ensembles of Vision-based Safety Control Filters
Ihab Tabbara, Hussein Sibai
TL;DR
The paper tackles the challenge of ensuring safety in vision-based control by proposing ensemble-based safety filters. It leverages pre-trained vision representations (CLIP and VC1) to build diverse member models, fuses multi-camera inputs with attention, and applies three aggregation strategies (weighted averaging, majority voting, and consensus) to improve state and action safety classification. Experimental results on the DeepAccident dataset show that diverse ensembles achieve higher accuracy and better out-of-distribution generalization than individual models and larger single models, with multi-backbone ensembles and majority voting often delivering the best performance. This work suggests ensembles as a practical path toward more reliable vision-based safety filters, while acknowledging that formal verification of such learned filters remains a challenging open problem.
Abstract
Safety filters in control systems correct nominal controls that violate safety constraints. Designing such filters as functions of visual observations in uncertain and complex environments is challenging. Several deep learning-based approaches to tackle this challenge have been proposed recently. However, formally verifying that the learned filters satisfy critical properties that enable them to guarantee the safety of the system is currently beyond reach. Instead, in this work, motivated by the success of ensemble methods in reinforcement learning, we empirically investigate the efficacy of ensembles in enhancing the accuracy and the out-of-distribution generalization of such filters, as a step towards more reliable ones. We experiment with diverse pre-trained vision representation models as filter backbones, training approaches, and output aggregation techniques. We compare the performance of ensembles with different configurations against each other, their individual member models, and large single-model baselines in distinguishing between safe and unsafe states and controls in the DeepAccident dataset. Our results show that diverse ensembles have better state and control classification accuracies compared to individual models.
