Table of Contents
Fetching ...

Learning Ensembles of Vision-based Safety Control Filters

Ihab Tabbara, Hussein Sibai

TL;DR

The paper tackles the challenge of ensuring safety in vision-based control by proposing ensemble-based safety filters. It leverages pre-trained vision representations (CLIP and VC1) to build diverse member models, fuses multi-camera inputs with attention, and applies three aggregation strategies (weighted averaging, majority voting, and consensus) to improve state and action safety classification. Experimental results on the DeepAccident dataset show that diverse ensembles achieve higher accuracy and better out-of-distribution generalization than individual models and larger single models, with multi-backbone ensembles and majority voting often delivering the best performance. This work suggests ensembles as a practical path toward more reliable vision-based safety filters, while acknowledging that formal verification of such learned filters remains a challenging open problem.

Abstract

Safety filters in control systems correct nominal controls that violate safety constraints. Designing such filters as functions of visual observations in uncertain and complex environments is challenging. Several deep learning-based approaches to tackle this challenge have been proposed recently. However, formally verifying that the learned filters satisfy critical properties that enable them to guarantee the safety of the system is currently beyond reach. Instead, in this work, motivated by the success of ensemble methods in reinforcement learning, we empirically investigate the efficacy of ensembles in enhancing the accuracy and the out-of-distribution generalization of such filters, as a step towards more reliable ones. We experiment with diverse pre-trained vision representation models as filter backbones, training approaches, and output aggregation techniques. We compare the performance of ensembles with different configurations against each other, their individual member models, and large single-model baselines in distinguishing between safe and unsafe states and controls in the DeepAccident dataset. Our results show that diverse ensembles have better state and control classification accuracies compared to individual models.

Learning Ensembles of Vision-based Safety Control Filters

TL;DR

The paper tackles the challenge of ensuring safety in vision-based control by proposing ensemble-based safety filters. It leverages pre-trained vision representations (CLIP and VC1) to build diverse member models, fuses multi-camera inputs with attention, and applies three aggregation strategies (weighted averaging, majority voting, and consensus) to improve state and action safety classification. Experimental results on the DeepAccident dataset show that diverse ensembles achieve higher accuracy and better out-of-distribution generalization than individual models and larger single models, with multi-backbone ensembles and majority voting often delivering the best performance. This work suggests ensembles as a practical path toward more reliable vision-based safety filters, while acknowledging that formal verification of such learned filters remains a challenging open problem.

Abstract

Safety filters in control systems correct nominal controls that violate safety constraints. Designing such filters as functions of visual observations in uncertain and complex environments is challenging. Several deep learning-based approaches to tackle this challenge have been proposed recently. However, formally verifying that the learned filters satisfy critical properties that enable them to guarantee the safety of the system is currently beyond reach. Instead, in this work, motivated by the success of ensemble methods in reinforcement learning, we empirically investigate the efficacy of ensembles in enhancing the accuracy and the out-of-distribution generalization of such filters, as a step towards more reliable ones. We experiment with diverse pre-trained vision representation models as filter backbones, training approaches, and output aggregation techniques. We compare the performance of ensembles with different configurations against each other, their individual member models, and large single-model baselines in distinguishing between safe and unsafe states and controls in the DeepAccident dataset. Our results show that diverse ensembles have better state and control classification accuracies compared to individual models.

Paper Structure

This paper contains 22 sections, 1 theorem, 3 equations, 2 tables.

Key Result

Theorem 3.3

Any Lipschitz continuous control policy $\pi: \mathcal{D} \rightarrow \mathcal{U}$ where $\forall x, \pi(x) \in \{u \in \mathcal{U} : \nabla B(x)(f(x) + g(x)u) + \gamma(B(x)) \geq 0\}$ renders $B_{\geq 0}$ forward invariant.

Theorems & Definitions (3)

  • Definition 3.1: Control-affine control systems
  • Definition 3.2: Control barrier functions (cbf_overview_2018)
  • Theorem 3.3: cbf_overview_2018