Table of Contents
Fetching ...

COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems against Semantic Attacks

Zijian Huang, Wenda Chu, Linyi Li, Chejian Xu, Bo Li

TL;DR

COMMIT addresses the lack of certified robustness for multi-sensor fusion systems in autonomous driving by introducing an anisotropic smoothing framework and a grid-based partitioning strategy to certify MSFs against semantic transformations such as rotation and shifting. It provides rigorous lower bounds on detection confidence and IoU for large MSF models by integrating median smoothing with heterogeneous multi-modal inputs, enabling pre-deployment certification on CARLA-based benchmarks. Empirical results show MSFs achieve substantial certified robustness improvements over single-modal models, with IoU certification gains up to 53.23% under rotation and corresponding improvements in detection robustness, validating the practical value of certifiable robustness in AV perception. The framework is designed to be architecture-agnostic and is accompanied by a benchmark to evaluate future MSF models, advancing toward provably robust autonomous driving perception systems.

Abstract

Multi-sensor fusion systems (MSFs) play a vital role as the perception module in modern autonomous vehicles (AVs). Therefore, ensuring their robustness against common and realistic adversarial semantic transformations, such as rotation and shifting in the physical world, is crucial for the safety of AVs. While empirical evidence suggests that MSFs exhibit improved robustness compared to single-modal models, they are still vulnerable to adversarial semantic transformations. Despite the proposal of empirical defenses, several works show that these defenses can be attacked again by new adaptive attacks. So far, there is no certified defense proposed for MSFs. In this work, we propose the first robustness certification framework COMMIT certify robustness of multi-sensor fusion systems against semantic attacks. In particular, we propose a practical anisotropic noise mechanism that leverages randomized smoothing with multi-modal data and performs a grid-based splitting method to characterize complex semantic transformations. We also propose efficient algorithms to compute the certification in terms of object detection accuracy and IoU for large-scale MSF models. Empirically, we evaluate the efficacy of COMMIT in different settings and provide a comprehensive benchmark of certified robustness for different MSF models using the CARLA simulation platform. We show that the certification for MSF models is at most 48.39% higher than that of single-modal models, which validates the advantages of MSF models. We believe our certification framework and benchmark will contribute an important step towards certifiably robust AVs in practice.

COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems against Semantic Attacks

TL;DR

COMMIT addresses the lack of certified robustness for multi-sensor fusion systems in autonomous driving by introducing an anisotropic smoothing framework and a grid-based partitioning strategy to certify MSFs against semantic transformations such as rotation and shifting. It provides rigorous lower bounds on detection confidence and IoU for large MSF models by integrating median smoothing with heterogeneous multi-modal inputs, enabling pre-deployment certification on CARLA-based benchmarks. Empirical results show MSFs achieve substantial certified robustness improvements over single-modal models, with IoU certification gains up to 53.23% under rotation and corresponding improvements in detection robustness, validating the practical value of certifiable robustness in AV perception. The framework is designed to be architecture-agnostic and is accompanied by a benchmark to evaluate future MSF models, advancing toward provably robust autonomous driving perception systems.

Abstract

Multi-sensor fusion systems (MSFs) play a vital role as the perception module in modern autonomous vehicles (AVs). Therefore, ensuring their robustness against common and realistic adversarial semantic transformations, such as rotation and shifting in the physical world, is crucial for the safety of AVs. While empirical evidence suggests that MSFs exhibit improved robustness compared to single-modal models, they are still vulnerable to adversarial semantic transformations. Despite the proposal of empirical defenses, several works show that these defenses can be attacked again by new adaptive attacks. So far, there is no certified defense proposed for MSFs. In this work, we propose the first robustness certification framework COMMIT certify robustness of multi-sensor fusion systems against semantic attacks. In particular, we propose a practical anisotropic noise mechanism that leverages randomized smoothing with multi-modal data and performs a grid-based splitting method to characterize complex semantic transformations. We also propose efficient algorithms to compute the certification in terms of object detection accuracy and IoU for large-scale MSF models. Empirically, we evaluate the efficacy of COMMIT in different settings and provide a comprehensive benchmark of certified robustness for different MSF models using the CARLA simulation platform. We show that the certification for MSF models is at most 48.39% higher than that of single-modal models, which validates the advantages of MSF models. We believe our certification framework and benchmark will contribute an important step towards certifiably robust AVs in practice.
Paper Structure (32 sections, 8 theorems, 36 equations, 9 figures, 6 tables, 4 algorithms)

This paper contains 32 sections, 8 theorems, 36 equations, 9 figures, 6 tables, 4 algorithms.

Key Result

Theorem 3.1

Let $T=\{T_x, T_p\}$ be a transformation with parameter space $\mathcal{Z}$. Suppose $\mathcal{S} \subseteq \mathcal{Z}$ and $\{\alpha_i\}_{i=1}^M\subseteq \mathcal{S}$. For detection confidence $g:\mathcal{X} \times \mathcal{P} \to [0,1]$, let $h_q({\bm{x}}, {\bm{p}})$ be the median smoothing of $g

Figures (9)

  • Figure 1: Overview of COMMIT, the first framework that provides certified robustness for multi-sensor fusion systems against semantic transformations.
  • Figure 2: Visualization of the rotation and shifting transformations. The x-axis, y-axis, and z-axis point to the left, down, and forward respectively, while the original point is at the center of the bottom plane for the ego vehicle bounding box.
  • Figure 3: Certified and empirical robustness on detection rate and IoU against rotation transformation (smoothing $\sigma=0.25$) under different thresholds. Solid lines represent the certified bounds, and dashed lines show the empirical performance under PGD attacks. $x$-axis represents the threshold for confidence score ($\text{TH}_\text{conf}$) and IoU score ($\text{TH}_\text{IoU}$), and $y$-axis represents the ratio of detection whose confidence / IoU score is larger than the confidence / IoU threshold.
  • Figure 4: Image $\ell_2$ norm distribution in rotation and shift transformation. Images are from spawn point 15, 30, 43, 46, 57 and 86 from dataset with building and without pedestrian. The scatter plots show the $\ell_2$ distance of randomly chosen pairs in randomly chosen big intervals. The black line is the $\ell_2$ distance between the endpoints of big intervals.
  • Figure 5: Illustration of $C(\underline x,\underline z,\underline r, \bar{x},\bar{z},\bar{r}, w,l)$ on $x-z$ plane.
  • ...and 4 more figures

Theorems & Definitions (15)

  • Theorem 3.1
  • Remark 3.2
  • Lemma 3.3
  • Remark 3.4
  • Theorem 3.5
  • Theorem 3.6
  • proof : Proof Sketch.
  • Remark A.2
  • Theorem \ref{thm:detection}: restated
  • proof
  • ...and 5 more