Table of Contents
Fetching ...

Pixel-wise Smoothing for Certified Robustness against Camera Motion Perturbations

Hanjiang Hu, Zuxin Liu, Linyi Li, Jiacheng Zhu, Ding Zhao

TL;DR

A novel, efficient, and practical framework for certifying the robustness of 3D-2D projective transformations against camera motion perturbations that leverages a smoothing distribution over the 2D pixel space instead of in the 3D physical space, eliminating the need for costly camera motion sampling and significantly enhancing the efficiency of robustness certifications.

Abstract

Deep learning-based visual perception models lack robustness when faced with camera motion perturbations in practice. The current certification process for assessing robustness is costly and time-consuming due to the extensive number of image projections required for Monte Carlo sampling in the 3D camera motion space. To address these challenges, we present a novel, efficient, and practical framework for certifying the robustness of 3D-2D projective transformations against camera motion perturbations. Our approach leverages a smoothing distribution over the 2D pixel space instead of in the 3D physical space, eliminating the need for costly camera motion sampling and significantly enhancing the efficiency of robustness certifications. With the pixel-wise smoothed classifier, we are able to fully upper bound the projection errors using a technique of uniform partitioning in camera motion space. Additionally, we extend our certification framework to a more general scenario where only a single-frame point cloud is required in the projection oracle. Through extensive experimentation, we validate the trade-off between effectiveness and efficiency enabled by our proposed method. Remarkably, our approach achieves approximately 80% certified accuracy while utilizing only 30% of the projected image frames. The code is available at https://github.com/HanjiangHu/pixel-wise-smoothing.

Pixel-wise Smoothing for Certified Robustness against Camera Motion Perturbations

TL;DR

A novel, efficient, and practical framework for certifying the robustness of 3D-2D projective transformations against camera motion perturbations that leverages a smoothing distribution over the 2D pixel space instead of in the 3D physical space, eliminating the need for costly camera motion sampling and significantly enhancing the efficiency of robustness certifications.

Abstract

Deep learning-based visual perception models lack robustness when faced with camera motion perturbations in practice. The current certification process for assessing robustness is costly and time-consuming due to the extensive number of image projections required for Monte Carlo sampling in the 3D camera motion space. To address these challenges, we present a novel, efficient, and practical framework for certifying the robustness of 3D-2D projective transformations against camera motion perturbations. Our approach leverages a smoothing distribution over the 2D pixel space instead of in the 3D physical space, eliminating the need for costly camera motion sampling and significantly enhancing the efficiency of robustness certifications. With the pixel-wise smoothed classifier, we are able to fully upper bound the projection errors using a technique of uniform partitioning in camera motion space. Additionally, we extend our certification framework to a more general scenario where only a single-frame point cloud is required in the projection oracle. Through extensive experimentation, we validate the trade-off between effectiveness and efficiency enabled by our proposed method. Remarkably, our approach achieves approximately 80% certified accuracy while utilizing only 30% of the projected image frames. The code is available at https://github.com/HanjiangHu/pixel-wise-smoothing.
Paper Structure (32 sections, 15 theorems, 139 equations, 5 figures, 13 tables)

This paper contains 32 sections, 15 theorems, 139 equations, 5 figures, 13 tables.

Key Result

Lemma 4.4

Given the projection from entire 3D point $V\in\mathcal{V}: \mathbb{P}\times[0, 1]^K$ along one-axis translation or rotation and the consistent camera motion interval $\mathbb{U}_{P,r,s}$ for any $P\in\mathbb{P}$ projected on $(r,s)$, define the interval $\Delta^{r,s}$ as, then for any projection on $(r,s)$ under camera motion $u\in\bigcup_{P\in\mathbb{P}}\mathbb{U}_{P,r,s}$, we have $\forall ~

Figures (5)

  • Figure 1: Overview of the robustness certification using pixel-wise smoothing (green) to avoid non-efficient sampling in the camera motion space with too many projected frames required (red).
  • Figure 2: Image projection and coordinate framework in camera motion space.
  • Figure 3: Uniform partitions of $\Delta$ in camera motion $\alpha$ to fully cover all the pixel values $O(V,\alpha)_{r,s}$.
  • Figure 4: Projection from $P'$ as the one-frame point and unknown $P$ in entire points with $\delta$-convexity.
  • Figure 5: Certified accuracy of ResNet50 with smoothing variance $\sigma=0.25, 0.5, 0.75$ under different radii along $T_z, T_x, T_y, R_z, R_x, R_y$.

Theorems & Definitions (35)

  • Definition 3.1: 3D-2D position projection
  • Definition 3.2: 3D-2D $K$-channel pixel-wise projection
  • Definition 4.1: $\varepsilon$-smoothed classifier with 2D image projection
  • Remark 4.2
  • Definition 4.3: Consistent camera motion interval
  • Lemma 4.4: Upper bound of fully-covered motion interval
  • Theorem 4.5: Certification with fully-covered partitions
  • Lemma 4.6: Approximated upper bound of fully-covered interval
  • Theorem 4.7: Certification with approximated partitions
  • Definition 4.8: 3D projection from one-frame point cloud with $\delta$-convexity
  • ...and 25 more