Table of Contents
Fetching ...

Adversarial Attacks on Event-Based Pedestrian Detectors: A Physical Approach

Guixu Lin, Muyao Niu, Qingtian Zhu, Zhengwei Yin, Zhuoxiao Li, Shengfeng He, Yinqiang Zheng

TL;DR

The paper tackles the problem of vulnerability in event-based pedestrian detectors to physical adversarial attacks. It introduces an end-to-end digital framework that optimizes 2D adversarial texture maps mapped onto a 3D human model via differentiable rendering and V2E to minimize the detector's combined confidence, specifically $\min (f_{\text{obj}}(\tilde{E}) + f_{\text{cls}}(\tilde{E}))$. Key contributions include (i) first demonstration of physical adversarial attacks on event-based vision, (ii) an end-to-end pipeline that converts 3D clothing textures into adversarial events, and (iii) validation of digital and physical attacks showing degraded detection performance, with analyses of body-part coverage and environmental conditions. The findings reveal that event-based pedestrian detectors are vulnerable to physical adversarial perturbations, motivating defenses and sensor-fusion strategies to ensure safety in real-world deployments, and highlighting the need for robustification against white-box physical attacks formalized by $L_{\text{adv}}=\lambda_1 L_{\text{obj}}+\lambda_2 L_{\text{cls}}$ with $\lambda_1=\lambda_2=10{,}000$ during optimization.

Abstract

Event cameras, known for their low latency and high dynamic range, show great potential in pedestrian detection applications. However, while recent research has primarily focused on improving detection accuracy, the robustness of event-based visual models against physical adversarial attacks has received limited attention. For example, adversarial physical objects, such as specific clothing patterns or accessories, can exploit inherent vulnerabilities in these systems, leading to misdetections or misclassifications. This study is the first to explore physical adversarial attacks on event-driven pedestrian detectors, specifically investigating whether certain clothing patterns worn by pedestrians can cause these detectors to fail, effectively rendering them unable to detect the person. To address this, we developed an end-to-end adversarial framework in the digital domain, framing the design of adversarial clothing textures as a 2D texture optimization problem. By crafting an effective adversarial loss function, the framework iteratively generates optimal textures through backpropagation. Our results demonstrate that the textures identified in the digital domain possess strong adversarial properties. Furthermore, we translated these digitally optimized textures into physical clothing and tested them in real-world scenarios, successfully demonstrating that the designed textures significantly degrade the performance of event-based pedestrian detection models. This work highlights the vulnerability of such models to physical adversarial attacks.

Adversarial Attacks on Event-Based Pedestrian Detectors: A Physical Approach

TL;DR

The paper tackles the problem of vulnerability in event-based pedestrian detectors to physical adversarial attacks. It introduces an end-to-end digital framework that optimizes 2D adversarial texture maps mapped onto a 3D human model via differentiable rendering and V2E to minimize the detector's combined confidence, specifically . Key contributions include (i) first demonstration of physical adversarial attacks on event-based vision, (ii) an end-to-end pipeline that converts 3D clothing textures into adversarial events, and (iii) validation of digital and physical attacks showing degraded detection performance, with analyses of body-part coverage and environmental conditions. The findings reveal that event-based pedestrian detectors are vulnerable to physical adversarial perturbations, motivating defenses and sensor-fusion strategies to ensure safety in real-world deployments, and highlighting the need for robustification against white-box physical attacks formalized by with during optimization.

Abstract

Event cameras, known for their low latency and high dynamic range, show great potential in pedestrian detection applications. However, while recent research has primarily focused on improving detection accuracy, the robustness of event-based visual models against physical adversarial attacks has received limited attention. For example, adversarial physical objects, such as specific clothing patterns or accessories, can exploit inherent vulnerabilities in these systems, leading to misdetections or misclassifications. This study is the first to explore physical adversarial attacks on event-driven pedestrian detectors, specifically investigating whether certain clothing patterns worn by pedestrians can cause these detectors to fail, effectively rendering them unable to detect the person. To address this, we developed an end-to-end adversarial framework in the digital domain, framing the design of adversarial clothing textures as a 2D texture optimization problem. By crafting an effective adversarial loss function, the framework iteratively generates optimal textures through backpropagation. Our results demonstrate that the textures identified in the digital domain possess strong adversarial properties. Furthermore, we translated these digitally optimized textures into physical clothing and tested them in real-world scenarios, successfully demonstrating that the designed textures significantly degrade the performance of event-based pedestrian detection models. This work highlights the vulnerability of such models to physical adversarial attacks.

Paper Structure

This paper contains 13 sections, 18 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Demonstration of a physical adversarial attack: A person wearing adversarial clothing evades detection by an event-based pedestrian detector during movement, while a pedestrian in normal clothing is accurately detected. Bounding boxes indicate successful pedestrian detection.
  • Figure 2: Demonstration of our method. (a) Adversarial Texture Map Generation: The input $z$ is fed into a texture map generation network, producing a grayscale texture map $\hat{U}$. After applying a masking operation, the adversarial texture map $\tilde{U}$ is obtained. (b) 3D Human Rendering: The adversarial texture map $\tilde{U}$ is combined with 3D human model shape and pose parameters, and a differentiable renderer is used to generate 2D videos of continuous human motion. (c) Adversarial Event Attack: From these generated 2D videos, events $\tilde{E}$ are created using the differentiable V2E method and are used to attack event-based pedestrian detectors $f$, where the neural network parameters of $f$ remain frozen. By applying the adversarial loss $L_{adv}$, the entire end-to-end pipeline is updated through backpropagation, ultimately resulting in the optimal adversarial texture map $\tilde{U}$.
  • Figure 3: Illustration of the texture map generation network. (a) represents the network structure of the texture map generation network. (b) shows a demo of a texture map $\hat{U}$. The output $\hat{U}$ consists of an $n \times n$ grid of white or black blocks, where each block is $c\times c$ pixels in size.
  • Figure 4: The masks used for all textures and the basic texture maps for comparison.
  • Figure 5: Visualization of the optimal texture map (grid size 10$\times$10 pixels) and the corresponding rendered human.
  • ...and 6 more figures