Table of Contents
Fetching ...

Searching Realistic-Looking Adversarial Objects For Autonomous Driving Systems

Shengxiang Sun, Shenzhe Zhu

TL;DR

This work tackles the realism gap in physically realizable adversarial objects for autonomous driving by introducing a Judge that evaluates object realism and guides gradient-based texture optimization within a NeRF-based renderer. The Judge assigns a realism score per timestep $S_t$ and an average $S$, which modulates the adversarial loss via $\min_{\theta} J(\theta) = \frac{1}{S+\varepsilon} \sum_{t=0}^T C(x_t)$ under the constraint $G(x_{t-1}, x_t, \theta) = 0$, thereby balancing attack effectiveness with perceptual realism. The paper analyzes four strategies to realize the Judge—off-the-shelf vision-language models, fine-tuning open-source models, neurosymbolic systems, and traditional image processing—finding fine-tuning and neurosymbolic approaches most promising while noting reliability concerns for the other strategies. Overall, the approach advances realistic adversarial testing for autonomous driving by enabling adversarial objects that are both effective and visually plausible, with implications for robustness assessment and safety.

Abstract

Numerous studies on adversarial attacks targeting self-driving policies fail to incorporate realistic-looking adversarial objects, limiting real-world applicability. Building upon prior research that facilitated the transition of adversarial objects from simulations to practical applications, this paper discusses a modified gradient-based texture optimization method to discover realistic-looking adversarial objects. While retaining the core architecture and techniques of the prior research, the proposed addition involves an entity termed the 'Judge'. This agent assesses the texture of a rendered object, assigning a probability score reflecting its realism. This score is integrated into the loss function to encourage the NeRF object renderer to concurrently learn realistic and adversarial textures. The paper analyzes four strategies for developing a robust 'Judge': 1) Leveraging cutting-edge vision-language models. 2) Fine-tuning open-sourced vision-language models. 3) Pretraining neurosymbolic systems. 4) Utilizing traditional image processing techniques. Our findings indicate that strategies 1) and 4) yield less reliable outcomes, pointing towards strategies 2) or 3) as more promising directions for future research.

Searching Realistic-Looking Adversarial Objects For Autonomous Driving Systems

TL;DR

This work tackles the realism gap in physically realizable adversarial objects for autonomous driving by introducing a Judge that evaluates object realism and guides gradient-based texture optimization within a NeRF-based renderer. The Judge assigns a realism score per timestep and an average , which modulates the adversarial loss via under the constraint , thereby balancing attack effectiveness with perceptual realism. The paper analyzes four strategies to realize the Judge—off-the-shelf vision-language models, fine-tuning open-source models, neurosymbolic systems, and traditional image processing—finding fine-tuning and neurosymbolic approaches most promising while noting reliability concerns for the other strategies. Overall, the approach advances realistic adversarial testing for autonomous driving by enabling adversarial objects that are both effective and visually plausible, with implications for robustness assessment and safety.

Abstract

Numerous studies on adversarial attacks targeting self-driving policies fail to incorporate realistic-looking adversarial objects, limiting real-world applicability. Building upon prior research that facilitated the transition of adversarial objects from simulations to practical applications, this paper discusses a modified gradient-based texture optimization method to discover realistic-looking adversarial objects. While retaining the core architecture and techniques of the prior research, the proposed addition involves an entity termed the 'Judge'. This agent assesses the texture of a rendered object, assigning a probability score reflecting its realism. This score is integrated into the loss function to encourage the NeRF object renderer to concurrently learn realistic and adversarial textures. The paper analyzes four strategies for developing a robust 'Judge': 1) Leveraging cutting-edge vision-language models. 2) Fine-tuning open-sourced vision-language models. 3) Pretraining neurosymbolic systems. 4) Utilizing traditional image processing techniques. Our findings indicate that strategies 1) and 4) yield less reliable outcomes, pointing towards strategies 2) or 3) as more promising directions for future research.
Paper Structure (15 sections, 3 equations, 2 figures, 1 table)

This paper contains 15 sections, 3 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: A modified algorithm flowchart, illustrating the integration of the Judge within the process. The primary alterations to the algorithm involve the loss function and the inclusion of the Judge, tasked with evaluating the realism of rendered objects.
  • Figure 2: The custom dataset generation process. Step ① selects image frames of traffic objects from datasets such as ApolloScape apolloscape2024 or Nuscenes nuscenes. Step ② edits these image frames’ textures using InstructPix2Pix brooks2023instructpix2pix or MagicBrush zhang2023magicbrush. Step ③ assigns a realistic probability score to the edited image frames through manual labeling. Finally, the prompt in Appendix \ref{['appendixa1']}, the image from ②, and the text from ③ will be combined as triplets to create a custom dataset to fine-tune open-sourced models.