Table of Contents
Fetching ...

Comparative Analysis of Patch Attack on VLM-Based Autonomous Driving Architectures

David Fernandez, Pedram MohajerAnsari, Amir Salarpour, Long Cheng, Abolfazl Razi, Mert D. Pesé

TL;DR

This paper presents a systematic framework for comparative adversarial evaluation across three VLM architectures: Dolphins, OmniDrive (Omni-L), and LeapVAD, revealing severe vulnerabilities across all architectures, sustained multi-frame failures, and critical object detection degradation.

Abstract

Vision-language models are emerging for autonomous driving, yet their robustness to physical adversarial attacks remains unexplored. This paper presents a systematic framework for comparative adversarial evaluation across three VLM architectures: Dolphins, OmniDrive (Omni-L), and LeapVAD. Using black-box optimization with semantic homogenization for fair comparison, we evaluate physically realizable patch attacks in CARLA simulation. Results reveal severe vulnerabilities across all architectures, sustained multi-frame failures, and critical object detection degradation. Our analysis exposes distinct architectural vulnerability patterns, demonstrating that current VLM designs inadequately address adversarial threats in safety-critical autonomous driving applications.

Comparative Analysis of Patch Attack on VLM-Based Autonomous Driving Architectures

TL;DR

This paper presents a systematic framework for comparative adversarial evaluation across three VLM architectures: Dolphins, OmniDrive (Omni-L), and LeapVAD, revealing severe vulnerabilities across all architectures, sustained multi-frame failures, and critical object detection degradation.

Abstract

Vision-language models are emerging for autonomous driving, yet their robustness to physical adversarial attacks remains unexplored. This paper presents a systematic framework for comparative adversarial evaluation across three VLM architectures: Dolphins, OmniDrive (Omni-L), and LeapVAD. Using black-box optimization with semantic homogenization for fair comparison, we evaluate physically realizable patch attacks in CARLA simulation. Results reveal severe vulnerabilities across all architectures, sustained multi-frame failures, and critical object detection degradation. Our analysis exposes distinct architectural vulnerability patterns, demonstrating that current VLM designs inadequately address adversarial threats in safety-critical autonomous driving applications.
Paper Structure (17 sections, 7 equations, 4 figures, 2 tables)

This paper contains 17 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of Attack Framework: (1) Black-box NES optimization generates adversarial patches; (2) CARLA scenarios capture frame video sequences; (3) VLM architectures process scenes: Dolphins, OmniDrive (Omni-L), and LeapVAD; (4) Semantic homogenization layer projects all VLM outputs into a unified embedding space; (5) Multi-dimensional evaluation.
  • Figure 2: Scenario 1:Bus Shelter Crosswalk Attack temporal sequence. The attack scenario demonstrates how the adversarial patch suppresses pedestrian detection as the ego vehicle approaches bus ad shelter.
  • Figure 3: Distance-dependent attack success rates across VLM architectures for both scenarios. The shaded red region indicates the critical decision-making range. Omni-L (purple) exhibits highest vulnerability across distances, while LeapVAD (orange) shows superior robustness, particularly at close ranges where its explicit critical object attention provides maximal benefit.
  • Figure 4: Temporal attack persistence across VLM architectures. Each row shows representative trials for crosswalk and highway scenarios. Models demonstrate similar temporal consistency patterns with LeapVAD showing slightly longer attack persistence (7.8$\pm$1.4 frames highway) compared to OmniDrive (6.9$\pm$1.3 frames).