Table of Contents
Fetching ...

Beyond Crash: Hijacking Your Autonomous Vehicle for Fun and Profit

Qi Sun, Ahmed Abdo, Luis Burbano, Ziyang Li, Yaxing Yao, Alvaro Cardenas, Yinzhi Cao

TL;DR

JackZebra reveals a long-horizon, route-level hijacking threat for vision-based autonomous vehicles by using a rear-mounted display to influence end-to-end driving decisions through a closed-loop interaction. The approach combines offline min–max patch optimization to build a robust patch bank with an online interactive adjustment loop that selects patches based on real-time deviations, sustaining influence across time. In CARLA/Bench2Drive simulations, JackZebra hijacked 34 of 39 routes (average completion around $122.2$ m) while maintaining largely plausible driving behavior and traffic-rule compliance under varied environmental conditions. These findings recast route integrity as a critical security property and motivate defenses such as temporal anomaly detection and multi-sensor corroboration to guard against stealthy, long-horizon manipulation of autonomous driving systems.

Abstract

Autonomous Vehicles (AVs), especially vision-based AVs, are rapidly being deployed without human operators. As AVs operate in safety-critical environments, understanding their robustness in an adversarial environment is an important research problem. Prior physical adversarial attacks on vision-based autonomous vehicles predominantly target immediate safety failures (e.g., a crash, a traffic-rule violation, or a transient lane departure) by inducing a short-lived perception or control error. This paper shows a qualitatively different risk: a long-horizon route integrity compromise, where an attacker gradually steers a victim AV away from its intended route and into an attacker-chosen destination while the victim continues to drive "normally." This will not pose a danger to the victim vehicle itself, but also to potential passengers sitting inside the vehicle. In this paper, we design and implement the first adversarial framework, called JackZebra, that performs route-level hijacking of a vision-based end-to-end driving stack using a physically plausible attacker vehicle with a reconfigurable display mounted on the rear. The central challenge is temporal persistence: adversarial influence must remain effective in changing viewpoints, lighting, weather, traffic, and the victim's continual replanning -- without triggering conspicuous failures. Our key insight is to treat route hijacking as a closed-loop control problem and to convert adversarial patches into steering primitives that can be selected online via an interactive adjustment loop. Our adversarial patches are also carefully designed against worst-case background and sensor variations so that the adversarial impacts on the victim. Our evaluation shows that JackZebra can successfully hijack victim vehicles to deviate from original routes and stop at adversarial destinations with a high success rate.

Beyond Crash: Hijacking Your Autonomous Vehicle for Fun and Profit

TL;DR

JackZebra reveals a long-horizon, route-level hijacking threat for vision-based autonomous vehicles by using a rear-mounted display to influence end-to-end driving decisions through a closed-loop interaction. The approach combines offline min–max patch optimization to build a robust patch bank with an online interactive adjustment loop that selects patches based on real-time deviations, sustaining influence across time. In CARLA/Bench2Drive simulations, JackZebra hijacked 34 of 39 routes (average completion around m) while maintaining largely plausible driving behavior and traffic-rule compliance under varied environmental conditions. These findings recast route integrity as a critical security property and motivate defenses such as temporal anomaly detection and multi-sensor corroboration to guard against stealthy, long-horizon manipulation of autonomous driving systems.

Abstract

Autonomous Vehicles (AVs), especially vision-based AVs, are rapidly being deployed without human operators. As AVs operate in safety-critical environments, understanding their robustness in an adversarial environment is an important research problem. Prior physical adversarial attacks on vision-based autonomous vehicles predominantly target immediate safety failures (e.g., a crash, a traffic-rule violation, or a transient lane departure) by inducing a short-lived perception or control error. This paper shows a qualitatively different risk: a long-horizon route integrity compromise, where an attacker gradually steers a victim AV away from its intended route and into an attacker-chosen destination while the victim continues to drive "normally." This will not pose a danger to the victim vehicle itself, but also to potential passengers sitting inside the vehicle. In this paper, we design and implement the first adversarial framework, called JackZebra, that performs route-level hijacking of a vision-based end-to-end driving stack using a physically plausible attacker vehicle with a reconfigurable display mounted on the rear. The central challenge is temporal persistence: adversarial influence must remain effective in changing viewpoints, lighting, weather, traffic, and the victim's continual replanning -- without triggering conspicuous failures. Our key insight is to treat route hijacking as a closed-loop control problem and to convert adversarial patches into steering primitives that can be selected online via an interactive adjustment loop. Our adversarial patches are also carefully designed against worst-case background and sensor variations so that the adversarial impacts on the victim. Our evaluation shows that JackZebra can successfully hijack victim vehicles to deviate from original routes and stop at adversarial destinations with a high success rate.
Paper Structure (31 sections, 8 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 31 sections, 8 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: An overall workflow of an autonomous driving agent: Sensor observations (e.g., images), navigational commands, and language prompts are first provided as inputs (Step 1). A backbone encoder (Step 2) converts these multimodal signals into a compact embedding representation (e.g., visual features and instruction tokens). The resulting embeddings are then consumed by a decision-making model (e.g., AV-based Vision Language Action model) that fuses perception with intent and contextual information to infer driving-relevant internal representations (Step 3). Its outputs are passed to a decoder and planning head (Step 4), which maps the fused representation into actionable predictions. Finally, the system produces two outputs (Step 5): a planned trajectory for vehicle control and a natural-language explanation that summarizes the rationale underlying the driving decision.
  • Figure 2: An illustration of JackZebra's motivation: A victim car is supposed to follow the blue route to a safe location, but is hijacked by an SUV to a different location. The left part shows the benign route (blue) and the adversarial route (red). The right part shows the scenario in a particular intersection: The victim is supposed to turn left, but was hijacked and instructed to go straight and follow the adversarial SUV.
  • Figure 3: System Architecture. Note that JackZebra has two major stages: (i) an offline attack generation stage to optimize a bank with patches with different hijacking purposes, e.g., turning angles, and (ii) an online attack stage to adjust the victim car using a chosen image based on three types of sensors, including front- and back-facing cameras and GPS locations. Both the adversarial and the victim vehicles are under the influence of JackZebra: the adversarial vehicle is instructed by JackZebra to follow traffic rules, and the victim vehicle is influenced by the chosen patch on the adversarial vehicle.
  • Figure 4: An illustration of Adversarial and Victim Vehicle and Their Positions Related to the Adversarial Route. JackZebra uses such information to choose a patch to display on the back of the adversarial vehicle (i.e., the red vehicle).
  • Figure 5: An illustration of two case studies, one success (a) and one failure (b). The red and blue dots are victim vehicle's predicted trajectories. The red dots are path waypoints depicting victim's future position, and blue dots are speed waypoints depicting victim's future speed.
  • ...and 2 more figures