Table of Contents
Fetching ...

FuncPoison: Poisoning Function Library to Hijack Multi-agent Autonomous Driving Systems

Yuzhen Long, Songze Li

TL;DR

FuncPoison targets the trusted function call interface in LLM-based multi-agent autonomous driving systems by embedding adversarial prompts into function descriptions. The attack hijacks function selection and, once invoked, propagates manipulated outputs through memory, reasoning, and planning components, causing cross-agent misbehavior. Evaluations on AgentDriver and AgentThink with nuScenes show ASR above 86% and significant trajectory deviation and collisions, outperforming baselines and persisting under several defenses. The work highlights a critical, under-addressed supply-chain vulnerability in function libraries and motivates new defenses that verify function-call provenance and integrity.

Abstract

Autonomous driving systems increasingly rely on multi-agent architectures powered by large language models (LLMs), where specialized agents collaborate to perceive, reason, and plan. A key component of these systems is the shared function library, a collection of software tools that agents use to process sensor data and navigate complex driving environments. Despite its critical role in agent decision-making, the function library remains an under-explored vulnerability. In this paper, we introduce FuncPoison, a novel poisoning-based attack targeting the function library to manipulate the behavior of LLM-driven multi-agent autonomous systems. FuncPoison exploits two key weaknesses in how agents access the function library: (1) agents rely on text-based instructions to select tools; and (2) these tools are activated using standardized command formats that attackers can replicate. By injecting malicious tools with deceptive instructions, FuncPoison manipulates one agent s decisions--such as misinterpreting road conditions--triggering cascading errors that mislead other agents in the system. We experimentally evaluate FuncPoison on two representative multi-agent autonomous driving systems, demonstrating its ability to significantly degrade trajectory accuracy, flexibly target specific agents to induce coordinated misbehavior, and evade diverse defense mechanisms. Our results reveal that the function library, often considered a simple toolset, can serve as a critical attack surface in LLM-based autonomous driving systems, raising elevated concerns on their reliability.

FuncPoison: Poisoning Function Library to Hijack Multi-agent Autonomous Driving Systems

TL;DR

FuncPoison targets the trusted function call interface in LLM-based multi-agent autonomous driving systems by embedding adversarial prompts into function descriptions. The attack hijacks function selection and, once invoked, propagates manipulated outputs through memory, reasoning, and planning components, causing cross-agent misbehavior. Evaluations on AgentDriver and AgentThink with nuScenes show ASR above 86% and significant trajectory deviation and collisions, outperforming baselines and persisting under several defenses. The work highlights a critical, under-addressed supply-chain vulnerability in function libraries and motivates new defenses that verify function-call provenance and integrity.

Abstract

Autonomous driving systems increasingly rely on multi-agent architectures powered by large language models (LLMs), where specialized agents collaborate to perceive, reason, and plan. A key component of these systems is the shared function library, a collection of software tools that agents use to process sensor data and navigate complex driving environments. Despite its critical role in agent decision-making, the function library remains an under-explored vulnerability. In this paper, we introduce FuncPoison, a novel poisoning-based attack targeting the function library to manipulate the behavior of LLM-driven multi-agent autonomous systems. FuncPoison exploits two key weaknesses in how agents access the function library: (1) agents rely on text-based instructions to select tools; and (2) these tools are activated using standardized command formats that attackers can replicate. By injecting malicious tools with deceptive instructions, FuncPoison manipulates one agent s decisions--such as misinterpreting road conditions--triggering cascading errors that mislead other agents in the system. We experimentally evaluate FuncPoison on two representative multi-agent autonomous driving systems, demonstrating its ability to significantly degrade trajectory accuracy, flexibly target specific agents to induce coordinated misbehavior, and evade diverse defense mechanisms. Our results reveal that the function library, often considered a simple toolset, can serve as a critical attack surface in LLM-based autonomous driving systems, raising elevated concerns on their reliability.

Paper Structure

This paper contains 30 sections, 4 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Existing vs. Our New Attack Surfaces.
  • Figure 2: Function Call pipeline: (1) Function Choose, where the agent selects a tool from the Function Library based on prompt intent; (2) Function Output, where the agent executes the selected function in a structured template and receives results.
  • Figure 3: Architecture of two representative Autonomous Driving Multi-Agent Systems: AgentDriver is composed of four specialized agents—Perception, Memory, Reasoning, and Planning—that collaborate through a sequential information pipeline. In contrast, AgentThink adopts an alternating chain of Thinking and Function agents to accomplish decision-making in a chain-of-thought (CoT) manner.
  • Figure 4: Overview of FuncPoison: Our attack injects forged function calls into the prompt and manipulates specific functions in the function library, enabling control over downstream agents.
  • Figure 5: Attack performance on AgentDriver (L2 threshold=3).
  • ...and 5 more figures