Table of Contents
Fetching ...

Agent2Agent Threats in Safety-Critical LLM Assistants: A Human-Centric Taxonomy

Lukas Stappen, Ahmet Erkan Turan, Johann Hagerer, Georg Groh

TL;DR

The paper addresses security risks posed by LLM-based in-vehicle agents and inter-agent communication via A2A, where malicious prompts can propagate and compromise driver safety. It introduces AgentHeLLM, a threat-modeling framework that separates asset identification (human-centric, harm-driven) from attack-path analysis (graph-based, poison vs. trigger paths), supported by an open-source Attack Path Generator. A formal, two-dimensional taxonomy combines victim-centered asset categories inspired by UDHR with a graph of Actors and Datasources to enumerate and analyze multi-stage threats. The approach aims to provide rigorous, safety-critical threat anticipation aligned with standards like ISO/SAE 21434 and UNECE R155, and is extendable to other safety-critical deployments of agentic AI.

Abstract

The integration of Large Language Model (LLM)-based conversational agents into vehicles creates novel security challenges at the intersection of agentic AI, automotive safety, and inter-agent communication. As these intelligent assistants coordinate with external services via protocols such as Google's Agent-to-Agent (A2A), they establish attack surfaces where manipulations can propagate through natural language payloads, potentially causing severe consequences ranging from driver distraction to unauthorized vehicle control. Existing AI security frameworks, while foundational, lack the rigorous "separation of concerns" standard in safety-critical systems engineering by co-mingling the concepts of what is being protected (assets) with how it is attacked (attack paths). This paper addresses this methodological gap by proposing a threat modeling framework called AgentHeLLM (Agent Hazard Exploration for LLM Assistants) that formally separates asset identification from attack path analysis. We introduce a human-centric asset taxonomy derived from harm-oriented "victim modeling" and inspired by the Universal Declaration of Human Rights, and a formal graph-based model that distinguishes poison paths (malicious data propagation) from trigger paths (activation actions). We demonstrate the framework's practical applicability through an open-source attack path suggestion tool AgentHeLLM Attack Path Generator that automates multi-stage threat discovery using a bi-level search strategy.

Agent2Agent Threats in Safety-Critical LLM Assistants: A Human-Centric Taxonomy

TL;DR

The paper addresses security risks posed by LLM-based in-vehicle agents and inter-agent communication via A2A, where malicious prompts can propagate and compromise driver safety. It introduces AgentHeLLM, a threat-modeling framework that separates asset identification (human-centric, harm-driven) from attack-path analysis (graph-based, poison vs. trigger paths), supported by an open-source Attack Path Generator. A formal, two-dimensional taxonomy combines victim-centered asset categories inspired by UDHR with a graph of Actors and Datasources to enumerate and analyze multi-stage threats. The approach aims to provide rigorous, safety-critical threat anticipation aligned with standards like ISO/SAE 21434 and UNECE R155, and is extendable to other safety-critical deployments of agentic AI.

Abstract

The integration of Large Language Model (LLM)-based conversational agents into vehicles creates novel security challenges at the intersection of agentic AI, automotive safety, and inter-agent communication. As these intelligent assistants coordinate with external services via protocols such as Google's Agent-to-Agent (A2A), they establish attack surfaces where manipulations can propagate through natural language payloads, potentially causing severe consequences ranging from driver distraction to unauthorized vehicle control. Existing AI security frameworks, while foundational, lack the rigorous "separation of concerns" standard in safety-critical systems engineering by co-mingling the concepts of what is being protected (assets) with how it is attacked (attack paths). This paper addresses this methodological gap by proposing a threat modeling framework called AgentHeLLM (Agent Hazard Exploration for LLM Assistants) that formally separates asset identification from attack path analysis. We introduce a human-centric asset taxonomy derived from harm-oriented "victim modeling" and inspired by the Universal Declaration of Human Rights, and a formal graph-based model that distinguishes poison paths (malicious data propagation) from trigger paths (activation actions). We demonstrate the framework's practical applicability through an open-source attack path suggestion tool AgentHeLLM Attack Path Generator that automates multi-stage threat discovery using a bi-level search strategy.
Paper Structure (29 sections, 1 equation, 7 figures, 1 table)

This paper contains 29 sections, 1 equation, 7 figures, 1 table.

Figures (7)

  • Figure 1: Framework overview: Separation of asset taxonomy (WHAT) from attack path formalization (HOW).
  • Figure 2: Graph primitives for modeling agentic systems. Actors (entities with agency) interact via communicate/respond edges, while Datasources (passive stores) are accessed via read/write edges.
  • Figure 3: Memory poisoning: attacker manipulates agent to store malicious text, then triggers consumption.
  • Figure 4: Privilege escalation: manipulated WhatsApp message causes driver to issue malicious prompt.
  • Figure 6: Structure of an attack path as a sequence of attack steps. Each step may contain nested trigger chains for edge activation or consumption, creating a recursive "attack within attack" structure.
  • ...and 2 more figures