Table of Contents
Fetching ...

SoK: Trust-Authorization Mismatch in LLM Agent Interactions

Guanquan Shi, Haohua Du, Zhiqiang Wang, Xiaoyu Liang, Weiwenpei Liu, Song Bian, Zhenyu Guan

TL;DR

The paper addresses security in autonomous LLM agents by proposing a unifying Trust-Authorization Mismatch framework. It introduces the Belief-Intention-Permission (B-I-P) model and a Trust-Authorization Matrix to analyze how corrupted beliefs can lead to unsafe actions when permissions are not provenance-aware. It systematizes threats along the B-I-P chain and maps existing attacks and defenses to four stages, advocating chain-breaking defenses such as belief-aware dynamic authorization and taint-tracking, complemented by auditable security logs. It also argues that emphasis should shift from perfect belief integrity to robust permission controls and post-hoc accountability to enable verifiable and secure agent systems in practice.

Abstract

Large Language Models (LLMs) are rapidly evolving into autonomous agents capable of interacting with the external world, significantly expanding their capabilities through standardized interaction protocols. However, this paradigm revives the classic cybersecurity challenges of agency and authorization in a novel and volatile context. As decision-making shifts from deterministic code logic to probabilistic inference driven by natural language, traditional security mechanisms designed for deterministic behavior fail. It is fundamentally challenging to establish trust for unpredictable AI agents and to enforce the Principle of Least Privilege (PoLP) when instructions are ambiguous. Despite the escalating threat landscape, the academic community's understanding of this emerging domain remains fragmented, lacking a systematic framework to analyze its root causes. This paper provides a unifying formal lens for agent-interaction security. We observed that most security threats in this domain stem from a fundamental mismatch between trust evaluation and authorization policies. We introduce a novel risk analysis model centered on this trust-authorization gap. Using this model as a unifying lens, we survey and classify the implementation paths of existing, often seemingly isolated, attacks and defenses. This new framework not only unifies the field but also allows us to identify critical research gaps. Finally, we leverage our analysis to suggest a systematic research direction toward building robust, trusted agents and dynamic authorization mechanisms.

SoK: Trust-Authorization Mismatch in LLM Agent Interactions

TL;DR

The paper addresses security in autonomous LLM agents by proposing a unifying Trust-Authorization Mismatch framework. It introduces the Belief-Intention-Permission (B-I-P) model and a Trust-Authorization Matrix to analyze how corrupted beliefs can lead to unsafe actions when permissions are not provenance-aware. It systematizes threats along the B-I-P chain and maps existing attacks and defenses to four stages, advocating chain-breaking defenses such as belief-aware dynamic authorization and taint-tracking, complemented by auditable security logs. It also argues that emphasis should shift from perfect belief integrity to robust permission controls and post-hoc accountability to enable verifiable and secure agent systems in practice.

Abstract

Large Language Models (LLMs) are rapidly evolving into autonomous agents capable of interacting with the external world, significantly expanding their capabilities through standardized interaction protocols. However, this paradigm revives the classic cybersecurity challenges of agency and authorization in a novel and volatile context. As decision-making shifts from deterministic code logic to probabilistic inference driven by natural language, traditional security mechanisms designed for deterministic behavior fail. It is fundamentally challenging to establish trust for unpredictable AI agents and to enforce the Principle of Least Privilege (PoLP) when instructions are ambiguous. Despite the escalating threat landscape, the academic community's understanding of this emerging domain remains fragmented, lacking a systematic framework to analyze its root causes. This paper provides a unifying formal lens for agent-interaction security. We observed that most security threats in this domain stem from a fundamental mismatch between trust evaluation and authorization policies. We introduce a novel risk analysis model centered on this trust-authorization gap. Using this model as a unifying lens, we survey and classify the implementation paths of existing, often seemingly isolated, attacks and defenses. This new framework not only unifies the field but also allows us to identify critical research gaps. Finally, we leverage our analysis to suggest a systematic research direction toward building robust, trusted agents and dynamic authorization mechanisms.

Paper Structure

This paper contains 33 sections, 1 theorem, 7 equations, 7 figures, 4 tables.

Key Result

Theorem 1

Under O1/O2 and monotone $F$, no event $\mathrm{exec}(\alpha)$ with $\rho(\alpha)=\mathsf{High}$ can occur if the closest preceding $\mathrm{plan}(\xi)$ is low-trust-derived. Equivalently, all attempted trajectories into Fig. 4's Failure quadrant (Y-Low/X-High) are cut at Stage 3; admissible high-ri

Figures (7)

  • Figure 1: The Comparison of the Traditional Security Model and AI Trustworthy Model.
  • Figure 2: MCP protocol and A2A protocol workflow.
  • Figure 3: MAS Coordination Strategies.
  • Figure 4: The Trust-Authorization Matrix. The Y-axis is a composite of Evidence Strength and Source Assurance; the ‘HITL Zone’ is constrained by Obligation O2.
  • Figure 5: The Trust-Authorization Mismatch Process. Each stage is annotated with auditable labels and intent justifications (Constraint 1 & 2). Stage 3 policy enforces Theorem 1 under assumptions A1–A4 (details in Appendix \ref{['app:proof']}).
  • ...and 2 more figures

Theorems & Definitions (4)

  • Definition 3.1: Chained Mismatch with Observables
  • Theorem 1: Non-interference for High-risk Exec w.r.t. Low-trust Sources
  • Definition A.1: Low-trust-derived intent
  • proof : Proof sketch