Table of Contents
Fetching ...

Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design-A2A, AP2, ERC-8004, and Beyond

Botao 'Amber' Hu, Helena Rong

TL;DR

The paper addresses the challenge of trust in an agentic web composed of billions of autonomous agents. It adopts a unified framework of six trust models—Brief, Claim, Proof, Stake, Reputation, and Constraint—and analyzes how open protocols such as Google A2A, AP2, and Ethereum ERC-8004 instantiate them, highlighting LLM-specific fragilities like prompt injection, sycophancy, hallucination, deception, and misalignment. It contributes a structured taxonomy, a cross-protocol evaluation, and a forward-looking design agenda that advocates tiered trust (with $T_0$–$T_3$ levels), identity/brief-based discovery, hybrid by default configurations, and continuous auditability to mitigate social and technical risks. The practical impact is a set of actionable guidelines for safer, interoperable, and scalable inter-agent economies, enabling safer orchestration of large-scale autonomous agents across diverse domains.

Abstract

As the "agentic web" takes shape-billions of AI agents (often LLM-powered) autonomously transacting and collaborating-trust shifts from human oversight to protocol design. In 2025, several inter-agent protocols crystallized this shift, including Google's Agent-to-Agent (A2A), Agent Payments Protocol (AP2), and Ethereum's ERC-8004 "Trustless Agents," yet their underlying trust assumptions remain under-examined. This paper presents a comparative study of trust models in inter-agent protocol design: Brief (self- or third-party verifiable claims), Claim (self-proclaimed capabilities and identity, e.g. AgentCard), Proof (cryptographic verification, including zero-knowledge proofs and trusted execution environment attestations), Stake (bonded collateral with slashing and insurance), Reputation (crowd feedback and graph-based trust signals), and Constraint (sandboxing and capability bounding). For each, we analyze assumptions, attack surfaces, and design trade-offs, with particular emphasis on LLM-specific fragilities-prompt injection, sycophancy/nudge-susceptibility, hallucination, deception, and misalignment-that render purely reputational or claim-only approaches brittle. Our findings indicate no single mechanism suffices. We argue for trustless-by-default architectures anchored in Proof and Stake to gate high-impact actions, augmented by Brief for identity and discovery and Reputation overlays for flexibility and social signals. We comparatively evaluate A2A, AP2, ERC-8004 and related historical variations in academic research under metrics spanning security, privacy, latency/cost, and social robustness (Sybil/collusion/whitewashing resistance). We conclude with hybrid trust model recommendations that mitigate reputation gaming and misinformed LLM behavior, and we distill actionable design guidelines for safer, interoperable, and scalable agent economies.

Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design-A2A, AP2, ERC-8004, and Beyond

TL;DR

The paper addresses the challenge of trust in an agentic web composed of billions of autonomous agents. It adopts a unified framework of six trust models—Brief, Claim, Proof, Stake, Reputation, and Constraint—and analyzes how open protocols such as Google A2A, AP2, and Ethereum ERC-8004 instantiate them, highlighting LLM-specific fragilities like prompt injection, sycophancy, hallucination, deception, and misalignment. It contributes a structured taxonomy, a cross-protocol evaluation, and a forward-looking design agenda that advocates tiered trust (with levels), identity/brief-based discovery, hybrid by default configurations, and continuous auditability to mitigate social and technical risks. The practical impact is a set of actionable guidelines for safer, interoperable, and scalable inter-agent economies, enabling safer orchestration of large-scale autonomous agents across diverse domains.

Abstract

As the "agentic web" takes shape-billions of AI agents (often LLM-powered) autonomously transacting and collaborating-trust shifts from human oversight to protocol design. In 2025, several inter-agent protocols crystallized this shift, including Google's Agent-to-Agent (A2A), Agent Payments Protocol (AP2), and Ethereum's ERC-8004 "Trustless Agents," yet their underlying trust assumptions remain under-examined. This paper presents a comparative study of trust models in inter-agent protocol design: Brief (self- or third-party verifiable claims), Claim (self-proclaimed capabilities and identity, e.g. AgentCard), Proof (cryptographic verification, including zero-knowledge proofs and trusted execution environment attestations), Stake (bonded collateral with slashing and insurance), Reputation (crowd feedback and graph-based trust signals), and Constraint (sandboxing and capability bounding). For each, we analyze assumptions, attack surfaces, and design trade-offs, with particular emphasis on LLM-specific fragilities-prompt injection, sycophancy/nudge-susceptibility, hallucination, deception, and misalignment-that render purely reputational or claim-only approaches brittle. Our findings indicate no single mechanism suffices. We argue for trustless-by-default architectures anchored in Proof and Stake to gate high-impact actions, augmented by Brief for identity and discovery and Reputation overlays for flexibility and social signals. We comparatively evaluate A2A, AP2, ERC-8004 and related historical variations in academic research under metrics spanning security, privacy, latency/cost, and social robustness (Sybil/collusion/whitewashing resistance). We conclude with hybrid trust model recommendations that mitigate reputation gaming and misinformed LLM behavior, and we distill actionable design guidelines for safer, interoperable, and scalable agent economies.

Paper Structure

This paper contains 37 sections, 1 table.