Table of Contents
Fetching ...

On Protecting Agentic Systems' Intellectual Property via Watermarking

Liwen Wang, Zongjie Li, Yuchong Xie, Shuai Wang, Dongdong She, Wei Wang, Juergen Rahmel

TL;DR

The paper tackles IP protection for agentic systems deployed in grey-box settings where only tool-usage trajectories are visible. It introduces AGENTWM, a distribution-level watermarking framework that biases semantically equivalent action segments within visible trajectories, preserving performance while embedding verifiable signals. The method includes five complementary watermark schemes, an automated passes generator/ verifier pipeline, and a statistical verification procedure using Jensen-Shannon Divergence to detect theft and localize attackers. Across three domains, AGENTWM achieves near-perfect detection and attribution with minimal impact on utility and strong robustness to removal attempts, offering a practical defense for proprietary agentic capabilities in real-world deployments $\mathcal{M}_{vic}$ vs $\mathcal{M}_{imi}$.$

Abstract

The evolution of Large Language Models (LLMs) into agentic systems that perform autonomous reasoning and tool use has created significant intellectual property (IP) value. We demonstrate that these systems are highly vulnerable to imitation attacks, where adversaries steal proprietary capabilities by training imitation models on victim outputs. Crucially, existing LLM watermarking techniques fail in this domain because real-world agentic systems often operate as grey boxes, concealing the internal reasoning traces required for verification. This paper presents AGENTWM, the first watermarking framework designed specifically for agentic models. AGENTWM exploits the semantic equivalence of action sequences, injecting watermarks by subtly biasing the distribution of functionally identical tool execution paths. This mechanism allows AGENTWM to embed verifiable signals directly into the visible action trajectory while remaining indistinguishable to users. We develop an automated pipeline to generate robust watermark schemes and a rigorous statistical hypothesis testing procedure for verification. Extensive evaluations across three complex domains demonstrate that AGENTWM achieves high detection accuracy with negligible impact on agent performance. Our results confirm that AGENTWM effectively protects agentic IP against adaptive adversaries, who cannot remove the watermarks without severely degrading the stolen model's utility.

On Protecting Agentic Systems' Intellectual Property via Watermarking

TL;DR

The paper tackles IP protection for agentic systems deployed in grey-box settings where only tool-usage trajectories are visible. It introduces AGENTWM, a distribution-level watermarking framework that biases semantically equivalent action segments within visible trajectories, preserving performance while embedding verifiable signals. The method includes five complementary watermark schemes, an automated passes generator/ verifier pipeline, and a statistical verification procedure using Jensen-Shannon Divergence to detect theft and localize attackers. Across three domains, AGENTWM achieves near-perfect detection and attribution with minimal impact on utility and strong robustness to removal attempts, offering a practical defense for proprietary agentic capabilities in real-world deployments vs .$

Abstract

The evolution of Large Language Models (LLMs) into agentic systems that perform autonomous reasoning and tool use has created significant intellectual property (IP) value. We demonstrate that these systems are highly vulnerable to imitation attacks, where adversaries steal proprietary capabilities by training imitation models on victim outputs. Crucially, existing LLM watermarking techniques fail in this domain because real-world agentic systems often operate as grey boxes, concealing the internal reasoning traces required for verification. This paper presents AGENTWM, the first watermarking framework designed specifically for agentic models. AGENTWM exploits the semantic equivalence of action sequences, injecting watermarks by subtly biasing the distribution of functionally identical tool execution paths. This mechanism allows AGENTWM to embed verifiable signals directly into the visible action trajectory while remaining indistinguishable to users. We develop an automated pipeline to generate robust watermark schemes and a rigorous statistical hypothesis testing procedure for verification. Extensive evaluations across three complex domains demonstrate that AGENTWM achieves high detection accuracy with negligible impact on agent performance. Our results confirm that AGENTWM effectively protects agentic IP against adaptive adversaries, who cannot remove the watermarks without severely degrading the stolen model's utility.
Paper Structure (25 sections, 1 equation, 6 figures, 14 tables)

This paper contains 25 sections, 1 equation, 6 figures, 14 tables.

Figures (6)

  • Figure 1: The paradigm shift in agentic system architectures.
  • Figure 2: Overview of the imitation attack and mitigation enabled by agentwm.
  • Figure 3: Overview of agentwm. (a) Offline watermark preparation: the Generator mines candidate equivalence sets from the tool library with five WM schemes, and the Verifier validates them to produce watermark passes. (b) Online protection: agentwm assigns user-specific passes, applies watermarks to trajectories, and detects IP theft through statistical divergence tests.
  • Figure 4: Taxonomy of five watermark schemes. Each defines a distinct class of equivalence groups over action segments, enabling distribution-level watermark embedding.
  • Figure 5: Detection performance across different threshold configurations.
  • ...and 1 more figures