CoTGuard: Using Chain-of-Thought Triggering for Copyright Protection in Multi-Agent LLM Systems

Yan Wen; Junfeng Guo; Heng Huang

CoTGuard: Using Chain-of-Thought Triggering for Copyright Protection in Multi-Agent LLM Systems

Yan Wen, Junfeng Guo, Heng Huang

TL;DR

CoTGuard tackles copyright protection in multi-agent LLM systems by exploiting intermediate Chain-of-Thought traces rather than final outputs. It introduces a trigger-based CoT framework that embeds task-specific trigger patterns into prompts via a mapping $T(k,t)$, enabling watermarking to propagate through inter-agent reasoning. A leakage detector $D(\hat{\mathcal{R}}, \mathcal{K})$ computes a leakage score $\delta \in [0,1]$ against a threshold $\theta$ by parsing, comparing, and aggregating reasoning traces, including robustness to paraphrase. Empirical results across mathematics, logic, and planning tasks show high leakage detection rates with minimal impact on task performance, demonstrating practicality for IP protection in collaborative LLM workflows.

Abstract

As large language models (LLMs) evolve into autonomous agents capable of collaborative reasoning and task execution, multi-agent LLM systems have emerged as a powerful paradigm for solving complex problems. However, these systems pose new challenges for copyright protection, particularly when sensitive or copyrighted content is inadvertently recalled through inter-agent communication and reasoning. Existing protection techniques primarily focus on detecting content in final outputs, overlooking the richer, more revealing reasoning processes within the agents themselves. In this paper, we introduce CoTGuard, a novel framework for copyright protection that leverages trigger-based detection within Chain-of-Thought (CoT) reasoning. Specifically, we can activate specific CoT segments and monitor intermediate reasoning steps for unauthorized content reproduction by embedding specific trigger queries into agent prompts. This approach enables fine-grained, interpretable detection of copyright violations in collaborative agent scenarios. We evaluate CoTGuard on various benchmarks in extensive experiments and show that it effectively uncovers content leakage with minimal interference to task performance. Our findings suggest that reasoning-level monitoring offers a promising direction for safeguarding intellectual property in LLM-based agent systems.

CoTGuard: Using Chain-of-Thought Triggering for Copyright Protection in Multi-Agent LLM Systems

TL;DR

Abstract

CoTGuard: Using Chain-of-Thought Triggering for Copyright Protection in Multi-Agent LLM Systems

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (1)