Table of Contents
Fetching ...

CoTGuard: Using Chain-of-Thought Triggering for Copyright Protection in Multi-Agent LLM Systems

Yan Wen, Junfeng Guo, Heng Huang

TL;DR

CoTGuard tackles copyright protection in multi-agent LLM systems by exploiting intermediate Chain-of-Thought traces rather than final outputs. It introduces a trigger-based CoT framework that embeds task-specific trigger patterns into prompts via a mapping $T(k,t)$, enabling watermarking to propagate through inter-agent reasoning. A leakage detector $D(\hat{\mathcal{R}}, \mathcal{K})$ computes a leakage score $\delta \in [0,1]$ against a threshold $\theta$ by parsing, comparing, and aggregating reasoning traces, including robustness to paraphrase. Empirical results across mathematics, logic, and planning tasks show high leakage detection rates with minimal impact on task performance, demonstrating practicality for IP protection in collaborative LLM workflows.

Abstract

As large language models (LLMs) evolve into autonomous agents capable of collaborative reasoning and task execution, multi-agent LLM systems have emerged as a powerful paradigm for solving complex problems. However, these systems pose new challenges for copyright protection, particularly when sensitive or copyrighted content is inadvertently recalled through inter-agent communication and reasoning. Existing protection techniques primarily focus on detecting content in final outputs, overlooking the richer, more revealing reasoning processes within the agents themselves. In this paper, we introduce CoTGuard, a novel framework for copyright protection that leverages trigger-based detection within Chain-of-Thought (CoT) reasoning. Specifically, we can activate specific CoT segments and monitor intermediate reasoning steps for unauthorized content reproduction by embedding specific trigger queries into agent prompts. This approach enables fine-grained, interpretable detection of copyright violations in collaborative agent scenarios. We evaluate CoTGuard on various benchmarks in extensive experiments and show that it effectively uncovers content leakage with minimal interference to task performance. Our findings suggest that reasoning-level monitoring offers a promising direction for safeguarding intellectual property in LLM-based agent systems.

CoTGuard: Using Chain-of-Thought Triggering for Copyright Protection in Multi-Agent LLM Systems

TL;DR

CoTGuard tackles copyright protection in multi-agent LLM systems by exploiting intermediate Chain-of-Thought traces rather than final outputs. It introduces a trigger-based CoT framework that embeds task-specific trigger patterns into prompts via a mapping , enabling watermarking to propagate through inter-agent reasoning. A leakage detector computes a leakage score against a threshold by parsing, comparing, and aggregating reasoning traces, including robustness to paraphrase. Empirical results across mathematics, logic, and planning tasks show high leakage detection rates with minimal impact on task performance, demonstrating practicality for IP protection in collaborative LLM workflows.

Abstract

As large language models (LLMs) evolve into autonomous agents capable of collaborative reasoning and task execution, multi-agent LLM systems have emerged as a powerful paradigm for solving complex problems. However, these systems pose new challenges for copyright protection, particularly when sensitive or copyrighted content is inadvertently recalled through inter-agent communication and reasoning. Existing protection techniques primarily focus on detecting content in final outputs, overlooking the richer, more revealing reasoning processes within the agents themselves. In this paper, we introduce CoTGuard, a novel framework for copyright protection that leverages trigger-based detection within Chain-of-Thought (CoT) reasoning. Specifically, we can activate specific CoT segments and monitor intermediate reasoning steps for unauthorized content reproduction by embedding specific trigger queries into agent prompts. This approach enables fine-grained, interpretable detection of copyright violations in collaborative agent scenarios. We evaluate CoTGuard on various benchmarks in extensive experiments and show that it effectively uncovers content leakage with minimal interference to task performance. Our findings suggest that reasoning-level monitoring offers a promising direction for safeguarding intellectual property in LLM-based agent systems.

Paper Structure

This paper contains 46 sections, 1 theorem, 4 equations, 1 figure, 10 tables, 4 algorithms.

Key Result

Theorem 1

Let $\mathcal{K}$ denote a finite set of known trigger keys, and let $T(k, t)$ be a deterministic function mapping each pair $(k, t)$ of trigger key and task type to a trigger pattern $\tau$. Let $\mathcal{R} = \{r_1, \dots, r_n\}$ denote the original set of Chain-of-Thought traces produced by agent such that if $\delta > \theta$ for some threshold $\theta$, we infer with high confidence that $\ha

Figures (1)

  • Figure 1: General pipeline of the CoTGuard. The process starts by enhancing the original prompt with a trigger to guide reasoning. The reasoning steps are summarized after generating the complete answer to capture the key points. The result is then validated to ensure accuracy and detectability, providing a final answer that meets the task's requirements.

Theorems & Definitions (1)

  • Theorem 1: Trigger-Based Leakage Detection