Agentic Copyright Watermarking against Adversarial Evidence Forgery with Purification-Agnostic Curriculum Proxy Learning

Erjin Bao; Ching-Chun Chang; Hanrui Wang; Isao Echizen

Agentic Copyright Watermarking against Adversarial Evidence Forgery with Purification-Agnostic Curriculum Proxy Learning

Erjin Bao, Ching-Chun Chang, Hanrui Wang, Isao Echizen

TL;DR

The paper tackles ownership protection for AI models by proposing a self-authenticating black-box watermarking protocol and defending against adversarial evidence forgery through purification. It introduces purification-agnostic curriculum proxy learning, employing a surrogate function $Q_{\eta}$ and a Fourier low-pass proxy with radius $R$, plus Gaussian dynamic sampling with rejection to manage purification strength. Empirical results on CIFAR-10 with a PreActResNet18 backbone show watermark verification accuracy around $0.94$–$0.99$ on genuine watermarks and strong suppression of forged/adversarial evidence after purification, with proxy learning preserving inference performance on purified data. These findings highlight a robust pathway for black-box watermarking that maintains model utility while improving resistance to adversarial tampering and evidence forgery across purification regimes.

Abstract

With the proliferation of AI agents in various domains, protecting the ownership of AI models has become crucial due to the significant investment in their development. Unauthorized use and illegal distribution of these models pose serious threats to intellectual property, necessitating effective copyright protection measures. Model watermarking has emerged as a key technique to address this issue, embedding ownership information within models to assert rightful ownership during copyright disputes. This paper presents several contributions to model watermarking: a self-authenticating black-box watermarking protocol using hash techniques, a study on evidence forgery attacks using adversarial perturbations, a proposed defense involving a purification step to counter adversarial attacks, and a purification-agnostic curriculum proxy learning method to enhance watermark robustness and model performance. Experimental results demonstrate the effectiveness of these approaches in improving the security, reliability, and performance of watermarked models.

Agentic Copyright Watermarking against Adversarial Evidence Forgery with Purification-Agnostic Curriculum Proxy Learning

TL;DR

and a Fourier low-pass proxy with radius

, plus Gaussian dynamic sampling with rejection to manage purification strength. Empirical results on CIFAR-10 with a PreActResNet18 backbone show watermark verification accuracy around

–

on genuine watermarks and strong suppression of forged/adversarial evidence after purification, with proxy learning preserving inference performance on purified data. These findings highlight a robust pathway for black-box watermarking that maintains model utility while improving resistance to adversarial tampering and evidence forgery across purification regimes.

Abstract

Paper Structure (9 sections, 6 equations, 4 figures, 1 table)

This paper contains 9 sections, 6 equations, 4 figures, 1 table.

Introduction
Methodology
Black-Box Watermarking Protocol
Adversarial Evidence Forgery
Purification-Agnostic Proxy Learning
Fourier Low-Pass Transform
Gaussian Dynamic Sampling with Rejection
Experiment
Conclusion

Figures (4)

Figure 1: Overview of watermark embedding and verification.
Figure 2: Adversarial evidence forgery and purification.
Figure 3: Purification-agnostic curriculum proxy learning.
Figure 4: Demonstration of adversarial attack and defense.

Agentic Copyright Watermarking against Adversarial Evidence Forgery with Purification-Agnostic Curriculum Proxy Learning

TL;DR

Abstract

Agentic Copyright Watermarking against Adversarial Evidence Forgery with Purification-Agnostic Curriculum Proxy Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)