Agentic Copyright Watermarking against Adversarial Evidence Forgery with Purification-Agnostic Curriculum Proxy Learning
Erjin Bao, Ching-Chun Chang, Hanrui Wang, Isao Echizen
TL;DR
The paper tackles ownership protection for AI models by proposing a self-authenticating black-box watermarking protocol and defending against adversarial evidence forgery through purification. It introduces purification-agnostic curriculum proxy learning, employing a surrogate function $Q_{\eta}$ and a Fourier low-pass proxy with radius $R$, plus Gaussian dynamic sampling with rejection to manage purification strength. Empirical results on CIFAR-10 with a PreActResNet18 backbone show watermark verification accuracy around $0.94$–$0.99$ on genuine watermarks and strong suppression of forged/adversarial evidence after purification, with proxy learning preserving inference performance on purified data. These findings highlight a robust pathway for black-box watermarking that maintains model utility while improving resistance to adversarial tampering and evidence forgery across purification regimes.
Abstract
With the proliferation of AI agents in various domains, protecting the ownership of AI models has become crucial due to the significant investment in their development. Unauthorized use and illegal distribution of these models pose serious threats to intellectual property, necessitating effective copyright protection measures. Model watermarking has emerged as a key technique to address this issue, embedding ownership information within models to assert rightful ownership during copyright disputes. This paper presents several contributions to model watermarking: a self-authenticating black-box watermarking protocol using hash techniques, a study on evidence forgery attacks using adversarial perturbations, a proposed defense involving a purification step to counter adversarial attacks, and a purification-agnostic curriculum proxy learning method to enhance watermark robustness and model performance. Experimental results demonstrate the effectiveness of these approaches in improving the security, reliability, and performance of watermarked models.
