ComMark: Covert and Robust Black-Box Model Watermarking with Compressed Samples
Yunfei Yang, Xiaojun Chen, Zhendong Zhao, Yu Zhou, Xiaoyan Gu, Juan Cao
TL;DR
ComMark introduces a covert, robust black-box watermarking framework that embeds triggers via frequency-domain compression inspired by JPEG quantization. It constructs watermark samples globally in the frequency domain, trains with simulated attacks and a similarity loss to strengthen feature-space alignment, and verifies ownership purely through black-box queries. Extensive experiments across vision tasks show high watermark success with minimal accuracy loss and strong robustness to extraction, removal, and evasion attacks, along with excellent covertness. The method scales to audio, video, and text, underscoring broad practical applicability for IP protection of deep models.
Abstract
The rapid advancement of deep learning has turned models into highly valuable assets due to their reliance on massive data and costly training processes. However, these models are increasingly vulnerable to leakage and theft, highlighting the critical need for robust intellectual property protection. Model watermarking has emerged as an effective solution, with black-box watermarking gaining significant attention for its practicality and flexibility. Nonetheless, existing black-box methods often fail to better balance covertness (hiding the watermark to prevent detection and forgery) and robustness (ensuring the watermark resists removal)-two essential properties for real-world copyright verification. In this paper, we propose ComMark, a novel black-box model watermarking framework that leverages frequency-domain transformations to generate compressed, covert, and attack-resistant watermark samples by filtering out high-frequency information. To further enhance watermark robustness, our method incorporates simulated attack scenarios and a similarity loss during training. Comprehensive evaluations across diverse datasets and architectures demonstrate that ComMark achieves state-of-the-art performance in both covertness and robustness. Furthermore, we extend its applicability beyond image recognition to tasks including speech recognition, sentiment analysis, image generation, image captioning, and video recognition, underscoring its versatility and broad applicability.
