Table of Contents
Fetching ...

Finding needles in a haystack: A Black-Box Approach to Invisible Watermark Detection

Minzhou Pan, Zhenting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu, Xue Lin

TL;DR

This paper addresses the challenge of detecting invisible image watermarks without access to decoding algorithms or labeled examples in a black-box setting. It introduces WaterMark Detection (WMD), a self-supervised detector that uses distribution offsets between a detection dataset and a clean reference, together with an asymmetric loss and iterative pruning to separate watermarked from non-watermarked images. Across multiple datasets and watermarking methods, including post-processing and generative watermarks, WMD achieves high detection performance, with AUC frequently exceeding 0.9 for single watermarking and remaining above 0.7 in more challenging multi-watermark scenarios. The work highlights the method's potential to improve accountability and trust in AI-generated content while acknowledging limitations such as distribution-mismatch sensitivity and hyperparameter tuning, and outlining future directions like domain adaptation.

Abstract

In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting. WMD is capable of detecting arbitrary watermarks within a given reference dataset using a clean non-watermarked dataset as a reference, without relying on specific decoding methods or prior knowledge of the watermarking techniques. We develop WMD using foundations of offset learning, where a clean non-watermarked dataset enables us to isolate the influence of only watermarked samples in the reference dataset. Our comprehensive evaluations demonstrate the effectiveness of WMD, significantly outperforming naive detection methods, which only yield AUC scores around 0.5. In contrast, WMD consistently achieves impressive detection AUC scores, surpassing 0.9 in most single-watermark datasets and exceeding 0.7 in more challenging multi-watermark scenarios across diverse datasets and watermarking methods. As invisible watermarks become increasingly prevalent, while specific decoding techniques remain undisclosed, our approach provides a versatile solution and establishes a path toward increasing accountability, transparency, and trust in our digital visual content.

Finding needles in a haystack: A Black-Box Approach to Invisible Watermark Detection

TL;DR

This paper addresses the challenge of detecting invisible image watermarks without access to decoding algorithms or labeled examples in a black-box setting. It introduces WaterMark Detection (WMD), a self-supervised detector that uses distribution offsets between a detection dataset and a clean reference, together with an asymmetric loss and iterative pruning to separate watermarked from non-watermarked images. Across multiple datasets and watermarking methods, including post-processing and generative watermarks, WMD achieves high detection performance, with AUC frequently exceeding 0.9 for single watermarking and remaining above 0.7 in more challenging multi-watermark scenarios. The work highlights the method's potential to improve accountability and trust in AI-generated content while acknowledging limitations such as distribution-mismatch sensitivity and hyperparameter tuning, and outlining future directions like domain adaptation.

Abstract

In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting. WMD is capable of detecting arbitrary watermarks within a given reference dataset using a clean non-watermarked dataset as a reference, without relying on specific decoding methods or prior knowledge of the watermarking techniques. We develop WMD using foundations of offset learning, where a clean non-watermarked dataset enables us to isolate the influence of only watermarked samples in the reference dataset. Our comprehensive evaluations demonstrate the effectiveness of WMD, significantly outperforming naive detection methods, which only yield AUC scores around 0.5. In contrast, WMD consistently achieves impressive detection AUC scores, surpassing 0.9 in most single-watermark datasets and exceeding 0.7 in more challenging multi-watermark scenarios across diverse datasets and watermarking methods. As invisible watermarks become increasingly prevalent, while specific decoding techniques remain undisclosed, our approach provides a versatile solution and establishes a path toward increasing accountability, transparency, and trust in our digital visual content.
Paper Structure (26 sections, 9 equations, 5 figures, 10 tables, 1 algorithm)

This paper contains 26 sections, 9 equations, 5 figures, 10 tables, 1 algorithm.

Figures (5)

  • Figure 1: Detecting invisible watermarked in a given dataset. Due to the invisibility of watermarks, human inspection and existing anomaly detection methods fail to distinguish watermarked images from clean ones within a dataset. To address this challenge, we propose $\textsc{WMD}$ as the first invisible watermark detection capable of accurately identifying invisible watermarked samples in the black-box setting, where there is no need for prior knowledge of the watermarking techniques or decoding methods.
  • Figure 2: Illustration of the Iterative Pruning process. As the number of pruning iterations increases, the detection dataset is gradually condensed by removing HTML]d9d9d9clean samples while retaining most of the HTML]9fc5e8watermarked samples.
  • Figure 3: Impact of pruning rate on watermark detection and training overheads. (a) Detection performance measured by AUC decreases as the pruning rate increases, with higher pruning removing more watermarked images during training. (b)Number of pruned watermarked images increases with higher pruning rates throughout the training process. (c) Time overheads for training increase substantially with higher pruning rates.
  • Figure 4: Watermark detection performance (measured by AUC) with varying sizes of the clean reference dataset and different numbers of clean samples used during training. Larger reference datasets and more clean training samples generally lead to better detection performance, with diminishing returns after a certain point.
  • Figure 5: Visual examples of original images and their watermarked counterparts using different watermarking methods. The top row shows the original images. The PSNR values are provided for each post-processing watermarked image, lower PSNR indicating the higher level of distortion introduced by the watermarking process.