Table of Contents
Fetching ...

Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?

Pei Yang, Hai Ci, Yiren Song, Mike Zheng Shou

TL;DR

This work categorizes watermarking algorithms into content-adaptive and content-agnostic ones, and demonstrates how averaging a collection of watermarked images could reveal the underlying watermark pattern and proposes security guidelines calling for using content-adaptive watermarking strategies and performing security evaluation against steganalysis.

Abstract

Digital watermarking techniques are crucial for copyright protection and source identification of images, especially in the era of generative AI models. However, many existing watermarking methods, particularly content-agnostic approaches that embed fixed patterns regardless of image content, are vulnerable to steganalysis attacks that can extract and remove the watermark with minimal perceptual distortion. In this work, we categorize watermarking algorithms into content-adaptive and content-agnostic ones, and demonstrate how averaging a collection of watermarked images could reveal the underlying watermark pattern. We then leverage this extracted pattern for effective watermark removal under both graybox and blackbox settings, even when the collection contains multiple watermark patterns. For some algorithms like Tree-Ring watermarks, the extracted pattern can also forge convincing watermarks on clean images. Our quantitative and qualitative evaluations across twelve watermarking methods highlight the threat posed by steganalysis to content-agnostic watermarks and the importance of designing watermarking techniques resilient to such analytical attacks. We propose security guidelines calling for using content-adaptive watermarking strategies and performing security evaluation against steganalysis. We also suggest multi-key assignments as potential mitigations against steganalysis vulnerabilities.

Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?

TL;DR

This work categorizes watermarking algorithms into content-adaptive and content-agnostic ones, and demonstrates how averaging a collection of watermarked images could reveal the underlying watermark pattern and proposes security guidelines calling for using content-adaptive watermarking strategies and performing security evaluation against steganalysis.

Abstract

Digital watermarking techniques are crucial for copyright protection and source identification of images, especially in the era of generative AI models. However, many existing watermarking methods, particularly content-agnostic approaches that embed fixed patterns regardless of image content, are vulnerable to steganalysis attacks that can extract and remove the watermark with minimal perceptual distortion. In this work, we categorize watermarking algorithms into content-adaptive and content-agnostic ones, and demonstrate how averaging a collection of watermarked images could reveal the underlying watermark pattern. We then leverage this extracted pattern for effective watermark removal under both graybox and blackbox settings, even when the collection contains multiple watermark patterns. For some algorithms like Tree-Ring watermarks, the extracted pattern can also forge convincing watermarks on clean images. Our quantitative and qualitative evaluations across twelve watermarking methods highlight the threat posed by steganalysis to content-agnostic watermarks and the importance of designing watermarking techniques resilient to such analytical attacks. We propose security guidelines calling for using content-adaptive watermarking strategies and performing security evaluation against steganalysis. We also suggest multi-key assignments as potential mitigations against steganalysis vulnerabilities.
Paper Structure (39 sections, 2 equations, 14 figures, 11 tables)

This paper contains 39 sections, 2 equations, 14 figures, 11 tables.

Figures (14)

  • Figure 1: Watermark pattern extraction, removal and forgery under the Simple Linear Assumption. Two groups of paired (graybox) or unpaired (blackbox) images are first averaged and then subtracted to reveal the watermark pattern. The pattern extracted is then used for watermark removal/forgery.
  • Figure 2: Performance of watermark detectors under steganalysis-based removal. Performance metrics include AUC (watermark verification AUC) and Bit Acc (bit accuracy of decoded watermark information). The plots also illustrate the corresponding PSNR as a measure of image quality degradation. The left/right columns show content-agnostic/content-adaptive methods, respectively. NR denotes the case without removal, reflecting the decoder's inherent performance.
  • Figure 3: Visualization of Tree-Ring-extracted watermark patterns. Top: Pattern extracted from DDIM-inverted latents without subtracting $x_\varnothing$. The first and second row are Fourier transform pairs. Bottom: Pattern extracted in image space under graybox and blackbox settings, akin to Figure \ref{['fig:pattern_vis_more_comprehensive']}.
  • Figure 4: Tree-Ring detection AUC against quality metrics for steganalysis-based removals ($n=5000$) and image distortions. Steganalysis-based removals (blue) cluster bottom-left, indicating effective watermark removal with comparatively low quality degradation.
  • Figure 5: Histograms of distance to reference watermarking pattern for Tree-Ring watermark removal (top) and forgery (bottom). For removal, averaging more images pushes the watermark-removed images (green) away from the true watermarked images (orange). For forgery, oppositely, this increases the similarity of forged images (green) to true watermarked images (orange). Red dashed lines are thresholds $\tau$ at 1% FPR.
  • ...and 9 more figures