Table of Contents
Fetching ...

Invisible Watermarks: Attacks and Robustness

Dongjun Hwang, Sungwon Woo, Tom Gao, Raymond Luo, Sunghwan Baek

TL;DR

The paper tackles the robustness of invisible image watermarks for differentiating real vs. AI-generated content. It introduces a remover network to enable stacking of two watermarking methods (Tree-Ring and StegaStamp) without mutual interference, and a GradCAM-guided Localized Blurring Attack (LBA) to attack watermark regions while preserving image quality. Across MS-COCO validation images, the remover improves detection after stacking and LBA achieves better image quality than uniform blur, though regeneration remains highly effective at watermark removal with less perceptual degradation. These findings inform practical watermark design and suggest future work on broader attack-defense combinations and decoder-access considerations to limit targeted attacks.

Abstract

As Generative AI continues to become more accessible, the case for robust detection of generated images in order to combat misinformation is stronger than ever. Invisible watermarking methods act as identifiers of generated content, embedding image- and latent-space messages that are robust to many forms of perturbations. The majority of current research investigates full-image attacks against images with a single watermarking method applied. We introduce novel improvements to watermarking robustness as well as minimizing degradation on image quality during attack. Firstly, we examine the application of both image-space and latent-space watermarking methods on a single image, where we propose a custom watermark remover network which preserves one of the watermarking modalities while completely removing the other during decoding. Then, we investigate localized blurring attacks (LBA) on watermarked images based on the GradCAM heatmap acquired from the watermark decoder in order to reduce the amount of degradation to the target image. Our evaluation suggests that 1) implementing the watermark remover model to preserve one of the watermark modalities when decoding the other modality slightly improves on the baseline performance, and that 2) LBA degrades the image significantly less compared to uniform blurring of the entire image. Code is available at: https://github.com/tomputer-g/IDL_WAR

Invisible Watermarks: Attacks and Robustness

TL;DR

The paper tackles the robustness of invisible image watermarks for differentiating real vs. AI-generated content. It introduces a remover network to enable stacking of two watermarking methods (Tree-Ring and StegaStamp) without mutual interference, and a GradCAM-guided Localized Blurring Attack (LBA) to attack watermark regions while preserving image quality. Across MS-COCO validation images, the remover improves detection after stacking and LBA achieves better image quality than uniform blur, though regeneration remains highly effective at watermark removal with less perceptual degradation. These findings inform practical watermark design and suggest future work on broader attack-defense combinations and decoder-access considerations to limit targeted attacks.

Abstract

As Generative AI continues to become more accessible, the case for robust detection of generated images in order to combat misinformation is stronger than ever. Invisible watermarking methods act as identifiers of generated content, embedding image- and latent-space messages that are robust to many forms of perturbations. The majority of current research investigates full-image attacks against images with a single watermarking method applied. We introduce novel improvements to watermarking robustness as well as minimizing degradation on image quality during attack. Firstly, we examine the application of both image-space and latent-space watermarking methods on a single image, where we propose a custom watermark remover network which preserves one of the watermarking modalities while completely removing the other during decoding. Then, we investigate localized blurring attacks (LBA) on watermarked images based on the GradCAM heatmap acquired from the watermark decoder in order to reduce the amount of degradation to the target image. Our evaluation suggests that 1) implementing the watermark remover model to preserve one of the watermark modalities when decoding the other modality slightly improves on the baseline performance, and that 2) LBA degrades the image significantly less compared to uniform blurring of the entire image. Code is available at: https://github.com/tomputer-g/IDL_WAR

Paper Structure

This paper contains 30 sections, 5 equations, 7 figures, 12 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of the naive stacking of Tree-ring and StegaStamp watermarking pipeline.
  • Figure 2: Overview of the stacking of the modified Tree-ring and StegaStamp watermarking pipeline with the remover network.
  • Figure 3: Loss plot when training the stegastamp remover network.
  • Figure 4: Example of StegaStamp residuals on a watermarked image. The residuals are mostly applied around the kayak and the two occupants. Image taken from stegastamp.
  • Figure 5: Localized Blurring Attack pipeline.
  • ...and 2 more figures