Invisible Watermarks: Attacks and Robustness
Dongjun Hwang, Sungwon Woo, Tom Gao, Raymond Luo, Sunghwan Baek
TL;DR
The paper tackles the robustness of invisible image watermarks for differentiating real vs. AI-generated content. It introduces a remover network to enable stacking of two watermarking methods (Tree-Ring and StegaStamp) without mutual interference, and a GradCAM-guided Localized Blurring Attack (LBA) to attack watermark regions while preserving image quality. Across MS-COCO validation images, the remover improves detection after stacking and LBA achieves better image quality than uniform blur, though regeneration remains highly effective at watermark removal with less perceptual degradation. These findings inform practical watermark design and suggest future work on broader attack-defense combinations and decoder-access considerations to limit targeted attacks.
Abstract
As Generative AI continues to become more accessible, the case for robust detection of generated images in order to combat misinformation is stronger than ever. Invisible watermarking methods act as identifiers of generated content, embedding image- and latent-space messages that are robust to many forms of perturbations. The majority of current research investigates full-image attacks against images with a single watermarking method applied. We introduce novel improvements to watermarking robustness as well as minimizing degradation on image quality during attack. Firstly, we examine the application of both image-space and latent-space watermarking methods on a single image, where we propose a custom watermark remover network which preserves one of the watermarking modalities while completely removing the other during decoding. Then, we investigate localized blurring attacks (LBA) on watermarked images based on the GradCAM heatmap acquired from the watermark decoder in order to reduce the amount of degradation to the target image. Our evaluation suggests that 1) implementing the watermark remover model to preserve one of the watermark modalities when decoding the other modality slightly improves on the baseline performance, and that 2) LBA degrades the image significantly less compared to uniform blurring of the entire image. Code is available at: https://github.com/tomputer-g/IDL_WAR
