Table of Contents
Fetching ...

UnMarker: A Universal Attack on Defensive Image Watermarking

Andre Kassis, Urs Hengartner

TL;DR

UnMarker is the first practical attack on semantic watermarks, which have been deemed the future of defensive watermarking and shows that defensive watermarking is not a viable defense against deepfakes.

Abstract

Reports regarding the misuse of Generative AI (GenAI) to create deepfakes are frequent. Defensive watermarking enables GenAI providers to hide fingerprints in their images and use them later for deepfake detection. Yet, its potential has not been fully explored. We present UnMarker -- the first practical universal attack on defensive watermarking. Unlike existing attacks, UnMarker requires no detector feedback, no unrealistic knowledge of the watermarking scheme or similar models, and no advanced denoising pipelines that may not be available. Instead, being the product of an in-depth analysis of the watermarking paradigm revealing that robust schemes must construct their watermarks in the spectral amplitudes, UnMarker employs two novel adversarial optimizations to disrupt the spectra of watermarked images, erasing the watermarks. Evaluations against SOTA schemes prove UnMarker's effectiveness. It not only defeats traditional schemes while retaining superior quality compared to existing attacks but also breaks semantic watermarks that alter an image's structure, reducing the best detection rate to $43\%$ and rendering them useless. To our knowledge, UnMarker is the first practical attack on semantic watermarks, which have been deemed the future of defensive watermarking. Our findings show that defensive watermarking is not a viable defense against deepfakes, and we urge the community to explore alternatives.

UnMarker: A Universal Attack on Defensive Image Watermarking

TL;DR

UnMarker is the first practical attack on semantic watermarks, which have been deemed the future of defensive watermarking and shows that defensive watermarking is not a viable defense against deepfakes.

Abstract

Reports regarding the misuse of Generative AI (GenAI) to create deepfakes are frequent. Defensive watermarking enables GenAI providers to hide fingerprints in their images and use them later for deepfake detection. Yet, its potential has not been fully explored. We present UnMarker -- the first practical universal attack on defensive watermarking. Unlike existing attacks, UnMarker requires no detector feedback, no unrealistic knowledge of the watermarking scheme or similar models, and no advanced denoising pipelines that may not be available. Instead, being the product of an in-depth analysis of the watermarking paradigm revealing that robust schemes must construct their watermarks in the spectral amplitudes, UnMarker employs two novel adversarial optimizations to disrupt the spectra of watermarked images, erasing the watermarks. Evaluations against SOTA schemes prove UnMarker's effectiveness. It not only defeats traditional schemes while retaining superior quality compared to existing attacks but also breaks semantic watermarks that alter an image's structure, reducing the best detection rate to and rendering them useless. To our knowledge, UnMarker is the first practical attack on semantic watermarks, which have been deemed the future of defensive watermarking. Our findings show that defensive watermarking is not a viable defense against deepfakes, and we urge the community to explore alternatives.
Paper Structure (34 sections, 8 equations, 4 figures, 6 tables)

This paper contains 34 sections, 8 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Spectral analysis of two images. Both depict worldly objects of high consistency. Thus, the collective spectral magnitudes are distributed similarly as the low frequencies (center) corresponding to gradual variations are always far more dominant. Phases determine the spatial arrangement of the pixel value shifts that constitute these magnitudes to shape the content, making them extremely different.
  • Figure 2: Images watermarked by StableSignature--- non-semantic (top), and StegaStamp--- semantic (bottom). The rightmost figures display the differences between the original and watermarked images that correspond to the changes encoding the watermarks. StableSignature's modifications are restricted to existing (high-frequency) edges such as wrinkles, hair, mustache, and intersections of multiple components. StegaStamp's watermark is distributed across the image. The magnified area shows how it manipulates the consistency (texture), injecting gradual (low-frequency) changes that manifest as wrinkles at this location.
  • Figure 3: Operation of known filters.
  • Figure 4: Outputs of the three removal attacks. The VAEAttack over-smooths images, resulting in them losing crucial information and occasionally looking cartoonish. Compared to the DiffusionAttack, UnMarker's outputs better resemble the watermarked images. The DiffusionAttack also introduces semantic incoherences such as additional teeth in the case of PTW and omits identifying details such as freckles on the forehead for StegaStamp. PTW's watermarked image itself is of inferior quality with an unnatural patch. The DiffusionAttack retains this artifact while UnMarker eliminates it, enhancing the image.