Table of Contents
Fetching ...

Pixel Seal: Adversarial-only training for invisible image and video watermarking

Tomáš Souček, Pierre Fernandez, Hady Elsahar, Sylvestre-Alvise Rebuffi, Valeriu Lacatusu, Tuan Tran, Tom Sander, Alexandre Mourachko

TL;DR

Pixel Seal eliminates reliance on perceptual losses and stabilizes watermark learning with an adversarial-only framework and a three-stage schedule, augmented by high-resolution adaptation to close the realism gap. It achieves state-of-the-art robustness and imperceptibility for image watermarking and enables fast, scalable video watermarking via temporal pooling without retraining. Extensive experiments and ablations validate the approach across diverse transformations and high-resolution content, demonstrating practical provenance solutions for real-world image and video content. Release of model weights further supports deployment and research in robust, invisible watermarking.

Abstract

Invisible watermarking is essential for tracing the provenance of digital content. However, training state-of-the-art models remains notoriously difficult, with current approaches often struggling to balance robustness against true imperceptibility. This work introduces Pixel Seal, which sets a new state-of-the-art for image and video watermarking. We first identify three fundamental issues of existing methods: (i) the reliance on proxy perceptual losses such as MSE and LPIPS that fail to mimic human perception and result in visible watermark artifacts; (ii) the optimization instability caused by conflicting objectives, which necessitates exhaustive hyperparameter tuning; and (iii) reduced robustness and imperceptibility of watermarks when scaling models to high-resolution images and videos. To overcome these issues, we first propose an adversarial-only training paradigm that eliminates unreliable pixel-wise imperceptibility losses. Second, we introduce a three-stage training schedule that stabilizes convergence by decoupling robustness and imperceptibility. Third, we address the resolution gap via high-resolution adaptation, employing JND-based attenuation and training-time inference simulation to eliminate upscaling artifacts. We thoroughly evaluate the robustness and imperceptibility of Pixel Seal on different image types and across a wide range of transformations, and show clear improvements over the state-of-the-art. We finally demonstrate that the model efficiently adapts to video via temporal watermark pooling, positioning Pixel Seal as a practical and scalable solution for reliable provenance in real-world image and video settings.

Pixel Seal: Adversarial-only training for invisible image and video watermarking

TL;DR

Pixel Seal eliminates reliance on perceptual losses and stabilizes watermark learning with an adversarial-only framework and a three-stage schedule, augmented by high-resolution adaptation to close the realism gap. It achieves state-of-the-art robustness and imperceptibility for image watermarking and enables fast, scalable video watermarking via temporal pooling without retraining. Extensive experiments and ablations validate the approach across diverse transformations and high-resolution content, demonstrating practical provenance solutions for real-world image and video content. Release of model weights further supports deployment and research in robust, invisible watermarking.

Abstract

Invisible watermarking is essential for tracing the provenance of digital content. However, training state-of-the-art models remains notoriously difficult, with current approaches often struggling to balance robustness against true imperceptibility. This work introduces Pixel Seal, which sets a new state-of-the-art for image and video watermarking. We first identify three fundamental issues of existing methods: (i) the reliance on proxy perceptual losses such as MSE and LPIPS that fail to mimic human perception and result in visible watermark artifacts; (ii) the optimization instability caused by conflicting objectives, which necessitates exhaustive hyperparameter tuning; and (iii) reduced robustness and imperceptibility of watermarks when scaling models to high-resolution images and videos. To overcome these issues, we first propose an adversarial-only training paradigm that eliminates unreliable pixel-wise imperceptibility losses. Second, we introduce a three-stage training schedule that stabilizes convergence by decoupling robustness and imperceptibility. Third, we address the resolution gap via high-resolution adaptation, employing JND-based attenuation and training-time inference simulation to eliminate upscaling artifacts. We thoroughly evaluate the robustness and imperceptibility of Pixel Seal on different image types and across a wide range of transformations, and show clear improvements over the state-of-the-art. We finally demonstrate that the model efficiently adapts to video via temporal watermark pooling, positioning Pixel Seal as a practical and scalable solution for reliable provenance in real-world image and video settings.

Paper Structure

This paper contains 15 sections, 10 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Imperceptibility and robustness of multi-bit image watermarking methods. The Pixel Seal family of models sets new state-of-the-art results for multi-bit image watermarking, both in watermark imperceptibility and robustness. The figure shows average values across 1000 test images generated by Meta AI. The robustness is measured for a combined attack of brightness change (0.5), crop (50% area), and JPEG compression (quality 40). Imperceptibility of Video Seal 0.0 is heavily skewed due to its small but very visible artifacts. Each Pixel Seal model is trained with a different watermark boosting factor $\beta$ (see Section \ref{['sec:method:adv']} for more details).
  • Figure 2: Our training setup. Pixel Seal is trained using only adversarial and message loss, which results in highly imperceptible watermarks. The message loss is backpropagated through the simulated attack (image augmentation) to ensure the embedded watermarks are robust to common user edits.
  • Figure 3: Training and inference of Pixel Seal on high-resolution images. The input image is first resized to the model resolution (256$\times$256) and the raw watermark is computed using the Pixel Seal embedder. Then, this watermark is resized to the original input resolution and pixel-wise multiplied by the Just-Noticeable Difference (JND) map to obtain the final high-resolution watermark.
  • Figure 4: The embedder with temporal watermark pooling enabled. During inference, a temporal average pooling and un-pooling layer is inserted into the embedder. This modification results in a significant speedup for video watermarking, with no impact on imperceptibility or robustness.
  • Figure 5: Comparison with related work on an AI-generated image. We show both the watermarked image (top) and the predicted watermark brightened for clarity (bottom). Many related methods leave visible artifacts in areas with a single color. In contrast, Pixel Seal does not leave visible artifacts in such areas while being more robust to various transformations. More examples are available in the appendix.
  • ...and 4 more figures