Table of Contents
Fetching ...

Learning Generalizable and Efficient Image Watermarking via Hierarchical Two-Stage Optimization

Ke Liu, Xuanhan Wang, Qilong Zhang, Lianli Gao, Jingkuan Song

TL;DR

The paper tackles the challenge of learning based image watermarking that is simultaneously invisible, robust, and broadly applicable. It introduces Hierarchical Watermark Learning (HiWL), a two stage framework consisting of distribution alignment to fuse watermark messages with cover images in a latent space, and generalized watermark representation learning via RGB residuals to enable one shot embedding across diverse images. Empirical results show HiWL achieves about 7.6% higher watermark extraction accuracy than prior methods and can process 1000 images in 1 second, while maintaining high invisibility (PSNR around $37.86$ dB, SSIM around $0.969$) and strong robustness under 18 distortion types and cross domain transfers. This two stage design provides a scalable, low latency solution for practical watermarking with broad applicability across datasets and transformation scenarios.

Abstract

Deep image watermarking, which refers to enabling imperceptible watermark embedding and reliable extraction in cover images, has been shown to be effective for copyright protection of image assets. However, existing methods face limitations in simultaneously satisfying three essential criteria for generalizable watermarking: (1) invisibility (imperceptible hiding of watermarks), (2) robustness (reliable watermark recovery under diverse conditions), and (3) broad applicability (low latency in the watermarking process). To address these limitations, we propose a Hierarchical Watermark Learning (HiWL) framework, a two-stage optimization that enables a watermarking model to simultaneously achieve all three criteria. In the first stage, distribution alignment learning is designed to establish a common latent space with two constraints: (1) visual consistency between watermarked and non-watermarked images, and (2) information invariance across watermark latent representations. In this way, multimodal inputs -- including watermark messages (binary codes) and cover images (RGB pixels) -- can be effectively represented, ensuring both the invisibility of watermarks and robustness in the watermarking process. In the second stage, we employ generalized watermark representation learning to separate a unique representation of the watermark from the marked image in RGB space. Once trained, the HiWL model effectively learns generalizable watermark representations while maintaining broad applicability. Extensive experiments demonstrate the effectiveness of the proposed method. Specifically, it achieves 7.6% higher accuracy in watermark extraction compared to existing methods, while maintaining extremely low latency (processing 1000 images in 1 second).

Learning Generalizable and Efficient Image Watermarking via Hierarchical Two-Stage Optimization

TL;DR

The paper tackles the challenge of learning based image watermarking that is simultaneously invisible, robust, and broadly applicable. It introduces Hierarchical Watermark Learning (HiWL), a two stage framework consisting of distribution alignment to fuse watermark messages with cover images in a latent space, and generalized watermark representation learning via RGB residuals to enable one shot embedding across diverse images. Empirical results show HiWL achieves about 7.6% higher watermark extraction accuracy than prior methods and can process 1000 images in 1 second, while maintaining high invisibility (PSNR around dB, SSIM around ) and strong robustness under 18 distortion types and cross domain transfers. This two stage design provides a scalable, low latency solution for practical watermarking with broad applicability across datasets and transformation scenarios.

Abstract

Deep image watermarking, which refers to enabling imperceptible watermark embedding and reliable extraction in cover images, has been shown to be effective for copyright protection of image assets. However, existing methods face limitations in simultaneously satisfying three essential criteria for generalizable watermarking: (1) invisibility (imperceptible hiding of watermarks), (2) robustness (reliable watermark recovery under diverse conditions), and (3) broad applicability (low latency in the watermarking process). To address these limitations, we propose a Hierarchical Watermark Learning (HiWL) framework, a two-stage optimization that enables a watermarking model to simultaneously achieve all three criteria. In the first stage, distribution alignment learning is designed to establish a common latent space with two constraints: (1) visual consistency between watermarked and non-watermarked images, and (2) information invariance across watermark latent representations. In this way, multimodal inputs -- including watermark messages (binary codes) and cover images (RGB pixels) -- can be effectively represented, ensuring both the invisibility of watermarks and robustness in the watermarking process. In the second stage, we employ generalized watermark representation learning to separate a unique representation of the watermark from the marked image in RGB space. Once trained, the HiWL model effectively learns generalizable watermark representations while maintaining broad applicability. Extensive experiments demonstrate the effectiveness of the proposed method. Specifically, it achieves 7.6% higher accuracy in watermark extraction compared to existing methods, while maintaining extremely low latency (processing 1000 images in 1 second).

Paper Structure

This paper contains 16 sections, 10 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Deep watermarking paradigms including the latent-based and the single-shot. Existing methods do not satisfy invisibility, robustness, and broad applicability simultaneously.
  • Figure 2: Visualization of generated watermark images, which are separately produced by UDH udh, MuST must and Ours. vanilla indicates the cover image.
  • Figure 3: The overview of proposed HiWL in the training phase (left) and inference phase (right). During training, it involves a two-stage optimization. In the first stage, the image reconstruction loss $\mathcal{L}_I$, adversarial loss $\mathcal{L}_A$, and message reconstruction loss $\mathcal{L}_M$ are jointly used for multiple alignment between cover image and watermark messages. In the second stage, multi-image adaptation is designed to learn generalized RGB watermarks.
  • Figure 4: Visualization of gradient heatmaps in the decoder $\mathcal{F}_{de}(\cdot)$.
  • Figure 5: The structure diagram of the proposed method (HiWL), which consists of an encoder, a decoder, and a discriminator.
  • ...and 6 more figures