Table of Contents
Fetching ...

END$^2$: Robust Dual-Decoder Watermarking Framework Against Non-Differentiable Distortions

Nan Sun, Han Fang, Yuxing Lu, Chengxin Zhao, Hefei Ling

TL;DR

END2 addresses the challenge of non-differentiable distortions in DNN-based image watermarking by introducing a dual-decoder framework that routes gradients through a Teacher Decoder while a Student Decoder learns to handle distorted inputs. It achieves latent-space alignment on a hypersphere via cosine similarity and employs swapping learning plus momentum updating to maintain consistent representations across decoders. Empirically, END2 outperforms state-of-the-art methods under non-differentiable distortions and remains competitive with differentiable-noise pipelines, including strong performance on real JPEG and black-box style-transfer distortions. This approach enhances practicality and generalizability of watermarking in real-world scenarios with unknown distortion mechanisms.

Abstract

DNN-based watermarking methods have rapidly advanced, with the ``Encoder-Noise Layer-Decoder'' (END) framework being the most widely used. To ensure end-to-end training, the noise layer in the framework must be differentiable. However, real-world distortions are often non-differentiable, leading to challenges in end-to-end training. Existing solutions only treat the distortion perturbation as additive noise, which does not fully integrate the effect of distortion in training. To better incorporate non-differentiable distortions into training, we propose a novel dual-decoder architecture (END$^2$). Unlike conventional END architecture, our method employs two structurally identical decoders: the Teacher Decoder, processing pure watermarked images, and the Student Decoder, handling distortion-perturbed images. The gradient is backpropagated only through the Teacher Decoder branch to optimize the encoder thus bypassing the problem of non-differentiability. To ensure resistance to arbitrary distortions, we enforce alignment of the two decoders' feature representations by maximizing the cosine similarity between their intermediate vectors on a hypersphere. Extensive experiments demonstrate that our scheme outperforms state-of-the-art algorithms under various non-differentiable distortions. Moreover, even without the differentiability constraint, our method surpasses baselines with a differentiable noise layer. Our approach is effective and easily implementable across all END architectures, enhancing practicality and generalizability.

END$^2$: Robust Dual-Decoder Watermarking Framework Against Non-Differentiable Distortions

TL;DR

END2 addresses the challenge of non-differentiable distortions in DNN-based image watermarking by introducing a dual-decoder framework that routes gradients through a Teacher Decoder while a Student Decoder learns to handle distorted inputs. It achieves latent-space alignment on a hypersphere via cosine similarity and employs swapping learning plus momentum updating to maintain consistent representations across decoders. Empirically, END2 outperforms state-of-the-art methods under non-differentiable distortions and remains competitive with differentiable-noise pipelines, including strong performance on real JPEG and black-box style-transfer distortions. This approach enhances practicality and generalizability of watermarking in real-world scenarios with unknown distortion mechanisms.

Abstract

DNN-based watermarking methods have rapidly advanced, with the ``Encoder-Noise Layer-Decoder'' (END) framework being the most widely used. To ensure end-to-end training, the noise layer in the framework must be differentiable. However, real-world distortions are often non-differentiable, leading to challenges in end-to-end training. Existing solutions only treat the distortion perturbation as additive noise, which does not fully integrate the effect of distortion in training. To better incorporate non-differentiable distortions into training, we propose a novel dual-decoder architecture (END). Unlike conventional END architecture, our method employs two structurally identical decoders: the Teacher Decoder, processing pure watermarked images, and the Student Decoder, handling distortion-perturbed images. The gradient is backpropagated only through the Teacher Decoder branch to optimize the encoder thus bypassing the problem of non-differentiability. To ensure resistance to arbitrary distortions, we enforce alignment of the two decoders' feature representations by maximizing the cosine similarity between their intermediate vectors on a hypersphere. Extensive experiments demonstrate that our scheme outperforms state-of-the-art algorithms under various non-differentiable distortions. Moreover, even without the differentiability constraint, our method surpasses baselines with a differentiable noise layer. Our approach is effective and easily implementable across all END architectures, enhancing practicality and generalizability.

Paper Structure

This paper contains 16 sections, 11 equations, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: The structures for different methods. (a) The standard END structure, which requires a differentiable noise layer to maintain the joint optimization of the model. (b) Our proposed END2 structure. Green arrows represent the direction of propagation of the gradient in backpropagation and the red arrow represents that the process is gradient-free.
  • Figure 2: Results of the invisibility and robustness of our model under real JPEG compression and four style transfer distortions. The second row depicts the original image, while the third row shows the embedded watermarked image. The fourth row illustrates the watermarked image after being subjected to non-differentiable distortion. The final row shows the residual image, which is the difference between watermarked images and original images, magnified by a factor of $10$ for enhanced visibility.
  • Figure 3: The performance of different models, in terms of average decoding accuracy, under four style transfer distortions is evaluated as PSNR varies.