END$^2$: Robust Dual-Decoder Watermarking Framework Against Non-Differentiable Distortions
Nan Sun, Han Fang, Yuxing Lu, Chengxin Zhao, Hefei Ling
TL;DR
END2 addresses the challenge of non-differentiable distortions in DNN-based image watermarking by introducing a dual-decoder framework that routes gradients through a Teacher Decoder while a Student Decoder learns to handle distorted inputs. It achieves latent-space alignment on a hypersphere via cosine similarity and employs swapping learning plus momentum updating to maintain consistent representations across decoders. Empirically, END2 outperforms state-of-the-art methods under non-differentiable distortions and remains competitive with differentiable-noise pipelines, including strong performance on real JPEG and black-box style-transfer distortions. This approach enhances practicality and generalizability of watermarking in real-world scenarios with unknown distortion mechanisms.
Abstract
DNN-based watermarking methods have rapidly advanced, with the ``Encoder-Noise Layer-Decoder'' (END) framework being the most widely used. To ensure end-to-end training, the noise layer in the framework must be differentiable. However, real-world distortions are often non-differentiable, leading to challenges in end-to-end training. Existing solutions only treat the distortion perturbation as additive noise, which does not fully integrate the effect of distortion in training. To better incorporate non-differentiable distortions into training, we propose a novel dual-decoder architecture (END$^2$). Unlike conventional END architecture, our method employs two structurally identical decoders: the Teacher Decoder, processing pure watermarked images, and the Student Decoder, handling distortion-perturbed images. The gradient is backpropagated only through the Teacher Decoder branch to optimize the encoder thus bypassing the problem of non-differentiability. To ensure resistance to arbitrary distortions, we enforce alignment of the two decoders' feature representations by maximizing the cosine similarity between their intermediate vectors on a hypersphere. Extensive experiments demonstrate that our scheme outperforms state-of-the-art algorithms under various non-differentiable distortions. Moreover, even without the differentiability constraint, our method surpasses baselines with a differentiable noise layer. Our approach is effective and easily implementable across all END architectures, enhancing practicality and generalizability.
