Sim-to-Real: An Unsupervised Noise Layer for Screen-Camera Watermarking Robustness
Yufeng Wu, Xin Liao, Baowei Wang, Han Fang, Xiaoshuai Wu, Mingyue Chen, Guiling Wang
TL;DR
This work tackles unauthorized screen-captured watermark leakage by enhancing SC-resistant watermarking robustness. It introduces Simulation-to-Real (S2R), an unsupervised noise layer that bridges simulated SC noise and real SC noise via a two-stage process: a mathematical model $T$ to create a certain-domain noise and an unpaired image-to-image map $G$ to align this noise to the real domain, yielding $y^u = G(T(x^s))$ and $F_\mathcal{U}(\cdot) = T * G$. A theoretical feasibility proof shows that the complex SC noise distribution can be decomposed into a multiplicative/additive form and approximated by a learned bias $k_\delta, n_\delta$, simplifying the distribution alignment task. The framework combines a differentiable noise model with adversarial and perceptual losses to train $G$, enabling robust, content-preserving noise refinement without paired data. Experimental results demonstrate that S2R outperforms state-of-the-art methods in watermark robustness and image quality across diverse devices, distances, and viewpoints, and offers scalable plug-and-play integration with different noise models and resolutions. This approach provides a practical, generalizable path toward real-world SC watermarking protections with reduced data requirements and improved generalization.
Abstract
Unauthorized screen capturing and dissemination pose severe security threats such as data leakage and information theft. Several studies propose robust watermarking methods to track the copyright of Screen-Camera (SC) images, facilitating post-hoc certification against infringement. These techniques typically employ heuristic mathematical modeling or supervised neural network fitting as the noise layer, to enhance watermarking robustness against SC. However, both strategies cannot fundamentally achieve an effective approximation of SC noise. Mathematical simulation suffers from biased approximations due to the incomplete decomposition of the noise and the absence of interdependence among the noise components. Supervised networks require paired data to train the noise-fitting model, and it is difficult for the model to learn all the features of the noise. To address the above issues, we propose Simulation-to-Real (S2R). Specifically, an unsupervised noise layer employs unpaired data to learn the discrepancy between the modeled simulated noise distribution and the real-world SC noise distribution, rather than directly learning the mapping from sharp images to real-world images. Learning this transformation from simulation to reality is inherently simpler, as it primarily involves bridging the gap in noise distributions, instead of the complex task of reconstructing fine-grained image details. Extensive experimental results validate the efficacy of the proposed method, demonstrating superior watermark robustness and generalization compared to state-of-the-art methods.
