Robust Invisible Video Watermarking with Attention
Kevin Alex Zhang, Lei Xu, Alfredo Cuesta-Infante, Kalyan Veeramachaneni
TL;DR
The paper tackles robust, invisible video watermarking by introducing RivaGAN, an end-to-end architecture that uses a per-pixel attention mechanism to embed a D-bit watermark into video frames while jointly trained with a critic and an adversary to ensure video quality and watermark robustness. It augments the encoder–decoder pair with an attention module that guides bit embedding at the pixel level, and employs differentiable noise layers simulating scaling, cropping, and compression to enforce resilience. Across experiments on the Hollywood2 dataset, the method achieves high decoding accuracy with minimal perceptual distortion, outperforming concatenation-based baselines and demonstrating resilience to common video processing operations. These results, along with analyses of bit-level influence and temporal consistency, indicate practical viability for secure, blind watermark recovery, and the authors provide public code for replication.
Abstract
The goal of video watermarking is to embed a message within a video file in a way such that it minimally impacts the viewing experience but can be recovered even if the video is redistributed and modified, allowing media producers to assert ownership over their content. This paper presents RivaGAN, a novel architecture for robust video watermarking which features a custom attention-based mechanism for embedding arbitrary data as well as two independent adversarial networks which critique the video quality and optimize for robustness. Using this technique, we are able to achieve state-of-the-art results in deep learning-based video watermarking and produce watermarked videos which have minimal visual distortion and are robust against common video processing operations.
