VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models
Xuming Hu, Hanqian Li, Jungang Li, Yu Huang, Shuliang Liu, Qi Zheng, Junhao Chen, Aiwei Liu
TL;DR
VideoMark introduces a distortion-free, robust watermarking framework for diffusion-based video generation by embedding watermarks into per-frame pure pseudorandom Gaussian noise using PRC codes and a fixed watermark sequence with a random start index. A Temporal Matching Module (TMM) based on edit distance aligns decoded messages with the embedded sequence, enabling resilience to temporal attacks such as frame deletion. The approach is training-free, maintains video quality comparable to watermark-free generation, and achieves higher decoding accuracy than prior in-processing and post-processing methods while remaining imperceptible without the secret key. This work enables practical content attribution for diffusion-based video generation with strong invisibility and robustness across attack scenarios.
Abstract
This work introduces \textbf{VideoMark}, a distortion-free robust watermarking framework for video diffusion models. As diffusion models excel in generating realistic videos, reliable content attribution is increasingly critical. However, existing video watermarking methods often introduce distortion by altering the initial distribution of diffusion variables and are vulnerable to temporal attacks, such as frame deletion, due to variable video lengths. VideoMark addresses these challenges by employing a \textbf{pure pseudorandom initialization} to embed watermarks, avoiding distortion while ensuring uniform noise distribution in the latent space to preserve generation quality. To enhance robustness, we adopt a frame-wise watermarking strategy with pseudorandom error correction (PRC) codes, using a fixed watermark sequence with randomly selected starting indices for each video. For watermark extraction, we propose a Temporal Matching Module (TMM) that leverages edit distance to align decoded messages with the original watermark sequence, ensuring resilience against temporal attacks. Experimental results show that VideoMark achieves higher decoding accuracy than existing methods while maintaining video quality comparable to watermark-free generation. The watermark remains imperceptible to attackers without the secret key, offering superior invisibility compared to other frameworks. VideoMark provides a practical, training-free solution for content attribution in diffusion-based video generation. Our code and data are available at \href{https://github.com/KYRIE-LI11/VideoMark}{VideoMark}.
