Table of Contents
Fetching ...

VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models

Xuming Hu, Hanqian Li, Jungang Li, Yu Huang, Shuliang Liu, Qi Zheng, Junhao Chen, Aiwei Liu

TL;DR

VideoMark introduces a distortion-free, robust watermarking framework for diffusion-based video generation by embedding watermarks into per-frame pure pseudorandom Gaussian noise using PRC codes and a fixed watermark sequence with a random start index. A Temporal Matching Module (TMM) based on edit distance aligns decoded messages with the embedded sequence, enabling resilience to temporal attacks such as frame deletion. The approach is training-free, maintains video quality comparable to watermark-free generation, and achieves higher decoding accuracy than prior in-processing and post-processing methods while remaining imperceptible without the secret key. This work enables practical content attribution for diffusion-based video generation with strong invisibility and robustness across attack scenarios.

Abstract

This work introduces \textbf{VideoMark}, a distortion-free robust watermarking framework for video diffusion models. As diffusion models excel in generating realistic videos, reliable content attribution is increasingly critical. However, existing video watermarking methods often introduce distortion by altering the initial distribution of diffusion variables and are vulnerable to temporal attacks, such as frame deletion, due to variable video lengths. VideoMark addresses these challenges by employing a \textbf{pure pseudorandom initialization} to embed watermarks, avoiding distortion while ensuring uniform noise distribution in the latent space to preserve generation quality. To enhance robustness, we adopt a frame-wise watermarking strategy with pseudorandom error correction (PRC) codes, using a fixed watermark sequence with randomly selected starting indices for each video. For watermark extraction, we propose a Temporal Matching Module (TMM) that leverages edit distance to align decoded messages with the original watermark sequence, ensuring resilience against temporal attacks. Experimental results show that VideoMark achieves higher decoding accuracy than existing methods while maintaining video quality comparable to watermark-free generation. The watermark remains imperceptible to attackers without the secret key, offering superior invisibility compared to other frameworks. VideoMark provides a practical, training-free solution for content attribution in diffusion-based video generation. Our code and data are available at \href{https://github.com/KYRIE-LI11/VideoMark}{VideoMark}.

VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models

TL;DR

VideoMark introduces a distortion-free, robust watermarking framework for diffusion-based video generation by embedding watermarks into per-frame pure pseudorandom Gaussian noise using PRC codes and a fixed watermark sequence with a random start index. A Temporal Matching Module (TMM) based on edit distance aligns decoded messages with the embedded sequence, enabling resilience to temporal attacks such as frame deletion. The approach is training-free, maintains video quality comparable to watermark-free generation, and achieves higher decoding accuracy than prior in-processing and post-processing methods while remaining imperceptible without the secret key. This work enables practical content attribution for diffusion-based video generation with strong invisibility and robustness across attack scenarios.

Abstract

This work introduces \textbf{VideoMark}, a distortion-free robust watermarking framework for video diffusion models. As diffusion models excel in generating realistic videos, reliable content attribution is increasingly critical. However, existing video watermarking methods often introduce distortion by altering the initial distribution of diffusion variables and are vulnerable to temporal attacks, such as frame deletion, due to variable video lengths. VideoMark addresses these challenges by employing a \textbf{pure pseudorandom initialization} to embed watermarks, avoiding distortion while ensuring uniform noise distribution in the latent space to preserve generation quality. To enhance robustness, we adopt a frame-wise watermarking strategy with pseudorandom error correction (PRC) codes, using a fixed watermark sequence with randomly selected starting indices for each video. For watermark extraction, we propose a Temporal Matching Module (TMM) that leverages edit distance to align decoded messages with the original watermark sequence, ensuring resilience against temporal attacks. Experimental results show that VideoMark achieves higher decoding accuracy than existing methods while maintaining video quality comparable to watermark-free generation. The watermark remains imperceptible to attackers without the secret key, offering superior invisibility compared to other frameworks. VideoMark provides a practical, training-free solution for content attribution in diffusion-based video generation. Our code and data are available at \href{https://github.com/KYRIE-LI11/VideoMark}{VideoMark}.

Paper Structure

This paper contains 16 sections, 12 equations, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: VideoMark outperforms VideoShield across three key metrics: message length, robustness, and invisibility.
  • Figure 2: The overall framework of VideoMark. During the watermark embedding phase, $\epsilon$ denotes the standard Gaussian noise sampled randomly. In the I2V task, the first video frame prompts the prediction of initial noise during watermark extraction.
  • Figure 3: The binary classification results under different watermarking algorithms.
  • Figure 4: The binary classification results under different watermarking algorithms.
  • Figure 5: The extraction accuracy and robustness of VideoMark against spatial tampering for varying message lengths.
  • ...and 2 more figures