Video Signature: Implicit Watermarking for Video Diffusion Models

Yu Huang; Junhao Chen; Shuliang Liu; Hanqian Li; Jungang Li; Qi Zheng; Aiwei Liu; Yi R. Fung; Xuming Hu

Video Signature: Implicit Watermarking for Video Diffusion Models

Yu Huang, Junhao Chen, Shuliang Liu, Hanqian Li, Jungang Li, Qi Zheng, Aiwei Liu, Yi R. Fung, Xuming Hu

TL;DR

VidSig introduces implicit watermarking for video diffusion models by fine-tuning a subset of the latent decoder to embed multibit watermarks during video generation. It combines Perturbation-Aware Suppression (PAS) to pre-identify perceptually sensitive layers and a Temporal Alignment (TA) module to enforce inter-frame coherence, achieving high watermark extraction accuracy with minimal perceptual loss. The method outperforms post-generation baselines and naively extended image-based in-generation approaches in both watermark reliability and video quality, while also offering low latency and robust tamper resistance, including across different frame counts and resolutions and transferability to new models. Practically, VidSig provides a scalable, plug-in solution for ownership verification and provenance tracking of AI-generated videos in real-world deployment.

Abstract

The rapid development of Artificial Intelligence Generated Content (AIGC) has led to significant progress in video generation, but also raises serious concerns about intellectual property protection and reliable content tracing. Watermarking is a widely adopted solution to this issue, yet existing methods for video generation mainly follow a post-generation paradigm, which often fails to effectively balance the trade-off between video quality and watermark extraction. Meanwhile, current in-generation methods that embed the watermark into the initial Gaussian noise usually incur substantial additional computation. To address these issues, we propose \textbf{Video Signature} (\textsc{VidSig}), an implicit watermarking method for video diffusion models that enables imperceptible and adaptive watermark integration during video generation with almost no extra latency. Specifically, we partially fine-tune the latent decoder, where \textbf{Perturbation-Aware Suppression} (PAS) pre-identifies and freezes perceptually sensitive layers to preserve visual quality. Beyond spatial fidelity, we further enhance temporal consistency by introducing a lightweight \textbf{Temporal Alignment} module that guides the decoder to generate coherent frame sequences during fine-tuning. Experimental results show that \textsc{VidSig} achieves the best trade-off among watermark extraction accuracy, video quality, and watermark latency. It also demonstrates strong robustness against both spatial and temporal tamper, and remains stable across different video lengths and resolutions, highlighting its practicality in real-world scenarios.

Video Signature: Implicit Watermarking for Video Diffusion Models

TL;DR

Abstract

Video Signature: Implicit Watermarking for Video Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)