LVMark: Robust Watermark for Latent Video Diffusion Models

MinHyuk Jang; Youngdong Jang; JaeHyeok Lee; Feng Yang; Gyeongrok Oh; Jongheon Jeong; Sangpil Kim

LVMark: Robust Watermark for Latent Video Diffusion Models

MinHyuk Jang, Youngdong Jang, JaeHyeok Lee, Feng Yang, Gyeongrok Oh, Jongheon Jeong, Sangpil Kim

TL;DR

LVMark tackles ownership protection for video diffusion models by embedding imperceptible watermarks that remain robust under video distortions and model attacks. It fuses low-frequency information from a 3D discrete wavelet transform with RGB video features using cross-attention, and embeds watermarks by selectively modulating a subset of latent-decoder weights. A distortion layer and a composite training loss balance visual quality and bit accuracy, achieving up to 512-bit capacity with robust decoding. Empirically, LVMark outperforms existing approaches in temporal consistency and robustness, enabling reliable ownership tracking without compromising video fidelity.

Abstract

Rapid advancements in video diffusion models have enabled the creation of realistic videos, raising concerns about unauthorized use and driving the demand for techniques to protect model ownership. Existing watermarking methods, while effective for image diffusion models, do not account for temporal consistency, leading to degraded video quality and reduced robustness against video distortions. To address this issue, we introduce LVMark, a novel watermarking method for video diffusion models. We propose a new watermark decoder tailored for generated videos by learning the consistency between adjacent frames. It ensures accurate message decoding, even under malicious attacks, by combining the low-frequency components of the 3D wavelet domain with the RGB features of the video. Additionally, our approach minimizes video quality degradation by embedding watermark messages in layers with minimal impact on visual appearance using an importance-based weight modulation strategy. We optimize both the watermark decoder and the latent decoder of diffusion model, effectively balancing the trade-off between visual quality and bit accuracy. Our experiments show that our method embeds invisible watermarks into video diffusion models, ensuring robust decoding accuracy with 512-bit capacity, even under video distortions.

LVMark: Robust Watermark for Latent Video Diffusion Models

TL;DR

Abstract

LVMark: Robust Watermark for Latent Video Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)