Investigating Memorization in Video Diffusion Models

Chen Chen; Enhuai Liu; Daochang Liu; Mubarak Shah; Chang Xu

Investigating Memorization in Video Diffusion Models

Chen Chen, Enhuai Liu, Daochang Liu, Mubarak Shah, Chang Xu

TL;DR

This work addresses the privacy risk of memorization in video diffusion models (VDMs) by formulating disentangled definitions for content memorization and motion memorization, and by introducing dedicated metrics to quantify each type. It develops Generalized SSCD (GSSCD) for frame-level content memorization and Optical Flow Similarity (OFS-k) for motion memorization, augmented with Natural Motion Filtering (NMF) to discount natural motions. A systematic, prompt-driven analysis across open-source VDMs on WebVid-10M reveals widespread memorization of training data, including image backbones, highlighting privacy risks even in open-source models. The authors propose inference-time detection strategies that leverage content and motion magnitudes to efficiently flag memorized outputs, offering a practical foundation for privacy-preserving VDM deployment and future improvements.

Abstract

Diffusion models, widely used for image and video generation, face a significant limitation: the risk of memorizing and reproducing training data during inference, potentially generating unauthorized copyrighted content. While prior research has focused on image diffusion models (IDMs), video diffusion models (VDMs) remain underexplored. To address this gap, we first formally define the two types of memorization in VDMs (content memorization and motion memorization) in a practical way that focuses on privacy preservation and applies to all generation types. We then introduce new metrics specifically designed to separately assess content and motion memorization in VDMs. Additionally, we curate a dataset of text prompts that are most prone to triggering memorization when used as conditioning in VDMs. By leveraging these prompts, we generate diverse videos from various open-source VDMs, successfully extracting numerous training videos from each tested model. Through the application of our proposed metrics, we systematically analyze memorization across various pretrained VDMs, including text-conditional and unconditional models, on a variety of datasets. Our comprehensive study reveals that memorization is widespread across all tested VDMs, indicating that VDMs can also memorize image training data in addition to video datasets. Finally, we propose efficient and effective detection strategies for both content and motion memorization, offering a foundational approach for improving privacy in VDMs.

Investigating Memorization in Video Diffusion Models

TL;DR

Abstract

Investigating Memorization in Video Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)