Video Unlearning via Low-Rank Refusal Vector
Simone Facchiano, Stefano Saravalle, Matteo Migliarini, Edoardo De Matteis, Alessio Sampieri, Andrea Pilzer, Emanuele Rodolà, Indro Spinelli, Luca Franco, Fabio Galasso
TL;DR
This paper tackles the risk of unsafe content in text-conditioned video diffusion by introducing a training-free, permanent unlearning method. It derives a low-rank refusal vector $\mathbf{r}^l$ from five safe/unsafe prompt pairs and injects a closed-form weight update, refined via contrastive PCA to minimize collateral forgetting. The approach, applicable to both text and image conditioning in video models, is implemented as weight edits in targeted layers and validated on Open-Sora and ZeroScopeT2V across T2VSafetyBench and SafeSora, achieving substantial reductions in unsafe generations while preserving FVD and MM-Notox scores. Compared with filtering or fine-tuning baselines, this method offers a scalable, data-efficient, and inference-light solution for safe video generation with permanent unlearning. The work also demonstrates robustness through cPCA refinements and multi-vector unlearning, enabling practical deployment in open-weight video generation systems.
Abstract
Video generative models achieve high-quality synthesis from natural-language prompts by leveraging large-scale web data. However, this training paradigm inherently exposes them to unsafe biases and harmful concepts, introducing the risk of generating undesirable or illicit content. To mitigate unsafe generations, existing machine unlearning approaches either rely on filtering, and can therefore be bypassed, or they update model weights, but with costly fine-tuning or training-free closed-form edits. We propose the first training-free weight update framework for concept removal in video diffusion models. From five paired safe/unsafe prompts, our method estimates a refusal vector and integrates it into the model weights as a closed-form update. A contrastive low-rank factorization further disentangles the target concept from unrelated semantics, it ensures a selective concept suppression and it does not harm generation quality. Our approach reduces unsafe generations on the Open-Sora and ZeroScopeT2V models across the T2VSafetyBench and SafeSora benchmarks, with average reductions of 36.3% and 58.2% respectively, while preserving prompt alignment and video quality. This establishes an efficient and scalable solution for safe video generation without retraining nor any inference overhead. Project page: https://www.pinlab.org/video-unlearning.
