Table of Contents
Fetching ...

Video Unlearning via Low-Rank Refusal Vector

Simone Facchiano, Stefano Saravalle, Matteo Migliarini, Edoardo De Matteis, Alessio Sampieri, Andrea Pilzer, Emanuele Rodolà, Indro Spinelli, Luca Franco, Fabio Galasso

TL;DR

This paper tackles the risk of unsafe content in text-conditioned video diffusion by introducing a training-free, permanent unlearning method. It derives a low-rank refusal vector $\mathbf{r}^l$ from five safe/unsafe prompt pairs and injects a closed-form weight update, refined via contrastive PCA to minimize collateral forgetting. The approach, applicable to both text and image conditioning in video models, is implemented as weight edits in targeted layers and validated on Open-Sora and ZeroScopeT2V across T2VSafetyBench and SafeSora, achieving substantial reductions in unsafe generations while preserving FVD and MM-Notox scores. Compared with filtering or fine-tuning baselines, this method offers a scalable, data-efficient, and inference-light solution for safe video generation with permanent unlearning. The work also demonstrates robustness through cPCA refinements and multi-vector unlearning, enabling practical deployment in open-weight video generation systems.

Abstract

Video generative models achieve high-quality synthesis from natural-language prompts by leveraging large-scale web data. However, this training paradigm inherently exposes them to unsafe biases and harmful concepts, introducing the risk of generating undesirable or illicit content. To mitigate unsafe generations, existing machine unlearning approaches either rely on filtering, and can therefore be bypassed, or they update model weights, but with costly fine-tuning or training-free closed-form edits. We propose the first training-free weight update framework for concept removal in video diffusion models. From five paired safe/unsafe prompts, our method estimates a refusal vector and integrates it into the model weights as a closed-form update. A contrastive low-rank factorization further disentangles the target concept from unrelated semantics, it ensures a selective concept suppression and it does not harm generation quality. Our approach reduces unsafe generations on the Open-Sora and ZeroScopeT2V models across the T2VSafetyBench and SafeSora benchmarks, with average reductions of 36.3% and 58.2% respectively, while preserving prompt alignment and video quality. This establishes an efficient and scalable solution for safe video generation without retraining nor any inference overhead. Project page: https://www.pinlab.org/video-unlearning.

Video Unlearning via Low-Rank Refusal Vector

TL;DR

This paper tackles the risk of unsafe content in text-conditioned video diffusion by introducing a training-free, permanent unlearning method. It derives a low-rank refusal vector from five safe/unsafe prompt pairs and injects a closed-form weight update, refined via contrastive PCA to minimize collateral forgetting. The approach, applicable to both text and image conditioning in video models, is implemented as weight edits in targeted layers and validated on Open-Sora and ZeroScopeT2V across T2VSafetyBench and SafeSora, achieving substantial reductions in unsafe generations while preserving FVD and MM-Notox scores. Compared with filtering or fine-tuning baselines, this method offers a scalable, data-efficient, and inference-light solution for safe video generation with permanent unlearning. The work also demonstrates robustness through cPCA refinements and multi-vector unlearning, enabling practical deployment in open-weight video generation systems.

Abstract

Video generative models achieve high-quality synthesis from natural-language prompts by leveraging large-scale web data. However, this training paradigm inherently exposes them to unsafe biases and harmful concepts, introducing the risk of generating undesirable or illicit content. To mitigate unsafe generations, existing machine unlearning approaches either rely on filtering, and can therefore be bypassed, or they update model weights, but with costly fine-tuning or training-free closed-form edits. We propose the first training-free weight update framework for concept removal in video diffusion models. From five paired safe/unsafe prompts, our method estimates a refusal vector and integrates it into the model weights as a closed-form update. A contrastive low-rank factorization further disentangles the target concept from unrelated semantics, it ensures a selective concept suppression and it does not harm generation quality. Our approach reduces unsafe generations on the Open-Sora and ZeroScopeT2V models across the T2VSafetyBench and SafeSora benchmarks, with average reductions of 36.3% and 58.2% respectively, while preserving prompt alignment and video quality. This establishes an efficient and scalable solution for safe video generation without retraining nor any inference overhead. Project page: https://www.pinlab.org/video-unlearning.

Paper Structure

This paper contains 31 sections, 17 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 1: From five input pairs, our method derives a low-rank refusal vector to suppress unwanted concepts (e.g., logos, nudity). Our method preserves video quality and the capability to generate all other concepts without retraining or degrading model capabilities.
  • Figure 2: Qualitative results for the five unsafe categories of T2VSafetyBench. Top: original uncensored frames. Bottom: corrected outputs with our method.
  • Figure 3: On the left, the figure illustrates the behavior of the censorship rate as a function of the cPCA rank (see Eq. \ref{['eq:fwd_proj']}). On the right, the figure shows a decreasing trend in the censorship rate as the value of $\lambda$ increases (Eq. \ref{['eq:projection_pca']}).
  • Figure 4: Decreasing quality over increasing lambda values
  • Figure 5: Qualitative results for the Copyright and Trademarks class. The top row shows uncensored video frames, while the bottom row shows corrected versions with our method.
  • ...and 7 more figures