Table of Contents
Fetching ...

TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models

Jiaming He, Guanyu Hou, Hongwei Li, Zhicong Huang, Kangjie Chen, Yi Yu, Wenbo Jiang, Guowen Xu, Tianwei Zhang

TL;DR

TEAR addresses a critical safety gap in Text-to-Video generation by focusing on vulnerabilities introduced by temporal dynamics. It introduces a temporal-aware automated red-teaming framework with a generator-refine loop and online preference learning to craft prompts that are textually safe but induce unsafe videos, validated through extensive experiments on open-source and commercial T2V systems. The approach achieves over $80\%$ attack success rates, significantly outperforming baselines around $57\%$, and demonstrates strong cross-model transferability of problematic prompts, exposing temporal safety gaps in commercial APIs. TEAR provides a scalable auditing tool for developers to proactively identify and mitigate temporal safety risks in T2V models, contributing to safer deployment of diffusion-based video generation technologies.

Abstract

Text-to-Video (T2V) models are capable of synthesizing high-quality, temporally coherent dynamic video content, but the diverse generation also inherently introduces critical safety challenges. Existing safety evaluation methods,which focus on static image and text generation, are insufficient to capture the complex temporal dynamics in video generation. To address this, we propose a TEmporal-aware Automated Red-teaming framework, named TEAR, an automated framework designed to uncover safety risks specifically linked to the dynamic temporal sequencing of T2V models. TEAR employs a temporal-aware test generator optimized via a two-stage approach: initial generator training and temporal-aware online preference learning, to craft textually innocuous prompts that exploit temporal dynamics to elicit policy-violating video output. And a refine model is adopted to improve the prompt stealthiness and adversarial effectiveness cyclically. Extensive experimental evaluation demonstrates the effectiveness of TEAR across open-source and commercial T2V systems with over 80% attack success rate, a significant boost from prior best result of 57%.

TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models

TL;DR

TEAR addresses a critical safety gap in Text-to-Video generation by focusing on vulnerabilities introduced by temporal dynamics. It introduces a temporal-aware automated red-teaming framework with a generator-refine loop and online preference learning to craft prompts that are textually safe but induce unsafe videos, validated through extensive experiments on open-source and commercial T2V systems. The approach achieves over attack success rates, significantly outperforming baselines around , and demonstrates strong cross-model transferability of problematic prompts, exposing temporal safety gaps in commercial APIs. TEAR provides a scalable auditing tool for developers to proactively identify and mitigate temporal safety risks in T2V models, contributing to safer deployment of diffusion-based video generation technologies.

Abstract

Text-to-Video (T2V) models are capable of synthesizing high-quality, temporally coherent dynamic video content, but the diverse generation also inherently introduces critical safety challenges. Existing safety evaluation methods,which focus on static image and text generation, are insufficient to capture the complex temporal dynamics in video generation. To address this, we propose a TEmporal-aware Automated Red-teaming framework, named TEAR, an automated framework designed to uncover safety risks specifically linked to the dynamic temporal sequencing of T2V models. TEAR employs a temporal-aware test generator optimized via a two-stage approach: initial generator training and temporal-aware online preference learning, to craft textually innocuous prompts that exploit temporal dynamics to elicit policy-violating video output. And a refine model is adopted to improve the prompt stealthiness and adversarial effectiveness cyclically. Extensive experimental evaluation demonstrates the effectiveness of TEAR across open-source and commercial T2V systems with over 80% attack success rate, a significant boost from prior best result of 57%.

Paper Structure

This paper contains 35 sections, 6 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Textually safe prompts can generate temporally harmful videos, which is shown below the video frames.
  • Figure 2: Overview of the TEAR framework. Our approach has two phases. (a) Test-case Generator Optimization: A generator is trained in three stages (Dataset Construction, Initial Training, Temporal-aware Optimization) using rule-based construction and temporal-aware rewards ($R_{pmt}$, $R_{con}$) maximization. (b) Red-teaming Test Case Generation: The optimized generator produces a prompt ($P_t$) that aims to be judged as safe by the Prompt Judge System ($\Phi_P(p)=0$) but produce an unsafe video, as caught by the Video Judge System ($\Phi_V(\mathcal{M}(p))=1$). A Refine Model uses this feedback to populate the final red-teaming set ($D_R$).
  • Figure 3: The effectiveness of TEAR on commercial T2V services.
  • Figure 4: The impact of refining rounds on ASR and NSFW Filter Pass Rate.
  • Figure 5: Diversity of prompts generated by TEAR for different categories.
  • ...and 2 more figures