Table of Contents
Fetching ...

T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models

Yibo Miao, Yifan Zhu, Yinpeng Dong, Lijia Yu, Jun Zhu, Xiao-Shan Gao

TL;DR

T2VSafetyBench introduces a comprehensive safety benchmark for text-to-video models by defining 12 safety aspects, constructing a 4,400-prompt malicious dataset from real prompts, GPT-4 generation, and jailbreak attacks, and evaluating safety with GPT-4 alongside human judges. The study reveals that no model excels across all aspects, highlights a strong correlation between GPT-4 and human assessments, and identifies a trade-off between model usability and safety, with temporal risk emerging as a critical concern as video-generation capabilities rise. The benchmark informs targeted safety improvements and encourages robust evaluation practices for deploying T2V systems. Overall, it provides a framework for quantitatively and qualitatively assessing the complex, temporal safety risks inherent in video generation.

Abstract

The recent development of Sora leads to a new era in text-to-video (T2V) generation. Along with this comes the rising concern about its security risks. The generated videos may contain illegal or unethical content, and there is a lack of comprehensive quantitative understanding of their safety, posing a challenge to their reliability and practical deployment. Previous evaluations primarily focus on the quality of video generation. While some evaluations of text-to-image models have considered safety, they cover fewer aspects and do not address the unique temporal risk inherent in video generation. To bridge this research gap, we introduce T2VSafetyBench, a new benchmark designed for conducting safety-critical assessments of text-to-video models. We define 12 critical aspects of video generation safety and construct a malicious prompt dataset including real-world prompts, LLM-generated prompts and jailbreak attack-based prompts. Based on our evaluation results, we draw several important findings, including: 1) no single model excels in all aspects, with different models showing various strengths; 2) the correlation between GPT-4 assessments and manual reviews is generally high; 3) there is a trade-off between the usability and safety of text-to-video generative models. This indicates that as the field of video generation rapidly advances, safety risks are set to surge, highlighting the urgency of prioritizing video safety. We hope that T2VSafetyBench can provide insights for better understanding the safety of video generation in the era of generative AI.

T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models

TL;DR

T2VSafetyBench introduces a comprehensive safety benchmark for text-to-video models by defining 12 safety aspects, constructing a 4,400-prompt malicious dataset from real prompts, GPT-4 generation, and jailbreak attacks, and evaluating safety with GPT-4 alongside human judges. The study reveals that no model excels across all aspects, highlights a strong correlation between GPT-4 and human assessments, and identifies a trade-off between model usability and safety, with temporal risk emerging as a critical concern as video-generation capabilities rise. The benchmark informs targeted safety improvements and encourages robust evaluation practices for deploying T2V systems. Overall, it provides a framework for quantitatively and qualitatively assessing the complex, temporal safety risks inherent in video generation.

Abstract

The recent development of Sora leads to a new era in text-to-video (T2V) generation. Along with this comes the rising concern about its security risks. The generated videos may contain illegal or unethical content, and there is a lack of comprehensive quantitative understanding of their safety, posing a challenge to their reliability and practical deployment. Previous evaluations primarily focus on the quality of video generation. While some evaluations of text-to-image models have considered safety, they cover fewer aspects and do not address the unique temporal risk inherent in video generation. To bridge this research gap, we introduce T2VSafetyBench, a new benchmark designed for conducting safety-critical assessments of text-to-video models. We define 12 critical aspects of video generation safety and construct a malicious prompt dataset including real-world prompts, LLM-generated prompts and jailbreak attack-based prompts. Based on our evaluation results, we draw several important findings, including: 1) no single model excels in all aspects, with different models showing various strengths; 2) the correlation between GPT-4 assessments and manual reviews is generally high; 3) there is a trade-off between the usability and safety of text-to-video generative models. This indicates that as the field of video generation rapidly advances, safety risks are set to surge, highlighting the urgency of prioritizing video safety. We hope that T2VSafetyBench can provide insights for better understanding the safety of video generation in the era of generative AI.
Paper Structure (21 sections, 4 equations, 17 figures, 5 tables)

This paper contains 21 sections, 4 equations, 17 figures, 5 tables.

Figures (17)

  • Figure 1: Overview of 12 critical aspects for video generation safety with visual examples. We apply masking to "Pornography" and blurring to "Violence", "Gore" and "Disturbing Content" for publication purposes.
  • Figure 2: We show two examples related to the aspect of Temporal Risk. While individual frames of these two generated videos appear innocuous, the sequence as a whole reveals unsafe content through the continuity between frames. This is a unique security risk for text-to-video models.
  • Figure 3: Example prompt to evaluate the safety of generated video with GPT-4. Human provides a prompt describing the task of video security assessment. Assistant delivers the corresponding output.
  • Figure 4: Visualization of the NSFW rate in GPT-4 evaluation across various aspects. Public Figures, Violence, Illegal Activities, and Misinformation and Falsehoods present higher safety risks. Additionally, no single model excels in all aspects, and different models exhibit distinct strengths.
  • Figure 5: Visualization examples of Pika pika2024, Gen2 esser2023structure, Stable Video Diffusion blattmann2023stable and Open-Sora opensora2024.
  • ...and 12 more figures