Table of Contents
Fetching ...

RunawayEvil: Jailbreaking the Image-to-Video Generative Models

Songping Wang, Rufan Qian, Yueming Lyu, Qinglong Liu, Linzhuang Zou, Jie Qin, Songhua Liu, Caifeng Shan

TL;DR

This work tackles the vulnerability of Image-to-Video models to Jailbreak attacks by introducing RunawayEvil, a self-evolving multimodal framework built on a Strategy-Tactic-Action architecture. It combines a Strategy-Aware Command Unit, a memory-augmented Multimodal Tactical Planning Unit, and a Tactical Action Unit to generate, execute, and continually improve coordinated text and image attacks aimed at bypassing safety filters. Through RL-based strategy customization, LLM-driven strategy exploration, and memory-enabled tactical planning, it achieves state-of-the-art attack success across multiple I2V models and benchmarks, highlighting cross-modal weaknesses and the need for robust defenses. The results underscore the importance of cross-modal coordination and dynamic strategy adaptation in security analyses of emerging multimodal video-generation systems.

Abstract

Image-to-Video (I2V) generation synthesizes dynamic visual content from image and text inputs, providing significant creative control. However, the security of such multimodal systems, particularly their vulnerability to jailbreak attacks, remains critically underexplored. To bridge this gap, we propose RunawayEvil, the first multimodal jailbreak framework for I2V models with dynamic evolutionary capability. Built on a "Strategy-Tactic-Action" paradigm, our framework exhibits self-amplifying attack through three core components: (1) Strategy-Aware Command Unit that enables the attack to self-evolve its strategies through reinforcement learning-driven strategy customization and LLM-based strategy exploration; (2) Multimodal Tactical Planning Unit that generates coordinated text jailbreak instructions and image tampering guidelines based on the selected strategies; (3) Tactical Action Unit that executes and evaluates the multimodal coordinated attacks. This self-evolving architecture allows the framework to continuously adapt and intensify its attack strategies without human intervention. Extensive experiments demonstrate RunawayEvil achieves state-of-the-art attack success rates on commercial I2V models, such as Open-Sora 2.0 and CogVideoX. Specifically, RunawayEvil outperforms existing methods by 58.5 to 79 percent on COCO2017. This work provides a critical tool for vulnerability analysis of I2V models, thereby laying a foundation for more robust video generation systems.

RunawayEvil: Jailbreaking the Image-to-Video Generative Models

TL;DR

This work tackles the vulnerability of Image-to-Video models to Jailbreak attacks by introducing RunawayEvil, a self-evolving multimodal framework built on a Strategy-Tactic-Action architecture. It combines a Strategy-Aware Command Unit, a memory-augmented Multimodal Tactical Planning Unit, and a Tactical Action Unit to generate, execute, and continually improve coordinated text and image attacks aimed at bypassing safety filters. Through RL-based strategy customization, LLM-driven strategy exploration, and memory-enabled tactical planning, it achieves state-of-the-art attack success across multiple I2V models and benchmarks, highlighting cross-modal weaknesses and the need for robust defenses. The results underscore the importance of cross-modal coordination and dynamic strategy adaptation in security analyses of emerging multimodal video-generation systems.

Abstract

Image-to-Video (I2V) generation synthesizes dynamic visual content from image and text inputs, providing significant creative control. However, the security of such multimodal systems, particularly their vulnerability to jailbreak attacks, remains critically underexplored. To bridge this gap, we propose RunawayEvil, the first multimodal jailbreak framework for I2V models with dynamic evolutionary capability. Built on a "Strategy-Tactic-Action" paradigm, our framework exhibits self-amplifying attack through three core components: (1) Strategy-Aware Command Unit that enables the attack to self-evolve its strategies through reinforcement learning-driven strategy customization and LLM-based strategy exploration; (2) Multimodal Tactical Planning Unit that generates coordinated text jailbreak instructions and image tampering guidelines based on the selected strategies; (3) Tactical Action Unit that executes and evaluates the multimodal coordinated attacks. This self-evolving architecture allows the framework to continuously adapt and intensify its attack strategies without human intervention. Extensive experiments demonstrate RunawayEvil achieves state-of-the-art attack success rates on commercial I2V models, such as Open-Sora 2.0 and CogVideoX. Specifically, RunawayEvil outperforms existing methods by 58.5 to 79 percent on COCO2017. This work provides a critical tool for vulnerability analysis of I2V models, thereby laying a foundation for more robust video generation systems.

Paper Structure

This paper contains 22 sections, 10 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Visualization of successful jailbreaks using RunawayEvil, which unleashes the full potential of multimodal jailbreaks.
  • Figure 2: RunawayEvil's multimodal jailbreak framework. Built upon "Strategy-Tactics-Action" paradigm, RunawayEvil achieves adaptive attacks against I2V models through closed-loop collaboration among three core modules. (A) MTPU receives input pairs and strategic guidance from SACU to generate collaborative attack instructions; (B) TAU launches the multimodal jailbreak attack and feeds the result back to the SACU via a safety evaluator; (C) SACU leverages the strategy exploration agent to mine experience for enriching strategies while the strategy customization agent tailors the optimal strategy for the input. These three modules form a dynamic iterative closed loop, efficiently bypassing the cross-modal security defense mechanisms of I2V models.
  • Figure 3: The self-evolutionary framework of SACU. The Strategy Exploration Agent mines experience from the Strategy Memory Bank to generate new strategies, updating the strategy library. The Strategy Customization Agent, driven by reinforcement learning, learns to select the optimal strategy. This framework breaks through rigid attack pattern and enhances the flexibility and adaptability of attacks.
  • Figure 4: Visualization of video jailbreaking performance using different methods.
  • Figure 5: ASR varies with the number of iterations under different safety evaluators. Left figure: Qwen-VL as the safety evaluator; Right figure: LLaVA-Next as the safety evaluator. The ASR against the four I2V models consistently increases with the number of iterations.
  • ...and 1 more figures