RunawayEvil: Jailbreaking the Image-to-Video Generative Models
Songping Wang, Rufan Qian, Yueming Lyu, Qinglong Liu, Linzhuang Zou, Jie Qin, Songhua Liu, Caifeng Shan
TL;DR
This work tackles the vulnerability of Image-to-Video models to Jailbreak attacks by introducing RunawayEvil, a self-evolving multimodal framework built on a Strategy-Tactic-Action architecture. It combines a Strategy-Aware Command Unit, a memory-augmented Multimodal Tactical Planning Unit, and a Tactical Action Unit to generate, execute, and continually improve coordinated text and image attacks aimed at bypassing safety filters. Through RL-based strategy customization, LLM-driven strategy exploration, and memory-enabled tactical planning, it achieves state-of-the-art attack success across multiple I2V models and benchmarks, highlighting cross-modal weaknesses and the need for robust defenses. The results underscore the importance of cross-modal coordination and dynamic strategy adaptation in security analyses of emerging multimodal video-generation systems.
Abstract
Image-to-Video (I2V) generation synthesizes dynamic visual content from image and text inputs, providing significant creative control. However, the security of such multimodal systems, particularly their vulnerability to jailbreak attacks, remains critically underexplored. To bridge this gap, we propose RunawayEvil, the first multimodal jailbreak framework for I2V models with dynamic evolutionary capability. Built on a "Strategy-Tactic-Action" paradigm, our framework exhibits self-amplifying attack through three core components: (1) Strategy-Aware Command Unit that enables the attack to self-evolve its strategies through reinforcement learning-driven strategy customization and LLM-based strategy exploration; (2) Multimodal Tactical Planning Unit that generates coordinated text jailbreak instructions and image tampering guidelines based on the selected strategies; (3) Tactical Action Unit that executes and evaluates the multimodal coordinated attacks. This self-evolving architecture allows the framework to continuously adapt and intensify its attack strategies without human intervention. Extensive experiments demonstrate RunawayEvil achieves state-of-the-art attack success rates on commercial I2V models, such as Open-Sora 2.0 and CogVideoX. Specifically, RunawayEvil outperforms existing methods by 58.5 to 79 percent on COCO2017. This work provides a critical tool for vulnerability analysis of I2V models, thereby laying a foundation for more robust video generation systems.
