JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Haibo Jin, Leyang Hu, Xinnuo Li, Peiyan Zhang, Chonghan Chen, Jun Zhuang, Haohan Wang
TL;DR
This survey provides a unified framework for jailbreaking LLMs and VLMs, categorizing attack methods into seven types and surveying corresponding defenses. It links textual and multimodal security perspectives, highlighting evaluation gaps and suggesting future directions for robust, aligned, and secure AI systems. The study catalogs detailed attack and defense taxonomies, analyzes existing benchmarks, and identifies cross-modal vulnerabilities requiring coordinated defenses. The results offer a roadmap for researchers and practitioners to enhance safety and reliability in next-generation language models.
Abstract
The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignment. This survey provides an extensive review of the emerging field of jailbreaking--deliberately circumventing the ethical and operational boundaries of LLMs and VLMs--and the consequent development of defense mechanisms. Our study categorizes jailbreaks into seven distinct types and elaborates on defense strategies that address these vulnerabilities. Through this comprehensive examination, we identify research gaps and propose directions for future studies to enhance the security frameworks of LLMs and VLMs. Our findings underscore the necessity for a unified perspective that integrates both jailbreak strategies and defensive solutions to foster a robust, secure, and reliable environment for the next generation of language models. More details can be found on our website: https://chonghan-chen.com/llm-jailbreak-zoo-survey/.
