Table of Contents
Fetching ...

Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety

Xingjun Ma, Yifeng Gao, Yixu Wang, Ruofan Wang, Xin Wang, Ye Sun, Yifan Ding, Hengyuan Xu, Yunhao Chen, Yunhan Zhao, Hanxun Huang, Yige Li, Yutao Wu, Jiaming Zhang, Xiang Zheng, Yang Bai, Zuxuan Wu, Xipeng Qiu, Jingfeng Zhang, Yiming Li, Xudong Han, Haonan Li, Jun Sun, Cong Wang, Jindong Gu, Baoyuan Wu, Siheng Chen, Tianwei Zhang, Yang Liu, Mingming Gong, Tongliang Liu, Shirui Pan, Cihang Xie, Tianyu Pang, Yinpeng Dong, Ruoxi Jia, Yang Zhang, Shiqing Ma, Xiangyu Zhang, Neil Gong, Chaowei Xiao, Sarah Erfani, Tim Baldwin, Bo Li, Masashi Sugiyama, Dacheng Tao, James Bailey, Yu-Gang Jiang

TL;DR

This survey maps the safety landscape of six families of large models (VFMs, LLMs, VLPs, VLMs, DMs, and Agents), presenting a comprehensive taxonomy of threats (adversarial, backdoor, poisoning, jailbreak, prompt injection, energy-latency, data/model/memory extraction, and agent-specific risks) and corresponding defenses. It systematically reviews attacks and defenses across model types, highlights widely used datasets and benchmarks, and identifies gaps in evaluation, proactive defense, and collaboration. The paper argues for scalable, practical defenses, standardized safety evaluations, and global, cross-domain cooperation to responsibly deploy AI at scale. It also emphasizes the need for proactive safety APIs, open-source platforms, and risk-aware governance to manage evolving threats as models integrate more deeply into society.

Abstract

The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-powered Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.

Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety

TL;DR

This survey maps the safety landscape of six families of large models (VFMs, LLMs, VLPs, VLMs, DMs, and Agents), presenting a comprehensive taxonomy of threats (adversarial, backdoor, poisoning, jailbreak, prompt injection, energy-latency, data/model/memory extraction, and agent-specific risks) and corresponding defenses. It systematically reviews attacks and defenses across model types, highlights widely used datasets and benchmarks, and identifies gaps in evaluation, proactive defense, and collaboration. The paper argues for scalable, practical defenses, standardized safety evaluations, and global, cross-domain cooperation to responsibly deploy AI at scale. It also emphasizes the need for proactive safety APIs, open-source platforms, and risk-aware governance to manage evolving threats as models integrate more deeply into society.

Abstract

The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-powered Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.

Paper Structure

This paper contains 194 sections, 3 figures, 14 tables.

Figures (3)

  • Figure 1: Left: The number of surveyed technical papers on attacks, defenses, and benchmarks/datasets. Middle: Distribution of surveyed technical papers by model type. Right: Distribution of surveyed technical papers by attack and defense type.
  • Figure 2: Left: The quarterly trend in the number of surveyed safety papers across different models; Middle: Proportional distribution of attack and defense studies associated with large models. Right: Annual trend in the number of surveyed safety papers on various attacks and defenses, ordered from most to least studied.
  • Figure 3: A road map of this survey.