Table of Contents
Fetching ...

AI Risk Management Should Incorporate Both Safety and Security

Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal

TL;DR

This work argues that AI risk management must unambiguously integrate both safety and security perspectives, as their objectives and threat models have historically diverged. It introduces a unified reference framework with four dimensions—objectives of protection, threat models, problem framing, and governance—to clarify differences and foster cross-disciplinary collaboration. The authors contrast safety’s probabilistic risk-minimization focus with security’s minimax, adversarial framing, and illustrate their interplay through case studies on malicious use and AI-generated content transparency. The result is a concrete taxonomy and practical guidance intended to support holistic evaluation, governance, and risk mitigation for frontier AI systems, with implications for policy and industry standards.

Abstract

The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this paper, we advocate that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security, and unambiguously take into account the perspectives of both disciplines in order to devise mostly effective and holistic risk mitigation approaches. Unfortunately, this vision is often obfuscated, as the definitions of the basic concepts of "safety" and "security" themselves are often inconsistent and lack consensus across communities. With AI risk management being increasingly cross-disciplinary, this issue is particularly salient. In light of this conceptual challenge, we introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security, aiming to facilitate a shared understanding and effective collaboration across communities.

AI Risk Management Should Incorporate Both Safety and Security

TL;DR

This work argues that AI risk management must unambiguously integrate both safety and security perspectives, as their objectives and threat models have historically diverged. It introduces a unified reference framework with four dimensions—objectives of protection, threat models, problem framing, and governance—to clarify differences and foster cross-disciplinary collaboration. The authors contrast safety’s probabilistic risk-minimization focus with security’s minimax, adversarial framing, and illustrate their interplay through case studies on malicious use and AI-generated content transparency. The result is a concrete taxonomy and practical guidance intended to support holistic evaluation, governance, and risk mitigation for frontier AI systems, with implications for policy and industry standards.

Abstract

The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this paper, we advocate that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security, and unambiguously take into account the perspectives of both disciplines in order to devise mostly effective and holistic risk mitigation approaches. Unfortunately, this vision is often obfuscated, as the definitions of the basic concepts of "safety" and "security" themselves are often inconsistent and lack consensus across communities. With AI risk management being increasingly cross-disciplinary, this issue is particularly salient. In light of this conceptual challenge, we introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security, aiming to facilitate a shared understanding and effective collaboration across communities.
Paper Structure (37 sections, 1 figure)

This paper contains 37 sections, 1 figure.

Figures (1)

  • Figure 1: We present a reference framework to systematically examine the differing considerations that underpin AI safety and AI security, and discuss their interplay. We elaborate on four dimensions: objectives of protection (Section \ref{['subsec:diff-objectives-of-protection']}), threat models (Section \ref{['subsec:diff-nauture-of-risks']}), problem framing (Section \ref{['subsec:diff-problem-framing']}), and governance and liability (Section \ref{['subsec:diff-governance-structure']}).