SoK: The Security-Safety Continuum of Multimodal Foundation Models through Information Flow and Global Game-Theoretic Analysis of Asymmetric Threats
Ruoxi Sun, Jiamin Chang, Hammond Pearce, Chaowei Xiao, Bo Li, Qi Wu, Surya Nepal, Minhui Xue
TL;DR
This paper addresses the intertwined safety and security challenges of multimodal foundation models by introducing an information-theoretic SoK that maps information flow to channel concepts. It develops a deterministic minimax defense framework and a Defense Coverage Index (DCI) to evaluate 15 defenses against a broad taxonomy of model- and system-level threats, framed around six information flows. The study shows that system-level bandwidth constraints and architectural compartmentalization provide more general and robust protection than model-only defenses, and it formalizes a self-destructive circuit-breaker as a last-resort safeguard. Overall, the work establishes principled foundations for analyzing MFM vulnerabilities and guiding future defenses, highlighting the need for cross-cutting, architecture-aware protections in high-stakes deployments.
Abstract
Multimodal foundation models (MFMs) integrate diverse data modalities to support complex and wide-ranging tasks. However, this integration also introduces distinct safety and security challenges. In this paper, we unify the concepts of safety and security in the context of MFMs by identifying critical threats that arise from both model behavior and system-level interactions. We propose a taxonomy grounded in information theory, evaluating risks through the concepts of channel capacity, signal, noise, and bandwidth. This perspective provides a principled way to analyze how information flows through MFMs and how vulnerabilities can emerge across modalities. Building on this foundation, we introduce a deterministic minimax formulation to analyze defense mechanisms and expose structural vulnerabilities in multimodal systems. Our framework projects attacks onto the noise, signal, and bandwidth axes, collapsing the defense search space and mitigating defender asymmetry. Across 15 defenses, we find that system-level bandwidth and behavior constraints generalize substantially better than brittle model-only methods. Finally, we formalize an MFM "self-destruction threshold" that specifies when termination should be triggered, providing a concrete activation rule for circuit-breaker safeguards within multimodal systems.
