UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models

Segyu Lee; Boryeong Cho; Hojung Jung; Seokhyun An; Juhyeong Kim; Jaehyun Kwak; Yongjin Yang; Sangwon Jang; Youngrok Park; Wonjun Chang; Se-Young Yun

UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models

Segyu Lee, Boryeong Cho, Hojung Jung, Seokhyun An, Juhyeong Kim, Jaehyun Kwak, Yongjin Yang, Sangwon Jang, Youngrok Park, Wonjun Chang, Se-Young Yun

Abstract

Unified Multimodal Models (UMMs) offer powerful cross-modality capabilities but introduce new safety risks not observed in single-task models. Despite their emergence, existing safety benchmarks remain fragmented across tasks and modalities, limiting the comprehensive evaluation of complex system-level vulnerabilities. To address this gap, we introduce UniSAFE, the first comprehensive benchmark for system-level safety evaluation of UMMs across 7 I/O modality combinations, spanning conventional tasks and novel multimodal-context image generation settings. UniSAFE is built with a shared-target design that projects common risk scenarios across task-specific I/O configurations, enabling controlled cross-task comparisons of safety failures. Comprising 6,802 curated instances, we use UniSAFE to evaluate 15 state-of-the-art UMMs, both proprietary and open-source. Our results reveal critical vulnerabilities across current UMMs, including elevated safety violations in multi-image composition and multi-turn settings, with image-output tasks consistently more vulnerable than text-output tasks. These findings highlight the need for stronger system-level safety alignment for UMMs. Our code and data are publicly available at https://github.com/segyulee/UniSAFE

UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models

Abstract

Paper Structure (107 sections, 13 equations, 29 figures, 10 tables)

This paper contains 107 sections, 13 equations, 29 figures, 10 tables.

Introduction
Related Works
Unified Multimodal Models.
Multimodal safety benchmarks.
UniSAFE: a comprehensive safety benchmark for unified models
Unified tasks
Tasks based on I/O modalities.
Safety taxonomy.
Data construction pipeline
Step 1: Extracting unsafe triggers.
Step 2: Constructing target description.
Step 3: Scenario generation for each task.
Shared risk scenario.
Curation by human experts.
Data statistics
...and 92 more sections

Figures (29)

Figure 1: Examples of outputs generated by UniSAFE. Our benchmark consists of risk scenarios centered on a common target across 7 distinct task types, enabling evaluation across diverse risk settings.
Figure 2: Overview of the UniSAFE three-step data construction pipeline: (1) collect unsafe triggers across threat categories, (2) expand them into contextual target descriptions, and (3) instantiate shared, multimodal task-specific risk scenarios for safety evaluation of UMMs.
Figure 3: Taxonomy of safety categories for image and text modalities.
Figure 4: Refusal Rates for commercial UMMs across different tasks. Refusal Rates are further decomposed into system-level Refusal Rates and model-level Refusal Rates.
Figure 5: Safety risk across tasks and modalities in commercial UMMs. For GPT-5, Gemini-2.5, and Qwen-image, the bars show the proportions of test samples that produce harmful content (moderate- and high-risk) across 7 task types. Image-output tasks (text-to-image, image editing, image composition, multi-turn) consistently exhibit higher harmful content rates than text-output tasks (text-to-text, image-to-text, multimodal understanding), revealing strong modality-dependent bias in safety alignment.
...and 24 more figures

Theorems & Definitions (3)

Definition 3.1: Characterizing unified tasks
Definition 3.2: Self-Awareness Score (SAS)
Definition B.1: Generalized Multi-Turn and Multi-Modal Task

UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models

Abstract

UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models

Authors

Abstract

Table of Contents

Figures (29)

Theorems & Definitions (3)