Table of Contents
Fetching ...

OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning

Boyu Zhu, Xiaofei Wen, Wenjie Jacky Mo, Tinghui Zhu, Yanan Xie, Peng Qi, Muhao Chen

TL;DR

OmniGuard introduces the first unified omni-modal guardrails with deliberate reasoning to moderate safety across text, image, video, and audio. It builds a large omni-modal safety dataset and uses targeted distillation from expert models, followed by mission-focused instruction tuning, to train OmniGuard-7B and OmniGuard-3B. The approach achieves state-of-the-art or competitive performance across 15 safety benchmarks, with strong cross-modal generalization and interpretable reasoning for safety judgments. The work advances robust, explainable safety moderation for multi-modal AI systems and highlights areas for efficiency improvements and richer cross-modal data. This framework lays groundwork for scalable, policy-grounded, cross-modal safeguarding in next-generation omnimodal models.

Abstract

Omni-modal Large Language Models (OLLMs) that process text, images, videos, and audio introduce new challenges for safety and value guardrails in human-AI interaction. Prior guardrail research largely targets unimodal settings and typically frames safeguarding as binary classification, which limits robustness across diverse modalities and tasks. To address this gap, we propose OmniGuard, the first family of omni-modal guardrails that performs safeguarding across all modalities with deliberate reasoning ability. To support the training of OMNIGUARD, we curate a large, comprehensive omni-modal safety dataset comprising over 210K diverse samples, with inputs that cover all modalities through both unimodal and cross-modal samples. Each sample is annotated with structured safety labels and carefully curated safety critiques from expert models through targeted distillation. Extensive experiments on 15 benchmarks show that OmniGuard achieves strong effectiveness and generalization across a wide range of multimodal safety scenarios. Importantly, OmniGuard provides a unified framework that enforces policies and mitigates risks in omni-modalities, paving the way toward building more robust and capable omnimodal safeguarding systems.

OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning

TL;DR

OmniGuard introduces the first unified omni-modal guardrails with deliberate reasoning to moderate safety across text, image, video, and audio. It builds a large omni-modal safety dataset and uses targeted distillation from expert models, followed by mission-focused instruction tuning, to train OmniGuard-7B and OmniGuard-3B. The approach achieves state-of-the-art or competitive performance across 15 safety benchmarks, with strong cross-modal generalization and interpretable reasoning for safety judgments. The work advances robust, explainable safety moderation for multi-modal AI systems and highlights areas for efficiency improvements and richer cross-modal data. This framework lays groundwork for scalable, policy-grounded, cross-modal safeguarding in next-generation omnimodal models.

Abstract

Omni-modal Large Language Models (OLLMs) that process text, images, videos, and audio introduce new challenges for safety and value guardrails in human-AI interaction. Prior guardrail research largely targets unimodal settings and typically frames safeguarding as binary classification, which limits robustness across diverse modalities and tasks. To address this gap, we propose OmniGuard, the first family of omni-modal guardrails that performs safeguarding across all modalities with deliberate reasoning ability. To support the training of OMNIGUARD, we curate a large, comprehensive omni-modal safety dataset comprising over 210K diverse samples, with inputs that cover all modalities through both unimodal and cross-modal samples. Each sample is annotated with structured safety labels and carefully curated safety critiques from expert models through targeted distillation. Extensive experiments on 15 benchmarks show that OmniGuard achieves strong effectiveness and generalization across a wide range of multimodal safety scenarios. Importantly, OmniGuard provides a unified framework that enforces policies and mitigates risks in omni-modalities, paving the way toward building more robust and capable omnimodal safeguarding systems.

Paper Structure

This paper contains 29 sections, 2 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of OmniGuard's training process. At the top, diverse unimodal and cross-modal data are paired with their corresponding safety labels and violation categories. Expert models then generate detailed reasoning critiques, which are subsequently used to fine-tune OmniGuard through targeted distillation. In contrast to existing guardrail systems (bottom left), which are typically modality-specific and limited to simple binary classification, OmniGuard supports unified omni-modal safety judgment across text, image, video, and audio domains, while additionally providing comprehensive safety reasoning to justify its decisions (bottom right).
  • Figure 2: Collected datasets and the distribution of the constructed dataset.
  • Figure 3: Performance comparison of OmniGuard and baseline models on cross-modal safety benchmarks. The performance is evaluated in Accuracy (ACC).
  • Figure 4: Comparison of performance between Label-only SFT and critique-augmented training across both uni-modal and cross-modal settings. The upper four subplots show average performance results on uni-modal benchmarks (Text, Image, Video, Audio), evaluated by F1 score (%). The bottom four subplots present cross-modal results on MM-SafetyBench (Image-Text), Video-SafetyBench (Video-Text), and AIAH (Audio-Text), along with the average performance, reported in accuracy (ACC, %).
  • Figure 5: Prompt template used for target distillation from teacher models when generating safety critiques. Provided with the safety label (safe or unsafe) and the corresponding violation categories, the teacher models are instructed to produce a detailed explanation describing the rationale behind the safety assessment.