SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

Zhaorun Chen; Francesco Pinto; Minzhou Pan; Bo Li

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

Zhaorun Chen, Francesco Pinto, Minzhou Pan, Bo Li

TL;DR

SafeWatch tackles the need for scalable, policy-driven video guardrails by combining parallel policy encoding and policy-aware token pruning to reduce latency and mitigate bias, while delivering grounded explanations. It introduces PEPE to process safety policies in parallel and PAP to focus computation on policy-relevant video tokens, enabling zero-shot generalization to new policies. A large SafeWatch-Bench dataset with 2M videos and a multi-agent consensus annotation pipeline supports robust training and evaluation across real-world and Generative AI content. Empirical results show SafeWatch outperforms state-of-the-art baselines on SafeWatch-Bench and existing benchmarks, with improved explainability and reduced inference costs, signaling a practical path toward robust, transparent video moderation.

Abstract

With the rise of generative AI and rapid growth of high-quality video generation, video guardrails have become more crucial than ever to ensure safety and security across platforms. Current video guardrails, however, are either overly simplistic, relying on pure classification models trained on simple policies with limited unsafe categories, which lack detailed explanations, or prompting multimodal large language models (MLLMs) with long safety guidelines, which are inefficient and impractical for guardrailing real-world content. To bridge this gap, we propose SafeWatch, an efficient MLLM-based video guardrail model designed to follow customized safety policies and provide multi-label video guardrail outputs with content-specific explanations in a zero-shot manner. In particular, unlike traditional MLLM-based guardrails that encode all safety policies autoregressively, causing inefficiency and bias, SafeWatch uniquely encodes each policy chunk in parallel and eliminates their position bias such that all policies are attended simultaneously with equal importance. In addition, to improve efficiency and accuracy, SafeWatch incorporates a policy-aware visual token pruning algorithm that adaptively selects the most relevant video tokens for each policy, discarding noisy or irrelevant information. This allows for more focused, policy-compliant guardrail with significantly reduced computational overhead. Considering the limitations of existing video guardrail benchmarks, we propose SafeWatch-Bench, a large-scale video guardrail benchmark comprising over 2M videos spanning six safety categories which covers over 30 tasks to ensure a comprehensive coverage of all potential safety scenarios. SafeWatch outperforms SOTA by 28.2% on SafeWatch-Bench, 13.6% on benchmarks, cuts costs by 10%, and delivers top-tier explanations validated by LLM and human reviews.

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

TL;DR

Abstract

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (21)