BLM-Guard: Explainable Multimodal Ad Moderation with Chain-of-Thought and Policy-Aligned Rewards
Yiran Yang, Zhaowei Liu, Yuan Yuan, Yukun Song, Xiong Ma, Yinghao Song, Xiangji Zeng, Lu Sun, Yulu Wang, Hai Zhou, Shuai Cui, Zhaohan Gong, Jiefei Zhang
TL;DR
The paper addresses policy-driven moderation of multimodal short-video ads, where deceptive visuals, audio, and subtitles require precise, explainable checks implemented via $\mathbf{v}$-level reasoning and policy guidance. It introduces BLM-Guard, a framework that combines Interleaved-modal Chain-of-Thought (ICoT) reasoning with rule-based policy priors and a self-adaptive GRPO reinforcement learning loop to align outputs with platform guidelines. A dedicated BLM-Guard Benchmark provides a three-level taxonomy (Severity, Scenario, Violation Type) and a data synthesis pipeline to support policy-grounded evaluation; results show improvements in accuracy, consistency, and generalization over strong baselines. The approach advances practical moderation by delivering explainable decisions and robust handling of cross-modal mismatches and policy drift, with clear potential for real-world deployment in short-video ad platforms.
Abstract
Short-video platforms now host vast multimodal ads whose deceptive visuals, speech and subtitles demand finer-grained, policy-driven moderation than community safety filters. We present BLM-Guard, a content-audit framework for commercial ads that fuses Chain-of-Thought reasoning with rule-based policy principles and a critic-guided reward. A rule-driven ICoT data-synthesis pipeline jump-starts training by generating structured scene descriptions, reasoning chains and labels, cutting annotation costs. Reinforcement learning then refines the model using a composite reward balancing causal coherence with policy adherence. A multitask architecture models intra-modal manipulations (e.g., exaggerated imagery) and cross-modal mismatches (e.g., subtitle-speech drift), boosting robustness. Experiments on real short-video ads show BLM-Guard surpasses strong baselines in accuracy, consistency and generalization.
