VModA: An Effective Framework for Adaptive NSFW Image Moderation

Han Bao; Qinying Wang; Zhi Chen; Qingming Li; Xuhong Zhang; Changjiang Li; Zonghui Wang; Shouling Ji; Wenzhi Chen

VModA: An Effective Framework for Adaptive NSFW Image Moderation

Han Bao, Qinying Wang, Zhi Chen, Qingming Li, Xuhong Zhang, Changjiang Li, Zonghui Wang, Shouling Ji, Wenzhi Chen

TL;DR

VModA addresses the need for adaptable and robust NSFW image moderation by integrating Vision-Language Models with LLM-driven reasoning in a domain-aware, multi-stage framework. It introduces preprocessing, ROI zoom-in, CoT-based semantics, and iterative output aggregation to overcome limitations in detail capture and advanced semantics. Across six real-world NSFW datasets and multiple backbones, VModA achieves up to 54.3% accuracy gains and demonstrates strong adaptability to category- and scenario-based rules, while also revealing label inconsistencies in public benchmarks. The approach reduces reliance on large training datasets, enables flexible policy enforcement, and shows practical value for real-world moderation, albeit with acknowledged limitations and avenues for future enhancement.

Abstract

Not Safe/Suitable for Work (NSFW) content is rampant on social networks and poses serious harm to citizens, especially minors. Current detection methods mainly rely on deep learning-based image recognition and classification. However, NSFW images are now presented in increasingly sophisticated ways, often using image details and complex semantics to obscure their true nature or attract more views. Although still understandable to humans, these images often evade existing detection methods, posing a significant threat. Further complicating the issue, varying regulations across platforms and regions create additional challenges for effective moderation, leading to detection bias and reduced accuracy. To address this, we propose VModA, a general and effective framework that adapts to diverse moderation rules and handles complex, semantically rich NSFW content across categories. Experimental results show that VModA significantly outperforms existing methods, achieving up to a 54.3% accuracy improvement across NSFW types, including those with complex semantics. Further experiments demonstrate that our method exhibits strong adaptability across categories, scenarios, and base VLMs. We also identified inconsistent and controversial label samples in public NSFW benchmark datasets, re-annotated them, and submitted corrections to the original maintainers. Two datasets have confirmed the updates so far. Additionally, we evaluate VModA in real-world scenarios to demonstrate its practical effectiveness.

VModA: An Effective Framework for Adaptive NSFW Image Moderation

TL;DR

Abstract

VModA: An Effective Framework for Adaptive NSFW Image Moderation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)