Table of Contents
Fetching ...

VModA: An Effective Framework for Adaptive NSFW Image Moderation

Han Bao, Qinying Wang, Zhi Chen, Qingming Li, Xuhong Zhang, Changjiang Li, Zonghui Wang, Shouling Ji, Wenzhi Chen

TL;DR

VModA addresses the need for adaptable and robust NSFW image moderation by integrating Vision-Language Models with LLM-driven reasoning in a domain-aware, multi-stage framework. It introduces preprocessing, ROI zoom-in, CoT-based semantics, and iterative output aggregation to overcome limitations in detail capture and advanced semantics. Across six real-world NSFW datasets and multiple backbones, VModA achieves up to 54.3% accuracy gains and demonstrates strong adaptability to category- and scenario-based rules, while also revealing label inconsistencies in public benchmarks. The approach reduces reliance on large training datasets, enables flexible policy enforcement, and shows practical value for real-world moderation, albeit with acknowledged limitations and avenues for future enhancement.

Abstract

Not Safe/Suitable for Work (NSFW) content is rampant on social networks and poses serious harm to citizens, especially minors. Current detection methods mainly rely on deep learning-based image recognition and classification. However, NSFW images are now presented in increasingly sophisticated ways, often using image details and complex semantics to obscure their true nature or attract more views. Although still understandable to humans, these images often evade existing detection methods, posing a significant threat. Further complicating the issue, varying regulations across platforms and regions create additional challenges for effective moderation, leading to detection bias and reduced accuracy. To address this, we propose VModA, a general and effective framework that adapts to diverse moderation rules and handles complex, semantically rich NSFW content across categories. Experimental results show that VModA significantly outperforms existing methods, achieving up to a 54.3% accuracy improvement across NSFW types, including those with complex semantics. Further experiments demonstrate that our method exhibits strong adaptability across categories, scenarios, and base VLMs. We also identified inconsistent and controversial label samples in public NSFW benchmark datasets, re-annotated them, and submitted corrections to the original maintainers. Two datasets have confirmed the updates so far. Additionally, we evaluate VModA in real-world scenarios to demonstrate its practical effectiveness.

VModA: An Effective Framework for Adaptive NSFW Image Moderation

TL;DR

VModA addresses the need for adaptable and robust NSFW image moderation by integrating Vision-Language Models with LLM-driven reasoning in a domain-aware, multi-stage framework. It introduces preprocessing, ROI zoom-in, CoT-based semantics, and iterative output aggregation to overcome limitations in detail capture and advanced semantics. Across six real-world NSFW datasets and multiple backbones, VModA achieves up to 54.3% accuracy gains and demonstrates strong adaptability to category- and scenario-based rules, while also revealing label inconsistencies in public benchmarks. The approach reduces reliance on large training datasets, enables flexible policy enforcement, and shows practical value for real-world moderation, albeit with acknowledged limitations and avenues for future enhancement.

Abstract

Not Safe/Suitable for Work (NSFW) content is rampant on social networks and poses serious harm to citizens, especially minors. Current detection methods mainly rely on deep learning-based image recognition and classification. However, NSFW images are now presented in increasingly sophisticated ways, often using image details and complex semantics to obscure their true nature or attract more views. Although still understandable to humans, these images often evade existing detection methods, posing a significant threat. Further complicating the issue, varying regulations across platforms and regions create additional challenges for effective moderation, leading to detection bias and reduced accuracy. To address this, we propose VModA, a general and effective framework that adapts to diverse moderation rules and handles complex, semantically rich NSFW content across categories. Experimental results show that VModA significantly outperforms existing methods, achieving up to a 54.3% accuracy improvement across NSFW types, including those with complex semantics. Further experiments demonstrate that our method exhibits strong adaptability across categories, scenarios, and base VLMs. We also identified inconsistent and controversial label samples in public NSFW benchmark datasets, re-annotated them, and submitted corrections to the original maintainers. Two datasets have confirmed the updates so far. Additionally, we evaluate VModA in real-world scenarios to demonstrate its practical effectiveness.

Paper Structure

This paper contains 47 sections, 4 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Three examples highlighting challenges VLMs face in pornographic moderation. The first image and response show VLMs failed to capture harmful content details. The second image indicates that the VLM exhibited inaccuracies in understanding the advanced semantics of the image. The third image and response show VLMs refused to moderate NSFW content.
  • Figure 2: The overview of VModA. The top of the diagram represents the VLM and LLM based on different system prompt strategies. The colored steps in the flowchart indicate the use of the corresponding prompt strategies and models during execution.
  • Figure 3: An example of the region of interest zoom-in.
  • Figure 4: Contributions of each module to the moderation of NSFW datasets
  • Figure 5: Samples of the incorrect original label in porn dataset.
  • ...and 5 more figures