Table of Contents
Fetching ...

Multi-Agent VLMs Guided Self-Training with PNU Loss for Low-Resource Offensive Content Detection

Han Wang, Deyi Ji, Junyu Lu, Lanyun Zhu, Hailong Zhang, Haiyang Wu, Liqun Liu, Peng Shu, Roy Ka-Wei Lee

TL;DR

The paper tackles the challenge of offensive content detection under low-resource supervision by introducing a self-training framework that couples a lightweight classifier with Multi-Agent Vision-Language Models (MA-VLMs) to verify pseudo-labels. A socially-informed prompting scheme simulates moderator and user perspectives, producing Agreed-Unknown and Disagreed-Unknown categories to guide label propagation. The authors propose a novel Positive-Negative-Unlabeled (PNU) loss that integrates labeled data, Agreed-Unknown, and Disagreed-Unknown samples, balancing PN, PU, and NU components via a tunable parameter $\gamma$. Experiments on four benchmark datasets show that MA-VLM guided self-training with PNU loss outperforms baselines in low-resource settings and rivals large-scale models, demonstrating strong robustness and potential for scalable, fair moderation across modalities and languages.

Abstract

Accurate detection of offensive content on social media demands high-quality labeled data; however, such data is often scarce due to the low prevalence of offensive instances and the high cost of manual annotation. To address this low-resource challenge, we propose a self-training framework that leverages abundant unlabeled data through collaborative pseudo-labeling. Starting with a lightweight classifier trained on limited labeled data, our method iteratively assigns pseudo-labels to unlabeled instances with the support of Multi-Agent Vision-Language Models (MA-VLMs). Un-labeled data on which the classifier and MA-VLMs agree are designated as the Agreed-Unknown set, while conflicting samples form the Disagreed-Unknown set. To enhance label reliability, MA-VLMs simulate dual perspectives, moderator and user, capturing both regulatory and subjective viewpoints. The classifier is optimized using a novel Positive-Negative-Unlabeled (PNU) loss, which jointly exploits labeled, Agreed-Unknown, and Disagreed-Unknown data while mitigating pseudo-label noise. Experiments on benchmark datasets demonstrate that our framework substantially outperforms baselines under limited supervision and approaches the performance of large-scale models

Multi-Agent VLMs Guided Self-Training with PNU Loss for Low-Resource Offensive Content Detection

TL;DR

The paper tackles the challenge of offensive content detection under low-resource supervision by introducing a self-training framework that couples a lightweight classifier with Multi-Agent Vision-Language Models (MA-VLMs) to verify pseudo-labels. A socially-informed prompting scheme simulates moderator and user perspectives, producing Agreed-Unknown and Disagreed-Unknown categories to guide label propagation. The authors propose a novel Positive-Negative-Unlabeled (PNU) loss that integrates labeled data, Agreed-Unknown, and Disagreed-Unknown samples, balancing PN, PU, and NU components via a tunable parameter . Experiments on four benchmark datasets show that MA-VLM guided self-training with PNU loss outperforms baselines in low-resource settings and rivals large-scale models, demonstrating strong robustness and potential for scalable, fair moderation across modalities and languages.

Abstract

Accurate detection of offensive content on social media demands high-quality labeled data; however, such data is often scarce due to the low prevalence of offensive instances and the high cost of manual annotation. To address this low-resource challenge, we propose a self-training framework that leverages abundant unlabeled data through collaborative pseudo-labeling. Starting with a lightweight classifier trained on limited labeled data, our method iteratively assigns pseudo-labels to unlabeled instances with the support of Multi-Agent Vision-Language Models (MA-VLMs). Un-labeled data on which the classifier and MA-VLMs agree are designated as the Agreed-Unknown set, while conflicting samples form the Disagreed-Unknown set. To enhance label reliability, MA-VLMs simulate dual perspectives, moderator and user, capturing both regulatory and subjective viewpoints. The classifier is optimized using a novel Positive-Negative-Unlabeled (PNU) loss, which jointly exploits labeled, Agreed-Unknown, and Disagreed-Unknown data while mitigating pseudo-label noise. Experiments on benchmark datasets demonstrate that our framework substantially outperforms baselines under limited supervision and approaches the performance of large-scale models

Paper Structure

This paper contains 22 sections, 5 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Comparison of our approach (bottom) with supervised-only (top) and traditional self-training (middle).
  • Figure 2: MA-VLMs guided self-training pipeline using PNU loss.
  • Figure 3: MA-VLMs with hate meme detection example.
  • Figure 4: M-F1 of top $k$ pseudo-labeled samples per round on FHM ($n=100$); only first 10 rounds shown. GT = Ground Truth