Table of Contents
Fetching ...

Seeing the Unseen: Rethinking Illicit Promotion Detection with In-Context Learning

Sangyi Wu, Junpu Guo, Xianghang Mi

Abstract

Illicit online promotion is a persistent threat that evolves to evade detection. Existing moderation systems remain tethered to platform-specific supervision and static taxonomies, a reactive paradigm that struggles to generalize across domains or uncover novel threats. This paper presents a systematic study of In-Context Learning (ICL) as a unified framework for illicit promotion detection. Through rigorous analysis, we show that properly configured ICL achieves performance comparable to fine-tuned models using 22x fewer labeled examples. We demonstrate three key capabilities: (1) Generalization to unseen threats: ICL generalizes to new illicit categories without category-specific demonstrations, with a performance drop of less than 6% for most evaluated categories. (2) Autonomous discovery: A novel two-stage pipeline distills 2,900 free-form labels into coherent taxonomies, surfacing eight previously undocumented illicit categories such as usury and illegal immigration. (3) Cross-platform generalization: Deployed on 200,000 real-world samples from search engines and Twitter without adaptation, ICL achieves 92.6% accuracy. Furthermore, 61.8% of its uniquely flagged samples correspond to borderline or obfuscated content missed by existing detectors. Our findings position ICL as a new paradigm for content moderation, combining the precision of specialized classifiers with cross-platform generalization and autonomous threat discovery. By shifting to inference-time reasoning, ICL offers a path toward proactively adaptive moderation systems.

Seeing the Unseen: Rethinking Illicit Promotion Detection with In-Context Learning

Abstract

Illicit online promotion is a persistent threat that evolves to evade detection. Existing moderation systems remain tethered to platform-specific supervision and static taxonomies, a reactive paradigm that struggles to generalize across domains or uncover novel threats. This paper presents a systematic study of In-Context Learning (ICL) as a unified framework for illicit promotion detection. Through rigorous analysis, we show that properly configured ICL achieves performance comparable to fine-tuned models using 22x fewer labeled examples. We demonstrate three key capabilities: (1) Generalization to unseen threats: ICL generalizes to new illicit categories without category-specific demonstrations, with a performance drop of less than 6% for most evaluated categories. (2) Autonomous discovery: A novel two-stage pipeline distills 2,900 free-form labels into coherent taxonomies, surfacing eight previously undocumented illicit categories such as usury and illegal immigration. (3) Cross-platform generalization: Deployed on 200,000 real-world samples from search engines and Twitter without adaptation, ICL achieves 92.6% accuracy. Furthermore, 61.8% of its uniquely flagged samples correspond to borderline or obfuscated content missed by existing detectors. Our findings position ICL as a new paradigm for content moderation, combining the precision of specialized classifiers with cross-platform generalization and autonomous threat discovery. By shifting to inference-time reasoning, ICL offers a path toward proactively adaptive moderation systems.

Paper Structure

This paper contains 27 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Impact of Demonstration Quantity. F1-scores of different model families as a function of the number of shots for (a) binary and (b) multiclass classification tasks.
  • Figure 2: Efficacy of Demonstration Selection Strategies. Comparison of Random, Lexical (BM25), and Semantic (Embedding-based) selection strategies across varying shot counts using the Mistral model.
  • Figure 3: Instruction Tuning Impact. Performance comparison between Instruction-Tuned and Base variants across Llama, Mistral, and Qwen families (32-shot, semantic retrieval).
  • Figure 4: Necessity of Explicit Labels. Impact of removing labels from demonstrations compared to Zero-shot and Standard (Labeled) Few-shot baselines on FPR and F1 Score.
  • Figure 5: Sensitivity to Label Verbalization. Comparison of Original (Semantic), Inverted, and Abstract (Symbolic) label mappings for (a) binary and (b) multiclass tasks.
  • ...and 5 more figures