SLM-Mod: Small Language Models Surpass LLMs at Content Moderation
Xianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, Koustuv Saha
TL;DR
This work investigates whether open-source small language models fine-tuned with LoRA can outperform larger, closed- and open-source LLMs in community-specific content moderation. Through a large Reddit-based dataset, the authors demonstrate that SLMs achieve higher accuracy and recall in in-domain moderation and maintain strong performance under imbalanced data, while LLMs offer higher precision. They further show that SLMs transfer reasonably well across domains, sometimes outperforming LLMs in cross-domain settings, and reveal stability advantages over closed-source models. The study argues for a shift toward specialist SLMs for scalable, cost-efficient moderation and outlines hybrid deployment strategies that combine automated triage with human review. It also highlights limitations and directions for future work, including continual updating to track evolving community norms and extending to multimodal moderation.
Abstract
Large language models (LLMs) have shown promise in many natural language understanding tasks, including content moderation. However, these models can be expensive to query in real-time and do not allow for a community-specific approach to content moderation. To address these challenges, we explore the use of open-source small language models (SLMs) for community-specific content moderation tasks. We fine-tune and evaluate SLMs (less than 15B parameters) by comparing their performance against much larger open- and closed-sourced models in both a zero-shot and few-shot setting. Using 150K comments from 15 popular Reddit communities, we find that SLMs outperform zero-shot LLMs at content moderation -- 11.5% higher accuracy and 25.7% higher recall on average across all communities. Moreover, few-shot in-context learning leads to only a marginal increase in the performance of LLMs, still lacking compared to SLMs. We further show the promise of cross-community content moderation, which has implications for new communities and the development of cross-platform moderation techniques. Finally, we outline directions for future work on language model based content moderation. Code and models can be found at https://github.com/AGoyal0512/SLM-Mod.
