System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection
Binglin Wu, Jiaxiu Zou, Xianneng Li
TL;DR
The paper tackles fine-grained Chinese hate speech detection, where implicit rhetoric and evolving slang hinder traditional models. It introduces a three-stage LLM framework—Domain-specific Prompt Engineering, Task-oriented Supervised Fine-tuning, and Dynamic LLM Merge—and validates it on the STATE-ToxiCN benchmark. Results show state-of-the-art performance, strong generalization to cross-context cases, and clear gains from both targeted prompting and model merging. The work demonstrates a viable path to robust, fine-grained hate speech detection in Chinese, with practical impact for safer online spaces.
Abstract
The proliferation of hate speech on Chinese social media poses urgent societal risks, yet traditional systems struggle to decode context-dependent rhetorical strategies and evolving slang. To bridge this gap, we propose a novel three-stage LLM-based framework: Prompt Engineering, Supervised Fine-tuning, and LLM Merging. First, context-aware prompts are designed to guide LLMs in extracting implicit hate patterns. Next, task-specific features are integrated during supervised fine-tuning to enhance domain adaptation. Finally, merging fine-tuned LLMs improves robustness against out-of-distribution cases. Evaluations on the STATE-ToxiCN benchmark validate the framework's effectiveness, demonstrating superior performance over baseline methods in detecting fine-grained hate speech.
