Table of Contents
Fetching ...

C-MTCSD: A Chinese Multi-Turn Conversational Stance Detection Dataset

Fuqiang Niu, Yi Yang, Xianghua Fu, Genan Dai, Bowen Zhang

TL;DR

The paper introduces C-MTCSD, the largest Chinese multi-turn conversational stance detection dataset collected from Sina Weibo to enable context-aware stance analysis in Chinese. It details data collection, annotation, and quality assurance, delivering 24,264 labeled text-target pairs with up to 6-turn depth across five topics. The authors benchmark traditional models and LLM-based approaches, showing that state-of-the-art systems achieve only 64.07% F1 in zero-shot settings and that performance declines as conversation depth increases, highlighting the difficulty of implicit stance and coreference in dialogue. C-MTCSD provides a challenging benchmark for advancing Chinese CSD and motivates future methods to better leverage discourse context and implicit cues.

Abstract

Stance detection has become an essential tool for analyzing public discussions on social media. Current methods face significant challenges, particularly in Chinese language processing and multi-turn conversational analysis. To address these limitations, we introduce C-MTCSD, the largest Chinese multi-turn conversational stance detection dataset, comprising 24,264 carefully annotated instances from Sina Weibo, which is 4.2 times larger than the only prior Chinese conversational stance detection dataset. Our comprehensive evaluation using both traditional approaches and large language models reveals the complexity of C-MTCSD: even state-of-the-art models achieve only 64.07% F1 score in the challenging zero-shot setting, while performance consistently degrades with increasing conversation depth. Traditional models particularly struggle with implicit stance detection, achieving below 50% F1 score. This work establishes a challenging new benchmark for Chinese stance detection research, highlighting significant opportunities for future improvements.

C-MTCSD: A Chinese Multi-Turn Conversational Stance Detection Dataset

TL;DR

The paper introduces C-MTCSD, the largest Chinese multi-turn conversational stance detection dataset collected from Sina Weibo to enable context-aware stance analysis in Chinese. It details data collection, annotation, and quality assurance, delivering 24,264 labeled text-target pairs with up to 6-turn depth across five topics. The authors benchmark traditional models and LLM-based approaches, showing that state-of-the-art systems achieve only 64.07% F1 in zero-shot settings and that performance declines as conversation depth increases, highlighting the difficulty of implicit stance and coreference in dialogue. C-MTCSD provides a challenging benchmark for advancing Chinese CSD and motivates future methods to better leverage discourse context and implicit cues.

Abstract

Stance detection has become an essential tool for analyzing public discussions on social media. Current methods face significant challenges, particularly in Chinese language processing and multi-turn conversational analysis. To address these limitations, we introduce C-MTCSD, the largest Chinese multi-turn conversational stance detection dataset, comprising 24,264 carefully annotated instances from Sina Weibo, which is 4.2 times larger than the only prior Chinese conversational stance detection dataset. Our comprehensive evaluation using both traditional approaches and large language models reveals the complexity of C-MTCSD: even state-of-the-art models achieve only 64.07% F1 score in the challenging zero-shot setting, while performance consistently degrades with increasing conversation depth. Traditional models particularly struggle with implicit stance detection, achieving below 50% F1 score. This work establishes a challenging new benchmark for Chinese stance detection research, highlighting significant opportunities for future improvements.

Paper Structure

This paper contains 5 sections, 8 tables.