CNSocialDepress: A Chinese Social Media Dataset for Depression Risk Detection and Structured Analysis

Jinyuan Xu; Tian Lan; Xintao Yu; Xue He; Hezhi Zhang; Ying Wang; Pierre Magistry; Mathieu Valette; Lei Li

CNSocialDepress: A Chinese Social Media Dataset for Depression Risk Detection and Structured Analysis

Jinyuan Xu, Tian Lan, Xintao Yu, Xue He, Hezhi Zhang, Ying Wang, Pierre Magistry, Mathieu Valette, Lei Li

TL;DR

CNSocialDepress introduces the first public Chinese-language depression-risk dataset that pairs binary labels with expert-annotated six-dimensional analyses. It combines a manually curated CNSD Gold standard with an automated CNSD Silver pipeline, enabling scalable labeling and structured analysis generation for depression signals on Chinese social media. Through extensive experiments across data generation, structured summarization, and classification using multiple LLMs and baselines, the work demonstrates strong generation quality and competitive classification performance, highlighting the utility of structured psychological profiling for mental health applications in Chinese. The dataset and pipeline offer practical tools for early detection and intervention while acknowledging platform biases, annotation costs, and ethical considerations.

Abstract

Depression is a pressing global public health issue, yet publicly available Chinese-language resources for risk detection remain scarce and are mostly limited to binary classification. To address this limitation, we release CNSocialDepress, a benchmark dataset for depression risk detection from Chinese social media posts. The dataset contains 44,178 texts from 233 users, within which psychological experts annotated 10,306 depression-related segments. CNSocialDepress provides binary risk labels together with structured multi-dimensional psychological attributes, enabling interpretable and fine-grained analysis of depressive signals. Experimental results demonstrate its utility across a wide range of NLP tasks, including structured psychological profiling and fine-tuning of large language models for depression detection. Comprehensive evaluations highlight the dataset's effectiveness and practical value for depression risk identification and psychological analysis, thereby providing insights to mental health applications tailored for Chinese-speaking populations.

CNSocialDepress: A Chinese Social Media Dataset for Depression Risk Detection and Structured Analysis

TL;DR

Abstract

CNSocialDepress: A Chinese Social Media Dataset for Depression Risk Detection and Structured Analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)