Table of Contents
Fetching ...

Fine-grained Speech Sentiment Analysis in Chinese Psychological Support Hotlines Based on Large-scale Pre-trained Model

Zhonglong Chen, Changwei Song, Yining Chen, Jianqiang Li, Guanghui Fu, Yongsheng Tong, Qing Zhao

TL;DR

This work tackles the problem of automatic negative-emotion detection in Chinese psychological support hotline speech by leveraging large-scale pre-trained models (Wav2Vec 2.0, HuBERT, Whisper) and fine-tuning a compact classifier for both binary and fine-grained tasks on 20,630 segments from 105 callers of the Beijing hotline. The authors find strong performance for binary negative emotion recognition (F1 up to 76.96%) but substantially weaker results for the 11-class fine-grained multi-label task (weighted F1 up to 41.74%), underscoring the challenge of nuanced emotion understanding in crisis contexts and the importance of data scale and labeling quality. The study provides a valuable baseline for integrating SER into hotline systems and for large-scale psychometric analyses, while highlighting the need for more diverse data, contextual modeling, and methods tailored to closely related emotions. Overall, the work demonstrates both the feasibility and limitations of current PTMs for clinical speech emotion analysis in Chinese crisis hotlines, guiding future improvements in data, modeling, and deployment scenarios.

Abstract

Suicide and suicidal behaviors remain significant challenges for public policy and healthcare. In response, psychological support hotlines have been established worldwide to provide immediate help to individuals in mental crises. The effectiveness of these hotlines largely depends on accurately identifying callers' emotional states, particularly underlying negative emotions indicative of increased suicide risk. However, the high demand for psychological interventions often results in a shortage of professional operators, highlighting the need for an effective speech emotion recognition model. This model would automatically detect and analyze callers' emotions, facilitating integration into hotline services. Additionally, it would enable large-scale data analysis of psychological support hotline interactions to explore psychological phenomena and behaviors across populations. Our study utilizes data from the Beijing psychological support hotline, the largest suicide hotline in China. We analyzed speech data from 105 callers containing 20,630 segments and categorized them into 11 types of negative emotions. We developed a negative emotion recognition model and a fine-grained multi-label classification model using a large-scale pre-trained model. Our experiments indicate that the negative emotion recognition model achieves a maximum F1-score of 76.96%. However, it shows limited efficacy in the fine-grained multi-label classification task, with the best model achieving only a 41.74% weighted F1-score. We conducted an error analysis for this task, discussed potential future improvements, and considered the clinical application possibilities of our study. All the codes are public available.

Fine-grained Speech Sentiment Analysis in Chinese Psychological Support Hotlines Based on Large-scale Pre-trained Model

TL;DR

This work tackles the problem of automatic negative-emotion detection in Chinese psychological support hotline speech by leveraging large-scale pre-trained models (Wav2Vec 2.0, HuBERT, Whisper) and fine-tuning a compact classifier for both binary and fine-grained tasks on 20,630 segments from 105 callers of the Beijing hotline. The authors find strong performance for binary negative emotion recognition (F1 up to 76.96%) but substantially weaker results for the 11-class fine-grained multi-label task (weighted F1 up to 41.74%), underscoring the challenge of nuanced emotion understanding in crisis contexts and the importance of data scale and labeling quality. The study provides a valuable baseline for integrating SER into hotline systems and for large-scale psychometric analyses, while highlighting the need for more diverse data, contextual modeling, and methods tailored to closely related emotions. Overall, the work demonstrates both the feasibility and limitations of current PTMs for clinical speech emotion analysis in Chinese crisis hotlines, guiding future improvements in data, modeling, and deployment scenarios.

Abstract

Suicide and suicidal behaviors remain significant challenges for public policy and healthcare. In response, psychological support hotlines have been established worldwide to provide immediate help to individuals in mental crises. The effectiveness of these hotlines largely depends on accurately identifying callers' emotional states, particularly underlying negative emotions indicative of increased suicide risk. However, the high demand for psychological interventions often results in a shortage of professional operators, highlighting the need for an effective speech emotion recognition model. This model would automatically detect and analyze callers' emotions, facilitating integration into hotline services. Additionally, it would enable large-scale data analysis of psychological support hotline interactions to explore psychological phenomena and behaviors across populations. Our study utilizes data from the Beijing psychological support hotline, the largest suicide hotline in China. We analyzed speech data from 105 callers containing 20,630 segments and categorized them into 11 types of negative emotions. We developed a negative emotion recognition model and a fine-grained multi-label classification model using a large-scale pre-trained model. Our experiments indicate that the negative emotion recognition model achieves a maximum F1-score of 76.96%. However, it shows limited efficacy in the fine-grained multi-label classification task, with the best model achieving only a 41.74% weighted F1-score. We conducted an error analysis for this task, discussed potential future improvements, and considered the clinical application possibilities of our study. All the codes are public available.
Paper Structure (14 sections, 2 figures, 6 tables)

This paper contains 14 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Label co-occurrence relationships between categories in the fine-grained emotion multi-label classification task.
  • Figure 2: Distribution of speech segment durations. Due to space constraints, we only show the top 11 categories with the highest percentages.