Table of Contents
Fetching ...

DepressionEmo: A novel dataset for multilabel classification of depression emotions

Abu Bakar Siddiqur Rahman, Hoang-Thang Ta, Lotfollah Najjar, Azad Azadmanesh, Ali Saffet Gönül

TL;DR

DepressionEmo introduces a novel Reddit-derived multilabel dataset for eight depression-related emotions, created from 6037 long posts via majority voting on zero-shot model outputs and validated against human annotators and ChatGPT. The paper provides a thorough dataset creation pipeline, including data crawling, preprocessing, emotion definitions, and a robust annotation framework with interrater reliability assessments. It benchmarks six classifiers (three ML and three DL) and finds that deep learning models, particularly BART, best capture the emotional content with a F1-Macro of 0.76 and a F1-Micro of 0.80. Beyond dataset construction, DepressionEmo offers analyses of emotion distribution, temporal trends, and inter-emotion correlations, illustrating its potential for improving automated depression emotion detection in textual data and guiding future data augmentation and model development.

Abstract

Emotions are integral to human social interactions, with diverse responses elicited by various situational contexts. Particularly, the prevalence of negative emotional states has been correlated with negative outcomes for mental health, necessitating a comprehensive analysis of their occurrence and impact on individuals. In this paper, we introduce a novel dataset named DepressionEmo designed to detect 8 emotions associated with depression by 6037 examples of long Reddit user posts. This dataset was created through a majority vote over inputs by zero-shot classifications from pre-trained models and validating the quality by annotators and ChatGPT, exhibiting an acceptable level of interrater reliability between annotators. The correlation between emotions, their distribution over time, and linguistic analysis are conducted on DepressionEmo. Besides, we provide several text classification methods classified into two groups: machine learning methods such as SVM, XGBoost, and Light GBM; and deep learning methods such as BERT, GAN-BERT, and BART. The pretrained BART model, bart-base allows us to obtain the highest F1- Macro of 0.76, showing its outperformance compared to other methods evaluated in our analysis. Across all emotions, the highest F1-Macro value is achieved by suicide intent, indicating a certain value of our dataset in identifying emotions in individuals with depression symptoms through text analysis. The curated dataset is publicly available at: https://github.com/abuBakarSiddiqurRahman/DepressionEmo.

DepressionEmo: A novel dataset for multilabel classification of depression emotions

TL;DR

DepressionEmo introduces a novel Reddit-derived multilabel dataset for eight depression-related emotions, created from 6037 long posts via majority voting on zero-shot model outputs and validated against human annotators and ChatGPT. The paper provides a thorough dataset creation pipeline, including data crawling, preprocessing, emotion definitions, and a robust annotation framework with interrater reliability assessments. It benchmarks six classifiers (three ML and three DL) and finds that deep learning models, particularly BART, best capture the emotional content with a F1-Macro of 0.76 and a F1-Micro of 0.80. Beyond dataset construction, DepressionEmo offers analyses of emotion distribution, temporal trends, and inter-emotion correlations, illustrating its potential for improving automated depression emotion detection in textual data and guiding future data augmentation and model development.

Abstract

Emotions are integral to human social interactions, with diverse responses elicited by various situational contexts. Particularly, the prevalence of negative emotional states has been correlated with negative outcomes for mental health, necessitating a comprehensive analysis of their occurrence and impact on individuals. In this paper, we introduce a novel dataset named DepressionEmo designed to detect 8 emotions associated with depression by 6037 examples of long Reddit user posts. This dataset was created through a majority vote over inputs by zero-shot classifications from pre-trained models and validating the quality by annotators and ChatGPT, exhibiting an acceptable level of interrater reliability between annotators. The correlation between emotions, their distribution over time, and linguistic analysis are conducted on DepressionEmo. Besides, we provide several text classification methods classified into two groups: machine learning methods such as SVM, XGBoost, and Light GBM; and deep learning methods such as BERT, GAN-BERT, and BART. The pretrained BART model, bart-base allows us to obtain the highest F1- Macro of 0.76, showing its outperformance compared to other methods evaluated in our analysis. Across all emotions, the highest F1-Macro value is achieved by suicide intent, indicating a certain value of our dataset in identifying emotions in individuals with depression symptoms through text analysis. The curated dataset is publicly available at: https://github.com/abuBakarSiddiqurRahman/DepressionEmo.
Paper Structure (18 sections, 6 equations, 9 figures, 10 tables)

This paper contains 18 sections, 6 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Distribution of emotions, by percentage and number of examples, across the training, validation, and validation sets
  • Figure 2: Emotion distribution.
  • Figure 3: The heatmap shows the Pearson correlation of emotion pairs.
  • Figure 4: Total emotions and "suicide intent" emotions in weekdays.
  • Figure 5: Total emotions and "suicide intent" emotions in 24 hours.
  • ...and 4 more figures