Table of Contents
Fetching ...

A Multimodal Framework for Depression Detection during Covid-19 via Harvesting Social Media: A Novel Dataset and Method

Ashutosh Anshul, Gumpili Sai Pranav, Mohammad Zia Ur Rehman, Nagendra Kumar

TL;DR

The paper addresses depression detection on social media during Covid-19 by introducing MFEL, a multimodal framework that fuses extrinsic cues from URLs, intrinsic textual and visual features, and user-specific signals. It introduces the Visual Neural Network (VNN) for image embeddings and a novel Covid-19 Twitter dataset to evaluate pandemic-specific depression signals. MFEL combines Logistic Regression, XGBoost, and a Neural Network with max voting and achieves state-of-the-art accuracy on the Tsinghua benchmark (93.1%) and robust performance on the Covid-19 dataset (91.7%), outperforming several baseline and transformer-based methods. Ablation studies quantify the contribution of each modality and external data, demonstrating the value of OCR, URL headings, and multimodal fusion for reliable, interpretable depression detection with potential for early intervention.

Abstract

The recent coronavirus disease (Covid-19) has become a pandemic and has affected the entire globe. During the pandemic, we have observed a spike in cases related to mental health, such as anxiety, stress, and depression. Depression significantly influences most diseases worldwide, making it difficult to detect mental health conditions in people due to unawareness and unwillingness to consult a doctor. However, nowadays, people extensively use online social media platforms to express their emotions and thoughts. Hence, social media platforms are now becoming a large data source that can be utilized for detecting depression and mental illness. However, existing approaches often overlook data sparsity in tweets and the multimodal aspects of social media. In this paper, we propose a novel multimodal framework that combines textual, user-specific, and image analysis to detect depression among social media users. To provide enough context about the user's emotional state, we propose (i) an extrinsic feature by harnessing the URLs present in tweets and (ii) extracting textual content present in images posted in tweets. We also extract five sets of features belonging to different modalities to describe a user. Additionally, we introduce a Deep Learning model, the Visual Neural Network (VNN), to generate embeddings of user-posted images, which are used to create the visual feature vector for prediction. We contribute a curated Covid-19 dataset of depressed and non-depressed users for research purposes and demonstrate the effectiveness of our model in detecting depression during the Covid-19 outbreak. Our model outperforms existing state-of-the-art methods over a benchmark dataset by 2%-8% and produces promising results on the Covid-19 dataset. Our analysis highlights the impact of each modality and provides valuable insights into users' mental and emotional states.

A Multimodal Framework for Depression Detection during Covid-19 via Harvesting Social Media: A Novel Dataset and Method

TL;DR

The paper addresses depression detection on social media during Covid-19 by introducing MFEL, a multimodal framework that fuses extrinsic cues from URLs, intrinsic textual and visual features, and user-specific signals. It introduces the Visual Neural Network (VNN) for image embeddings and a novel Covid-19 Twitter dataset to evaluate pandemic-specific depression signals. MFEL combines Logistic Regression, XGBoost, and a Neural Network with max voting and achieves state-of-the-art accuracy on the Tsinghua benchmark (93.1%) and robust performance on the Covid-19 dataset (91.7%), outperforming several baseline and transformer-based methods. Ablation studies quantify the contribution of each modality and external data, demonstrating the value of OCR, URL headings, and multimodal fusion for reliable, interpretable depression detection with potential for early intervention.

Abstract

The recent coronavirus disease (Covid-19) has become a pandemic and has affected the entire globe. During the pandemic, we have observed a spike in cases related to mental health, such as anxiety, stress, and depression. Depression significantly influences most diseases worldwide, making it difficult to detect mental health conditions in people due to unawareness and unwillingness to consult a doctor. However, nowadays, people extensively use online social media platforms to express their emotions and thoughts. Hence, social media platforms are now becoming a large data source that can be utilized for detecting depression and mental illness. However, existing approaches often overlook data sparsity in tweets and the multimodal aspects of social media. In this paper, we propose a novel multimodal framework that combines textual, user-specific, and image analysis to detect depression among social media users. To provide enough context about the user's emotional state, we propose (i) an extrinsic feature by harnessing the URLs present in tweets and (ii) extracting textual content present in images posted in tweets. We also extract five sets of features belonging to different modalities to describe a user. Additionally, we introduce a Deep Learning model, the Visual Neural Network (VNN), to generate embeddings of user-posted images, which are used to create the visual feature vector for prediction. We contribute a curated Covid-19 dataset of depressed and non-depressed users for research purposes and demonstrate the effectiveness of our model in detecting depression during the Covid-19 outbreak. Our model outperforms existing state-of-the-art methods over a benchmark dataset by 2%-8% and produces promising results on the Covid-19 dataset. Our analysis highlights the impact of each modality and provides valuable insights into users' mental and emotional states.

Paper Structure

This paper contains 24 sections, 4 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: System architecture of the proposed model
  • Figure 3: Sample images posted by depressed users containing textual content
  • Figure 4: Effect of number of topics in Topic-Based features over model performance
  • Figure 5: Effect of number of components in PCA for reducing the dimension of Lexicon categories
  • Figure 6: Effect of the dimension of Visual features over model performance