Table of Contents
Fetching ...

Harnessing Large Language Models Over Transformer Models for Detecting Bengali Depressive Social Media Text: A Comprehensive Study

Ahmadul Karim Chowdhury, Md. Saidur Rahman Sujon, Md. Shirajus Salekin Shafi, Tasin Ahmmad, Sifat Ahmed, Khan Md Hasib, Faisal Muhammad Shah

TL;DR

This study investigates Bengali depressive text detection across DL, transformer PLMs, and large language models using a newly created Bengali Social Media Depressive Dataset (BSMDD) derived from Reddit and X translations. By evaluating Deep Learning (LSTM/GRU variants), PLMs (BanglaBERT family, sahajBERT), and LLMs (GPT-3.5, GPT-4, DepGPT, Alpaca LoRA 7B) under zero-shot and few-shot prompts, the authors demonstrate the superior performance of DepGPT among LLMs (accuracy 0.9796, F1 0.9804) and reveal trade-offs between model types, including cost and explainability. The work provides detailed architecture or prompting analyses, a transparent evaluation framework, and a public Bengali depressive dataset, offering actionable insights for deploying multilingual mental-health classifiers and guiding future cross-linguistic and ethical considerations in AI-assisted screening. Overall, the results underscore the versatility and effectiveness of LLMs for rapid, context-aware depression detection in Bengali social media, while highlighting the need for robust data, interpretability, and responsible deployment in real-world settings.

Abstract

In an era where the silent struggle of underdiagnosed depression pervades globally, our research delves into the crucial link between mental health and social media. This work focuses on early detection of depression, particularly in extroverted social media users, using LLMs such as GPT 3.5, GPT 4 and our proposed GPT 3.5 fine-tuned model DepGPT, as well as advanced Deep learning models(LSTM, Bi-LSTM, GRU, BiGRU) and Transformer models(BERT, BanglaBERT, SahajBERT, BanglaBERT-Base). The study categorized Reddit and X datasets into "Depressive" and "Non-Depressive" segments, translated into Bengali by native speakers with expertise in mental health, resulting in the creation of the Bengali Social Media Depressive Dataset (BSMDD). Our work provides full architecture details for each model and a methodical way to assess their performance in Bengali depressive text categorization using zero-shot and few-shot learning techniques. Our work demonstrates the superiority of SahajBERT and Bi-LSTM with FastText embeddings in their respective domains also tackles explainability issues with transformer models and emphasizes the effectiveness of LLMs, especially DepGPT, demonstrating flexibility and competence in a range of learning contexts. According to the experiment results, the proposed model, DepGPT, outperformed not only Alpaca Lora 7B in zero-shot and few-shot scenarios but also every other model, achieving a near-perfect accuracy of 0.9796 and an F1-score of 0.9804, high recall, and exceptional precision. Although competitive, GPT-3.5 Turbo and Alpaca Lora 7B show relatively poorer effectiveness in zero-shot and few-shot situations. The work emphasizes the effectiveness and flexibility of LLMs in a variety of linguistic circumstances, providing insightful information about the complex field of depression detection models.

Harnessing Large Language Models Over Transformer Models for Detecting Bengali Depressive Social Media Text: A Comprehensive Study

TL;DR

This study investigates Bengali depressive text detection across DL, transformer PLMs, and large language models using a newly created Bengali Social Media Depressive Dataset (BSMDD) derived from Reddit and X translations. By evaluating Deep Learning (LSTM/GRU variants), PLMs (BanglaBERT family, sahajBERT), and LLMs (GPT-3.5, GPT-4, DepGPT, Alpaca LoRA 7B) under zero-shot and few-shot prompts, the authors demonstrate the superior performance of DepGPT among LLMs (accuracy 0.9796, F1 0.9804) and reveal trade-offs between model types, including cost and explainability. The work provides detailed architecture or prompting analyses, a transparent evaluation framework, and a public Bengali depressive dataset, offering actionable insights for deploying multilingual mental-health classifiers and guiding future cross-linguistic and ethical considerations in AI-assisted screening. Overall, the results underscore the versatility and effectiveness of LLMs for rapid, context-aware depression detection in Bengali social media, while highlighting the need for robust data, interpretability, and responsible deployment in real-world settings.

Abstract

In an era where the silent struggle of underdiagnosed depression pervades globally, our research delves into the crucial link between mental health and social media. This work focuses on early detection of depression, particularly in extroverted social media users, using LLMs such as GPT 3.5, GPT 4 and our proposed GPT 3.5 fine-tuned model DepGPT, as well as advanced Deep learning models(LSTM, Bi-LSTM, GRU, BiGRU) and Transformer models(BERT, BanglaBERT, SahajBERT, BanglaBERT-Base). The study categorized Reddit and X datasets into "Depressive" and "Non-Depressive" segments, translated into Bengali by native speakers with expertise in mental health, resulting in the creation of the Bengali Social Media Depressive Dataset (BSMDD). Our work provides full architecture details for each model and a methodical way to assess their performance in Bengali depressive text categorization using zero-shot and few-shot learning techniques. Our work demonstrates the superiority of SahajBERT and Bi-LSTM with FastText embeddings in their respective domains also tackles explainability issues with transformer models and emphasizes the effectiveness of LLMs, especially DepGPT, demonstrating flexibility and competence in a range of learning contexts. According to the experiment results, the proposed model, DepGPT, outperformed not only Alpaca Lora 7B in zero-shot and few-shot scenarios but also every other model, achieving a near-perfect accuracy of 0.9796 and an F1-score of 0.9804, high recall, and exceptional precision. Although competitive, GPT-3.5 Turbo and Alpaca Lora 7B show relatively poorer effectiveness in zero-shot and few-shot situations. The work emphasizes the effectiveness and flexibility of LLMs in a variety of linguistic circumstances, providing insightful information about the complex field of depression detection models.
Paper Structure (26 sections, 8 figures, 18 tables)

This paper contains 26 sections, 8 figures, 18 tables.

Figures (8)

  • Figure 1: Proposed Methodology
  • Figure 2: Design of Zero Shot Example
  • Figure 3: Design of Few Shot Example
  • Figure 4: A true positive sample example
  • Figure 5: A true negative sample example
  • ...and 3 more figures