Table of Contents
Fetching ...

A Comparative Analysis of Transformer and LSTM Models for Detecting Suicidal Ideation on Reddit

Khalid Hasan, Jamil Saquer

TL;DR

This study benchmarks transformer-based models (BERT variants) and LSTM-based architectures for detecting suicidal ideation in Reddit posts, using a large annotated dataset of 37,821 posts collected via the Pushshift API across multiple subreddits. RoBERTa achieves the best performance with an accuracy of $93.22 ext{ extpercent}$ and F1 of $93.14 ext{ extpercent}$, while an LSTM with attention and BERT embeddings closely follows at $92.65 ext{ extpercent}$ accuracy and $92.69 ext{ extpercent}$ F1. Across models, all transformers show strong capability for SI detection, underscoring the potential of NLP techniques to support mental health monitoring and timely intervention on social media. The work also demonstrates the superior impact of contextual embedding via BERT on downstream LSTM models and highlights practical considerations such as training efficiency for DistilBERT and future directions including ensembles and multilingual studies.

Abstract

Suicide is a critical global health problem involving more than 700,000 deaths yearly, particularly among young adults. Many people express their suicidal thoughts on social media platforms such as Reddit. This paper evaluates the effectiveness of the deep learning transformer-based models BERT, RoBERTa, DistilBERT, ALBERT, and ELECTRA and various Long Short-Term Memory (LSTM) based models in detecting suicidal ideation from user posts on Reddit. Toward this objective, we curated an extensive dataset from diverse subreddits and conducted linguistic, topic modeling, and statistical analyses to ensure data quality. Our results indicate that each model could reach high accuracy and F1 scores, but among them, RoBERTa emerged as the most effective model with an accuracy of 93.22% and F1 score of 93.14%. An LSTM model that uses attention and BERT embeddings performed as the second best, with an accuracy of 92.65% and an F1 score of 92.69%. Our findings show that transformer-based models have the potential to improve suicide ideation detection, thereby providing a path to develop robust mental health monitoring tools from social media. This research, therefore, underlines the undeniable prospect of advanced techniques in Natural Language Processing (NLP) while improving suicide prevention efforts.

A Comparative Analysis of Transformer and LSTM Models for Detecting Suicidal Ideation on Reddit

TL;DR

This study benchmarks transformer-based models (BERT variants) and LSTM-based architectures for detecting suicidal ideation in Reddit posts, using a large annotated dataset of 37,821 posts collected via the Pushshift API across multiple subreddits. RoBERTa achieves the best performance with an accuracy of and F1 of , while an LSTM with attention and BERT embeddings closely follows at accuracy and F1. Across models, all transformers show strong capability for SI detection, underscoring the potential of NLP techniques to support mental health monitoring and timely intervention on social media. The work also demonstrates the superior impact of contextual embedding via BERT on downstream LSTM models and highlights practical considerations such as training efficiency for DistilBERT and future directions including ensembles and multilingual studies.

Abstract

Suicide is a critical global health problem involving more than 700,000 deaths yearly, particularly among young adults. Many people express their suicidal thoughts on social media platforms such as Reddit. This paper evaluates the effectiveness of the deep learning transformer-based models BERT, RoBERTa, DistilBERT, ALBERT, and ELECTRA and various Long Short-Term Memory (LSTM) based models in detecting suicidal ideation from user posts on Reddit. Toward this objective, we curated an extensive dataset from diverse subreddits and conducted linguistic, topic modeling, and statistical analyses to ensure data quality. Our results indicate that each model could reach high accuracy and F1 scores, but among them, RoBERTa emerged as the most effective model with an accuracy of 93.22% and F1 score of 93.14%. An LSTM model that uses attention and BERT embeddings performed as the second best, with an accuracy of 92.65% and an F1 score of 92.69%. Our findings show that transformer-based models have the potential to improve suicide ideation detection, thereby providing a path to develop robust mental health monitoring tools from social media. This research, therefore, underlines the undeniable prospect of advanced techniques in Natural Language Processing (NLP) while improving suicide prevention efforts.

Paper Structure

This paper contains 17 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: A summary of our research framework
  • Figure 2: Results of Topic Modeling with 8 Topics