Table of Contents
Fetching ...

Risk prediction of pathological gambling on social media

Angelina Parfenova, Marianne Clausel

TL;DR

This work tackles risk prediction of pathological gambling from Reddit posts by leveraging temporal dynamics and emotional cues. The authors compare baseline text- and sequence-based models and propose a dual-encoder architecture that injects EmoBERTa-derived emotions and a time-decay layer into LSTM processing, followed by attention and classification. They demonstrate that sequential models with time-aware weighting and emotion-informed features outperform concatenation-based approaches, achieving a high $F1$-score of $0.95$ on the small eRisk PG dataset, and show interpretability via attention. The study highlights the importance of temporal and emotional processing for mental health prediction on social media, while noting limitations in data size and the need for broader validation and early-risk detection in future work.

Abstract

This paper addresses the problem of risk prediction on social media data, specifically focusing on the classification of Reddit users as having a pathological gambling disorder. To tackle this problem, this paper focuses on incorporating temporal and emotional features into the model. The preprocessing phase involves dealing with the time irregularity of posts by padding sequences. Two baseline architectures are used for preliminary evaluation: BERT classifier on concatenated posts per user and GRU with LSTM on sequential data. Experimental results demonstrate that the sequential models outperform the concatenation-based model. The results of the experiments conclude that the incorporation of a time decay layer (TD) and passing the emotion classification layer (EmoBERTa) through LSTM improves the performance significantly. Experiments concluded that the addition of a self-attention layer didn't significantly improve the performance of the model, however provided easily interpretable attention scores. The developed architecture with the inclusion of EmoBERTa and TD layers achieved a high F1 score, beating existing benchmarks on pathological gambling dataset. Future work may involve the early prediction of risk factors associated with pathological gambling disorder and testing models on other datasets. Overall, this research highlights the significance of the sequential processing of posts including temporal and emotional features to boost the predictive power, as well as adding an attention layer for interpretability.

Risk prediction of pathological gambling on social media

TL;DR

This work tackles risk prediction of pathological gambling from Reddit posts by leveraging temporal dynamics and emotional cues. The authors compare baseline text- and sequence-based models and propose a dual-encoder architecture that injects EmoBERTa-derived emotions and a time-decay layer into LSTM processing, followed by attention and classification. They demonstrate that sequential models with time-aware weighting and emotion-informed features outperform concatenation-based approaches, achieving a high -score of on the small eRisk PG dataset, and show interpretability via attention. The study highlights the importance of temporal and emotional processing for mental health prediction on social media, while noting limitations in data size and the need for broader validation and early-risk detection in future work.

Abstract

This paper addresses the problem of risk prediction on social media data, specifically focusing on the classification of Reddit users as having a pathological gambling disorder. To tackle this problem, this paper focuses on incorporating temporal and emotional features into the model. The preprocessing phase involves dealing with the time irregularity of posts by padding sequences. Two baseline architectures are used for preliminary evaluation: BERT classifier on concatenated posts per user and GRU with LSTM on sequential data. Experimental results demonstrate that the sequential models outperform the concatenation-based model. The results of the experiments conclude that the incorporation of a time decay layer (TD) and passing the emotion classification layer (EmoBERTa) through LSTM improves the performance significantly. Experiments concluded that the addition of a self-attention layer didn't significantly improve the performance of the model, however provided easily interpretable attention scores. The developed architecture with the inclusion of EmoBERTa and TD layers achieved a high F1 score, beating existing benchmarks on pathological gambling dataset. Future work may involve the early prediction of risk factors associated with pathological gambling disorder and testing models on other datasets. Overall, this research highlights the significance of the sequential processing of posts including temporal and emotional features to boost the predictive power, as well as adding an attention layer for interpretability.
Paper Structure (26 sections, 3 equations, 4 figures, 5 tables)

This paper contains 26 sections, 3 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Illustration of the data
  • Figure 2: The number of posts per user, demonstrating the variation addressed by sequence padding
  • Figure 3: Model architecture
  • Figure 4: Attention weights for several observations