Table of Contents
Fetching ...

Mental Disorder Classification via Temporal Representation of Text

Raja Kumar, Kishan Maharaj, Ashita Saxena, Pushpak Bhattacharyya

TL;DR

This work tackles the challenge of predicting mental disorders from long, chronologically ordered social-media posts, where traditional chunking of text with large language models loses temporal and inter-post dependencies. It introduces a temporal representation framework built around disorder-specific anchor embeddings and a time-series of cosine similarities to capture how a subject's posts align with a condition over time, followed by time-series classification. The approach yields a 5% absolute improvement in F1 across anorexia, depression, and self-harm over SOTA baselines, with larger gains for self-harm and depression, and substantially reduces computational cost (up to ~330x fewer FLOPs) compared to chunking-based methods. A cross-domain transfer study indicates overlapping linguistic cues among certain disorders, suggesting potential for leveraging data across conditions and extending the framework to additional modalities and disorders. The work demonstrates that preserving temporal structure and global context is crucial for effective mental-disorder classification from long-form text.

Abstract

Mental disorders pose a global challenge, aggravated by the shortage of qualified mental health professionals. Mental disorder prediction from social media posts by current LLMs is challenging due to the complexities of sequential text data and the limited context length of language models. Current language model-based approaches split a single data instance into multiple chunks to compensate for limited context size. The predictive model is then applied to each chunk individually, and the most voted output is selected as the final prediction. This results in the loss of inter-post dependencies and important time variant information, leading to poor performance. We propose a novel framework which first compresses the large sequence of chronologically ordered social media posts into a series of numbers. We then use this time variant representation for mental disorder classification. We demonstrate the generalization capabilities of our framework by outperforming the current SOTA in three different mental conditions: depression, self-harm, and anorexia, with an absolute improvement of 5% in the F1 score. We investigate the situation where current data instances fall within the context length of language models and present empirical results highlighting the importance of temporal properties of textual data. Furthermore, we utilize the proposed framework for a cross-domain study, exploring commonalities across disorders and the possibility of inter-domain data usage.

Mental Disorder Classification via Temporal Representation of Text

TL;DR

This work tackles the challenge of predicting mental disorders from long, chronologically ordered social-media posts, where traditional chunking of text with large language models loses temporal and inter-post dependencies. It introduces a temporal representation framework built around disorder-specific anchor embeddings and a time-series of cosine similarities to capture how a subject's posts align with a condition over time, followed by time-series classification. The approach yields a 5% absolute improvement in F1 across anorexia, depression, and self-harm over SOTA baselines, with larger gains for self-harm and depression, and substantially reduces computational cost (up to ~330x fewer FLOPs) compared to chunking-based methods. A cross-domain transfer study indicates overlapping linguistic cues among certain disorders, suggesting potential for leveraging data across conditions and extending the framework to additional modalities and disorders. The work demonstrates that preserving temporal structure and global context is crucial for effective mental-disorder classification from long-form text.

Abstract

Mental disorders pose a global challenge, aggravated by the shortage of qualified mental health professionals. Mental disorder prediction from social media posts by current LLMs is challenging due to the complexities of sequential text data and the limited context length of language models. Current language model-based approaches split a single data instance into multiple chunks to compensate for limited context size. The predictive model is then applied to each chunk individually, and the most voted output is selected as the final prediction. This results in the loss of inter-post dependencies and important time variant information, leading to poor performance. We propose a novel framework which first compresses the large sequence of chronologically ordered social media posts into a series of numbers. We then use this time variant representation for mental disorder classification. We demonstrate the generalization capabilities of our framework by outperforming the current SOTA in three different mental conditions: depression, self-harm, and anorexia, with an absolute improvement of 5% in the F1 score. We investigate the situation where current data instances fall within the context length of language models and present empirical results highlighting the importance of temporal properties of textual data. Furthermore, we utilize the proposed framework for a cross-domain study, exploring commonalities across disorders and the possibility of inter-domain data usage.
Paper Structure (37 sections, 5 figures, 10 tables)

This paper contains 37 sections, 5 figures, 10 tables.

Figures (5)

  • Figure 1: An example of four posts made by a person on social media. The intensity of the red colour indicates the extent to which a post indicates depression.
  • Figure 2: This figure shows the overall pipeline of our approach. Here, (1) shows the generation of the anchor embedding from RMHD, (2) shows the creation of temporal representations of social media posts of an individual, and (3) depicts the classification of the temporal representations as control or condition. Generating the anchor embedding is the first step which is followed by representing the posts in a temporal manner. These temporal representations are then used to train the time series classification model to detect the presence of a disorder.
  • Figure 3: Temporal representation of depressed and non-depressed subject. The Y-axis is the cosine similarity value with the anchor embedding, and the X-axis is the posts arranged according to the time of posting.
  • Figure 4: Results for temporal analysis: F1 scores comparison between the permuted input data and the ordered input data for three disorders of condition class
  • Figure 5: F1 scores of the condition class in three eRisk tasks by considering up to 2k context length.