Mental Disorder Classification via Temporal Representation of Text
Raja Kumar, Kishan Maharaj, Ashita Saxena, Pushpak Bhattacharyya
TL;DR
This work tackles the challenge of predicting mental disorders from long, chronologically ordered social-media posts, where traditional chunking of text with large language models loses temporal and inter-post dependencies. It introduces a temporal representation framework built around disorder-specific anchor embeddings and a time-series of cosine similarities to capture how a subject's posts align with a condition over time, followed by time-series classification. The approach yields a 5% absolute improvement in F1 across anorexia, depression, and self-harm over SOTA baselines, with larger gains for self-harm and depression, and substantially reduces computational cost (up to ~330x fewer FLOPs) compared to chunking-based methods. A cross-domain transfer study indicates overlapping linguistic cues among certain disorders, suggesting potential for leveraging data across conditions and extending the framework to additional modalities and disorders. The work demonstrates that preserving temporal structure and global context is crucial for effective mental-disorder classification from long-form text.
Abstract
Mental disorders pose a global challenge, aggravated by the shortage of qualified mental health professionals. Mental disorder prediction from social media posts by current LLMs is challenging due to the complexities of sequential text data and the limited context length of language models. Current language model-based approaches split a single data instance into multiple chunks to compensate for limited context size. The predictive model is then applied to each chunk individually, and the most voted output is selected as the final prediction. This results in the loss of inter-post dependencies and important time variant information, leading to poor performance. We propose a novel framework which first compresses the large sequence of chronologically ordered social media posts into a series of numbers. We then use this time variant representation for mental disorder classification. We demonstrate the generalization capabilities of our framework by outperforming the current SOTA in three different mental conditions: depression, self-harm, and anorexia, with an absolute improvement of 5% in the F1 score. We investigate the situation where current data instances fall within the context length of language models and present empirical results highlighting the importance of temporal properties of textual data. Furthermore, we utilize the proposed framework for a cross-domain study, exploring commonalities across disorders and the possibility of inter-domain data usage.
