Exploring Social Media Posts for Depression Identification: A Study on Reddit Dataset
Nandigramam Sai Harshit, Nilesh Kumar Sahu, Haroon R. Lone
TL;DR
The paper addresses predicting depressive states from social media text, focusing on Reddit data. It collects 2022 Reddit posts from depression-related subreddits, labels them with the UMLS Metathesaurus, and evaluates Bag-of-Words features using classical classifiers, with Random Forest achieving 92.28% accuracy. The study demonstrates the feasibility of depression detection from short user posts and highlights the need for larger-scale datasets and cross-platform validation. These results offer a data-driven approach for passive mental health monitoring with potential privacy considerations.
Abstract
Depression is one of the most common mental disorders affecting an individual's personal and professional life. In this work, we investigated the possibility of utilizing social media posts to identify depression in individuals. To achieve this goal, we conducted a preliminary study where we extracted and analyzed the top Reddit posts made in 2022 from depression-related forums. The collected data were labeled as depressive and non-depressive using UMLS Metathesaurus. Further, the pre-processed data were fed to classical machine learning models, where we achieved an accuracy of 92.28\% in predicting the depressive and non-depressive posts.
