Predicting Human Depression with Hybrid Data Acquisition utilizing Physical Activity Sensing and Social Media Feeds
Mohammad Helal Uddin, Sabur Baidya
TL;DR
This study addresses depression prediction by fusing mobile sensor data with social media signals. It introduces a hybrid data acquisition model and an end-to-end analytics pipeline that combines CNN-based activity recognition with Naive Bayes sentiment analysis, followed by downstream depression level classification using SVM. Evaluated on data from 33 participants, the approach achieves high activity recognition (~98%), sentiment accuracy (~95.6%), and robust depression-detection performance (overall accuracy around 89–90% for depression levels), demonstrating meaningful correlations between physical activity, online sentiment, and GDS scores. The work advances privacy-preserving, multi-modal mental health monitoring and outlines practical directions for validation, wearables integration, and model optimization.
Abstract
Mental disorders including depression, anxiety, and other neurological disorders pose a significant global challenge, particularly among individuals exhibiting social avoidance tendencies. This study proposes a hybrid approach by leveraging smartphone sensor data measuring daily physical activities and analyzing their social media (Twitter) interactions for evaluating an individual's depression level. Using CNN-based deep learning models and Naive Bayes classification, we identify human physical activities accurately and also classify the user sentiments. A total of 33 participants were recruited for data acquisition, and nine relevant features were extracted from the physical activities and analyzed with their weekly depression scores, evaluated using the Geriatric Depression Scale (GDS) questionnaire. Of the nine features, six are derived from physical activities, achieving an activity recognition accuracy of 95%, while three features stem from sentiment analysis of Twitter activities, yielding a sentiment analysis accuracy of 95.6%. Notably, several physical activity features exhibited significant correlations with the severity of depression symptoms. For classifying the depression severity, a support vector machine (SVM)-based algorithm is employed that demonstrated a very high accuracy of 94%, outperforming alternative models, e.g., the multilayer perceptron (MLP) and k-nearest neighbor. It is a simple approach yet highly effective in the long run for monitoring depression without breaching personal privacy.
