DySTAN: Joint Modeling of Sedentary Activity and Social Context from Smartphone Sensors
Aditya Sneh, Nilesh Kumar Sahu, Snehil Gupta, Haroon R. Lone
TL;DR
DySTAN addresses the challenge of jointly recognizing sedentary activity and social context from smartphone inertial data. It introduces a Dynamic Cross-Stitch Unit and cross-task attention to share representations while preserving task-specific information, evaluated on the SSCD dataset collected via the LogMe app from 57 university students. The results show DySTAN outperforms single-task and multi-task baselines with strong ablations demonstrating the value of dynamic fusion and attention, and robust, consistent performance across folds. The work highlights the importance of modeling co-occurring contexts for accurate, context-aware mobile interventions, while acknowledging demographic limitations and proposing broader future data collection.
Abstract
Accurately recognizing human context from smartphone sensor data remains a significant challenge, especially in sedentary settings where activities such as studying, attending lectures, relaxing, and eating exhibit highly similar inertial patterns. Furthermore, social context plays a critical role in understanding user behavior, yet is often overlooked in mobile sensing research. To address these gaps, we introduce LogMe, a mobile sensing application that passively collects smartphone sensor data (accelerometer, gyroscope, magnetometer, and rotation vector) and prompts users for hourly self-reports capturing both sedentary activity and social context. Using this dual-label dataset, we propose DySTAN (Dynamic Cross-Stitch with Task Attention Network), a multi-task learning framework that jointly classifies both context dimensions from shared sensor inputs. It integrates task-specific layers with cross-task attention to model subtle distinctions effectively. DySTAN improves sedentary activity macro F1 scores by 21.8% over a single-task CNN-BiLSTM-GRU (CBG) model and by 8.2% over the strongest multi-task baseline, Sluice Network (SN). These results demonstrate the importance of modeling multiple, co-occurring context dimensions to improve the accuracy and robustness of mobile context recognition.
