Table of Contents
Fetching ...

DySTAN: Joint Modeling of Sedentary Activity and Social Context from Smartphone Sensors

Aditya Sneh, Nilesh Kumar Sahu, Snehil Gupta, Haroon R. Lone

TL;DR

DySTAN addresses the challenge of jointly recognizing sedentary activity and social context from smartphone inertial data. It introduces a Dynamic Cross-Stitch Unit and cross-task attention to share representations while preserving task-specific information, evaluated on the SSCD dataset collected via the LogMe app from 57 university students. The results show DySTAN outperforms single-task and multi-task baselines with strong ablations demonstrating the value of dynamic fusion and attention, and robust, consistent performance across folds. The work highlights the importance of modeling co-occurring contexts for accurate, context-aware mobile interventions, while acknowledging demographic limitations and proposing broader future data collection.

Abstract

Accurately recognizing human context from smartphone sensor data remains a significant challenge, especially in sedentary settings where activities such as studying, attending lectures, relaxing, and eating exhibit highly similar inertial patterns. Furthermore, social context plays a critical role in understanding user behavior, yet is often overlooked in mobile sensing research. To address these gaps, we introduce LogMe, a mobile sensing application that passively collects smartphone sensor data (accelerometer, gyroscope, magnetometer, and rotation vector) and prompts users for hourly self-reports capturing both sedentary activity and social context. Using this dual-label dataset, we propose DySTAN (Dynamic Cross-Stitch with Task Attention Network), a multi-task learning framework that jointly classifies both context dimensions from shared sensor inputs. It integrates task-specific layers with cross-task attention to model subtle distinctions effectively. DySTAN improves sedentary activity macro F1 scores by 21.8% over a single-task CNN-BiLSTM-GRU (CBG) model and by 8.2% over the strongest multi-task baseline, Sluice Network (SN). These results demonstrate the importance of modeling multiple, co-occurring context dimensions to improve the accuracy and robustness of mobile context recognition.

DySTAN: Joint Modeling of Sedentary Activity and Social Context from Smartphone Sensors

TL;DR

DySTAN addresses the challenge of jointly recognizing sedentary activity and social context from smartphone inertial data. It introduces a Dynamic Cross-Stitch Unit and cross-task attention to share representations while preserving task-specific information, evaluated on the SSCD dataset collected via the LogMe app from 57 university students. The results show DySTAN outperforms single-task and multi-task baselines with strong ablations demonstrating the value of dynamic fusion and attention, and robust, consistent performance across folds. The work highlights the importance of modeling co-occurring contexts for accurate, context-aware mobile interventions, while acknowledging demographic limitations and proposing broader future data collection.

Abstract

Accurately recognizing human context from smartphone sensor data remains a significant challenge, especially in sedentary settings where activities such as studying, attending lectures, relaxing, and eating exhibit highly similar inertial patterns. Furthermore, social context plays a critical role in understanding user behavior, yet is often overlooked in mobile sensing research. To address these gaps, we introduce LogMe, a mobile sensing application that passively collects smartphone sensor data (accelerometer, gyroscope, magnetometer, and rotation vector) and prompts users for hourly self-reports capturing both sedentary activity and social context. Using this dual-label dataset, we propose DySTAN (Dynamic Cross-Stitch with Task Attention Network), a multi-task learning framework that jointly classifies both context dimensions from shared sensor inputs. It integrates task-specific layers with cross-task attention to model subtle distinctions effectively. DySTAN improves sedentary activity macro F1 scores by 21.8% over a single-task CNN-BiLSTM-GRU (CBG) model and by 8.2% over the strongest multi-task baseline, Sluice Network (SN). These results demonstrate the importance of modeling multiple, co-occurring context dimensions to improve the accuracy and robustness of mobile context recognition.

Paper Structure

This paper contains 13 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: LogMe application flow.
  • Figure 2: DySTAN architecture for joint classification of sedentary activity and social context. The model processes 13-channel IMU sensor data through shared and task-specific CNN layers, dynamically fuses representations with a cross-stitch unit, applies cross-task attention, and uses bidirectional LSTMs followed by separate output heads for each context.
  • Figure 3: Comparison of Context and Social Context metrics across models. Stars indicate best performing models.
  • Figure 4: Confusion matrices for Social Context and Sedentary Activity classification are presented for each best model (DySTAN, SN, CBG, AROMA). For Social Context, the classes are Alone (A), With Someone (Engage in Conversation) (WSEIC), and With Someone (Not Engage in Conversation) (WSNEIC). The Sedentary Activity classes include Attending Lectures (AL), Eating (E), Relaxing (R), and Studying (S).