Table of Contents
Fetching ...

Quantifying the Privacy-Utility Trade-off in GPS-based Daily Stress Recognition using Semantic Features

Hoang Khang Phan, Nhat Tan Le

TL;DR

This work tackles the privacy risks of GPS-based stress recognition by introducing a privacy-aware framework that semantically encodes locations via a self-hosted OSM reverse-geocoder and an LLM-bootstrapped static map. It quantifies the privacy-utility trade-off using re-identification attacks, mutual information, and multiple validation schemes, demonstrating that the proposed PA encoding can match non-private baselines in LOSO stress prediction while substantially reducing identity leakage. With Random Forest and XGBoost classifiers, the approach achieves competitive accuracy and F1-scores (e.g., ~67% accuracy, ~64% F1 in PA) and shows that eight of the top ten MI features can be preserved without compromising privacy. Ablation and feature analyses reveal that time-based features, especially deadlines and recreational activity, drive stress predictions, while privacy-preserving transformations remove highly identifying signals such as class schedules, enabling safer deployment in educational contexts. The findings highlight a viable path toward end-to-end privacy-preserving mobile mental-health monitoring with robust generalization and practical implications for students and educators alike.

Abstract

Psychological stress is a widespread issue that significantly impacts student well-being and academic performance. Effective remote stress recognition is crucial, yet existing methods often rely on wearable devices or GPS-based clustering techniques that pose privacy risks. In this study, we introduce a novel, end-to-end privacy-enhanced framework for semantic location encoding using a self-hosted OSM engine and an LLM-bootstrapped static map. We rigorously quantify the privacy-utility trade-off and demonstrate (via LOSO validation) that our Privacy-Aware (PA) model achieves performance statistically indistinguishable from a non-private model, proving that utility does not require sacrificing privacy. Feature importance analysis highlights that recreational activity time, working time, and travel time play a significant role in stress recognition.

Quantifying the Privacy-Utility Trade-off in GPS-based Daily Stress Recognition using Semantic Features

TL;DR

This work tackles the privacy risks of GPS-based stress recognition by introducing a privacy-aware framework that semantically encodes locations via a self-hosted OSM reverse-geocoder and an LLM-bootstrapped static map. It quantifies the privacy-utility trade-off using re-identification attacks, mutual information, and multiple validation schemes, demonstrating that the proposed PA encoding can match non-private baselines in LOSO stress prediction while substantially reducing identity leakage. With Random Forest and XGBoost classifiers, the approach achieves competitive accuracy and F1-scores (e.g., ~67% accuracy, ~64% F1 in PA) and shows that eight of the top ten MI features can be preserved without compromising privacy. Ablation and feature analyses reveal that time-based features, especially deadlines and recreational activity, drive stress predictions, while privacy-preserving transformations remove highly identifying signals such as class schedules, enabling safer deployment in educational contexts. The findings highlight a viable path toward end-to-end privacy-preserving mobile mental-health monitoring with robust generalization and practical implications for students and educators alike.

Abstract

Psychological stress is a widespread issue that significantly impacts student well-being and academic performance. Effective remote stress recognition is crucial, yet existing methods often rely on wearable devices or GPS-based clustering techniques that pose privacy risks. In this study, we introduce a novel, end-to-end privacy-enhanced framework for semantic location encoding using a self-hosted OSM engine and an LLM-bootstrapped static map. We rigorously quantify the privacy-utility trade-off and demonstrate (via LOSO validation) that our Privacy-Aware (PA) model achieves performance statistically indistinguishable from a non-private model, proving that utility does not require sacrificing privacy. Feature importance analysis highlights that recreational activity time, working time, and travel time play a significant role in stress recognition.

Paper Structure

This paper contains 35 sections, 1 equation, 6 figures, 10 tables.

Figures (6)

  • Figure 1: The comparison of the current common GPS feature extraction method and our method. The red oval shape represents the clustering technique for frequent location identification. While GPS signals must be stored in a dataset, which poses privacy threats from exposure of locations, to examine the top frequent locations in the common method, our method showcases a GPS signal transformation technique to secure the privacy of the user's location and prevent information loss from a less frequently visited address.
  • Figure 2: The top-k-accuracy of the re-identification attack in the Limited Knowledge scenario. The Raw GPS line (green) reaches high accuracy at a low $k$, while our PA (orange) and AF (blue) methods require a significantly larger $k$ to achieve similar accuracy.
  • Figure 3: The SHAP beeswarm summary visualization for the stressed classification outcome in RF in AF scenario.
  • Figure 4: The line plot of recreational activity time (in seconds) of stress and non-stress students by week. Stressed student is represented by level 1 and non-stressed otherwise.
  • Figure 5: The line plot of workplace time (in seconds) of stress and non-stress students by week. Stressed student is represented by level 1 and non-stressed otherwise.
  • ...and 1 more figures