Table of Contents
Fetching ...

Human Activity Recognition from Wearable Sensor Data Using Self-Attention

Saif Mahmud, M Tanjid Hasan Tonmoy, Kishor Kumar Bhaumik, A K M Mahbubur Rahman, M Ashraful Amin, Mohammad Shoyaib, Muhammad Asif Hossain Khan, Amin Ahsan Ali

TL;DR

This work tackles Human Activity Recognition from multi-sensor time-series data by replacing recurrent architectures with a transformer-inspired self-attention model. It introduces sensor modality attention, multi-head self-attention blocks, and a global temporal attention module to produce discriminative window-level representations without recurrence. Attention computations follow a transformer-style mechanism, with the core operation described as $softmax(QK^T / \sqrt{d_k})V$, complemented by positional encoding to preserve sequence order. Across four public HAR datasets (PAMAP2, Opportunity, USC-HAD, Skoda), the approach yields superior window-wise performance and robust Leave-One-Subject-Out generalization, while providing interpretable sensor-attention maps that indicate sensor placements' relevance to each activity.

Abstract

Human Activity Recognition from body-worn sensor data poses an inherent challenge in capturing spatial and temporal dependencies of time-series signals. In this regard, the existing recurrent or convolutional or their hybrid models for activity recognition struggle to capture spatio-temporal context from the feature space of sensor reading sequence. To address this complex problem, we propose a self-attention based neural network model that foregoes recurrent architectures and utilizes different types of attention mechanisms to generate higher dimensional feature representation used for classification. We performed extensive experiments on four popular publicly available HAR datasets: PAMAP2, Opportunity, Skoda and USC-HAD. Our model achieve significant performance improvement over recent state-of-the-art models in both benchmark test subjects and Leave-one-subject-out evaluation. We also observe that the sensor attention maps produced by our model is able capture the importance of the modality and placement of the sensors in predicting the different activity classes.

Human Activity Recognition from Wearable Sensor Data Using Self-Attention

TL;DR

This work tackles Human Activity Recognition from multi-sensor time-series data by replacing recurrent architectures with a transformer-inspired self-attention model. It introduces sensor modality attention, multi-head self-attention blocks, and a global temporal attention module to produce discriminative window-level representations without recurrence. Attention computations follow a transformer-style mechanism, with the core operation described as , complemented by positional encoding to preserve sequence order. Across four public HAR datasets (PAMAP2, Opportunity, USC-HAD, Skoda), the approach yields superior window-wise performance and robust Leave-One-Subject-Out generalization, while providing interpretable sensor-attention maps that indicate sensor placements' relevance to each activity.

Abstract

Human Activity Recognition from body-worn sensor data poses an inherent challenge in capturing spatial and temporal dependencies of time-series signals. In this regard, the existing recurrent or convolutional or their hybrid models for activity recognition struggle to capture spatio-temporal context from the feature space of sensor reading sequence. To address this complex problem, we propose a self-attention based neural network model that foregoes recurrent architectures and utilizes different types of attention mechanisms to generate higher dimensional feature representation used for classification. We performed extensive experiments on four popular publicly available HAR datasets: PAMAP2, Opportunity, Skoda and USC-HAD. Our model achieve significant performance improvement over recent state-of-the-art models in both benchmark test subjects and Leave-one-subject-out evaluation. We also observe that the sensor attention maps produced by our model is able capture the importance of the modality and placement of the sensors in predicting the different activity classes.

Paper Structure

This paper contains 14 sections, 7 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Attention based model incorporating self-attention and global temporal attention
  • Figure 2: Activity window for walking activity in PAMAP2 dataset where timespan = 1 Sec
  • Figure 3: Performance measure against different window sizes
  • Figure 4: Attention weights on different sensor modality based on predicted class label in PAMAP2 dataset (e.g. Ironing involves higher attention weight for hand accelerometer, moderate attention for chest accelerometer and relatively low weights for other sensor placements)