Table of Contents
Fetching ...

SETransformer: A Hybrid Attention-Based Architecture for Robust Human Activity Recognition

Yunbo Liu, Xukui Qin, Yifan Gao, Xiang Li, Chengwei Feng

TL;DR

SETransformer tackles HAR from wearable sensors by fusing Transformer-based temporal modeling with channel-level Squeeze-and-Excitation attention and a learnable temporal attention pooling mechanism. The approach enables robust long-range dependency capture while adaptively weighting sensor channels and time steps, trained end-to-end on the WISDM dataset. It achieves state-of-the-art performance, with a validation accuracy of 84.68% and macro F1 of 84.64%, outperforming LSTM, GRU, BiLSTM, and CNN baselines. The work demonstrates strong potential for real-world mobile and ubiquitous sensing applications, and highlights avenues for future work including multi-modal data integration and edge-friendly deployment.

Abstract

Human Activity Recognition (HAR) using wearable sensor data has become a central task in mobile computing, healthcare, and human-computer interaction. Despite the success of traditional deep learning models such as CNNs and RNNs, they often struggle to capture long-range temporal dependencies and contextual relevance across multiple sensor channels. To address these limitations, we propose SETransformer, a hybrid deep neural architecture that combines Transformer-based temporal modeling with channel-wise squeeze-and-excitation (SE) attention and a learnable temporal attention pooling mechanism. The model takes raw triaxial accelerometer data as input and leverages global self-attention to capture activity-specific motion dynamics over extended time windows, while adaptively emphasizing informative sensor channels and critical time steps. We evaluate SETransformer on the WISDM dataset and demonstrate that it significantly outperforms conventional models including LSTM, GRU, BiLSTM, and CNN baselines. The proposed model achieves a validation accuracy of 84.68\% and a macro F1-score of 84.64\%, surpassing all baseline architectures by a notable margin. Our results show that SETransformer is a competitive and interpretable solution for real-world HAR tasks, with strong potential for deployment in mobile and ubiquitous sensing applications.

SETransformer: A Hybrid Attention-Based Architecture for Robust Human Activity Recognition

TL;DR

SETransformer tackles HAR from wearable sensors by fusing Transformer-based temporal modeling with channel-level Squeeze-and-Excitation attention and a learnable temporal attention pooling mechanism. The approach enables robust long-range dependency capture while adaptively weighting sensor channels and time steps, trained end-to-end on the WISDM dataset. It achieves state-of-the-art performance, with a validation accuracy of 84.68% and macro F1 of 84.64%, outperforming LSTM, GRU, BiLSTM, and CNN baselines. The work demonstrates strong potential for real-world mobile and ubiquitous sensing applications, and highlights avenues for future work including multi-modal data integration and edge-friendly deployment.

Abstract

Human Activity Recognition (HAR) using wearable sensor data has become a central task in mobile computing, healthcare, and human-computer interaction. Despite the success of traditional deep learning models such as CNNs and RNNs, they often struggle to capture long-range temporal dependencies and contextual relevance across multiple sensor channels. To address these limitations, we propose SETransformer, a hybrid deep neural architecture that combines Transformer-based temporal modeling with channel-wise squeeze-and-excitation (SE) attention and a learnable temporal attention pooling mechanism. The model takes raw triaxial accelerometer data as input and leverages global self-attention to capture activity-specific motion dynamics over extended time windows, while adaptively emphasizing informative sensor channels and critical time steps. We evaluate SETransformer on the WISDM dataset and demonstrate that it significantly outperforms conventional models including LSTM, GRU, BiLSTM, and CNN baselines. The proposed model achieves a validation accuracy of 84.68\% and a macro F1-score of 84.64\%, surpassing all baseline architectures by a notable margin. Our results show that SETransformer is a competitive and interpretable solution for real-world HAR tasks, with strong potential for deployment in mobile and ubiquitous sensing applications.

Paper Structure

This paper contains 21 sections, 13 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Confusion matrix of the SE-Transformer model on the test set.
  • Figure 2: Enter Caption