Table of Contents
Fetching ...

HARMamba: Efficient and Lightweight Wearable Sensor Human Activity Recognition Based on Bidirectional Mamba

Shuangjian Li, Tao Zhu, Furong Duan, Liming Chen, Huansheng Ning, Christopher Nugent, Yaping Wan

TL;DR

HAR on wearable sensors demands high accuracy with low computational cost for long sequences. HARMamba combines a bidirectional selective State Space Model with patch-based, channel-independent processing and hardware-aware optimization to achieve real-time inference on mobile devices. It demonstrates strongF1 and accuracy on PAMAP2, WISDM, UNIMIB SHAR, and UCI while using significantly fewer parameters and FLOPs than comparable methods, highlighting practical edge deployment potential. The approach offers a scalable HAR backbone with potential extensions to self-supervised learning and cross-domain gesture recognition.

Abstract

Wearable sensor-based human activity recognition (HAR) is a critical research domain in activity perception. However, achieving high efficiency and long sequence recognition remains a challenge. Despite the extensive investigation of temporal deep learning models, such as CNNs, RNNs, and transformers, their extensive parameters often pose significant computational and memory constraints, rendering them less suitable for resource-constrained mobile health applications. This study introduces HARMamba, an innovative light-weight and versatile HAR architecture that combines selective bidirectional State Spaces Model and hardware-aware design. To optimize real-time resource consumption in practical scenarios, HARMamba employs linear recursive mechanisms and parameter discretization, allowing it to selectively focus on relevant input sequences while efficiently fusing scan and recompute operations. The model employs independent channels to process sensor data streams, dividing each channel into patches and appending classification tokens to the end of the sequence. It utilizes position embedding to represent the sequence order. The patch sequence is subsequently processed by HARMamba Block, and the classification head finally outputs the activity category. The HARMamba Block serves as the fundamental component of the HARMamba architecture, enabling the effective capture of more discriminative activity sequence features. HARMamba outperforms contemporary state-of-the-art frameworks, delivering comparable or better accuracy with significantly reducing computational and memory demands. It's effectiveness has been extensively validated on 4 publically available datasets namely PAMAP2, WISDM, UNIMIB SHAR and UCI. The F1 scores of HARMamba on the four datasets are 99.74%, 99.20%, 88.23% and 97.01%, respectively.

HARMamba: Efficient and Lightweight Wearable Sensor Human Activity Recognition Based on Bidirectional Mamba

TL;DR

HAR on wearable sensors demands high accuracy with low computational cost for long sequences. HARMamba combines a bidirectional selective State Space Model with patch-based, channel-independent processing and hardware-aware optimization to achieve real-time inference on mobile devices. It demonstrates strongF1 and accuracy on PAMAP2, WISDM, UNIMIB SHAR, and UCI while using significantly fewer parameters and FLOPs than comparable methods, highlighting practical edge deployment potential. The approach offers a scalable HAR backbone with potential extensions to self-supervised learning and cross-domain gesture recognition.

Abstract

Wearable sensor-based human activity recognition (HAR) is a critical research domain in activity perception. However, achieving high efficiency and long sequence recognition remains a challenge. Despite the extensive investigation of temporal deep learning models, such as CNNs, RNNs, and transformers, their extensive parameters often pose significant computational and memory constraints, rendering them less suitable for resource-constrained mobile health applications. This study introduces HARMamba, an innovative light-weight and versatile HAR architecture that combines selective bidirectional State Spaces Model and hardware-aware design. To optimize real-time resource consumption in practical scenarios, HARMamba employs linear recursive mechanisms and parameter discretization, allowing it to selectively focus on relevant input sequences while efficiently fusing scan and recompute operations. The model employs independent channels to process sensor data streams, dividing each channel into patches and appending classification tokens to the end of the sequence. It utilizes position embedding to represent the sequence order. The patch sequence is subsequently processed by HARMamba Block, and the classification head finally outputs the activity category. The HARMamba Block serves as the fundamental component of the HARMamba architecture, enabling the effective capture of more discriminative activity sequence features. HARMamba outperforms contemporary state-of-the-art frameworks, delivering comparable or better accuracy with significantly reducing computational and memory demands. It's effectiveness has been extensively validated on 4 publically available datasets namely PAMAP2, WISDM, UNIMIB SHAR and UCI. The F1 scores of HARMamba on the four datasets are 99.74%, 99.20%, 88.23% and 97.01%, respectively.
Paper Structure (23 sections, 15 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 23 sections, 15 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: illustrates the schematic of our proposed HARMamba model. Initially, we render the input sensor signal sequence channel-agnostic, segment the sequence of each channel into patch sequences, and subsequently map them to patch tokens while incorporating position encoding for each token. Subsequently, the token sequence is fed into the HARMamba architecture. In contrast to Mamba, which is tailored for text sequences, the HARMamba encoder executes bidirectional processing of token sequences.
  • Figure 2: t-SNE visualization results. Subfigures (a), (b), (c) and (d) display sample t-SNE visualization results without any training, while subfigures (e), (f), (g) and (h) show the t-SNE visualization results output by the pre-trained model. The t-SNE visualization results show samples from four different datasets: PAMAP2, WISDM, UNIMIB SHAR and UCI. The activity categories are represented by different colours.
  • Figure 3: The confusion matrices on the PAMAP2 dataset between MTHARduan2023multi and HARMamba. (a) MTHAR, (b) HARMamba
  • Figure 4: The memory efficiency results of Transformer and HARMamba models are compared under different patch sizes on the Pamap2 dataset.