Table of Contents
Fetching ...

MicroBi-ConvLSTM: An Ultra-Lightweight Efficient Model for Human Activity Recognition on Resource Constrained Devices

Mridankan Mandal

TL;DR

MicroBi-ConvLSTM is presented, an ultra-lightweight convolutional-recurrent architecture achieving 11.4K parameters on average through two stage convolutional feature extraction with 4x temporal pooling and a single bidirectional LSTM layer while preserving linear O(N) complexity.

Abstract

Human Activity Recognition (HAR) on resource constrained wearables requires models that balance accuracy against strict memory and computational budgets. State of the art lightweight architectures such as TinierHAR (34K parameters) and TinyHAR (55K parameters) achieve strong accuracy, but exceed memory budgets of microcontrollers with limited SRAM once operating system overhead is considered. We present MicroBi-ConvLSTM, an ultra-lightweight convolutional-recurrent architecture achieving 11.4K parameters on average through two stage convolutional feature extraction with 4x temporal pooling and a single bidirectional LSTM layer. This represents 2.9x parameter reduction versus TinierHAR and 11.9x versus DeepConvLSTM while preserving linear O(N) complexity. Evaluation across eight diverse HAR benchmarks shows that MicroBi-ConvLSTM maintains competitive performance within the ultra-lightweight regime: 93.41% macro F1 on UCI-HAR, 94.46% on SKODA assembly gestures, and 88.98% on Daphnet gait freeze detection. Systematic ablation reveals task dependent component contributions where bidirectionality benefits episodic event detection, but provides marginal gains on periodic locomotion. INT8 post training quantization incurs only 0.21% average F1-score degradation, yielding a 23.0 KB average deployment footprint suitable for memory constrained edge devices.

MicroBi-ConvLSTM: An Ultra-Lightweight Efficient Model for Human Activity Recognition on Resource Constrained Devices

TL;DR

MicroBi-ConvLSTM is presented, an ultra-lightweight convolutional-recurrent architecture achieving 11.4K parameters on average through two stage convolutional feature extraction with 4x temporal pooling and a single bidirectional LSTM layer while preserving linear O(N) complexity.

Abstract

Human Activity Recognition (HAR) on resource constrained wearables requires models that balance accuracy against strict memory and computational budgets. State of the art lightweight architectures such as TinierHAR (34K parameters) and TinyHAR (55K parameters) achieve strong accuracy, but exceed memory budgets of microcontrollers with limited SRAM once operating system overhead is considered. We present MicroBi-ConvLSTM, an ultra-lightweight convolutional-recurrent architecture achieving 11.4K parameters on average through two stage convolutional feature extraction with 4x temporal pooling and a single bidirectional LSTM layer. This represents 2.9x parameter reduction versus TinierHAR and 11.9x versus DeepConvLSTM while preserving linear O(N) complexity. Evaluation across eight diverse HAR benchmarks shows that MicroBi-ConvLSTM maintains competitive performance within the ultra-lightweight regime: 93.41% macro F1 on UCI-HAR, 94.46% on SKODA assembly gestures, and 88.98% on Daphnet gait freeze detection. Systematic ablation reveals task dependent component contributions where bidirectionality benefits episodic event detection, but provides marginal gains on periodic locomotion. INT8 post training quantization incurs only 0.21% average F1-score degradation, yielding a 23.0 KB average deployment footprint suitable for memory constrained edge devices.
Paper Structure (31 sections, 3 equations, 8 figures, 7 tables)

This paper contains 31 sections, 3 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: uBi-ConvLSTM architecture overview. Input sensor signals ($C$ channels $\times$$T$ timesteps) pass through two convolutional blocks with batch normalization, ReLU activation, and 2$\times$ max pooling each, achieving 4$\times$ total temporal compression. A single bidirectional LSTM (hidden dimension 24) processes the compressed sequence, with the final timestep representation feeding the classification head. Parameter count varies with input channels and output classes.
  • Figure 2: Parameters, MACs, FLOPs, Model Size (in KB), F1-score per million MACs, and F1-score per thousand parameters distributions across architectures and datasets. Box plots show mean, and standard deviations across five random seeds. uBi-ConvLSTM (leftmost in each group) maintains competitive variance despite 2.9$\times$ fewer parameters than TinierHAR.
  • Figure 3: Pareto frontier: F1-score versus MACs (log scale). Each point represents one model-dataset combination. uBi-ConvLSTM (blue) occupies the top left region, achieving competitive accuracy at lower computational cost.
  • Figure 4: Efficiency heatmap: F1-score per thousand parameters across datasets. Darker cells indicate higher parameter efficiency. uBi-ConvLSTM advantage is consistent across benchmarks rather than dataset-specific.
  • Figure 5: FP32 versus INT8 F1-scores for uBi-ConvLSTM across datasets. The near diagonal alignment demonstrates quantization robustness, with average degradation of only 0.21%.
  • ...and 3 more figures