BabyMamba-HAR: Lightweight Selective State Space Models for Efficient Human Activity Recognition on Resource Constrained Devices

Mridankan Mandal

BabyMamba-HAR: Lightweight Selective State Space Models for Efficient Human Activity Recognition on Resource Constrained Devices

Mridankan Mandal

TL;DR

This work tackles efficient HAR on resource-constrained devices by adopting lightweight selective state space models (SSMs) inspired by Mamba. It introduces two architectures, CI-BabyMamba-HAR and Crossover-BiDir-BabyMamba-HAR, combining weight-tied bidirectional SSMs with a gated temporal attention head to achieve strong accuracy with tens of thousands of parameters and low MACs. Across eight diverse benchmarks, the Crossover variant matches or approaches state-of-the-art TinyML HAR baselines while offering up to an order of magnitude reduction in computations on high-channel datasets, underscoring its practical impact for wearable and mobile HAR. Systematic ablations reveal bidirectionality and gated pooling as key contributors, and provide concrete guidelines for selecting stem architecture based on sensor configuration and correlation, enabling robust, scalable TinyML HAR deployments.

Abstract

Human activity recognition (HAR) on wearable and mobile devices is constrained by memory footprint and computational budget, yet competitive accuracy must be maintained across heterogeneous sensor configurations. Selective state space models (SSMs) offer linear time sequence processing with input dependent gating, presenting a compelling alternative to quadratic complexity attention mechanisms. However, the design space for deploying SSMs in the TinyML regime remains largely unexplored. In this paper, BabyMamba-HAR is introduced, a framework comprising two novel lightweight Mamba inspired architectures optimized for resource constrained HAR: (1) CI-BabyMamba-HAR, using a channel independent stem that processes each sensor channel through shared weight, but instance independent transformations to prevent cross channel noise propagation, and (2) Crossover-BiDir-BabyMamba-HAR, using an early fusion stem that achieves channel count independent computational complexity. Both variants incorporate weight tied bidirectional scanning and lightweight temporal attention pooling. Through evaluation across eight diverse benchmarks, it is demonstrated that Crossover-BiDir-BabyMamba-HAR achieves 86.52% average macro F1-score with approximately 27K parameters and 2.21M MACs, matching TinyHAR (86.16%) while requiring 11x fewer MACs on high channel datasets. Systematic ablation studies reveal that bidirectional scanning contributes up to 8.42% F1-score improvement, and gated temporal attention provides up to 8.94% F1-score gain over mean pooling. These findings establish practical design principles for deploying selective state space models as efficient TinyML backbones for HAR.

BabyMamba-HAR: Lightweight Selective State Space Models for Efficient Human Activity Recognition on Resource Constrained Devices

TL;DR

Abstract

Paper Structure (35 sections, 9 equations, 5 figures, 8 tables)

This paper contains 35 sections, 9 equations, 5 figures, 8 tables.

Introduction
Related Work
Efficient HAR Architectures
State Space Models and Mamba
Methodology
Problem Formulation
Selective State Space Formulation
BabyMamba-HAR Architecture Family
CI-BabyMamba-HAR (Channel Independent)
Crossover-BiDir-BabyMamba-HAR (Early Fusion)
Weight Tied Bidirectional Scanning
Context Gated Temporal Attention Pooling
Computational Complexity
Experimental Setup
Evaluation Protocol
...and 20 more sections

Figures (5)

Figure 1: CI-BabyMamba-HAR architecture. The channel independent stem processes each sensor channel through shared weight Conv1D, BatchNorm, and SiLU layers. Weight tied bidirectional SSM blocks enable forward and backward temporal processing. A context gated temporal attention head aggregates features before classification.
Figure 2: Crossover-BiDir-BabyMamba-HAR architecture. The early fusion stem projects all input channels to $d_{\text{model}}$ features through a single Conv1D operation. Bidirectional SSM blocks with crossover connections process the fused representation. Backbone computational complexity is independent of input channel count.
Figure 3: Weight tied Bidirectional Selective state space (SSM) block architecture. Input dependent projections control discretization step $\Delta_t$ and state matrices $\mathbf{B}_t$, $\mathbf{C}_t$. The parallel scan implementation maintains linear time complexity $O(N)$.
Figure 4: Performance comparison grid across all eight HAR benchmark datasets showing macro F1-scores for each model-dataset combination.
Figure 5: Combined ablation study results showing $\Delta$F1 relative to baseline configurations. (C) Channel processing ablation across all 8 datasets comparing CI-BabyMamba vs. Crossover architectures with channel independent stem variants. (B) Hyperparameter sensitivity (d_state, d_model, expand factor) on 2 representative datasets. (A) Architecture ablation (bidirectionality A2, pooling A3, stem A4) on 4 datasets. (L) Sequence length scaling (64--512 timesteps) on 4 datasets. Yellow/gold bars: CI-BabyMamba-HAR, and purple bars: Crossover-BabyMamba-HAR. Baseline variants (A0, B0, C0, L64) shown with bold outlines.

BabyMamba-HAR: Lightweight Selective State Space Models for Efficient Human Activity Recognition on Resource Constrained Devices

TL;DR

Abstract

BabyMamba-HAR: Lightweight Selective State Space Models for Efficient Human Activity Recognition on Resource Constrained Devices

Authors

TL;DR

Abstract

Table of Contents

Figures (5)