Table of Contents
Fetching ...

MMA: A Momentum Mamba Architecture for Human Activity Recognition with Inertial Sensors

Thai-Khanh Nguyen, Uyen Vo, Tan M. Nguyen, Thieu N. Vo, Trung-Hieu Le, Cuong Pham

TL;DR

This work targets human activity recognition from inertial sensors, addressing gradient instability and limited long-range modeling in conventional deep models. It introduces Momentum Mamba, a momentum-augmented selective state-space model that injects second-order dynamics via a velocity state, preserving linear-time computation while improving gradient flow and temporal expressiveness. Extensions include Complex Momentum Mamba for frequency-selective memory and Adam Momentum Mamba for variance-aware adaptivity, all designed to robustly model long-horizon inertial sequences. Empirical results on MuWiGes, UESTC-MMEA-CL, and MMAct show consistent accuracy gains over Vanilla Mamba and Transformer baselines, with the Complex Variant providing the strongest performance at the cost of higher resources. The approach offers a scalable, edge-friendly paradigm for HAR and points to broader applicability in sequence modeling tasks requiring stable long-range memory and content-aware processing.

Abstract

Human activity recognition (HAR) from inertial sensors is essential for ubiquitous computing, mobile health, and ambient intelligence. Conventional deep models such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and transformers have advanced HAR but remain limited by vanishing or exloding gradients, high computational cost, and difficulty in capturing long-range dependencies. Structured state-space models (SSMs) like Mamba address these challenges with linear complexity and effective temporal modeling, yet they are restricted to first-order dynamics without stable longterm memory mechanisms. We introduce Momentum Mamba, a momentum-augmented SSM that incorporates second-order dynamics to improve stability of information flow across time steps, robustness, and long-sequence modeling. Two extensions further expand its capacity: Complex Momentum Mamba for frequency-selective memory scaling. Experiments on multiple HAR benchmarks demonstrate consistent gains over vanilla Mamba and Transformer baselines in accuracy, robustness, and convergence speed. With only moderate increases in training cost, momentum-augmented SSMs offer a favorable accuracy-efficiency balance, establishing them as a scalable paradigm for HAR and a promising principal framework for broader sequence modeling applications.

MMA: A Momentum Mamba Architecture for Human Activity Recognition with Inertial Sensors

TL;DR

This work targets human activity recognition from inertial sensors, addressing gradient instability and limited long-range modeling in conventional deep models. It introduces Momentum Mamba, a momentum-augmented selective state-space model that injects second-order dynamics via a velocity state, preserving linear-time computation while improving gradient flow and temporal expressiveness. Extensions include Complex Momentum Mamba for frequency-selective memory and Adam Momentum Mamba for variance-aware adaptivity, all designed to robustly model long-horizon inertial sequences. Empirical results on MuWiGes, UESTC-MMEA-CL, and MMAct show consistent accuracy gains over Vanilla Mamba and Transformer baselines, with the Complex Variant providing the strongest performance at the cost of higher resources. The approach offers a scalable, edge-friendly paradigm for HAR and points to broader applicability in sequence modeling tasks requiring stable long-range memory and content-aware processing.

Abstract

Human activity recognition (HAR) from inertial sensors is essential for ubiquitous computing, mobile health, and ambient intelligence. Conventional deep models such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and transformers have advanced HAR but remain limited by vanishing or exloding gradients, high computational cost, and difficulty in capturing long-range dependencies. Structured state-space models (SSMs) like Mamba address these challenges with linear complexity and effective temporal modeling, yet they are restricted to first-order dynamics without stable longterm memory mechanisms. We introduce Momentum Mamba, a momentum-augmented SSM that incorporates second-order dynamics to improve stability of information flow across time steps, robustness, and long-sequence modeling. Two extensions further expand its capacity: Complex Momentum Mamba for frequency-selective memory scaling. Experiments on multiple HAR benchmarks demonstrate consistent gains over vanilla Mamba and Transformer baselines in accuracy, robustness, and convergence speed. With only moderate increases in training cost, momentum-augmented SSMs offer a favorable accuracy-efficiency balance, establishing them as a scalable paradigm for HAR and a promising principal framework for broader sequence modeling applications.

Paper Structure

This paper contains 32 sections, 4 theorems, 29 equations, 5 figures, 5 tables.

Key Result

Proposition 1

Let $\{h_n\}$ be the hidden states of the Mamba architecture defined by where $\overline{A}_n = \exp(\Delta_n A)$ is diagonal with entries $e^{\Delta_n a_{n,i}}$ for $a_{n,i}<0$ and $\Delta_n>0$. Then the gradient of the loss $L$ with respect to $h_t$ satisfies If $\min_i a_{n,i}\ll 0$, then so gradients vanish exponentially with sequence length.

Figures (5)

  • Figure 1: Overall architecture of the proposed Momentum Mamba (MMA) framework for inertial HAR. The pipeline consists of three main stages: a lightweight Conv1D front-end for local feature extraction, stacked Momentum Mamba layers for temporal modeling, and a compact classification head for activity recognition. At its core, each Momentum Mamba block augments the standard Mamba recurrence with an auxiliary momentum state $v_n$ that accumulates input-driven updates through learnable parameters $(\alpha, \beta)$. This dual-state design smooths high-frequency fluctuations, stabilizes long-range dynamics, and improves robustness while preserving the linear-time scan efficiency of structured state-space models.
  • Figure 2: Sensor signal examples from three benchmark datasets. The second and third rows display accelerometer and gyroscope signals, respectively, from the MuWiGes nguyen2023hand, UESTC-MMEA-CL xu2023towards, and MMAct kong2019mmact datasets. These signals reflect the temporal variations of multi-axis motion data captured during human activities.
  • Figure 3: Two-Step Rescaling Explanations tsinterpret: Comparative Saliency Analysis of Transformer, Mamba, and MomentumMamba Models on UESTC Dataset, where warmer colors (yellow) indicate higher feature importance and cooler colors (purple) indicate lower importance.
  • Figure 4: $\ell_{2}$ norm of the gradients of the loss $\mathcal{L}$ w.r.t. the state vector $h_{t}$ at each time step $t$ for Vanilla Mamba (left) and MomentumMamba (right). MomentumMamba does not suffer from vanishing gradients.
  • Figure 5: Hyperparameter Grid Search Results for Momentum Mamba Model. Heatmap showing test accuracy (%) across different combinations of momentum beta ($\beta$) and momentum alpha ($\alpha$) parameters. Each cell displays the test accuracy percentage achieved with the corresponding hyperparameter combination. The color scale ranges from lower accuracy (red) to higher accuracy (green), from the MuWiGes nguyen2023hand, UESTC-MMEA-CL xu2023towards, and MMAct kong2019mmact datasets.

Theorems & Definitions (13)

  • Proposition 1: Vanishing Gradients in Mamba
  • proof
  • Proposition 2: Affine Recurrence Form
  • proof
  • Remark 1: Parallelization and Temporal Smoothing
  • Proposition 3: Gradient Propagation in Momentum Mamba
  • proof
  • Remark 2: Gradient Preservation by Momentum
  • Remark 3: Exploding Gradients
  • Remark 4: Frequency-Aware Filtering
  • ...and 3 more