Table of Contents
Fetching ...

SkelMamba: A State Space Model for Efficient Skeleton Action Recognition of Neurological Disorders

Niki Martinel, Mariano Serrao, Christian Micheloni

TL;DR

SkelMamba tackles skeleton-based action recognition by introducing an anatomically-guided, multi-stream state-space model approach. By partitioning channel representations into spatial, temporal, and spatio-temporal streams and employing four-way channel-split scanning, the method captures both local joint dynamics and global motion efficiently, with part-based tokens and attentive SSM integration enabling fine-grained, disease-relevant patterns. The architecture achieves state-of-the-art results on standard benchmarks and demonstrates strong potential for automated neurological disorder diagnosis through a new gait dataset, offering privacy-preserving, scalable motion analysis. Overall, SkelMamba advances skeleton analytics by delivering high accuracy with low computational cost, making it practical for clinical applications and large-scale action recognition tasks alike.

Abstract

We introduce a novel state-space model (SSM)-based framework for skeleton-based human action recognition, with an anatomically-guided architecture that improves state-of-the-art performance in both clinical diagnostics and general action recognition tasks. Our approach decomposes skeletal motion analysis into spatial, temporal, and spatio-temporal streams, using channel partitioning to capture distinct movement characteristics efficiently. By implementing a structured, multi-directional scanning strategy within SSMs, our model captures local joint interactions and global motion patterns across multiple anatomical body parts. This anatomically-aware decomposition enhances the ability to identify subtle motion patterns critical in medical diagnosis, such as gait anomalies associated with neurological conditions. On public action recognition benchmarks, i.e., NTU RGB+D, NTU RGB+D 120, and NW-UCLA, our model outperforms current state-of-the-art methods, achieving accuracy improvements up to $3.2\%$ with lower computational complexity than previous leading transformer-based models. We also introduce a novel medical dataset for motion-based patient neurological disorder analysis to validate our method's potential in automated disease diagnosis.

SkelMamba: A State Space Model for Efficient Skeleton Action Recognition of Neurological Disorders

TL;DR

SkelMamba tackles skeleton-based action recognition by introducing an anatomically-guided, multi-stream state-space model approach. By partitioning channel representations into spatial, temporal, and spatio-temporal streams and employing four-way channel-split scanning, the method captures both local joint dynamics and global motion efficiently, with part-based tokens and attentive SSM integration enabling fine-grained, disease-relevant patterns. The architecture achieves state-of-the-art results on standard benchmarks and demonstrates strong potential for automated neurological disorder diagnosis through a new gait dataset, offering privacy-preserving, scalable motion analysis. Overall, SkelMamba advances skeleton analytics by delivering high accuracy with low computational cost, making it practical for clinical applications and large-scale action recognition tasks alike.

Abstract

We introduce a novel state-space model (SSM)-based framework for skeleton-based human action recognition, with an anatomically-guided architecture that improves state-of-the-art performance in both clinical diagnostics and general action recognition tasks. Our approach decomposes skeletal motion analysis into spatial, temporal, and spatio-temporal streams, using channel partitioning to capture distinct movement characteristics efficiently. By implementing a structured, multi-directional scanning strategy within SSMs, our model captures local joint interactions and global motion patterns across multiple anatomical body parts. This anatomically-aware decomposition enhances the ability to identify subtle motion patterns critical in medical diagnosis, such as gait anomalies associated with neurological conditions. On public action recognition benchmarks, i.e., NTU RGB+D, NTU RGB+D 120, and NW-UCLA, our model outperforms current state-of-the-art methods, achieving accuracy improvements up to with lower computational complexity than previous leading transformer-based models. We also introduce a novel medical dataset for motion-based patient neurological disorder analysis to validate our method's potential in automated disease diagnosis.

Paper Structure

This paper contains 26 sections, 13 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: The overall framework of our proposed SkelMamba architecture.
  • Figure 2: Channel-Wise Spatio-Temporal SSM (C-2D-SSM).