Motion Matters: Motion-guided Modulation Network for Skeleton-based Micro-Action Recognition
Jihao Gu, Kun Li, Fei Wang, Yanyan Wei, Zhiliang Wu, Hehe Fan, Meng Wang
TL;DR
This paper tackles the challenge of recognizing micro-actions from skeletal data by explicitly modeling subtle motion cues. It introduces the Motion-guided Modulation Network (MMN), which decomposes motion into skeletal-level modulation (MSM) and temporal-level modulation (MTM), and couples them with a motion consistency learning pipeline to fuse multi-scale features. The approach leverages skeleton-aware embeddings, skeleton-temporal positional encoding, and skeletal-temporal formers to produce discriminative spatiotemporal representations. Experiments on MA-52 and iMiGUE demonstrate state-of-the-art performance with strong efficiency, validating the importance of explicit, motion-guided modulation for micro-action recognition.
Abstract
Micro-Actions (MAs) are an important form of non-verbal communication in social interactions, with potential applications in human emotional analysis. However, existing methods in Micro-Action Recognition often overlook the inherent subtle changes in MAs, which limits the accuracy of distinguishing MAs with subtle changes. To address this issue, we present a novel Motion-guided Modulation Network (MMN) that implicitly captures and modulates subtle motion cues to enhance spatial-temporal representation learning. Specifically, we introduce a Motion-guided Skeletal Modulation module (MSM) to inject motion cues at the skeletal level, acting as a control signal to guide spatial representation modeling. In parallel, we design a Motion-guided Temporal Modulation module (MTM) to incorporate motion information at the frame level, facilitating the modeling of holistic motion patterns in micro-actions. Finally, we propose a motion consistency learning strategy to aggregate the motion cues from multi-scale features for micro-action classification. Experimental results on the Micro-Action 52 and iMiGUE datasets demonstrate that MMN achieves state-of-the-art performance in skeleton-based micro-action recognition, underscoring the importance of explicitly modeling subtle motion cues. The code will be available at https://github.com/momiji-bit/MMN.
