Micro-gesture Online Recognition using Learnable Query Points
Pengyu Liu, Fei Wang, Kun Li, Guoliang Chen, Yanyan Wei, Shengeng Tang, Zhiliang Wu, Dan Guo
TL;DR
This work tackles Micro-gesture Online Recognition by reframing it as a set-prediction problem with learnable query points and vectors. It extends the PointTAD baseline with a Mamba-MHSA block and a Multi-Level Interactive Module to better model temporal semantics and boundary localization, evaluated on the SMG dataset. The proposed method achieves $F1=14.34$ and ranks second in the MiGA track, demonstrating improved MG discrimination and boundary detection, with ablations guiding design choices such as $N_q$, window size, decoder depth, and Mamba blocks. Future work includes integrating skeletal data to further enhance recognition performance and robustness.
Abstract
In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the Micro-gesture Online Recognition track in the MiGA challenge at IJCAI 2024. The Micro-gesture Online Recognition task involves identifying the category and locating the start and end times of micro-gestures in video clips. Compared to the typical Temporal Action Detection task, the Micro-gesture Online Recognition task focuses more on distinguishing between micro-gestures and pinpointing the start and end times of actions. Our solution ranks 2nd in the Micro-gesture Online Recognition track.
