SiT-MLP: A Simple MLP with Point-wise Topology Feature Learning for Skeleton-based Action Recognition
Shaojie Zhang, Jianqin Yin, Yonghao Dang, Jiajun Fu
TL;DR
The paper addresses skeleton-based action recognition by moving away from predefined human priors used in GCNs. It introduces STGU, a gate-based, MLP-backed module that learns point-wise, sample-specific spatial topology without priors, culminating in the SiT-MLP model. Empirical results on NTU RGB+D 60/120 and Northwestern-UCLA show competitive accuracy with far fewer parameters and improved efficiency, highlighting the viability of prior-free, MLP-based approaches for skeleton sequences. This work suggests that simple, adaptable MLP architectures can model global joint relationships effectively, offering generalization benefits and real-time deployment potential.
Abstract
Graph convolution networks (GCNs) have achieved remarkable performance in skeleton-based action recognition. However, previous GCN-based methods rely on elaborate human priors excessively and construct complex feature aggregation mechanisms, which limits the generalizability and effectiveness of networks. To solve these problems, we propose a novel Spatial Topology Gating Unit (STGU), an MLP-based variant without extra priors, to capture the co-occurrence topology features that encode the spatial dependency across all joints. In STGU, to learn the point-wise topology features, a new gate-based feature interaction mechanism is introduced to activate the features point-to-point by the attention map generated from the input sample. Based on the STGU, we propose the first MLP-based model, SiT-MLP, for skeleton-based action recognition in this work. Compared with previous methods on three large-scale datasets, SiT-MLP achieves competitive performance. In addition, SiT-MLP reduces the parameters significantly with favorable results. The code will be available at https://github.com/BUPTSJZhang/SiT?MLP.
