DoGCLR: Dominance-Game Contrastive Learning Network for Skeleton-Based Action Recognition
Yanshan Li, Ke Ma, Miaomiao Wei, Linhui Dai
TL;DR
DoGCLR introduces a Dominance-Game framework for skeleton-based action recognition, modeling positive and negative sample construction as a joint game to balance semantic preservation and discriminative power. It couples a Spatio-temporal Dual-Weight Localization (DW-KRM) with Dual-scale Game-based Augmentation (DGA) for positive samples and an Entropy-driven Dominance Game Replacement Queue (EDGRQ) for negative samples, incorporating region-aware augmentations and entropy-based memory management. The approach achieves state-of-the-art or competitive results on NTU RGB+D 60/120 and PKU-MMD benchmarks, demonstrating improved motion-region modeling, hard-negative diversity, and robust generalization across views and setups. This work advances self-supervised skeleton action learning by integrating game-theoretic optimization with region-aware augmentations and entropy-driven memory strategies, enabling stronger representations for downstream recognition tasks.
Abstract
Existing self-supervised contrastive learning methods for skeleton-based action recognition often process all skeleton regions uniformly, and adopt a first-in-first-out (FIFO) queue to store negative samples, which leads to motion information loss and non-optimal negative sample selection. To address these challenges, this paper proposes Dominance-Game Contrastive Learning network for skeleton-based action Recognition (DoGCLR), a self-supervised framework based on game theory. DoGCLR models the construction of positive and negative samples as a dynamic Dominance Game, where both sample types interact to reach an equilibrium that balances semantic preservation and discriminative strength. Specifically, a spatio-temporal dual weight localization mechanism identifies key motion regions and guides region-wise augmentations to enhance motion diversity while maintaining semantics. In parallel, an entropy-driven dominance strategy manages the memory bank by retaining high entropy (hard) negatives and replacing low-entropy (weak) ones, ensuring consistent exposure to informative contrastive signals. Extensive experiments are conducted on NTU RGB+D and PKU-MMD datasets. On NTU RGB+D 60 X-Sub/X-View, DoGCLR achieves 81.1%/89.4% accuracy, and on NTU RGB+D 120 X-Sub/X-Set, DoGCLR achieves 71.2%/75.5% accuracy, surpassing state-of-the-art methods by 0.1%, 2.7%, 1.1%, and 2.3%, respectively. On PKU-MMD Part I/Part II, DoGCLR performs comparably to the state-of-the-art methods and achieves a 1.9% higher accuracy on Part II, highlighting its strong robustness on more challenging scenarios.
