Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition
Kun Li, Dan Guo, Guoliang Chen, Chunxiao Fan, Jingyuan Xu, Zhiliang Wu, Hehe Fan, Meng Wang
TL;DR
This work addresses the ambiguity inherent in micro-action recognition (MAR) by introducing the Prototypical Calibrating Ambiguous Network (PCAN). PCAN identifies ambiguous samples via a hierarchical body- and action-level action-tree, learns body- and action-level prototypes, and applies a hierarchical prototypical calibration along with diversity amplification to separate closely related micro-actions. A prototype-guided rectification module refines predictions by leveraging prototype similarities, and the model is trained with a composite loss that includes cross-entropy, hierarchical probability alignment, prototypical contrastive calibration, and prototype diversity terms. Experiments on the MA-52 dataset show that PCAN yields significant improvements over state-of-the-art methods, especially for hard ambiguous actions, and the ablations confirm the effectiveness of the hierarchical structure and calibration losses. The approach provides a principled, scalable way to mitigate action ambiguity in fine-grained, holistic MAR tasks with potential applicability to broader recognition problems with similar ambiguity structures.
Abstract
Micro-Action Recognition (MAR) has gained increasing attention due to its crucial role as a form of non-verbal communication in social interactions, with promising potential for applications in human communication and emotion analysis. However, current approaches often overlook the inherent ambiguity in micro-actions, which arises from the wide category range and subtle visual differences between categories. This oversight hampers the accuracy of micro-action recognition. In this paper, we propose a novel Prototypical Calibrating Ambiguous Network (PCAN) to unleash and mitigate the ambiguity of MAR. Firstly, we employ a hierarchical action-tree to identify the ambiguous sample, categorizing them into distinct sets of ambiguous samples of false negatives and false positives, considering both body- and action-level categories. Secondly, we implement an ambiguous contrastive refinement module to calibrate these ambiguous samples by regulating the distance between ambiguous samples and their corresponding prototypes. This calibration process aims to pull false negative (FN) samples closer to their respective prototypes and push false positive (FP) samples apart from their affiliated prototypes. In addition, we propose a new prototypical diversity amplification loss to strengthen the model's capacity by amplifying the differences between different prototypes. Finally, we propose a prototype-guided rectification to rectify prediction by incorporating the representability of prototypes. Extensive experiments conducted on the benchmark dataset demonstrate the superior performance of our method compared to existing approaches. The code is available at https://github.com/kunli-cs/PCAN.
