Table of Contents
Fetching ...

Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition

Kun Li, Dan Guo, Guoliang Chen, Chunxiao Fan, Jingyuan Xu, Zhiliang Wu, Hehe Fan, Meng Wang

TL;DR

This work addresses the ambiguity inherent in micro-action recognition (MAR) by introducing the Prototypical Calibrating Ambiguous Network (PCAN). PCAN identifies ambiguous samples via a hierarchical body- and action-level action-tree, learns body- and action-level prototypes, and applies a hierarchical prototypical calibration along with diversity amplification to separate closely related micro-actions. A prototype-guided rectification module refines predictions by leveraging prototype similarities, and the model is trained with a composite loss that includes cross-entropy, hierarchical probability alignment, prototypical contrastive calibration, and prototype diversity terms. Experiments on the MA-52 dataset show that PCAN yields significant improvements over state-of-the-art methods, especially for hard ambiguous actions, and the ablations confirm the effectiveness of the hierarchical structure and calibration losses. The approach provides a principled, scalable way to mitigate action ambiguity in fine-grained, holistic MAR tasks with potential applicability to broader recognition problems with similar ambiguity structures.

Abstract

Micro-Action Recognition (MAR) has gained increasing attention due to its crucial role as a form of non-verbal communication in social interactions, with promising potential for applications in human communication and emotion analysis. However, current approaches often overlook the inherent ambiguity in micro-actions, which arises from the wide category range and subtle visual differences between categories. This oversight hampers the accuracy of micro-action recognition. In this paper, we propose a novel Prototypical Calibrating Ambiguous Network (PCAN) to unleash and mitigate the ambiguity of MAR. Firstly, we employ a hierarchical action-tree to identify the ambiguous sample, categorizing them into distinct sets of ambiguous samples of false negatives and false positives, considering both body- and action-level categories. Secondly, we implement an ambiguous contrastive refinement module to calibrate these ambiguous samples by regulating the distance between ambiguous samples and their corresponding prototypes. This calibration process aims to pull false negative (FN) samples closer to their respective prototypes and push false positive (FP) samples apart from their affiliated prototypes. In addition, we propose a new prototypical diversity amplification loss to strengthen the model's capacity by amplifying the differences between different prototypes. Finally, we propose a prototype-guided rectification to rectify prediction by incorporating the representability of prototypes. Extensive experiments conducted on the benchmark dataset demonstrate the superior performance of our method compared to existing approaches. The code is available at https://github.com/kunli-cs/PCAN.

Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition

TL;DR

This work addresses the ambiguity inherent in micro-action recognition (MAR) by introducing the Prototypical Calibrating Ambiguous Network (PCAN). PCAN identifies ambiguous samples via a hierarchical body- and action-level action-tree, learns body- and action-level prototypes, and applies a hierarchical prototypical calibration along with diversity amplification to separate closely related micro-actions. A prototype-guided rectification module refines predictions by leveraging prototype similarities, and the model is trained with a composite loss that includes cross-entropy, hierarchical probability alignment, prototypical contrastive calibration, and prototype diversity terms. Experiments on the MA-52 dataset show that PCAN yields significant improvements over state-of-the-art methods, especially for hard ambiguous actions, and the ablations confirm the effectiveness of the hierarchical structure and calibration losses. The approach provides a principled, scalable way to mitigate action ambiguity in fine-grained, holistic MAR tasks with potential applicability to broader recognition problems with similar ambiguity structures.

Abstract

Micro-Action Recognition (MAR) has gained increasing attention due to its crucial role as a form of non-verbal communication in social interactions, with promising potential for applications in human communication and emotion analysis. However, current approaches often overlook the inherent ambiguity in micro-actions, which arises from the wide category range and subtle visual differences between categories. This oversight hampers the accuracy of micro-action recognition. In this paper, we propose a novel Prototypical Calibrating Ambiguous Network (PCAN) to unleash and mitigate the ambiguity of MAR. Firstly, we employ a hierarchical action-tree to identify the ambiguous sample, categorizing them into distinct sets of ambiguous samples of false negatives and false positives, considering both body- and action-level categories. Secondly, we implement an ambiguous contrastive refinement module to calibrate these ambiguous samples by regulating the distance between ambiguous samples and their corresponding prototypes. This calibration process aims to pull false negative (FN) samples closer to their respective prototypes and push false positive (FP) samples apart from their affiliated prototypes. In addition, we propose a new prototypical diversity amplification loss to strengthen the model's capacity by amplifying the differences between different prototypes. Finally, we propose a prototype-guided rectification to rectify prediction by incorporating the representability of prototypes. Extensive experiments conducted on the benchmark dataset demonstrate the superior performance of our method compared to existing approaches. The code is available at https://github.com/kunli-cs/PCAN.

Paper Structure

This paper contains 23 sections, 13 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: TOP: Micro-Action Recognition (MAR) aims to recognize the body-level and action-level micro-action categories, particularly when dealing with ambiguous samples. For example, "touching shoulder" and "touching neck" belong to the "body-hand" but exhibit subtle visual differences. BOTTOM: Our approach is motivated by the need to address ambiguities in MAR. We begin by identifying ambiguous samples (marked in ✗) that are prone to misclassification. Subsequently, we construct prototypes for each category within the body and action levels, and then align the ambiguous samples with the corresponding prototypes within the feature space. Please Zoom in for details.
  • Figure 2: The overview of Prototypical Calibrating Ambiguous Network (PCAN). In Ambiguous Samples Identification (§\ref{['sec:asi']}), we discover ambiguous samples ($\mathbb{FN}$ and $\mathbb{FP}$) through the preliminary prediction scores. In Ambiguous Samples Contrastive Calibration (§\ref{['sec:ascc']}), we use contrastive prototype calibration and prototype diversity amplification losses to calibrate the prototypes and eliminate the influence of ambiguous samples. In Prototype-guided Rectification (§\ref{['sec:ppr']}), we incorporate the similarity between established prototypes and video embedding for action-level category prediction.
  • Figure 3: Top-1 Accuracy (%) on ambiguous micro-actions for the MA-52 dataset.
  • Figure 4: Comparative analysis of the PoseConv3D duan2022revisiting and ours PCAN. The green categories are the micro-action categories being explored. Our PCAN method demonstrates an excellent ability to categorize ambiguous action categories. The proposed PCAN exhibits robust performance at both body-level and action-level.
  • Figure 5: The group-wise Top-1 accuracy improvement (%) of our method compared to PoseConv3D on the MA-52 dataset.