FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition

Jinsheng Wei; Zhaodi Xu; Guanming Lu; Haoyu Chen; Jingjie Yan

FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition

Jinsheng Wei, Zhaodi Xu, Guanming Lu, Haoyu Chen, Jingjie Yan

Abstract

Micro-gesture recognition (MGR) is challenging due to subtle inter-class variations. Existing methods rely on category-level supervision, which is insufficient for capturing subtle and localized motion differences. Thus, this paper proposes a Fine-Grained Semantic Guidance Learning (FG-SGL) framework that jointly integrates fine-grained and category-level semantics to guide vision--language models in perceiving local MG motions. FG-SA adopts fine-grained semantic cues to guide the learning of local motion features, while CP-A enhances the separability of MG features through category-level semantic guidance. To support fine-grained semantic guidance, this work constructs a fine-grained textual dataset with human annotations that describes the dynamic process of MGs in four refined semantic dimensions. Furthermore, a Multi-Level Contrastive Optimization strategy is designed to jointly optimize both modules in a coarse-to-fine pattern. Experiments show that FG-SGL achieves competitive performance, validating the effectiveness of fine-grained semantic guidance for MGR.

FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition

Abstract

Paper Structure (24 sections, 3 equations, 2 figures, 3 tables)

This paper contains 24 sections, 3 equations, 2 figures, 3 tables.

Introduction
Related work
Micro-gesture Datasets
Micro-gesture Recognition
Method
Overall Framework
Structured Semantic Prior: FG-Text
Fine-Grained Semantic Alignment (FG-SA)
Category Prototype Alignment (CP-A)
Multi-Level Contrastive Optimization
Experiments
Experimental Setup
Datasets and protocol
Video preprocessing
Backbone and model configuration
...and 9 more sections

Figures (2)

Figure 1: Comparison between coarse category-level supervision and fine-grained semantic supervision for MGR. While coarse labels provide only global guidance, fine-grained semantics decompose a MG into localized motion attributes, enabling more discriminative supervision.
Figure 2: Overall framework of FG-SGL. The framework introduces fine-grained instance-aware semantic alignment (FG-SA) and category-level prototype alignment (CP-A) on mid-level and high-level video representations, respectively, and jointly optimizes them under a unified learning objective.

FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition

Abstract

FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition

Authors

Abstract

Table of Contents

Figures (2)