Table of Contents
Fetching ...

Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding

Rong Gao, Xin Liu, Bohao Xing, Zitong Yu, Bjorn W. Schuller, Heikki Kälviäinen

TL;DR

This work introduces identity-free micro-gestures (MGs) as privacy-preserving cues for holistic emotion understanding and proposes a Spatial-Temporal-Balanced Dual-stream Contrastive Learning framework to recognize MGs and infer emotions. It introduces the iMiGUE dataset, detailing MG and emotion annotations, MG augmentation techniques, and an adaptive graph learning approach to fuse spatial and temporal information. The authors show state-of-the-art performance for MG recognition among unsupervised methods and competitive results against supervised baselines on large-scale datasets, and they validate MGs as auxiliary information through large language models, demonstrating improved emotion inference when MG cues are included. The study offers a practical pathway toward MG-informed Emotion AI and highlights the potential of combining MG analytics with LLM-based reasoning for richer social understanding, including deception detection and interview scenarios.

Abstract

In this work, we focus on a special group of human body language -- the micro-gesture (MG), which differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey information to others, but rather unintentional behaviors driven by inner feelings. This characteristic introduces two novel challenges regarding micro-gestures that are worth rethinking. The first is whether strategies designed for other action recognition are entirely applicable to micro-gestures. The second is whether micro-gestures, as supplementary data, can provide additional insights for emotional understanding. In recognizing micro-gestures, we explored various augmentation strategies that take into account the subtle spatial and brief temporal characteristics of micro-gestures, often accompanied by repetitiveness, to determine more suitable augmentation methods. Considering the significance of temporal domain information for micro-gestures, we introduce a simple and efficient plug-and-play spatiotemporal balancing fusion method. We not only studied our method on the considered micro-gesture dataset but also conducted experiments on mainstream action datasets. The results show that our approach performs well in micro-gesture recognition and on other datasets, achieving state-of-the-art performance compared to previous micro-gesture recognition methods. For emotional understanding based on micro-gestures, we construct complex emotional reasoning scenarios. Our evaluation, conducted with large language models, shows that micro-gestures play a significant and positive role in enhancing comprehensive emotional understanding. The scenarios we developed can be extended to other micro-gesture-based tasks such as deception detection and interviews. We confirm that our new insights contribute to advancing research in micro-gesture and emotional artificial intelligence.

Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding

TL;DR

This work introduces identity-free micro-gestures (MGs) as privacy-preserving cues for holistic emotion understanding and proposes a Spatial-Temporal-Balanced Dual-stream Contrastive Learning framework to recognize MGs and infer emotions. It introduces the iMiGUE dataset, detailing MG and emotion annotations, MG augmentation techniques, and an adaptive graph learning approach to fuse spatial and temporal information. The authors show state-of-the-art performance for MG recognition among unsupervised methods and competitive results against supervised baselines on large-scale datasets, and they validate MGs as auxiliary information through large language models, demonstrating improved emotion inference when MG cues are included. The study offers a practical pathway toward MG-informed Emotion AI and highlights the potential of combining MG analytics with LLM-based reasoning for richer social understanding, including deception detection and interview scenarios.

Abstract

In this work, we focus on a special group of human body language -- the micro-gesture (MG), which differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey information to others, but rather unintentional behaviors driven by inner feelings. This characteristic introduces two novel challenges regarding micro-gestures that are worth rethinking. The first is whether strategies designed for other action recognition are entirely applicable to micro-gestures. The second is whether micro-gestures, as supplementary data, can provide additional insights for emotional understanding. In recognizing micro-gestures, we explored various augmentation strategies that take into account the subtle spatial and brief temporal characteristics of micro-gestures, often accompanied by repetitiveness, to determine more suitable augmentation methods. Considering the significance of temporal domain information for micro-gestures, we introduce a simple and efficient plug-and-play spatiotemporal balancing fusion method. We not only studied our method on the considered micro-gesture dataset but also conducted experiments on mainstream action datasets. The results show that our approach performs well in micro-gesture recognition and on other datasets, achieving state-of-the-art performance compared to previous micro-gesture recognition methods. For emotional understanding based on micro-gestures, we construct complex emotional reasoning scenarios. Our evaluation, conducted with large language models, shows that micro-gestures play a significant and positive role in enhancing comprehensive emotional understanding. The scenarios we developed can be extended to other micro-gesture-based tasks such as deception detection and interviews. We confirm that our new insights contribute to advancing research in micro-gesture and emotional artificial intelligence.
Paper Structure (31 sections, 17 equations, 14 figures, 8 tables)

This paper contains 31 sections, 17 equations, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Frames from post-match press conference videos exhibit various identity-free micro-gestures (MGs). The objective of this study is to empower machines to recognize these MGs, comprehend the players' emotional states holistically, and subsequently determine whether the player has won or lost the match (positive or negative emotional states).
  • Figure 2: Previous methods mostly focused on spatial feature mining, and the augmentation techniques used were not designed with the characteristics of micro-gestures in mind. Therefore, it is worth carefully considering how to balance temporal and spatial information and exploring augmentation methods suitable for micro-gestures. Our previous research Liu_2021_CVPR has demonstrated to some extent the effectiveness of using only micro-gestures for emotional understanding. Now, let us consider more realistic scenarios: can micro-gestures, as an auxiliary role, provide additional effective information?
  • Figure 3: Categorie distribution and examples of the iMiGUE dataset: Categories of MGs in iMiGUE dataset which refers to psychological studies ekman2009tellingpease2008definitivenavarro2016every; Sample percentages of each category in the iMiGUE dataset; Examples (face masked) of MG categories in the iMiGUE dataset.
  • Figure 4: Micro-gesture skeleton data augmentation methods.
  • Figure 5: Repeatability example of micro gestures.
  • ...and 9 more figures