Nonverbal Interaction Detection
Jianan Wei, Tianfei Zhou, Yi Yang, Wenguan Wang
TL;DR
Nonverbal interaction understanding is framed as a unified problem that integrates multiple social signals rather than treating them in isolation. The authors introduce the NVI dataset and the NVI-DET task, formalized as the triplet $\langle\text{individual},\text{group},\text{interaction}\rangle$, and propose the dual multi-scale NVI-DEHR hypergraph to capture high-order social relations. The approach delivers state-of-the-art results on NVI-DET and strong generalization to HOI-DET benchmarks, demonstrating effective cross-task transfer. Overall, this work establishes a foundation for holistic social-signal analysis and points to future directions such as temporal dynamics and proximal cues in real-world settings.
Abstract
This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the linguistic counterparts, and existing solutions typically examine nonverbal cues in isolation. Our study marks the first systematic effort to enhance the interpretation of multifaceted nonverbal signals. First, we contribute a novel large-scale dataset, called NVI, which is meticulously annotated to include bounding boxes for humans and corresponding social groups, along with 22 atomic-level nonverbal behaviors under five broad interaction types. Second, we establish a new task NVI-DET for nonverbal interaction detection, which is formalized as identifying triplets in the form <individual, group, interaction> from images. Third, we propose a nonverbal interaction detection hypergraph (NVI-DEHR), a new approach that explicitly models high-order nonverbal interactions using hypergraphs. Central to the model is a dual multi-scale hypergraph that adeptly addresses individual-to-individual and group-to-group correlations across varying scales, facilitating interactional feature learning and eventually improving interaction prediction. Extensive experiments on NVI show that NVI-DEHR improves various baselines significantly in NVI-DET. It also exhibits leading performance on HOI-DET, confirming its versatility in supporting related tasks and strong generalization ability. We hope that our study will offer the community new avenues to explore nonverbal signals in more depth.
