JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
Simindokht Jahangard, Zhixi Cai, Shiki Wen, Hamid Rezatofighi
TL;DR
JRDB-Social introduces a robot-centered, three-level annotation dataset to study human social behavior in varied contexts. It combines individual attributes, intra-group dynamics, and social-group context, with text descriptions and a standardized annotation toolbox. The paper benchmarks vision-language models on these tasks, revealing that current models excel at predicting demographic attributes but struggle with higher-level social reasoning and group context. The dataset and evaluation framework provide a resource for developing socially aware robotic perception and interaction systems.
Abstract
Understanding human social behaviour is crucial in computer vision and robotics. Micro-level observations like individual actions fall short, necessitating a comprehensive approach that considers individual behaviour, intra-group dynamics, and social group levels for a thorough understanding. To address dataset limitations, this paper introduces JRDB-Social, an extension of JRDB. Designed to fill gaps in human understanding across diverse indoor and outdoor social contexts, JRDB-Social provides annotations at three levels: individual attributes, intra-group interactions, and social group context. This dataset aims to enhance our grasp of human social dynamics for robotic applications. Utilizing the recent cutting-edge multi-modal large language models, we evaluated our benchmark to explore their capacity to decipher social human behaviour.
