MBE-ARI: A Multimodal Dataset Mapping Bi-directional Engagement in Animal-Robot Interaction
Ian Noronha, Advait Prasad Jawaji, Juan Camilo Soto, Jiajun An, Yan Gu, Upinder Kaur
TL;DR
The paper tackles the lack of resources for meaningful animal–robot interaction by introducing MBE-ARI, a multimodal dataset of cow–robot engagements with synchronized RGB-D data and extensive pose annotations, plus a specialized full-body pose estimator for quadrupeds achieving a high mAP of 92.7%. It details a controlled experiment design, comprehensive data acquisition, and rigorous annotation of 39 animal and 12 robot keypoints across defined interaction phases. The pose-estimation framework combines HRNet with Faster R-CNN, leveraging multi-resolution features and ROI pooling to deliver precise keypoint localization, validated against strong baselines. Collectively, the work lays a robust foundation for perception, reasoning, and autonomous control in ARI, with practical implications for animal welfare and environmental conservation, and points to future avenues in dynamic path planning and real-time adaptation.
Abstract
Animal-robot interaction (ARI) remains an unexplored challenge in robotics, as robots struggle to interpret the complex, multimodal communication cues of animals, such as body language, movement, and vocalizations. Unlike human-robot interaction, which benefits from established datasets and frameworks, animal-robot interaction lacks the foundational resources needed to facilitate meaningful bidirectional communication. To bridge this gap, we present the MBE-ARI (Multimodal Bidirectional Engagement in Animal-Robot Interaction), a novel multimodal dataset that captures detailed interactions between a legged robot and cows. The dataset includes synchronized RGB-D streams from multiple viewpoints, annotated with body pose and activity labels across interaction phases, offering an unprecedented level of detail for ARI research. Additionally, we introduce a full-body pose estimation model tailored for quadruped animals, capable of tracking 39 keypoints with a mean average precision (mAP) of 92.7%, outperforming existing benchmarks in animal pose estimation. The MBE-ARI dataset and our pose estimation framework lay a robust foundation for advancing research in animal-robot interaction, providing essential tools for developing perception, reasoning, and interaction frameworks needed for effective collaboration between robots and animals. The dataset and resources are publicly available at https://github.com/RISELabPurdue/MBE-ARI/, inviting further exploration and development in this critical area.
