QUB-PHEO: A Visual-Based Dyadic Multi-View Dataset for Intention Inference in Collaborative Assembly
Samuel Adebayo, Seán McLoone, Joost C. Dessing
TL;DR
QUB-PHEO tackles the need for rich, multi-view data to infer human intentions in collaborative assembly by introducing a five-camera dyadic dataset where a human acts as a robot surrogate. The dataset comprises 70 participants (50 with full video data) and 36 subtasks, with dense visual annotations including facial landmarks, gaze, hand movements, and object bounding boxes, enabling fine-grained intention inference. The authors describe an end-to-end pipeline—calibration with Charuco boards, gaze mapping inspired by GazeScape, Label Studio-based annotation, and a YOLOv8-based object detector—delivering high-quality multi-view data (4.5 million frames, 36 hours of video). They also provide a formal framework for subtask classification and next-subtask inference, along with pathways for broader CV and HRI applications, under an EULA to foster community contributions and real-world impact.
Abstract
QUB-PHEO introduces a visual-based, dyadic dataset with the potential of advancing human-robot interaction (HRI) research in assembly operations and intention inference. This dataset captures rich multimodal interactions between two participants, one acting as a 'robot surrogate,' across a variety of assembly tasks that are further broken down into 36 distinct subtasks. With rich visual annotations, such as facial landmarks, gaze, hand movements, object localization, and more for 70 participants, QUB-PHEO offers two versions: full video data for 50 participants and visual cues for all 70. Designed to improve machine learning models for HRI, QUB-PHEO enables deeper analysis of subtle interaction cues and intentions, promising contributions to the field. The dataset will be available at https://github.com/exponentialR/QUB-PHEO subject to an End-User License Agreement (EULA).
