THÖR-MAGNI Act: Actions for Human Motion Modeling in Robot-Shared Industrial Spaces
Tiago Rodrigues de Almeida, Tim Schreiter, Andrey Rudenko, Luigi Palmieiri, Johannes A. Stork, Achim J. Lilienthal
TL;DR
The paper tackles safe human-robot interaction in industrial spaces by introducing THÖR-MAGNI Act, a fine-grained action-annotated extension to THÖR-MAGNI. It provides 8.3 hours of egocentric-labeled actions aligned with motion cues and gaze across five industrial scenarios, with 14 action labels organized under diverse agent classes. It proposes two transformer-based models—action-conditioned trajectory prediction and multi-task joint trajectory-action prediction—that leverage action and agent-class information. Experiments demonstrate that incorporating action cues improves trajectory predictions and that the joint model can achieve strong trajectory accuracy with competitive action prediction, underscoring the dataset's utility for advancing predictive human motion modeling in robot-shared environments.
Abstract
Accurate human activity and trajectory prediction are crucial for ensuring safe and reliable human-robot interactions in dynamic environments, such as industrial settings, with mobile robots. Datasets with fine-grained action labels for moving people in industrial environments with mobile robots are scarce, as most existing datasets focus on social navigation in public spaces. This paper introduces the THÖR-MAGNI Act dataset, a substantial extension of the THÖR-MAGNI dataset, which captures participant movements alongside robots in diverse semantic and spatial contexts. THÖR-MAGNI Act provides 8.3 hours of manually labeled participant actions derived from egocentric videos recorded via eye-tracking glasses. These actions, aligned with the provided THÖR-MAGNI motion cues, follow a long-tailed distribution with diversified acceleration, velocity, and navigation distance profiles. We demonstrate the utility of THÖR-MAGNI Act for two tasks: action-conditioned trajectory prediction and joint action and trajectory prediction. We propose two efficient transformer-based models that outperform the baselines to address these tasks. These results underscore the potential of THÖR-MAGNI Act to develop predictive models for enhanced human-robot interaction in complex environments.
