Table of Contents
Fetching ...

THÖR-MAGNI Act: Actions for Human Motion Modeling in Robot-Shared Industrial Spaces

Tiago Rodrigues de Almeida, Tim Schreiter, Andrey Rudenko, Luigi Palmieiri, Johannes A. Stork, Achim J. Lilienthal

TL;DR

The paper tackles safe human-robot interaction in industrial spaces by introducing THÖR-MAGNI Act, a fine-grained action-annotated extension to THÖR-MAGNI. It provides 8.3 hours of egocentric-labeled actions aligned with motion cues and gaze across five industrial scenarios, with 14 action labels organized under diverse agent classes. It proposes two transformer-based models—action-conditioned trajectory prediction and multi-task joint trajectory-action prediction—that leverage action and agent-class information. Experiments demonstrate that incorporating action cues improves trajectory predictions and that the joint model can achieve strong trajectory accuracy with competitive action prediction, underscoring the dataset's utility for advancing predictive human motion modeling in robot-shared environments.

Abstract

Accurate human activity and trajectory prediction are crucial for ensuring safe and reliable human-robot interactions in dynamic environments, such as industrial settings, with mobile robots. Datasets with fine-grained action labels for moving people in industrial environments with mobile robots are scarce, as most existing datasets focus on social navigation in public spaces. This paper introduces the THÖR-MAGNI Act dataset, a substantial extension of the THÖR-MAGNI dataset, which captures participant movements alongside robots in diverse semantic and spatial contexts. THÖR-MAGNI Act provides 8.3 hours of manually labeled participant actions derived from egocentric videos recorded via eye-tracking glasses. These actions, aligned with the provided THÖR-MAGNI motion cues, follow a long-tailed distribution with diversified acceleration, velocity, and navigation distance profiles. We demonstrate the utility of THÖR-MAGNI Act for two tasks: action-conditioned trajectory prediction and joint action and trajectory prediction. We propose two efficient transformer-based models that outperform the baselines to address these tasks. These results underscore the potential of THÖR-MAGNI Act to develop predictive models for enhanced human-robot interaction in complex environments.

THÖR-MAGNI Act: Actions for Human Motion Modeling in Robot-Shared Industrial Spaces

TL;DR

The paper tackles safe human-robot interaction in industrial spaces by introducing THÖR-MAGNI Act, a fine-grained action-annotated extension to THÖR-MAGNI. It provides 8.3 hours of egocentric-labeled actions aligned with motion cues and gaze across five industrial scenarios, with 14 action labels organized under diverse agent classes. It proposes two transformer-based models—action-conditioned trajectory prediction and multi-task joint trajectory-action prediction—that leverage action and agent-class information. Experiments demonstrate that incorporating action cues improves trajectory predictions and that the joint model can achieve strong trajectory accuracy with competitive action prediction, underscoring the dataset's utility for advancing predictive human motion modeling in robot-shared environments.

Abstract

Accurate human activity and trajectory prediction are crucial for ensuring safe and reliable human-robot interactions in dynamic environments, such as industrial settings, with mobile robots. Datasets with fine-grained action labels for moving people in industrial environments with mobile robots are scarce, as most existing datasets focus on social navigation in public spaces. This paper introduces the THÖR-MAGNI Act dataset, a substantial extension of the THÖR-MAGNI dataset, which captures participant movements alongside robots in diverse semantic and spatial contexts. THÖR-MAGNI Act provides 8.3 hours of manually labeled participant actions derived from egocentric videos recorded via eye-tracking glasses. These actions, aligned with the provided THÖR-MAGNI motion cues, follow a long-tailed distribution with diversified acceleration, velocity, and navigation distance profiles. We demonstrate the utility of THÖR-MAGNI Act for two tasks: action-conditioned trajectory prediction and joint action and trajectory prediction. We propose two efficient transformer-based models that outperform the baselines to address these tasks. These results underscore the potential of THÖR-MAGNI Act to develop predictive models for enhanced human-robot interaction in complex environments.

Paper Structure

This paper contains 10 sections, 2 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Action annotations for a 4-minute recording of a person carrying storage bins while interacting with a mobile robot, synchronized with the motion capture data. Inset images display snapshots from gaze overlaid videos, featuring visualizations of head orientation vector (red) and gaze vector (green). The length of the arrows on the map denotes the velocity magnitude.
  • Figure 2: Top: Agent class-actions mapping. Grey boxes denote actions, colored boxes represent the agent classes. Bottom: Distribution of action classes in log-scale sorted by descending order, with colors indicating agent classes.
  • Figure 3: Top: 2D acceleration (mean $\pm$ one standard deviation), where values near zero indicate constant velocity. Middle: 2D velocity (mean $\pm$ one standard deviation), where values near zero correspond to static actions. Bottom: navigation distance (mean $\pm$ one standard deviation), where values near zero indicate static actions and higher values reflect walking actions.
  • Figure 4: Action-conditioned models and multi-task learning methods (additional yellow branch). Dashed arrows indicate methods using agent class, while dotted arrows represent baseline models where $\mathbf{S}$ excludes actions in the trajectory prediction task.
  • Figure 5: Prediction examples for Carrier--Box in Scenario 3, for our multi-task learning framework ("MTL-OURS", left) for joint trajectory and action prediction, and for our action-conditioned trajectory prediction ("ACT-OURS", right), with a 4.8s prediction horizon.