Table of Contents
Fetching ...

THÖR-MAGNI: A Large-scale Indoor Motion Capture Recording of Human Movement and Robot Interaction

Tim Schreiter, Tiago Rodrigues de Almeida, Yufei Zhu, Eduardo Gutierrez Maestro, Lucas Morillo-Mendez, Andrey Rudenko, Luigi Palmieri, Tomasz P. Kucner, Martin Magnusson, Achim J. Lilienthal

TL;DR

THÖR-MAGNI addresses the lack of richly contextual indoor motion data for human motion analysis and human-robot interaction by providing a large-scale, multi-modal dataset collected in varied indoor scenarios. The dataset combines ground-truth motion capture, eye tracking, LiDAR, and robot sensor data across 52 runs with 40 participants and several robot behaviors, enabling factorized studies of goal-directed human motion, social navigation, and HRI. Its contributions include diverse scenario design, explicit context cues, multi-robot interactions, and a companion toolbox (thor-magni-tools) and visualization dashboard, facilitating preprocessing, analysis, and visualization. The dataset supports long-horizon trajectory prediction, social dynamics research, and proactive robot assistance studies, with potential to drive benchmarks for multi-modal indoor trajectory modeling in real-world workplaces.

Abstract

We present a new large dataset of indoor human and robot navigation and interaction, called THÖR-MAGNI, that is designed to facilitate research on social navigation: e.g., modelling and predicting human motion, analyzing goal-oriented interactions between humans and robots, and investigating visual attention in a social interaction context. THÖR-MAGNI was created to fill a gap in available datasets for human motion analysis and HRI. This gap is characterized by a lack of comprehensive inclusion of exogenous factors and essential target agent cues, which hinders the development of robust models capable of capturing the relationship between contextual cues and human behavior in different scenarios. Unlike existing datasets, THÖR-MAGNI includes a broader set of contextual features and offers multiple scenario variations to facilitate factor isolation. The dataset includes many social human-human and human-robot interaction scenarios, rich context annotations, and multi-modal data, such as walking trajectories, gaze tracking data, and lidar and camera streams recorded from a mobile robot. We also provide a set of tools for visualization and processing of the recorded data. THÖR-MAGNI is, to the best of our knowledge, unique in the amount and diversity of sensor data collected in a contextualized and socially dynamic environment, capturing natural human-robot interactions.

THÖR-MAGNI: A Large-scale Indoor Motion Capture Recording of Human Movement and Robot Interaction

TL;DR

THÖR-MAGNI addresses the lack of richly contextual indoor motion data for human motion analysis and human-robot interaction by providing a large-scale, multi-modal dataset collected in varied indoor scenarios. The dataset combines ground-truth motion capture, eye tracking, LiDAR, and robot sensor data across 52 runs with 40 participants and several robot behaviors, enabling factorized studies of goal-directed human motion, social navigation, and HRI. Its contributions include diverse scenario design, explicit context cues, multi-robot interactions, and a companion toolbox (thor-magni-tools) and visualization dashboard, facilitating preprocessing, analysis, and visualization. The dataset supports long-horizon trajectory prediction, social dynamics research, and proactive robot assistance studies, with potential to drive benchmarks for multi-modal indoor trajectory modeling in real-world workplaces.

Abstract

We present a new large dataset of indoor human and robot navigation and interaction, called THÖR-MAGNI, that is designed to facilitate research on social navigation: e.g., modelling and predicting human motion, analyzing goal-oriented interactions between humans and robots, and investigating visual attention in a social interaction context. THÖR-MAGNI was created to fill a gap in available datasets for human motion analysis and HRI. This gap is characterized by a lack of comprehensive inclusion of exogenous factors and essential target agent cues, which hinders the development of robust models capable of capturing the relationship between contextual cues and human behavior in different scenarios. Unlike existing datasets, THÖR-MAGNI includes a broader set of contextual features and offers multiple scenario variations to facilitate factor isolation. The dataset includes many social human-human and human-robot interaction scenarios, rich context annotations, and multi-modal data, such as walking trajectories, gaze tracking data, and lidar and camera streams recorded from a mobile robot. We also provide a set of tools for visualization and processing of the recorded data. THÖR-MAGNI is, to the best of our knowledge, unique in the amount and diversity of sensor data collected in a contextualized and socially dynamic environment, capturing natural human-robot interactions.
Paper Structure (37 sections, 17 figures, 6 tables)

This paper contains 37 sections, 17 figures, 6 tables.

Figures (17)

  • Figure 1: THÖR-MAGNI data modalities. (1) walking trajectories of participants, in a workplace setting shared with other humans and robots; (2) lidar sweep recorded with a mobile robot; (3) snapshot from an eye tracker's gaze overlay video; (4) fish-eye camera image from the mobile robot, showing object stashes and two goal points from our scenarios.
  • Figure 2: Our dataset provides a comprehensive exploration of human-robot interaction in a shared workplace environment. Left: Participants navigate independently, collaborate in social groups, and engage with a mobile robot. Navigation between goal points is coordinated via card decks at the goal points that assign a participant a new goal point upon drawing a card, as seen on the far left. Right: Equipment utilized in our data collection comprises: (1) bicycle helmets equipped with motion capture tracking markers, (2) eye tracking glasses, and (3) headphones used for receiving spoken instructions.
  • Figure 3: Participants in the role of Carrier were transporting various objects in different size and shapes. (1) Carrier--Box carrying a medium sized card box, with two hands. (2) Carrier--Storage Bin HRI placing the bin at a goal point (3) Stash of small objects transported by the Carrier--Bucket (4) Large Object (poster stand) moved by two Carrier--Large Object.
  • Figure 4: Robot used in- and for data collection (the "DARKO" robot) with an omnidirectional mobile base (RB-Kairos) of the dimensions: 760 $\times$ 665 $\times$ 690 mm (5), equipped with two sensor towers, one hosting two Azure Kinect RGB-D cameras (2), and one hosting an Ouster OS0-128 lidar and two Basler fish-eye RGB cameras (4). Additional equipment includes two Sick MicroScan 2D safety lidars (6), mecanum wheels (7), and a NAO robot ("ARMoD") for interaction with participants (3). The robotic arm with a maximum arm height of 855 mm (1) was not used in our recordings.
  • Figure 5: Varying environmental layouts for the room configuration of Scenarios 1--3. Right: Sample scene view for the site used for data acquisition of the THÖR-MAGNI dataset showing the room configuration for Scenarios 1--3 with the environment layout for Scenario 1B. Left: Overview of the room configuration and the scenario-specific layout changes. Bottom: Legend explaining elements of the layout, including: Driving styles for the robot in Scenario 3, semantic elements specific for Scenario 1 (Floor markings, Passage) and position of goals and obstacles. Upon placement some objects were subject so a slight rotation between runs, which is accounted for in the layouts with the rotation tolerance.
  • ...and 12 more figures