Table of Contents
Fetching ...

ENIGMA-360: An Ego-Exo Dataset for Human Behavior Understanding in Industrial Scenarios

Francesco Ragusa, Rosario Leonardi, Michele Mazzamuto, Daniele Di Mauro, Camillo Quattrocchi, Alessandro Passanisi, Irene D'Ambra, Antonino Furnari, Giovanni Maria Farinella

TL;DR

The proposed ENIGMA-360 dataset is a new ego-exo dataset acquired in a real industrial scenario temporally synchronized offering complementary information of the same scene, highlighting the need for new models capable of robust ego-exo understanding in real-world environments.

Abstract

Understanding human behavior from complementary egocentric (ego) and exocentric (exo) points of view enables the development of systems that can support workers in industrial environments and enhance their safety. However, progress in this area is hindered by the lack of datasets capturing both views in realistic industrial scenarios. To address this gap, we propose ENIGMA-360, a new ego-exo dataset acquired in a real industrial scenario. The dataset is composed of 180 egocentric and 180 exocentric procedural videos temporally synchronized offering complementary information of the same scene. The 360 videos have been labeled with temporal and spatial annotations, enabling the study of different aspects of human behavior in industrial domain. We provide baseline experiments for 3 foundational tasks for human behavior understanding: 1) Temporal Action Segmentation, 2) Keystep Recognition and 3) Egocentric Human-Object Interaction Detection, showing the limits of state-of-the-art approaches on this challenging scenario. These results highlight the need for new models capable of robust ego-exo understanding in real-world environments. We publicly release the dataset and its annotations at https://fpv-iplab.github.io/ENIGMA-360/.

ENIGMA-360: An Ego-Exo Dataset for Human Behavior Understanding in Industrial Scenarios

TL;DR

The proposed ENIGMA-360 dataset is a new ego-exo dataset acquired in a real industrial scenario temporally synchronized offering complementary information of the same scene, highlighting the need for new models capable of robust ego-exo understanding in real-world environments.

Abstract

Understanding human behavior from complementary egocentric (ego) and exocentric (exo) points of view enables the development of systems that can support workers in industrial environments and enhance their safety. However, progress in this area is hindered by the lack of datasets capturing both views in realistic industrial scenarios. To address this gap, we propose ENIGMA-360, a new ego-exo dataset acquired in a real industrial scenario. The dataset is composed of 180 egocentric and 180 exocentric procedural videos temporally synchronized offering complementary information of the same scene. The 360 videos have been labeled with temporal and spatial annotations, enabling the study of different aspects of human behavior in industrial domain. We provide baseline experiments for 3 foundational tasks for human behavior understanding: 1) Temporal Action Segmentation, 2) Keystep Recognition and 3) Egocentric Human-Object Interaction Detection, showing the limits of state-of-the-art approaches on this challenging scenario. These results highlight the need for new models capable of robust ego-exo understanding in real-world environments. We publicly release the dataset and its annotations at https://fpv-iplab.github.io/ENIGMA-360/.
Paper Structure (24 sections, 15 figures, 5 tables)

This paper contains 24 sections, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Overview of the ENIGMA-360 dataset. On the left, we show the spatial annotations available. On the top-right, we report the designed procedure for acquisition purpose. On the bottom-right we show synchronized egocentric and exocentric video frames illustrating the multi-view setup.
  • Figure 2: Comparison between state-of-the-art datasets comprising texture-less industrial-like objects (top) and the proposed ENIGMA-360 dataset, which includes realistic industrial objects (bottom).
  • Figure 3: A screenshot captured from the developed application, during the acquisition phase.
  • Figure 4: Participant demographics. Distribution of experience levels (left), gender (center), and age (right) among the 34 participants.
  • Figure 5: Distribution of data split (a) and number of videos of ENIGMA-360 dataset by duration. (b).
  • ...and 10 more figures