IndEgo: A Dataset of Industrial Scenarios and Collaborative Work for Egocentric Assistants

Vivek Chavan; Yasmina Imgrund; Tung Dao; Sanwantri Bai; Bosong Wang; Ze Lu; Oliver Heimann; Jörg Krüger

IndEgo: A Dataset of Industrial Scenarios and Collaborative Work for Egocentric Assistants

Vivek Chavan, Yasmina Imgrund, Tung Dao, Sanwantri Bai, Bosong Wang, Ze Lu, Oliver Heimann, Jörg Krüger

TL;DR

IndEgo tackles the scarcity of industrial, collaborative, long-horizon egocentric datasets by introducing 3,460 egocentric videos (≈197 hours) and 1,092 exocentric videos (≈97 hours) with rich multimodal data (eye gaze, narration, audio, motion) and two-person collaboration. It provides detailed annotations, task graphs, and benchmarks for procedural/non-procedural task understanding, Mistake Detection, and reasoning-based QA, revealing significant challenges for current multimodal models. The paper demonstrates baseline results across MD, QA, and collaborative task understanding, showing the value of joint ego-exo views and modality-aware ablations. IndEgo’s release aims to spur research in instruction following, human-AI collaboration, and embodied AI for safe, productive industrial operations, with data and code openly accessible on Hugging Face and GitHub.

Abstract

We introduce IndEgo, a multimodal egocentric and exocentric dataset addressing common industrial tasks, including assembly/disassembly, logistics and organisation, inspection and repair, woodworking, and others. The dataset contains 3,460 egocentric recordings (approximately 197 hours), along with 1,092 exocentric recordings (approximately 97 hours). A key focus of the dataset is collaborative work, where two workers jointly perform cognitively and physically intensive tasks. The egocentric recordings include rich multimodal data and added context via eye gaze, narration, sound, motion, and others. We provide detailed annotations (actions, summaries, mistake annotations, narrations), metadata, processed outputs (eye gaze, hand pose, semi-dense point cloud), and benchmarks on procedural and non-procedural task understanding, Mistake Detection, and reasoning-based Question Answering. Baseline evaluations for Mistake Detection, Question Answering and collaborative task understanding show that the dataset presents a challenge for the state-of-the-art multimodal models. Our dataset is available at: https://huggingface.co/datasets/FraunhoferIPK/IndEgo

IndEgo: A Dataset of Industrial Scenarios and Collaborative Work for Egocentric Assistants

TL;DR

Abstract

IndEgo: A Dataset of Industrial Scenarios and Collaborative Work for Egocentric Assistants

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)