On the Application of Egocentric Computer Vision to Industrial Scenarios
Vivek Chavan, Oliver Heimann, Jörg Krüger
TL;DR
The paper addresses the gap between industrial digitisation and modern AI by proposing egocentric vision using lightweight wearables to collect multimodal data from a first-person perspective. It introduces a pipeline where user-provided natural-language observations guide processing of synchronized video, eye-gaze, and hand data to produce rich labels, aided by a language-model that generates structured metadata; the approach is complemented by contextual cues such as trajectory and location. Key contributions include a detailed automated data collection and labeling workflow and a federated, three-layer continual-learning framework to handle privacy and incremental updates across personal, organizational, and global levels. The work aims to reduce digitisation effort, enhance knowledge transfer, and enable context-aware models in industrial environments, with practical implications for operator guidance, defect labeling, and workflow understanding.
Abstract
Egocentric vision aims to capture and analyse the world from the first-person perspective. We explore the possibilities for egocentric wearable devices to improve and enhance industrial use cases w.r.t. data collection, annotation, labelling and downstream applications. This would contribute to easier data collection and allow users to provide additional context. We envision that this approach could serve as a supplement to the traditional industrial Machine Vision workflow. Code, Dataset and related resources will be available at: https://github.com/Vivek9Chavan/EgoVis24
