Detection and Pose Estimation of flat, Texture-less Industry Objects on HoloLens using synthetic Training
Thomas Pöllabauer, Fabian Rücker, Andreas Franek, Felix Gorschlüter
TL;DR
The paper tackles the problem of real-time 6D pose estimation for flat, texture-less industrial objects on edge devices by leveraging synthetic training data derived from manufacturing documents. It presents a client-server AR pipeline that uses YOLOv5 for detection and CosyPose for pose estimation, trained entirely on synthetic renders generated from 2D manufacturing schematics converted into 3D meshes. The approach achieves strong detection recall and competitive pose estimation on real HoloLens 2 data, while acknowledging latency from backend processing and the domain gap between synthetic and real imagery. It also provides a modular framework with clear avenues for improving on-device inference, incorporating frontend tracking, and expanding the dataset to better handle challenging objects and motion conditions.
Abstract
Current state-of-the-art 6d pose estimation is too compute intensive to be deployed on edge devices, such as Microsoft HoloLens (2) or Apple iPad, both used for an increasing number of augmented reality applications. The quality of AR is greatly dependent on its capabilities to detect and overlay geometry within the scene. We propose a synthetically trained client-server-based augmented reality application, demonstrating state-of-the-art object pose estimation of metallic and texture-less industry objects on edge devices. Synthetic data enables training without real photographs, i.e. for yet-to-be-manufactured objects. Our qualitative evaluation on an AR-assisted sorting task, and quantitative evaluation on both renderings, as well as real-world data recorded on HoloLens 2, sheds light on its real-world applicability.
