GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance
Shiyu Li, Hannah Schieber, Niklas Corell, Bernhard Egger, Julian Kreimeier, Daniel Roth
TL;DR
This work presents GBOT, a real-time graph-based tracker for AR-assisted assembly that maintains multiple object poses across assembly states by linking parts through a multi-state assembly graph. It initializes tracking with a fast 6D pose estimation using YOLOv8Pose and continuously updates poses via an assembly-graph constraint mechanism, enabling robust performance under occlusion and state changes. A synthetic gbot dataset with five assembly assets and domain randomization is introduced, alongside real test data for qualitative evaluation; GBOT demonstrates superior accuracy and real-time capability compared to state-of-the-art baselines, with a public dataset and code release planned. The approach enables context-aware AR guidance by providing stable, state-aware 6D poses to AR devices through a RESTful API, supporting practical medical and industrial assembly scenarios.
Abstract
Guidance for assemblable parts is a promising field for augmented reality. Augmented reality assembly guidance requires 6D object poses of target objects in real time. Especially in time-critical medical or industrial settings, continuous and markerless tracking of individual parts is essential to visualize instructions superimposed on or next to the target object parts. In this regard, occlusions by the user's hand or other objects and the complexity of different assembly states complicate robust and real-time markerless multi-object tracking. To address this problem, we present Graph-based Object Tracking (GBOT), a novel graph-based single-view RGB-D tracking approach. The real-time markerless multi-object tracking is initialized via 6D pose estimation and updates the graph-based assembly poses. The tracking through various assembly states is achieved by our novel multi-state assembly graph. We update the multi-state assembly graph by utilizing the relative poses of the individual assembly parts. Linking the individual objects in this graph enables more robust object tracking during the assembly process. For evaluation, we introduce a synthetic dataset of publicly available and 3D printable assembly assets as a benchmark for future work. Quantitative experiments in synthetic data and further qualitative study in real test data show that GBOT can outperform existing work towards enabling context-aware augmented reality assembly guidance. Dataset and code will be made publically available.
