Table of Contents
Fetching ...

GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance

Shiyu Li, Hannah Schieber, Niklas Corell, Bernhard Egger, Julian Kreimeier, Daniel Roth

TL;DR

This work presents GBOT, a real-time graph-based tracker for AR-assisted assembly that maintains multiple object poses across assembly states by linking parts through a multi-state assembly graph. It initializes tracking with a fast 6D pose estimation using YOLOv8Pose and continuously updates poses via an assembly-graph constraint mechanism, enabling robust performance under occlusion and state changes. A synthetic gbot dataset with five assembly assets and domain randomization is introduced, alongside real test data for qualitative evaluation; GBOT demonstrates superior accuracy and real-time capability compared to state-of-the-art baselines, with a public dataset and code release planned. The approach enables context-aware AR guidance by providing stable, state-aware 6D poses to AR devices through a RESTful API, supporting practical medical and industrial assembly scenarios.

Abstract

Guidance for assemblable parts is a promising field for augmented reality. Augmented reality assembly guidance requires 6D object poses of target objects in real time. Especially in time-critical medical or industrial settings, continuous and markerless tracking of individual parts is essential to visualize instructions superimposed on or next to the target object parts. In this regard, occlusions by the user's hand or other objects and the complexity of different assembly states complicate robust and real-time markerless multi-object tracking. To address this problem, we present Graph-based Object Tracking (GBOT), a novel graph-based single-view RGB-D tracking approach. The real-time markerless multi-object tracking is initialized via 6D pose estimation and updates the graph-based assembly poses. The tracking through various assembly states is achieved by our novel multi-state assembly graph. We update the multi-state assembly graph by utilizing the relative poses of the individual assembly parts. Linking the individual objects in this graph enables more robust object tracking during the assembly process. For evaluation, we introduce a synthetic dataset of publicly available and 3D printable assembly assets as a benchmark for future work. Quantitative experiments in synthetic data and further qualitative study in real test data show that GBOT can outperform existing work towards enabling context-aware augmented reality assembly guidance. Dataset and code will be made publically available.

GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance

TL;DR

This work presents GBOT, a real-time graph-based tracker for AR-assisted assembly that maintains multiple object poses across assembly states by linking parts through a multi-state assembly graph. It initializes tracking with a fast 6D pose estimation using YOLOv8Pose and continuously updates poses via an assembly-graph constraint mechanism, enabling robust performance under occlusion and state changes. A synthetic gbot dataset with five assembly assets and domain randomization is introduced, alongside real test data for qualitative evaluation; GBOT demonstrates superior accuracy and real-time capability compared to state-of-the-art baselines, with a public dataset and code release planned. The approach enables context-aware AR guidance by providing stable, state-aware 6D poses to AR devices through a RESTful API, supporting practical medical and industrial assembly scenarios.

Abstract

Guidance for assemblable parts is a promising field for augmented reality. Augmented reality assembly guidance requires 6D object poses of target objects in real time. Especially in time-critical medical or industrial settings, continuous and markerless tracking of individual parts is essential to visualize instructions superimposed on or next to the target object parts. In this regard, occlusions by the user's hand or other objects and the complexity of different assembly states complicate robust and real-time markerless multi-object tracking. To address this problem, we present Graph-based Object Tracking (GBOT), a novel graph-based single-view RGB-D tracking approach. The real-time markerless multi-object tracking is initialized via 6D pose estimation and updates the graph-based assembly poses. The tracking through various assembly states is achieved by our novel multi-state assembly graph. We update the multi-state assembly graph by utilizing the relative poses of the individual assembly parts. Linking the individual objects in this graph enables more robust object tracking during the assembly process. For evaluation, we introduce a synthetic dataset of publicly available and 3D printable assembly assets as a benchmark for future work. Quantitative experiments in synthetic data and further qualitative study in real test data show that GBOT can outperform existing work towards enabling context-aware augmented reality assembly guidance. Dataset and code will be made publically available.
Paper Structure (28 sections, 10 equations, 13 figures, 3 tables)

This paper contains 28 sections, 10 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: gbot framework. We utilize RGB for our initialization and RGB-D to enable our continuous 6D pose object pose tracking. Our YOLOV8pose aquires bounding boxes and keypoints after non maximum suppression and pnp recovers the 6D object poses. YOLOV8pose initializes the graph-based tracking. Additionally, we enable GBOT to re-initialize the tracker every 10th frame li2018deepim. The tracker updates the estimated poses and uses links between assembled objects. These poses can be published in real time to an ar hmd via a RESTful API.
  • Figure 2: Multi-state assembly graph and criteria for the switch of assembly states. If the difference between the calculated relative pose between two assembly parts and the defined ground truth pose is smaller than the offset of translation and rotation, the algorithm will switch to the next assembly state.
  • Figure 3: An overview of all five assembly assets included in the gbot dataset.
  • Figure 4: Our synthetic training images. Clustered scenes with 3D printing parts for the assembly parts are generated. To add domain randomization, we add objects from the T-less hodan_t-less_2017 dataset, varying lighting conditions, and randomized backgrounds.
  • Figure 5: Synthetic and real scenes with different light conditions, motion blur, and occlusion: We make ablation studies regarding different light conditions, motion blur, and hand occlusion as real data restrictions.
  • ...and 8 more figures