Table of Contents
Fetching ...

Vision6D: 3D-to-2D Interactive Visualization and Annotation Tool for 6D Pose Estimation

Yike Zhang, Eduardo Davalos, Jack Noble

TL;DR

Vision6D presents an open-source interactive tool for 6D pose annotation that visualizes 3D models in 2D scenes and supports manual, camera-pose annotation when ground-truth data are unavailable. Grounded in the image formation model with intrinsics $K$ and extrinsics $M=[R|t]$, it enables real-time 3D-to-2D alignment through a three-panel UI: Main Panel, 3D Scene Display, and Output Panel. A user study on Linemod and HANDAL demonstrates competitive annotation accuracy (low inter- and intra-personal variability) and efficient workflow, with favorable NASA-TLX and SUS scores, validating the approach for rapid dataset generation and model training. The work highlights Vision6D’s potential to reduce reliance on external markers and improve first-frame pose initialization, supporting broader adoption in 6D pose estimation research and real-world robotics.

Abstract

Accurate 6D pose estimation has gained more attention over the years for robotics-assisted tasks that require precise interaction with physical objects. This paper presents an interactive 3D-to-2D visualization and annotation tool to support the 6D pose estimation research community. To the best of our knowledge, the proposed work is the first tool that allows users to visualize and manipulate 3D objects interactively on a 2D real-world scene, along with a comprehensive user study. This system supports robust 6D camera pose annotation by providing both visual cues and spatial relationships to determine object position and orientation in various environments. The annotation feature in Vision6D is particularly helpful in scenarios where the transformation matrix between the camera and world objects is unknown, as it enables accurate annotation of these objects' poses using only the camera intrinsic matrix. This capability serves as a foundational step in developing and training advanced pose estimation models across various domains. We evaluate Vision6D's effectiveness by utilizing widely-used open-source pose estimation datasets Linemod and HANDAL through comparisons between the default ground-truth camera poses with manual annotations. A user study was performed to show that Vision6D generates accurate pose annotations via visual cues in an intuitive 3D user interface. This approach aims to bridge the gap between 2D scene projections and 3D scenes, offering an effective way for researchers and developers to solve 6D pose annotation related problems. The software is open-source and publicly available at https://github.com/InteractiveGL/vision6D.

Vision6D: 3D-to-2D Interactive Visualization and Annotation Tool for 6D Pose Estimation

TL;DR

Vision6D presents an open-source interactive tool for 6D pose annotation that visualizes 3D models in 2D scenes and supports manual, camera-pose annotation when ground-truth data are unavailable. Grounded in the image formation model with intrinsics and extrinsics , it enables real-time 3D-to-2D alignment through a three-panel UI: Main Panel, 3D Scene Display, and Output Panel. A user study on Linemod and HANDAL demonstrates competitive annotation accuracy (low inter- and intra-personal variability) and efficient workflow, with favorable NASA-TLX and SUS scores, validating the approach for rapid dataset generation and model training. The work highlights Vision6D’s potential to reduce reliance on external markers and improve first-frame pose initialization, supporting broader adoption in 6D pose estimation research and real-world robotics.

Abstract

Accurate 6D pose estimation has gained more attention over the years for robotics-assisted tasks that require precise interaction with physical objects. This paper presents an interactive 3D-to-2D visualization and annotation tool to support the 6D pose estimation research community. To the best of our knowledge, the proposed work is the first tool that allows users to visualize and manipulate 3D objects interactively on a 2D real-world scene, along with a comprehensive user study. This system supports robust 6D camera pose annotation by providing both visual cues and spatial relationships to determine object position and orientation in various environments. The annotation feature in Vision6D is particularly helpful in scenarios where the transformation matrix between the camera and world objects is unknown, as it enables accurate annotation of these objects' poses using only the camera intrinsic matrix. This capability serves as a foundational step in developing and training advanced pose estimation models across various domains. We evaluate Vision6D's effectiveness by utilizing widely-used open-source pose estimation datasets Linemod and HANDAL through comparisons between the default ground-truth camera poses with manual annotations. A user study was performed to show that Vision6D generates accurate pose annotations via visual cues in an intuitive 3D user interface. This approach aims to bridge the gap between 2D scene projections and 3D scenes, offering an effective way for researchers and developers to solve 6D pose annotation related problems. The software is open-source and publicly available at https://github.com/InteractiveGL/vision6D.

Paper Structure

This paper contains 28 sections, 5 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: 3D-to-2D Projection. Demonstration of projecting 3D object point onto a 2D image plane using the camera intrinsic and extrinsic parameters.
  • Figure 2: Vision6D's 3D User Interface. Screenshot and decomposition of essential 2D and 3D features to support the interactive 6D pose annotation workflow.
  • Figure 3: Visualization and Pose Annotation of Vision6D
  • Figure 4: Quantitative Evaluation of Inter-Personal Variability. This figure presents the distributions of Angular Distance, Euclidean Distance, and ADD metrics for assessing inter-personal variability. The results demonstrate the effectiveness of pose annotation using Vision6D.
  • Figure 5: Quantitative Evaluation of Intra-Personal Consistency. This figure presents the distributions of Angular Distance, Euclidean Distance, and ADD metrics for assessing intra-personal consistency. The results highlight the robustness and reproducibility of the Vision6D.
  • ...and 2 more figures