Table of Contents
Fetching ...

Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation

Yongliang Lin, Yongzhi Su, Sandeep Inuganti, Yan Di, Naeem Ajilforoushan, Hanqing Yang, Yu Zhang, Jason Rambach

TL;DR

The paper tackles symmetry-induced ambiguity in RGB-only 6D pose estimation by shifting from traditional one-to-one 2D-3D correspondences to one-to-many correspondences. It introduces SymCode, a symmetry-aware binary surface encoding, and SymNet, an end-to-end network that regresses the pose $(\mathbf{R}, \mathbf{t})$ without PnP or RANSAC, leveraging a CPR module and symmetry-aware losses. The approach is evaluated on highly symmetric datasets (T-LESS and IC-BIN), showing faster inference and competitive accuracy relative to state-of-the-art baselines, with ablations validating the benefits of end-to-end regression, code length, and the one-to-many formulation. This work offers a practical GPU-friendly solution for robust, real-time pose estimation of symmetric and near-symmetric objects using RGB data, with potential applicability to broader symmetry-rich perception tasks.

Abstract

Estimating the 6D pose of an object from a single RGB image is a critical task that becomes additionally challenging when dealing with symmetric objects. Recent approaches typically establish one-to-one correspondences between image pixels and 3D object surface vertices. However, the utilization of one-to-one correspondences introduces ambiguity for symmetric objects. To address this, we propose SymCode, a symmetry-aware surface encoding that encodes the object surface vertices based on one-to-many correspondences, eliminating the problem of one-to-one correspondence ambiguity. We also introduce SymNet, a fast end-to-end network that directly regresses the 6D pose parameters without solving a PnP problem. We demonstrate faster runtime and comparable accuracy achieved by our method on the T-LESS and IC-BIN benchmarks of mostly symmetric objects. Our source code will be released upon acceptance.

Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation

TL;DR

The paper tackles symmetry-induced ambiguity in RGB-only 6D pose estimation by shifting from traditional one-to-one 2D-3D correspondences to one-to-many correspondences. It introduces SymCode, a symmetry-aware binary surface encoding, and SymNet, an end-to-end network that regresses the pose without PnP or RANSAC, leveraging a CPR module and symmetry-aware losses. The approach is evaluated on highly symmetric datasets (T-LESS and IC-BIN), showing faster inference and competitive accuracy relative to state-of-the-art baselines, with ablations validating the benefits of end-to-end regression, code length, and the one-to-many formulation. This work offers a practical GPU-friendly solution for robust, real-time pose estimation of symmetric and near-symmetric objects using RGB data, with potential applicability to broader symmetry-rich perception tasks.

Abstract

Estimating the 6D pose of an object from a single RGB image is a critical task that becomes additionally challenging when dealing with symmetric objects. Recent approaches typically establish one-to-one correspondences between image pixels and 3D object surface vertices. However, the utilization of one-to-one correspondences introduces ambiguity for symmetric objects. To address this, we propose SymCode, a symmetry-aware surface encoding that encodes the object surface vertices based on one-to-many correspondences, eliminating the problem of one-to-one correspondence ambiguity. We also introduce SymNet, a fast end-to-end network that directly regresses the 6D pose parameters without solving a PnP problem. We demonstrate faster runtime and comparable accuracy achieved by our method on the T-LESS and IC-BIN benchmarks of mostly symmetric objects. Our source code will be released upon acceptance.
Paper Structure (13 sections, 9 equations, 10 figures, 4 tables)

This paper contains 13 sections, 9 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Comparison of Surface Encoding. (a) Object images. (b) Textureless models. (c) 3D coordinate encoding. (d) ZebraPose encoding su2022zebrapose. (e) Our proposed SymCode. The 3D coordinate encoding and ZebraPose encoding, based on one-to-one correspondences, do not consider symmetry. In contrast, SymCode, based on one-to-many correspondences, explicitly preserves symmetry information.
  • Figure 2: Multi possible one-to-one correspondence sets. Left column: Images of an untextured cube and cylinder. Top: three possible correspondence sets for the cube image. Bottom: three possible correspondence sets for the cylinder image. The color of the models in the right three columns represents the coordinates in the object frame, with red, green, and blue representing the coordinates of the x-axis, y-axis, and z-axis, respectively. The object frame is represented by colored arrows as coordinates. The dashed lines show 2D-3D correspondences. Best viewed in color mode.
  • Figure 3: One-to-many correspondences set. All 3D points in a one-to-many correspondence are textured with the same color. Left: two specific one-to-many correspondences for corners and centers of side faces, assuming the cube has only 4 symmetries rotated along the z-axis with 0, 90, 180, 270 degrees. Invisible parts are connected by dashed lines, and visible parts are connected by solid lines. Right: two specific one-to-many correspondences for the side surface and top surface.
  • Figure 4: The generation process of SymCode and label rendering. (a) Object CAD models. (b) For complex models, we manually partition the model into several parts to ensure higher accuracy in the following process. (c) Testing the symmetry type of a vertex based on symmetry priority. (d) Finding the main vertex of each one-to-many correspondence; continuous symmetry and discrete symmetry are processed differently. (e) Each correspondence is assigned a unique binary code $\mathbf{c}_i$. (f) The surface encoder encodes each vertex in the model with a binary code, which inherently preserves symmetry information. (g) The rendered label is used as an intermediate target for the network. Optional processes are enclosed in dashed boxes.
  • Figure 5: Network Architecture. Given an RGB image, our SymNet takes the zoomed-in Region of Interest (RoI) as input and predicts intermediate features, including masks and binary code maps. The CPR module then directly regresses the 6D object pose. The entire process is an end-to-end procedure, eliminating the need for refinement or RANSAC processes for PnP.
  • ...and 5 more figures