Table of Contents
Fetching ...

COG: Confidence-aware Optimal Geometric Correspondence for Unsupervised Single-reference Novel Object Pose Estimation

Yuchen Che, Jingtu Wu, Hao Zheng, Asako Kanezaki

TL;DR

Confidence-aware Optimal Geometric Correspondence (COG), an unsupervised framework that formulates correspondence estimation as a confidence-aware optimal transport problem and integrates confidence into the correspondence finding and pose estimation pipeline, enabling unsupervised learning.

Abstract

Estimating the 6DoF pose of a novel object with a single reference view is challenging due to occlusions, view-point changes, and outliers. A core difficulty lies in finding robust cross-view correspondences, as existing methods often rely on discrete one-to-one matching that is non-differentiable and tends to collapse onto sparse key-points. We propose Confidence-aware Optimal Geometric Correspondence (COG), an unsupervised framework that formulates correspondence estimation as a confidence-aware optimal transport problem. COG produces balanced soft correspondences by predicting point-wise confidences and injecting them as optimal transport marginals, suppressing non-overlapping regions. Semantic priors from vision foundation models further regularize the correspondences, leading to stable pose estimation. This design integrates confidence into the correspondence finding and pose estimation pipeline, enabling unsupervised learning. Experiments show unsupervised COG achieves comparable performance to supervised methods, and supervised COG outperforms them.

COG: Confidence-aware Optimal Geometric Correspondence for Unsupervised Single-reference Novel Object Pose Estimation

TL;DR

Confidence-aware Optimal Geometric Correspondence (COG), an unsupervised framework that formulates correspondence estimation as a confidence-aware optimal transport problem and integrates confidence into the correspondence finding and pose estimation pipeline, enabling unsupervised learning.

Abstract

Estimating the 6DoF pose of a novel object with a single reference view is challenging due to occlusions, view-point changes, and outliers. A core difficulty lies in finding robust cross-view correspondences, as existing methods often rely on discrete one-to-one matching that is non-differentiable and tends to collapse onto sparse key-points. We propose Confidence-aware Optimal Geometric Correspondence (COG), an unsupervised framework that formulates correspondence estimation as a confidence-aware optimal transport problem. COG produces balanced soft correspondences by predicting point-wise confidences and injecting them as optimal transport marginals, suppressing non-overlapping regions. Semantic priors from vision foundation models further regularize the correspondences, leading to stable pose estimation. This design integrates confidence into the correspondence finding and pose estimation pipeline, enabling unsupervised learning. Experiments show unsupervised COG achieves comparable performance to supervised methods, and supervised COG outperforms them.
Paper Structure (39 sections, 13 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 39 sections, 13 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: Given a novel object's query and reference RGB-D images (a), COG outputs point-wise confidence and cross-view soft correspondence (b), to estimate the relative pose between query and reference (c). To achieve this, we formulate correspondence finding as an optimal transport problem, with each point's confidence as target marginals, and the point features' similarity as an affinity kernel (d).
  • Figure 2: Pre-processing pipeline of COG. Given RGB-D inputs, an object is segmented, depth map is back-projected into point clouds, and per-point RGB features are extracted from DINO to form feature augmented inputs.
  • Figure 3: Overview of the COG framework. The pipeline consists of coarse and fine phases, each using a geometric transformer to predict point-wise confidences and features. A Sinkhorn-based OT module computes soft correspondences, and a weighted SVD solver estimates the rigid transformation. The coarse pose is further refined in the fine phase using position embeddings for precise alignment.
  • Figure 4: Qualitative results of unsupervised COG on LM-O, TUD-L, and YCB-V datasets. Blue bounding boxes represent the estimated poses, while white boxes denote ground-truth poses.
  • Figure 5: Visualization of predicted confidence from unsupervised COG. Our method effectively handles non-overlapping regions and outlier points by assigning low confidence to unreliable points. Poses are aligned for visualization.
  • ...and 7 more figures