Table of Contents
Fetching ...

GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects

Sungphill Moon, Hyeontae Son, Dongcheol Hur, Sangwook Kim

TL;DR

GenFlow addresses the challenge of accurate 6D pose estimation for novel objects by introducing a shape-guided, optical-flow-based iterative refinement. It combines coarse pose hypotheses with a recurrent GenFlow module that predicts dense flow, confidence, and pose updates, guided by a differentiable PnP layer and pose-induced flow lookups to enforce 3D shape consistency. The method employs a cascade, multi-scale architecture and a GMM-based coarse-sampling strategy to enable coarse-to-fine refinement with strong generalization and efficiency. On BOP benchmarks, GenFlow achieves state-of-the-art performance for unseen objects in both RGB and RGB-D, while remaining competitive for seen objects without target-specific fine-tuning.

Abstract

Despite the progress of learning-based methods for 6D object pose estimation, the trade-off between accuracy and scalability for novel objects still exists. Specifically, previous methods for novel objects do not make good use of the target object's 3D shape information since they focus on generalization by processing the shape indirectly, making them less effective. We present GenFlow, an approach that enables both accuracy and generalization to novel objects with the guidance of the target object's shape. Our method predicts optical flow between the rendered image and the observed image and refines the 6D pose iteratively. It boosts the performance by a constraint of the 3D shape and the generalizable geometric knowledge learned from an end-to-end differentiable system. We further improve our model by designing a cascade network architecture to exploit the multi-scale correlations and coarse-to-fine refinement. GenFlow ranked first on the unseen object pose estimation benchmarks in both the RGB and RGB-D cases. It also achieves performance competitive with existing state-of-the-art methods for the seen object pose estimation without any fine-tuning.

GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects

TL;DR

GenFlow addresses the challenge of accurate 6D pose estimation for novel objects by introducing a shape-guided, optical-flow-based iterative refinement. It combines coarse pose hypotheses with a recurrent GenFlow module that predicts dense flow, confidence, and pose updates, guided by a differentiable PnP layer and pose-induced flow lookups to enforce 3D shape consistency. The method employs a cascade, multi-scale architecture and a GMM-based coarse-sampling strategy to enable coarse-to-fine refinement with strong generalization and efficiency. On BOP benchmarks, GenFlow achieves state-of-the-art performance for unseen objects in both RGB and RGB-D, while remaining competitive for seen objects without target-specific fine-tuning.

Abstract

Despite the progress of learning-based methods for 6D object pose estimation, the trade-off between accuracy and scalability for novel objects still exists. Specifically, previous methods for novel objects do not make good use of the target object's 3D shape information since they focus on generalization by processing the shape indirectly, making them less effective. We present GenFlow, an approach that enables both accuracy and generalization to novel objects with the guidance of the target object's shape. Our method predicts optical flow between the rendered image and the observed image and refines the 6D pose iteratively. It boosts the performance by a constraint of the 3D shape and the generalizable geometric knowledge learned from an end-to-end differentiable system. We further improve our model by designing a cascade network architecture to exploit the multi-scale correlations and coarse-to-fine refinement. GenFlow ranked first on the unseen object pose estimation benchmarks in both the RGB and RGB-D cases. It also achieves performance competitive with existing state-of-the-art methods for the seen object pose estimation without any fine-tuning.
Paper Structure (18 sections, 6 equations, 6 figures, 3 tables)

This paper contains 18 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Examples of 6D pose estimation of novel objects. Our method estimates correspondences between the input image and rendered image, and 6D object pose.
  • Figure 2: Visualization of the scores predicted by the coarse model. Each dot denotes the sampled rotation, and its size represents the relative score of the coarse model for the corresponding rotation. The green circles are boundaries of the area where the refiner is interested. Figures show that the high-scoring poses are clustered in the vicinity of the green circles.
  • Figure 3: Overview of GenFlow refinement. We visualize the process of feature extraction and GenFlow update in the $k^{th}$ refinement. ${T}_{C2O}$ refers to a transformation that maps the coordinates from camera space to object space.
  • Figure 4: Visualization of the outputs of GRU. Our method factorizes the confidence weights into two terms: certainty and pose sensitivity. Certainty estimation helps to the robustness to occlusion. Pose sensitivity highlights the rich texture regions and extremities of the object.
  • Figure 5: Cascade design of GenFlow modules. Multiple GenFlow modules are attached to each level on the feature pyramid. The last updated flow $\mathbf{F}_{k}^{M}$ and 6D pose $\mathbf{P}_{k}^{M}$ from the higher-level GenFlow module are used to initialize the flow and 6D pose of the lower-level module. With the cascade architecture, the 6D pose is recovered in a coarse to fine manner.
  • ...and 1 more figures