Table of Contents
Fetching ...

GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation

Xin Liu, Shibei Xue, Dezong Zhao, Shan Ma, Min Jiang

TL;DR

A global motion-guided recurrent flow estimation method called GMFlow for pose estimation that overcomes local ambiguities caused by occlusion or missing parts by seeking global explanations and leveraging the object's structural information to extend the motion of visible parts of the rigid body to its invisible regions.

Abstract

6D object pose estimation is crucial for robotic perception and precise manipulation. Occlusion and incomplete object visibility are common challenges in this task, but existing pose refinement methods often struggle to handle these issues effectively. To tackle this problem, we propose a global motion-guided recurrent flow estimation method called GMFlow for pose estimation. GMFlow overcomes local ambiguities caused by occlusion or missing parts by seeking global explanations. We leverage the object's structural information to extend the motion of visible parts of the rigid body to its invisible regions. Specifically, we capture global contextual information through a linear attention mechanism and guide local motion information to generate global motion estimates. Furthermore, we introduce object shape constraints in the flow iteration process, making flow estimation suitable for pose estimation scenarios. Experiments on the LM-O and YCB-V datasets demonstrate that our method outperforms existing techniques in accuracy while maintaining competitive computational efficiency.

GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation

TL;DR

A global motion-guided recurrent flow estimation method called GMFlow for pose estimation that overcomes local ambiguities caused by occlusion or missing parts by seeking global explanations and leveraging the object's structural information to extend the motion of visible parts of the rigid body to its invisible regions.

Abstract

6D object pose estimation is crucial for robotic perception and precise manipulation. Occlusion and incomplete object visibility are common challenges in this task, but existing pose refinement methods often struggle to handle these issues effectively. To tackle this problem, we propose a global motion-guided recurrent flow estimation method called GMFlow for pose estimation. GMFlow overcomes local ambiguities caused by occlusion or missing parts by seeking global explanations. We leverage the object's structural information to extend the motion of visible parts of the rigid body to its invisible regions. Specifically, we capture global contextual information through a linear attention mechanism and guide local motion information to generate global motion estimates. Furthermore, we introduce object shape constraints in the flow iteration process, making flow estimation suitable for pose estimation scenarios. Experiments on the LM-O and YCB-V datasets demonstrate that our method outperforms existing techniques in accuracy while maintaining competitive computational efficiency.

Paper Structure

This paper contains 15 sections, 6 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Challenges of flow-based method in 6D pose estimation task. Flow is commonly used to estimate the pixel motion vector field between two frames in an image sequence. Examples are shown in (a) and (b)janai2017slow. The objects in the images are not necessarily rigid bodies, and their changes are typically limited in magnitude. However, pose estimation tasks differ in this regard. Although rendered images, such as (d) and (f), are complete, objects in the target image may be occluded or incomplete, as in (c)krull2015learning and (e)xiangposecnn.
  • Figure 2: Overview of the proposed method.
  • Figure 3: The structure diagram of the global motion capture module.
  • Figure 4: Intermediate Flow Comparison. Our method demonstrates superior handling of occlusions and better utilization of contextual information in early iterations.
  • Figure 5: Qualitative results. From left to right: target image, intermediate flow with reconstruction, pose-induced flow with reconstruction, and comparison before and after our method. Three rows illustrate cases where the target object is complete, occluded, and partially cropped, respectively.
  • ...and 2 more figures