SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow
Qingyuan Wang, Rui Song, Jiaojiao Li, Kerui Cheng, David Ferstl, Yinlin Hu
TL;DR
SCFlow2 addresses the challenge of refining 6D object poses without retraining for novel objects by integrating a 3D scene flow representation with RGBD depth regularization and a 3D shape prior within an end-to-end trainable, plug-and-play framework. It constructs a 4D correlation volume from RGB and depth features, uses a GRU-based predictor to estimate a dense SE(3) transformation field, and derives a global pose residual that guides iterative refinement via a pose-induced flow, all while leveraging the target's shape prior to constrain the search. Trained on ShapeNet, Google-Scanned-Objects, and Objaverse, SCFlow2 achieves state-of-the-art accuracy on seven BOP datasets with novel objects and delivers fast inference (~0.18 s per pose) compared with multi-hypothesis refinement methods. The approach demonstrates strong generalization, effective ablations showing the necessity of both the shape prior and the 3D scene flow representation, and broad practical impact for real-world pose estimation systems.
Abstract
We introduce SCFlow2, a plug-and-play refinement framework for 6D object pose estimation. Most recent 6D object pose methods rely on refinement to get accurate results. However, most existing refinement methods either suffer from noises in establishing correspondences, or rely on retraining for novel objects. SCFlow2 is based on the SCFlow model designed for refinement with shape constraint, but formulates the additional depth as a regularization in the iteration via 3D scene flow for RGBD frames. The key design of SCFlow2 is an introduction of geometry constraints into the training of recurrent matching network, by combining the rigid-motion embeddings in 3D scene flow and 3D shape prior of the target. We train SCFlow2 on a combination of dataset Objaverse, GSO and ShapeNet, and evaluate on BOP datasets with novel objects. After using our method as a post-processing, most state-of-the-art methods produce significantly better results, without any retraining or fine-tuning. The source code is available at https://scflow2.github.io.
