Table of Contents
Fetching ...

Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes

Zhiyuan Yu, Zheng Qin, Lintao Zheng, Kai Xu

TL;DR

MIRETR tackles multi-instance point cloud registration in cluttered scenes by learning instance-aware correspondences through a coarse-to-fine transformer framework. The Instance-aware Geometric Transformer restricts context to per-instance neighborhoods and jointly learns superpoint features and per-instance masks, enabling reliable coarse correspondences that are extended to per-instance candidates for dense, instance-wise registration. A lightweight candidate selection and refinement stage removes duplicates and yields final per-instance poses, avoiding expensive multi-model fitting. Experimental results across Scan2CAD, ROBI, ShapeNet, and ModelNet40 demonstrate substantial accuracy gains, robustness to occlusion, and strong generalization to unseen categories, with a notable 16.6-point F1 improvement on ROBI.

Abstract

Multi-instance point cloud registration estimates the poses of multiple instances of a model point cloud in a scene point cloud. Extracting accurate point correspondence is to the center of the problem. Existing approaches usually treat the scene point cloud as a whole, overlooking the separation of instances. Therefore, point features could be easily polluted by other points from the background or different instances, leading to inaccurate correspondences oblivious to separate instances, especially in cluttered scenes. In this work, we propose MIRETR, Multi-Instance REgistration TRansformer, a coarse-to-fine approach to the extraction of instance-aware correspondences. At the coarse level, it jointly learns instance-aware superpoint features and predicts per-instance masks. With instance masks, the influence from outside of the instance being concerned is minimized, such that highly reliable superpoint correspondences can be extracted. The superpoint correspondences are then extended to instance candidates at the fine level according to the instance masks. At last, an efficient candidate selection and refinement algorithm is devised to obtain the final registrations. Extensive experiments on three public benchmarks demonstrate the efficacy of our approach. In particular, MIRETR outperforms the state of the arts by 16.6 points on F1 score on the challenging ROBI benchmark. Code and models are available at https://github.com/zhiyuanYU134/MIRETR.

Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes

TL;DR

MIRETR tackles multi-instance point cloud registration in cluttered scenes by learning instance-aware correspondences through a coarse-to-fine transformer framework. The Instance-aware Geometric Transformer restricts context to per-instance neighborhoods and jointly learns superpoint features and per-instance masks, enabling reliable coarse correspondences that are extended to per-instance candidates for dense, instance-wise registration. A lightweight candidate selection and refinement stage removes duplicates and yields final per-instance poses, avoiding expensive multi-model fitting. Experimental results across Scan2CAD, ROBI, ShapeNet, and ModelNet40 demonstrate substantial accuracy gains, robustness to occlusion, and strong generalization to unseen categories, with a notable 16.6-point F1 improvement on ROBI.

Abstract

Multi-instance point cloud registration estimates the poses of multiple instances of a model point cloud in a scene point cloud. Extracting accurate point correspondence is to the center of the problem. Existing approaches usually treat the scene point cloud as a whole, overlooking the separation of instances. Therefore, point features could be easily polluted by other points from the background or different instances, leading to inaccurate correspondences oblivious to separate instances, especially in cluttered scenes. In this work, we propose MIRETR, Multi-Instance REgistration TRansformer, a coarse-to-fine approach to the extraction of instance-aware correspondences. At the coarse level, it jointly learns instance-aware superpoint features and predicts per-instance masks. With instance masks, the influence from outside of the instance being concerned is minimized, such that highly reliable superpoint correspondences can be extracted. The superpoint correspondences are then extended to instance candidates at the fine level according to the instance masks. At last, an efficient candidate selection and refinement algorithm is devised to obtain the final registrations. Extensive experiments on three public benchmarks demonstrate the efficacy of our approach. In particular, MIRETR outperforms the state of the arts by 16.6 points on F1 score on the challenging ROBI benchmark. Code and models are available at https://github.com/zhiyuanYU134/MIRETR.
Paper Structure (26 sections, 21 equations, 9 figures, 10 tables)

This paper contains 26 sections, 21 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: MIRETR significantly improves the multi-instance registration results in cluttered scenes compared to the state-of-the-art GeoTransformer qin2022geometric. Benefiting from the instance-aware correspondences, our method can generate more accurate registrations (see the yellow boxes) and register the heavily-occluded instances with severe geometric deficiency (see the red boxes).
  • Figure 2: Overall pipeline of MIRETR. The backbone progressively downsamples two point clouds and extracts multi-level features. At the coarse level, the Instance-aware Geometric Transformer module extracts instance-aware superpoint features and establishes reliable superpoint correspondences. At the fine level, the superpoint correspondences are extended to instance candidates, where instance-wise point correspondences are extracted to estimate per-candidate poses. At last, a simple but effective candidate selection and refinement algorithm is adopted to generate the final registrations.
  • Figure 3: Comparison of (a) global attention, (b) local attention, and (c) instance-aware attention. The superpoints (patches) participating in the attention computation are color-coded. The anchor superpoints are in red. The $k$-nearest neighbors of the anchor are enclosed by the purple line.
  • Figure 4: Structure of the Instance masking block.
  • Figure 5: Registration results on ROBI benchmark. We visualize the successfully registered instances in (c) and (d). MIRETR registers more instances in the cluttered scenes (the $2^{\text{nd}}$ and the $3^{\text{rd}}$ rows) and with incomplete geometry (all three rows). And it extracts more accurate correspondences benefiting from the instance-aware correspondence learning mechanism.
  • ...and 4 more figures