Table of Contents
Fetching ...

3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration

Liyuan Zhang, Le Hui, Qi Liu, Bo Li, Yuchao Dai

TL;DR

The paper introduces 3DFMNet, a center-first approach to multi-instance point cloud registration that decomposes the problem into multiple pairwise registrations. It uses a 3D multi-object focusing module to locate object centers and generate proposals, followed by a 3D dual-masking instance matching module to estimate robust pairwise correspondences via instance and overlap masks. The framework employs attention-based feature correlation, ball-query object proposals, and an optimal-transport-based matching mechanism, optimized by dedicated focusing and matching losses. Experiments on Scan2CAD and ROBI demonstrate state-of-the-art performance, with analysis showing the potential upper-bound gains when centers are known, and ablations validating the necessity of both masking components. The work offers practical improvements for scene-CAD alignment in cluttered environments and provides broader insights for tasks like multi-target tracking and map construction.

Abstract

Multi-instance point cloud registration aims to estimate the pose of all instances of a model point cloud in the whole scene. Existing methods all adopt the strategy of first obtaining the global correspondence and then clustering to obtain the pose of each instance. However, due to the cluttered and occluded objects in the scene, it is difficult to obtain an accurate correspondence between the model point cloud and all instances in the scene. To this end, we propose a simple yet powerful 3D focusing-and-matching network for multi-instance point cloud registration by learning the multiple pair-wise point cloud registration. Specifically, we first present a 3D multi-object focusing module to locate the center of each object and generate object proposals. By using self-attention and cross-attention to associate the model point cloud with structurally similar objects, we can locate potential matching instances by regressing object centers. Then, we propose a 3D dual masking instance matching module to estimate the pose between the model point cloud and each object proposal. It performs instance mask and overlap mask masks to accurately predict the pair-wise correspondence. Extensive experiments on two public benchmarks, Scan2CAD and ROBI, show that our method achieves a new state-of-the-art performance on the multi-instance point cloud registration task. Code is available at https://github.com/zlynpu/3DFMNet.

3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration

TL;DR

The paper introduces 3DFMNet, a center-first approach to multi-instance point cloud registration that decomposes the problem into multiple pairwise registrations. It uses a 3D multi-object focusing module to locate object centers and generate proposals, followed by a 3D dual-masking instance matching module to estimate robust pairwise correspondences via instance and overlap masks. The framework employs attention-based feature correlation, ball-query object proposals, and an optimal-transport-based matching mechanism, optimized by dedicated focusing and matching losses. Experiments on Scan2CAD and ROBI demonstrate state-of-the-art performance, with analysis showing the potential upper-bound gains when centers are known, and ablations validating the necessity of both masking components. The work offers practical improvements for scene-CAD alignment in cluttered environments and provides broader insights for tasks like multi-target tracking and map construction.

Abstract

Multi-instance point cloud registration aims to estimate the pose of all instances of a model point cloud in the whole scene. Existing methods all adopt the strategy of first obtaining the global correspondence and then clustering to obtain the pose of each instance. However, due to the cluttered and occluded objects in the scene, it is difficult to obtain an accurate correspondence between the model point cloud and all instances in the scene. To this end, we propose a simple yet powerful 3D focusing-and-matching network for multi-instance point cloud registration by learning the multiple pair-wise point cloud registration. Specifically, we first present a 3D multi-object focusing module to locate the center of each object and generate object proposals. By using self-attention and cross-attention to associate the model point cloud with structurally similar objects, we can locate potential matching instances by regressing object centers. Then, we propose a 3D dual masking instance matching module to estimate the pose between the model point cloud and each object proposal. It performs instance mask and overlap mask masks to accurately predict the pair-wise correspondence. Extensive experiments on two public benchmarks, Scan2CAD and ROBI, show that our method achieves a new state-of-the-art performance on the multi-instance point cloud registration task. Code is available at https://github.com/zlynpu/3DFMNet.

Paper Structure

This paper contains 13 sections, 13 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Comparison between our method and existing methods in multi-instance point cloud registration. Our method decomposes the multi-instance point cloud registration into multiple pair-wise point cloud registration.
  • Figure 2: The framework of our 3D focusing-and-matching network for multi-instance pint cloud registration. Given the scene point cloud and the CAD model, we first present the 3D multi-object focusing module to localize the centers of the potential objects in the scene. Then, we design the 3D dual-masking instance matching module to learn pair-wise point cloud registration from the localized object proposals.
  • Figure 2: Result related to the 3D multi-object focusing module.
  • Figure 3: Registration results on the test set of the Sacn2CAD dataset. We visualize the successfully registered instances of MIRETR yu2024learning in (b) and ours in (c). "# Inst" means the number of registered instances. Note that for a better view, we draw the green boxes for the ground truth and the red boxes for the predict correspondences.
  • Figure 4: Per scene time on the ROBI dataset.
  • ...and 2 more figures