Table of Contents
Fetching ...

ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency

Shaocheng Yan, Pengcheng Shi, Jiayuan Li

TL;DR

ML-SemReg tackles low-overlap point cloud registration by exploiting semantic information at local and scene levels. It combines Group Matching (GM), enforcing Local Semantic Consistency to form LS-consistent groups, with Mask Matching (MM) using Binary Multi-Ring Semantic Signatures (BMR-SS) to enforce Scene Semantic Consistency within groups. The plug-and-play framework delivers higher inlier ratios and robust registration across KITTI and ScanNet, outperforming strong baselines and showing resilience to semantic segmentation noise. These gains extend to both traditional estimators like RANSAC and modern deep pipelines, underscoring practical impact for accurate 3D alignment in challenging real-world scenes.

Abstract

Recent advances in point cloud registration mostly leverage geometric information. Although these methods have yielded promising results, they still struggle with problems of low overlap, thus limiting their practical usage. In this paper, we propose ML-SemReg, a plug-and-play point cloud registration framework that fully exploits semantic information. Our key insight is that mismatches can be categorized into two types, i.e., inter- and intra-class, after rendering semantic clues, and can be well addressed by utilizing multi-level semantic consistency. We first propose a Group Matching module to address inter-class mismatching, outputting multiple matching groups that inherently satisfy Local Semantic Consistency. For each group, a Mask Matching module based on Scene Semantic Consistency is then introduced to suppress intra-class mismatching. Benefit from those two modules, ML-SemReg generates correspondences with a high inlier ratio. Extensive experiments demonstrate excellent performance and robustness of ML-SemReg, e.g., in hard-cases of the KITTI dataset, the Registration Recall of MAC increases by almost 34 percentage points when our ML-SemReg is equipped. Code is available at \url{https://github.com/Laka-3DV/ML-SemReg}

ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency

TL;DR

ML-SemReg tackles low-overlap point cloud registration by exploiting semantic information at local and scene levels. It combines Group Matching (GM), enforcing Local Semantic Consistency to form LS-consistent groups, with Mask Matching (MM) using Binary Multi-Ring Semantic Signatures (BMR-SS) to enforce Scene Semantic Consistency within groups. The plug-and-play framework delivers higher inlier ratios and robust registration across KITTI and ScanNet, outperforming strong baselines and showing resilience to semantic segmentation noise. These gains extend to both traditional estimators like RANSAC and modern deep pipelines, underscoring practical impact for accurate 3D alignment in challenging real-world scenes.

Abstract

Recent advances in point cloud registration mostly leverage geometric information. Although these methods have yielded promising results, they still struggle with problems of low overlap, thus limiting their practical usage. In this paper, we propose ML-SemReg, a plug-and-play point cloud registration framework that fully exploits semantic information. Our key insight is that mismatches can be categorized into two types, i.e., inter- and intra-class, after rendering semantic clues, and can be well addressed by utilizing multi-level semantic consistency. We first propose a Group Matching module to address inter-class mismatching, outputting multiple matching groups that inherently satisfy Local Semantic Consistency. For each group, a Mask Matching module based on Scene Semantic Consistency is then introduced to suppress intra-class mismatching. Benefit from those two modules, ML-SemReg generates correspondences with a high inlier ratio. Extensive experiments demonstrate excellent performance and robustness of ML-SemReg, e.g., in hard-cases of the KITTI dataset, the Registration Recall of MAC increases by almost 34 percentage points when our ML-SemReg is equipped. Code is available at \url{https://github.com/Laka-3DV/ML-SemReg}
Paper Structure (22 sections, 16 equations, 12 figures, 11 tables)

This paper contains 22 sections, 16 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 1: Inter-class and intra-class mismatching. We use various shapes (such as rectangles, triangles, etc.) to represent different object categories (semantic labels) and depict them in different colors. Keypoint (inside the red circle) $\boldsymbol{p}_{1}$ shares the same local geometric features with target keypoints $\boldsymbol{q}_{1\sim 5}$, resulting in inter-class mismatching between $\boldsymbol{p}_1$ and $\boldsymbol{q}_{2\sim 3}$, and intra-class mismatching between $\boldsymbol{p}_1$ and $\boldsymbol{q}_{4\sim 5}$.
  • Figure 2: ML-SemReg Framework. For keypoints from the source and target point clouds, the proposed ML-SemReg (1) constructs the BMR-SS and Local-SS of each keypoint to perceive scene and local semantic information, respectively. (2) Simultaneously, the local geometry feature is calculated by a descriptor (e.g., FPFH rusu2009fast). (3) For each semantic category, the GM module produces a matching group, in which the keypoints satisfy LS-Consistency among each other. (4) For each matching group, the MM module constructs a scene consistency mask, outputting sub-correspondences sets with multi-level semantic consistency. (4) Finally, the union of subsets produces high-quality correspondences, which are used to estimate the rigid transformation matrix.
  • Figure 3: Same locations exhibit scene similarity across different LiDAR scans. Keypoint $\boldsymbol{p}_i$ is connected to the landmarks (centroid of object) within its maximum receptive field radius $NL$. For $\mathbf{H}_i^{\mathcal{P}}$, the $t$-th row and $k$-column correspond to semantic label $s_t$ and $k$-th ring, respectively. The final scene similarity between $\boldsymbol{p}_i$ and $\boldsymbol{q}_j$ is $14$ as $\varTheta \left( \boldsymbol{p}_i,\boldsymbol{q}_j \right) =\left| \mathbf{H}_{i}^{\mathcal{P}}\odot \mathbf{H}_{j}^{\mathcal{Q}} \right|=14$.
  • Figure 4: Qualitative comparisons on the KITTI dataset of NN, NN+Ours, OT, and OT+Ours matchers using FPFH descriptor. The first and second rows depict examples from the easy (10-20m) and hard (20-30m) datasets, respectively. Inliers (residuals $<0.5m$) are indicated by green lines. Best viewed on screen.
  • Figure 5: Sensitivity of IR to $N$ and $L$. Each scatter is the mean of metrics, and $\delta$ is the standard deviation of all samples.
  • ...and 7 more figures

Theorems & Definitions (2)

  • proof
  • definition thmcounterdefinition: Ring-wise Semantic Consistency