Table of Contents
Fetching ...

PointNorm-Net: Self-Supervised Normal Prediction of 3D Point Clouds via Multi-Modal Distribution Estimation

Jie Zhang, Minghui Nie, Changqing Zou, Jian Liu, Ligang Liu, Junjie Cao

TL;DR

The paper tackles the challenge of estimating surface normals on real-world 3D point clouds without ground-truth annotations, addressing the domain gap between synthetic training data and real scans. It introduces PointNorm-Net, a self-supervised framework that leverages a three-stage multimodal distribution estimation strategy to identify the major mode of candidate normals, enabling robust normal prediction even at sharp features. The method combines a patch-based normal predictor with a candidate-consensus training objective and custom losses, achieving superior generalization on real Kinect, LiDAR, and TLS datasets while remaining efficient at inference. Its ground-truth sampling theory and multi-sample consensus paradigm offer a general approach that can be integrated with optimization-based or learning-based normal estimation and extended to other self-supervised point-cloud tasks.

Abstract

Although supervised deep normal estimators have recently shown impressive results on synthetic benchmarks, their performance deteriorates significantly in real-world scenarios due to the domain gap between synthetic and real data. Building high-quality real training data to boost those supervised methods is not trivial because point-wise annotation of normals for varying-scale real-world 3D scenes is a tedious and expensive task. This paper introduces PointNorm-Net, the first self-supervised deep learning framework to tackle this challenge. The key novelty of PointNorm-Net is a three-stage multi-modal normal distribution estimation paradigm that can be integrated into either deep or traditional optimization-based normal estimation frameworks. Extensive experiments show that our method achieves superior generalization and outperforms state-of-the-art conventional and deep learning approaches across three real-world datasets that exhibit distinct characteristics compared to the synthetic training data.

PointNorm-Net: Self-Supervised Normal Prediction of 3D Point Clouds via Multi-Modal Distribution Estimation

TL;DR

The paper tackles the challenge of estimating surface normals on real-world 3D point clouds without ground-truth annotations, addressing the domain gap between synthetic training data and real scans. It introduces PointNorm-Net, a self-supervised framework that leverages a three-stage multimodal distribution estimation strategy to identify the major mode of candidate normals, enabling robust normal prediction even at sharp features. The method combines a patch-based normal predictor with a candidate-consensus training objective and custom losses, achieving superior generalization on real Kinect, LiDAR, and TLS datasets while remaining efficient at inference. Its ground-truth sampling theory and multi-sample consensus paradigm offer a general approach that can be integrated with optimization-based or learning-based normal estimation and extended to other self-supervised point-cloud tasks.

Abstract

Although supervised deep normal estimators have recently shown impressive results on synthetic benchmarks, their performance deteriorates significantly in real-world scenarios due to the domain gap between synthetic and real data. Building high-quality real training data to boost those supervised methods is not trivial because point-wise annotation of normals for varying-scale real-world 3D scenes is a tedious and expensive task. This paper introduces PointNorm-Net, the first self-supervised deep learning framework to tackle this challenge. The key novelty of PointNorm-Net is a three-stage multi-modal normal distribution estimation paradigm that can be integrated into either deep or traditional optimization-based normal estimation frameworks. Extensive experiments show that our method achieves superior generalization and outperforms state-of-the-art conventional and deep learning approaches across three real-world datasets that exhibit distinct characteristics compared to the synthetic training data.
Paper Structure (27 sections, 2 theorems, 11 equations, 12 figures, 9 tables)

This paper contains 27 sections, 2 theorems, 11 equations, 12 figures, 9 tables.

Key Result

Theorem 1

If $E\{\varepsilon_{j}\}=0$, then we have $E\left\{\dot{\mathbf{n}}^{t}_{\theta}\right\}=E\left\{\tilde{\mathbf{n}}^{t}_{\theta}\right\}$.

Figures (12)

  • Figure 1: PointNorm-Net demonstrates better performance in comparison to traditional optimization-based methods and supervised deep normal estimators when dealing with real-world point cloud datasets, such as LiDAR sequence 06 of KITTI geiger2012we (left) and the PCV Kinect dataset MultiNormal2019 (right). The normal orientations are color-coded. On the left side, the estimated normals generated by DeepFit and PointNorm-Net are illustrated and zoomed in the blue and red boxes respectively for quality comparison. In contrast to the SOTA deep learning based approach DeepFit DeepFit, PointNorm-Net can better preserve sharp features and tiny structures while eliminating scanning noise more effectively. On the right side, the accuracy versus efficiency plot and 4 results on the PCV dataset indicate that PointNorm-Net is as fast as supervised deep methods, yet achieves much better accuracy.
  • Figure 2: A 2D illustration of candidate normal distribution of a point close to a sharp edge.
  • Figure 3: The training pipeline of the proposed PointNorm-Net. PointNorm-Net is a self-supervised and network-agnostic framework. Many patch-based normal estimation networks can be employed for the normal predictor. Note that during inference, the network only requires a single forward pass (i.e., the stage of major mode estimation), excluding stages 1 and 2.
  • Figure 4: We expect to find normals corresponding to planes with more inlier points (green points). The thickness of the normal is determined by the number of inlier points.
  • Figure 5: Candidate consensus loss function can effectively exclude the influence of disturbing normals (dotted normals). The red normals in (a) and (b) are the average normals of all candidates and solid candidates, respectively. The candidates in the green circles are regarded as inliers of the red normals. The darker the color, the higher their contribution to the red normals, which are determined by the candidate consensus loss.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Corollary 1