Table of Contents
Fetching ...

Surface Normal Estimation with Transformers

Barry Shichen Hu, Siyun Liang, Johannes Paetzold, Huy H. Nguyen, Isao Echizen, Jiapeng Tang

TL;DR

This work tackles the challenge of estimating surface normals from noisy and density-varying point clouds. It introduces SNEtransformer, a Transformer-based backbone that directly predicts unoriented normals without explicit surface fitting, unifying and simplifying prior approaches. The method achieves state-of-the-art accuracy and faster inference on PCPNet and SceneNN, while demonstrating robustness to noise and improved geometry in Poisson-based reconstructions. Ablation studies confirm the effectiveness of combining enhanced Graph Convolution with global Transformer attention and highlight design choices that drive performance gains.

Abstract

We propose the use of a Transformer to accurately predict normals from point clouds with noise and density variations. Previous learning-based methods utilize PointNet variants to explicitly extract multi-scale features at different input scales, then focus on a surface fitting method by which local point cloud neighborhoods are fitted to a geometric surface approximated by either a polynomial function or a multi-layer perceptron (MLP). However, fitting surfaces to fixed-order polynomial functions can suffer from overfitting or underfitting, and learning MLP-represented hyper-surfaces requires pre-generated per-point weights. To avoid these limitations, we first unify the design choices in previous works and then propose a simplified Transformer-based model to extract richer and more robust geometric features for the surface normal estimation task. Through extensive experiments, we demonstrate that our Transformer-based method achieves state-of-the-art performance on both the synthetic shape dataset PCPNet, and the real-world indoor scene dataset SceneNN, exhibiting more noise-resilient behavior and significantly faster inference. Most importantly, we demonstrate that the sophisticated hand-designed modules in existing works are not necessary to excel at the task of surface normal estimation.

Surface Normal Estimation with Transformers

TL;DR

This work tackles the challenge of estimating surface normals from noisy and density-varying point clouds. It introduces SNEtransformer, a Transformer-based backbone that directly predicts unoriented normals without explicit surface fitting, unifying and simplifying prior approaches. The method achieves state-of-the-art accuracy and faster inference on PCPNet and SceneNN, while demonstrating robustness to noise and improved geometry in Poisson-based reconstructions. Ablation studies confirm the effectiveness of combining enhanced Graph Convolution with global Transformer attention and highlight design choices that drive performance gains.

Abstract

We propose the use of a Transformer to accurately predict normals from point clouds with noise and density variations. Previous learning-based methods utilize PointNet variants to explicitly extract multi-scale features at different input scales, then focus on a surface fitting method by which local point cloud neighborhoods are fitted to a geometric surface approximated by either a polynomial function or a multi-layer perceptron (MLP). However, fitting surfaces to fixed-order polynomial functions can suffer from overfitting or underfitting, and learning MLP-represented hyper-surfaces requires pre-generated per-point weights. To avoid these limitations, we first unify the design choices in previous works and then propose a simplified Transformer-based model to extract richer and more robust geometric features for the surface normal estimation task. Through extensive experiments, we demonstrate that our Transformer-based method achieves state-of-the-art performance on both the synthetic shape dataset PCPNet, and the real-world indoor scene dataset SceneNN, exhibiting more noise-resilient behavior and significantly faster inference. Most importantly, we demonstrate that the sophisticated hand-designed modules in existing works are not necessary to excel at the task of surface normal estimation.
Paper Structure (35 sections, 10 equations, 6 figures, 3 tables)

This paper contains 35 sections, 10 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: We unify and simplify existing learning-based methods for surface normal estimation by proposing a straightforward Transformer-based model that directly predicts normals without relying on surface fitting. Our greatly simplified method not only achieves state-of-the-art performance but also exhibits significantly faster inference speed than previous works. In the figure, we present the simplified pipelines of existing works for comparison, and visualize the prediction error using a heat map. Inference times are recorded as well.
  • Figure 2: Graph Convolution preserves locality, while the Transformer Encoder extracts multi-scale features. The global attention map assigns larger weights to 'more reliable' points and smaller weights to 'unreliable' ones, thereby functioning as a denoising mechanism.
  • Figure 3: a) Qualitative results on PCPNet dataset. The point cloud heatmap reflects the error on the normal estimation. b) Visualization of the per-point weight. CSA (AdaFit) favors smaller neighborhoods indiscriminately, while HSurf-Net is trained with weights that prioritize 'on surface' points. Meanwhile, the Transformer acquires optimal global attention weights through training on raw point cloud data.
  • Figure 4: a) Visualization of predicted normals on the Semantic3D dataset. Our method preserves sharper geometric details, as highlighted by the red and green border regions. b) Error visualization of noisy point clouds in the SceneNN datasets. Point colors correspond to the angular error mapped onto a heatmap. SNEtransformer predicts more accurate normals than baselines when the input is affected by noise.
  • Figure 5: Percentage of Good Points (PGP) graphs for the PCPNet and SceneNN datasets. The area under the blue color is enlarged and displayed in a black pane. Our method produces high-quality estimations in noisy settings.
  • ...and 1 more figures