Table of Contents
Fetching ...

PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies

Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Abed Al Kader Hammoud, Mohamed Elhoseiny, Bernard Ghanem

TL;DR

The paper shows that the gap between PointNet++ and modern 3D point-cloud models is largely due to training and scaling choices rather than purely architectural advances. By systematically evaluating data augmentations and optimization techniques, the authors boost PointNet++ to competitive levels, then introduce PointNeXt, which uses an inverted residual MLP, separable components, and residual connections to scale effectively. Across segmentation, classification, and part segmentation benchmarks, PointNeXt achieves state-of-the-art results while maintaining high throughput, demonstrating the practical value of training and scaling in point-cloud networks. The work encourages a shift in focus toward training and scaling strategies alongside architectural innovation in 3D point cloud research.

Abstract

PointNet++ is one of the most influential neural architectures for point cloud understanding. Although the accuracy of PointNet++ has been largely surpassed by recent networks such as PointMLP and Point Transformer, we find that a large portion of the performance gain is due to improved training strategies, i.e. data augmentation and optimization techniques, and increased model sizes rather than architectural innovations. Thus, the full potential of PointNet++ has yet to be explored. In this work, we revisit the classical PointNet++ through a systematic study of model training and scaling strategies, and offer two major contributions. First, we propose a set of improved training strategies that significantly improve PointNet++ performance. For example, we show that, without any change in architecture, the overall accuracy (OA) of PointNet++ on ScanObjectNN object classification can be raised from 77.9% to 86.1%, even outperforming state-of-the-art PointMLP. Second, we introduce an inverted residual bottleneck design and separable MLPs into PointNet++ to enable efficient and effective model scaling and propose PointNeXt, the next version of PointNets. PointNeXt can be flexibly scaled up and outperforms state-of-the-art methods on both 3D classification and segmentation tasks. For classification, PointNeXt reaches an overall accuracy of 87.7 on ScanObjectNN, surpassing PointMLP by 2.3%, while being 10x faster in inference. For semantic segmentation, PointNeXt establishes a new state-of-the-art performance with 74.9% mean IoU on S3DIS (6-fold cross-validation), being superior to the recent Point Transformer. The code and models are available at https://github.com/guochengqian/pointnext.

PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies

TL;DR

The paper shows that the gap between PointNet++ and modern 3D point-cloud models is largely due to training and scaling choices rather than purely architectural advances. By systematically evaluating data augmentations and optimization techniques, the authors boost PointNet++ to competitive levels, then introduce PointNeXt, which uses an inverted residual MLP, separable components, and residual connections to scale effectively. Across segmentation, classification, and part segmentation benchmarks, PointNeXt achieves state-of-the-art results while maintaining high throughput, demonstrating the practical value of training and scaling in point-cloud networks. The work encourages a shift in focus toward training and scaling strategies alongside architectural innovation in 3D point cloud research.

Abstract

PointNet++ is one of the most influential neural architectures for point cloud understanding. Although the accuracy of PointNet++ has been largely surpassed by recent networks such as PointMLP and Point Transformer, we find that a large portion of the performance gain is due to improved training strategies, i.e. data augmentation and optimization techniques, and increased model sizes rather than architectural innovations. Thus, the full potential of PointNet++ has yet to be explored. In this work, we revisit the classical PointNet++ through a systematic study of model training and scaling strategies, and offer two major contributions. First, we propose a set of improved training strategies that significantly improve PointNet++ performance. For example, we show that, without any change in architecture, the overall accuracy (OA) of PointNet++ on ScanObjectNN object classification can be raised from 77.9% to 86.1%, even outperforming state-of-the-art PointMLP. Second, we introduce an inverted residual bottleneck design and separable MLPs into PointNet++ to enable efficient and effective model scaling and propose PointNeXt, the next version of PointNets. PointNeXt can be flexibly scaled up and outperforms state-of-the-art methods on both 3D classification and segmentation tasks. For classification, PointNeXt reaches an overall accuracy of 87.7 on ScanObjectNN, surpassing PointMLP by 2.3%, while being 10x faster in inference. For semantic segmentation, PointNeXt establishes a new state-of-the-art performance with 74.9% mean IoU on S3DIS (6-fold cross-validation), being superior to the recent Point Transformer. The code and models are available at https://github.com/guochengqian/pointnext.
Paper Structure (23 sections, 2 equations, 5 figures, 11 tables)

This paper contains 23 sections, 2 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Effects of training strategies and model scaling on PointNet++qi2017pointnet2. We show that improved training strategies (data augmentation and optimization techniques) and model scaling can significantly boost PointNet++ performance. The average overall accuracy and mIoU (6-fold cross-validation) are reported on ScanObjectNN uy-scanobjectnn-iccv19 and S3DIS armeni2016s3dis.
  • Figure 2: PointNeXt architecture. PointNeXt shares the same Set Abstraction and Feature Propagation blocks as PointNet++ qi2017pointnet2, while adding an additional MLP layer at the beginning and scaling the architecture with the proposed Inverted Residual MLP (InvResMLP) blocks.
  • Figure I: PointNeXt architecture for classification. The classification architecture shares the same encoder as the segmentation architecture.
  • Figure II: Qualitative comparisons of PointNet++ ($2^{nd}$ column), PointNeXt ($3^{rd}$ column), and Ground Truth ($4^{th}$ column) on S3DIS semantic segmentation. The input point cloud is visualized with original colors in the $1^{st}$ column. Differences between PointNet++ and PointNeXt are highlighted with red dash circles. Zoom-in for details.
  • Figure III: Qualitative comparisons of PointNet++ (left), PointNeXt (middle), and Ground Truth (right) on ShapeNetPart part segmentation.