Table of Contents
Fetching ...

SLNet: A Super-Lightweight Geometry-Adaptive Network for 3D Point Cloud Recognition

Mohammad Saeid, Amir Salarpour, Pedram MohajerAnsari, Mert D. Pesé

TL;DR

SLNet is presented, a lightweight backbone for 3D point cloud recognition designed to achieve strong performance without the computational cost of many recent attention, graph, and deep MLP based models, and NetScore+, which extends NetScore by incorporating latency and peak memory so that efficiency can be evaluated in a more deployment oriented way.

Abstract

We present SLNet, a lightweight backbone for 3D point cloud recognition designed to achieve strong performance without the computational cost of many recent attention, graph, and deep MLP based models. The model is built on two simple ideas: NAPE (Nonparametric Adaptive Point Embedding), which captures spatial structure using a combination of Gaussian RBF and cosine bases with input adaptive bandwidth and blending, and GMU (Geometric Modulation Unit), a per channel affine modulator that adds only 2D learnable parameters. These components are used within a four stage hierarchical encoder with FPS+kNN grouping, nonparametric normalization, and shared residual MLPs. In experiments, SLNet shows that a very small model can still remain highly competitive across several 3D recognition tasks. On ModelNet40, SLNet-S with 0.14M parameters and 0.31 GFLOPs achieves 93.64% overall accuracy, outperforming PointMLP-elite with 5x fewer parameters, while SLNet-M with 0.55M parameters and 1.22 GFLOPs reaches 93.92%, exceeding PointMLP with 24x fewer parameters. On ScanObjectNN, SLNet-M achieves 84.25% overall accuracy within 1.2 percentage points of PointMLP while using 28x fewer parameters. For large scale scene segmentation, SLNet-T extends the backbone with local Point Transformer attention and reaches 58.2% mIoU on S3DIS Area 5 with only 2.5M parameters, more than 17x fewer than Point Transformer V3. We also introduce NetScore+, which extends NetScore by incorporating latency and peak memory so that efficiency can be evaluated in a more deployment oriented way. Across multiple benchmarks and hardware settings, SLNet delivers a strong overall balance between accuracy and efficiency. Code is available at: https://github.com/m-saeid/SLNet.

SLNet: A Super-Lightweight Geometry-Adaptive Network for 3D Point Cloud Recognition

TL;DR

SLNet is presented, a lightweight backbone for 3D point cloud recognition designed to achieve strong performance without the computational cost of many recent attention, graph, and deep MLP based models, and NetScore+, which extends NetScore by incorporating latency and peak memory so that efficiency can be evaluated in a more deployment oriented way.

Abstract

We present SLNet, a lightweight backbone for 3D point cloud recognition designed to achieve strong performance without the computational cost of many recent attention, graph, and deep MLP based models. The model is built on two simple ideas: NAPE (Nonparametric Adaptive Point Embedding), which captures spatial structure using a combination of Gaussian RBF and cosine bases with input adaptive bandwidth and blending, and GMU (Geometric Modulation Unit), a per channel affine modulator that adds only 2D learnable parameters. These components are used within a four stage hierarchical encoder with FPS+kNN grouping, nonparametric normalization, and shared residual MLPs. In experiments, SLNet shows that a very small model can still remain highly competitive across several 3D recognition tasks. On ModelNet40, SLNet-S with 0.14M parameters and 0.31 GFLOPs achieves 93.64% overall accuracy, outperforming PointMLP-elite with 5x fewer parameters, while SLNet-M with 0.55M parameters and 1.22 GFLOPs reaches 93.92%, exceeding PointMLP with 24x fewer parameters. On ScanObjectNN, SLNet-M achieves 84.25% overall accuracy within 1.2 percentage points of PointMLP while using 28x fewer parameters. For large scale scene segmentation, SLNet-T extends the backbone with local Point Transformer attention and reaches 58.2% mIoU on S3DIS Area 5 with only 2.5M parameters, more than 17x fewer than Point Transformer V3. We also introduce NetScore+, which extends NetScore by incorporating latency and peak memory so that efficiency can be evaluated in a more deployment oriented way. Across multiple benchmarks and hardware settings, SLNet delivers a strong overall balance between accuracy and efficiency. Code is available at: https://github.com/m-saeid/SLNet.
Paper Structure (18 sections, 6 equations, 5 figures, 13 tables)

This paper contains 18 sections, 6 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: NetScore and NetScore$^{+}$ across four datasets and two hardware platforms (RTX 3090 and Jetson Orin Nano), spanning 12 metric–dataset–hardware axes. SLNet-S and SLNet-M consistently occupy the outermost positions, establishing Pareto-optimal deployability across all evaluated configurations.
  • Figure 2: SLNet architecture: a NAPE+GMU front end, a four-stage hierarchical encoder with FPS, parameter-free normalization, and lightweight residual refinement, and task-specific heads for classification and segmentation.
  • Figure 3: NAPE: raw 3D coordinates mapped via fused Gaussian RBF and cosine bases with data-driven adaptive bandwidth, yielding a parameter-free geometry encoding.
  • Figure 4: GMU: per-channel affine recalibration of the NAPE output with only $2D$ learnable scalars.
  • Figure 5: Qualitative comparison of saliency on ModelNet40 samples: top row—DGCNN stage-2 EdgeConv gradient saliency; middle and bottom—NAPE gradient saliency (channel widths 16 and 32). NAPE yields sharper, semantically focused responses despite operating at the input layer.