Table of Contents
Fetching ...

SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds

Yanbo Wang, Wentao Zhao, Chuan Cao, Tianchen Deng, Jingchuan Wang, Weidong Chen

TL;DR

SFPNet introduces Sparse Focal Point Modulation (SFPM) as a generalizable alternative to window-attention for LiDAR semantic segmentation, enabling multi-level contextualization and adaptive fusion without LiDAR-specific inductive bias. The model uses a gate-based aggregation to combine local and global contexts and a channel-wise information query to preserve feature semantics, achieving strong performance across mechanical spinning, solid-state, and hybrid-solid LiDAR datasets. A new large-scale hybrid-solid LiDAR dataset, S.MID, is introduced to evaluate robotic-substation scene understanding, with SFPM delivering state-of-the-art results on this dataset and competitive results on traditional benchmarks. The work demonstrates robust generalization and interpretability of SFPM, and sets a foundation for broader LiDAR data fusion and applications beyond driving scenarios.

Abstract

Although LiDAR semantic segmentation advances rapidly, state-of-the-art methods often incorporate specifically designed inductive bias derived from benchmarks originating from mechanical spinning LiDAR. This can limit model generalizability to other kinds of LiDAR technologies and make hyperparameter tuning more complex. To tackle these issues, we propose a generalized framework to accommodate various types of LiDAR prevalent in the market by replacing window-attention with our sparse focal point modulation. Our SFPNet is capable of extracting multi-level contexts and dynamically aggregating them using a gate mechanism. By implementing a channel-wise information query, features that incorporate both local and global contexts are encoded. We also introduce a novel large-scale hybrid-solid LiDAR semantic segmentation dataset for robotic applications. SFPNet demonstrates competitive performance on conventional benchmarks derived from mechanical spinning LiDAR, while achieving state-of-the-art results on benchmark derived from solid-state LiDAR. Additionally, it outperforms existing methods on our novel dataset sourced from hybrid-solid LiDAR. Code and dataset are available at https://github.com/Cavendish518/SFPNet and https://www.semanticindustry.top.

SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds

TL;DR

SFPNet introduces Sparse Focal Point Modulation (SFPM) as a generalizable alternative to window-attention for LiDAR semantic segmentation, enabling multi-level contextualization and adaptive fusion without LiDAR-specific inductive bias. The model uses a gate-based aggregation to combine local and global contexts and a channel-wise information query to preserve feature semantics, achieving strong performance across mechanical spinning, solid-state, and hybrid-solid LiDAR datasets. A new large-scale hybrid-solid LiDAR dataset, S.MID, is introduced to evaluate robotic-substation scene understanding, with SFPM delivering state-of-the-art results on this dataset and competitive results on traditional benchmarks. The work demonstrates robust generalization and interpretability of SFPM, and sets a foundation for broader LiDAR data fusion and applications beyond driving scenarios.

Abstract

Although LiDAR semantic segmentation advances rapidly, state-of-the-art methods often incorporate specifically designed inductive bias derived from benchmarks originating from mechanical spinning LiDAR. This can limit model generalizability to other kinds of LiDAR technologies and make hyperparameter tuning more complex. To tackle these issues, we propose a generalized framework to accommodate various types of LiDAR prevalent in the market by replacing window-attention with our sparse focal point modulation. Our SFPNet is capable of extracting multi-level contexts and dynamically aggregating them using a gate mechanism. By implementing a channel-wise information query, features that incorporate both local and global contexts are encoded. We also introduce a novel large-scale hybrid-solid LiDAR semantic segmentation dataset for robotic applications. SFPNet demonstrates competitive performance on conventional benchmarks derived from mechanical spinning LiDAR, while achieving state-of-the-art results on benchmark derived from solid-state LiDAR. Additionally, it outperforms existing methods on our novel dataset sourced from hybrid-solid LiDAR. Code and dataset are available at https://github.com/Cavendish518/SFPNet and https://www.semanticindustry.top.
Paper Structure (46 sections, 6 equations, 13 figures, 12 tables)

This paper contains 46 sections, 6 equations, 13 figures, 12 tables.

Figures (13)

  • Figure 1: Comparison of different types of LiDAR. \ref{['fig:lidar']} (a) compares three mainstream types of LiDAR technologies. Unlike camera, various types of LiDAR data have extremely different point distributions. Therefore, the generalizability of networks designed specifically for a particular LiDAR type is poor. \ref{['fig:lidar']} (b) contrasts the cumulative 1-second point clouds of Mid-360 (employed in our dataset) and commonly used VLP-32C. The non-repetitive scanning mode of Mid-360 covers a broader range of scenes, making it more suitable for industrial robots involving scene understanding tasks. Meanwhile, VLP-32C gathers more detailed road surface information.
  • Figure 1: Sensors and comparison between single frame and cumulative 1-second point clouds for Livox Mid-360. Although the single-frame point cloud is relatively sparse, the cumulative point cloud can better express the scene in the vertical direction. Please also note that only data collected by Livox Mid-360 and the corresponding labels are used in this research and have been released with S.MID.
  • Figure 2: Heuristic comparison with mainstream design. Based on point distribution of mechanical spinning LiDAR, Cylindrical partition zhu2021cylindrical and radial window lai2023spherical are proposed to extract the features of distant points. Focal neighborhood adapts to this problem by aggregating multi-level contexts. Since no special inductive bias is introduced, it can be elegantly applied to various kinds of LiDAR as shown in \ref{['fig:lidar']}.
  • Figure 2: Example of maps built in the annotation process.
  • Figure 3: Illustration of sparse focal point modulation.
  • ...and 8 more figures