Table of Contents
Fetching ...

FRNet: Frustum-Range Networks for Scalable LiDAR Segmentation

Xiang Xu, Lingdong Kong, Hui Shuai, Qingshan Liu

TL;DR

FRNet introduces a Frustum-Range representation that integrates per-point geometry into a range-view backbone to achieve scalable, end-to-end LiDAR segmentation. The framework comprises a Frustum Feature Encoder (FFE) to extract per-point features within frustum regions, a Frustum-Point (FP) fusion module to hierarchically update point and frustum features, and a Fusion Head (FH) to combine multi-level features for final predictions, all without post-processing. It also adds two efficient data-augmentation and reconstruction techniques, FrustumMix and RangeInterpolation, to boost robustness and representational quality. Across four major LiDAR benchmarks, FRNet attains competitive accuracy (e.g., mIoU of 73.3% on SemanticKITTI and 82.5% on nuScenes test) while delivering approximately 5x faster inference than state-of-the-art methods, underscoring strong potential for real-time, scalable autonomous-driving perception.

Abstract

LiDAR segmentation has become a crucial component of advanced autonomous driving systems. Recent range-view LiDAR segmentation approaches show promise for real-time processing. However, they inevitably suffer from corrupted contextual information and rely heavily on post-processing techniques for prediction refinement. In this work, we propose FRNet, a simple yet powerful method aimed at restoring the contextual information of range image pixels using corresponding frustum LiDAR points. First, a frustum feature encoder module is used to extract per-point features within the frustum region, which preserves scene consistency and is critical for point-level predictions. Next, a frustum-point fusion module is introduced to update per-point features hierarchically, enabling each point to extract more surrounding information through the frustum features. Finally, a head fusion module is used to fuse features at different levels for final semantic predictions. Extensive experiments conducted on four popular LiDAR segmentation benchmarks under various task setups demonstrate the superiority of FRNet. Notably, FRNet achieves 73.3% and 82.5% mIoU scores on the testing sets of SemanticKITTI and nuScenes. While achieving competitive performance, FRNet operates 5 times faster than state-of-the-art approaches. Such high efficiency opens up new possibilities for more scalable LiDAR segmentation. The code has been made publicly available at https://github.com/Xiangxu-0103/FRNet.

FRNet: Frustum-Range Networks for Scalable LiDAR Segmentation

TL;DR

FRNet introduces a Frustum-Range representation that integrates per-point geometry into a range-view backbone to achieve scalable, end-to-end LiDAR segmentation. The framework comprises a Frustum Feature Encoder (FFE) to extract per-point features within frustum regions, a Frustum-Point (FP) fusion module to hierarchically update point and frustum features, and a Fusion Head (FH) to combine multi-level features for final predictions, all without post-processing. It also adds two efficient data-augmentation and reconstruction techniques, FrustumMix and RangeInterpolation, to boost robustness and representational quality. Across four major LiDAR benchmarks, FRNet attains competitive accuracy (e.g., mIoU of 73.3% on SemanticKITTI and 82.5% on nuScenes test) while delivering approximately 5x faster inference than state-of-the-art methods, underscoring strong potential for real-time, scalable autonomous-driving perception.

Abstract

LiDAR segmentation has become a crucial component of advanced autonomous driving systems. Recent range-view LiDAR segmentation approaches show promise for real-time processing. However, they inevitably suffer from corrupted contextual information and rely heavily on post-processing techniques for prediction refinement. In this work, we propose FRNet, a simple yet powerful method aimed at restoring the contextual information of range image pixels using corresponding frustum LiDAR points. First, a frustum feature encoder module is used to extract per-point features within the frustum region, which preserves scene consistency and is critical for point-level predictions. Next, a frustum-point fusion module is introduced to update per-point features hierarchically, enabling each point to extract more surrounding information through the frustum features. Finally, a head fusion module is used to fuse features at different levels for final semantic predictions. Extensive experiments conducted on four popular LiDAR segmentation benchmarks under various task setups demonstrate the superiority of FRNet. Notably, FRNet achieves 73.3% and 82.5% mIoU scores on the testing sets of SemanticKITTI and nuScenes. While achieving competitive performance, FRNet operates 5 times faster than state-of-the-art approaches. Such high efficiency opens up new possibilities for more scalable LiDAR segmentation. The code has been made publicly available at https://github.com/Xiangxu-0103/FRNet.
Paper Structure (28 sections, 12 equations, 11 figures, 14 tables)

This paper contains 28 sections, 12 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: A study on the scalability of state-of-the-art LiDAR segmentation models on the SemanticKITTI behley2019semantickitti leaderboard. The size of the circular representation corresponds to the number of model parameters. FRNet achieves competitive performance with current state-of-the-art models while still maintaining satisfactory efficiency for real-time processing.
  • Figure 2: Pilot study on the performance degradation of post-processing in existing range-view methods milioto2019rangenet++aksoy2020salsanetzhao2021fidnetcheng2022cenet on the val set of SemanticKITTI behley2019semantickitti. We choose various $K$ values as hyperparameters in KNN post-processing. Compared to their performance at 2D (i.e., the range image), a severe drop in performance occurs with different $K$ values.
  • Figure 3: Architecture overview. The proposed FRNet comprises three main components: 1) Frustum Feature Encoder is used to embed per-point features within the frustum region. 2) Frustum-Point (FP) Fusion Module updates per-point features hierarchically at each stage of the 2D backbone. 3) Fusion Head fuses different levels of features to predict final results.
  • Figure 4: Frustum-point fusion module comprises two steps: 1) A Frustum-to-Point fusion to update per-point features. 2) A Point-to-Frustum fusion to update frustum features.
  • Figure 5: FrustumMix illustration. (a) and (b) show the original two LiDAR scenes. (c) presents the mixed scenes generated by the FrustumMix strategy, where scene 1 is colored green and scene 2 is colored purple.
  • ...and 6 more figures