TFNet: Exploiting Temporal Cues for Fast and Accurate LiDAR Semantic Segmentation
Rong Li, ShiJie Li, Xieyuanli Chen, Teli Ma, Juergen Gall, Junwei Liang
TL;DR
TFNet tackles the many-to-one boundary problem in range-image LiDAR semantic segmentation by introducing a Temporal Cross-Attention (TCA) module to fuse features from previous scans and a Max-Voting Post-Processing (MVP) step to refine predictions during inference. The approach projects LiDAR frames to range images, extracts multi-scale features, and uses TCA to integrate temporal context, while MVP aligns past predictions in a common frame and performs voxel-wise max-voting. Experiments on SemanticKITTI and SemanticPOSS show that TFNet achieves state-of-the-art performance among range-image methods, with MVP providing consistent gains across backbones and maintaining real-time inference. This work demonstrates that temporal coherence effectively resolves occlusions and projection ambiguities, offering a practical, plug-in improvement for LiDAR-based semantic segmentation in autonomous driving settings.
Abstract
LiDAR semantic segmentation plays a crucial role in enabling autonomous driving and robots to understand their surroundings accurately and robustly. A multitude of methods exist within this domain, including point-based, range-image-based, polar-coordinate-based, and hybrid strategies. Among these, range-image-based techniques have gained widespread adoption in practical applications due to their efficiency. However, they face a significant challenge known as the ``many-to-one'' problem caused by the range image's limited horizontal and vertical angular resolution. As a result, around 20% of the 3D points can be occluded. In this paper, we present TFNet, a range-image-based LiDAR semantic segmentation method that utilizes temporal information to address this issue. Specifically, we incorporate a temporal fusion layer to extract useful information from previous scans and integrate it with the current scan. We then design a max-voting-based post-processing technique to correct false predictions, particularly those caused by the ``many-to-one'' issue. We evaluated the approach on two benchmarks and demonstrated that the plug-in post-processing technique is generic and can be applied to various networks.
