HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving

R. D. Lin; Pengcheng Weng; Yinqiao Wang; Han Ding; Jinsong Han; Fei Wang

HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving

R. D. Lin, Pengcheng Weng, Yinqiao Wang, Han Ding, Jinsong Han, Fei Wang

TL;DR

HiLoTs addresses the challenge of semi-supervised LiDAR segmentation by leveraging long-term temporal dynamics through a High Temporal Sensitivity Flow and a Low Temporal Sensitivity Flow, selectively processing distant and nearby regions and fusing them with cross-attention. The method uses cylindrical voxelization and multi-voxel aggregation to enable efficient Transformer-style embedding within a Mean Teacher SSL framework, achieving state-of-the-art results on SemanticKITTI and nuScenes and approaching LiDAR+Camera multimodal performance without camera data. Ablation confirms the benefits of HTSF/LTSF and cross-attention, and robustness analyses show competitive behavior under adverse conditions, highlighting practical impact for autonomous driving systems.

Abstract

LiDAR point cloud semantic segmentation plays a crucial role in autonomous driving. In recent years, semi-supervised methods have gained popularity due to their significant reduction in annotation labor and time costs. Current semi-supervised methods typically focus on point cloud spatial distribution or consider short-term temporal representations, e.g., only two adjacent frames, often overlooking the rich long-term temporal properties inherent in autonomous driving scenarios. In driving experience, we observe that nearby objects, such as roads and vehicles, remain stable while driving, whereas distant objects exhibit greater variability in category and shape. This natural phenomenon is also captured by LiDAR, which reflects lower temporal sensitivity for nearby objects and higher sensitivity for distant ones. To leverage these characteristics, we propose HiLoTs, which learns high-temporal sensitivity and low-temporal sensitivity representations from continuous LiDAR frames. These representations are further enhanced and fused using a cross-attention mechanism. Additionally, we employ a teacher-student framework to align the representations learned by the labeled and unlabeled branches, effectively utilizing the large amounts of unlabeled data. Experimental results on the SemanticKITTI and nuScenes datasets demonstrate that our proposed HiLoTs outperforms state-of-the-art semi-supervised methods, and achieves performance close to LiDAR+Camera multimodal approaches. Code is available on https://github.com/rdlin118/HiLoTs

HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving

TL;DR

Abstract

HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)