Table of Contents
Fetching ...

PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds

Barza Nisar, Steven L. Waslander

TL;DR

PSA-SSL addresses the loss of pose and size information in self-supervised learning for LiDAR point clouds by integrating a self-supervised bounding box regression task with a LiDAR beam pattern augmentation strategy. This yields pose- and size-aware, sensor-agnostic representations that improve 3D semantic segmentation under limited labels and enhance cross-LiDAR transfer, while remaining lightweight and model-agnostic. The method demonstrates strong improvements over state-of-the-art SSL baselines on segmentation and competitive results for object detection across Waymo, nuScenes, and SemanticKITTI, and achieves faster pretraining. The work provides practical tools for building transferable 3D perception systems across diverse LiDAR sensors.

Abstract

Self-supervised learning (SSL) on 3D point clouds has the potential to learn feature representations that can transfer to diverse sensors and multiple downstream perception tasks. However, recent SSL approaches fail to define pretext tasks that retain geometric information such as object pose and scale, which can be detrimental to the performance of downstream localization and geometry-sensitive 3D scene understanding tasks, such as 3D semantic segmentation and 3D object detection. We propose PSA-SSL, a novel extension to point cloud SSL that learns object pose and size-aware (PSA) features. Our approach defines a self-supervised bounding box regression pretext task, which retains object pose and size information. Furthermore, we incorporate LiDAR beam pattern augmentation on input point clouds, which encourages learning sensor-agnostic features. Our experiments demonstrate that with a single pretrained model, our light-weight yet effective extensions achieve significant improvements on 3D semantic segmentation with limited labels across popular autonomous driving datasets (Waymo, nuScenes, SemanticKITTI). Moreover, our approach outperforms other state-of-the-art SSL methods on 3D semantic segmentation (using up to 10 times less labels), as well as on 3D object detection. Our code will be released on https://github.com/TRAILab/PSA-SSL.

PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds

TL;DR

PSA-SSL addresses the loss of pose and size information in self-supervised learning for LiDAR point clouds by integrating a self-supervised bounding box regression task with a LiDAR beam pattern augmentation strategy. This yields pose- and size-aware, sensor-agnostic representations that improve 3D semantic segmentation under limited labels and enhance cross-LiDAR transfer, while remaining lightweight and model-agnostic. The method demonstrates strong improvements over state-of-the-art SSL baselines on segmentation and competitive results for object detection across Waymo, nuScenes, and SemanticKITTI, and achieves faster pretraining. The work provides practical tools for building transferable 3D perception systems across diverse LiDAR sensors.

Abstract

Self-supervised learning (SSL) on 3D point clouds has the potential to learn feature representations that can transfer to diverse sensors and multiple downstream perception tasks. However, recent SSL approaches fail to define pretext tasks that retain geometric information such as object pose and scale, which can be detrimental to the performance of downstream localization and geometry-sensitive 3D scene understanding tasks, such as 3D semantic segmentation and 3D object detection. We propose PSA-SSL, a novel extension to point cloud SSL that learns object pose and size-aware (PSA) features. Our approach defines a self-supervised bounding box regression pretext task, which retains object pose and size information. Furthermore, we incorporate LiDAR beam pattern augmentation on input point clouds, which encourages learning sensor-agnostic features. Our experiments demonstrate that with a single pretrained model, our light-weight yet effective extensions achieve significant improvements on 3D semantic segmentation with limited labels across popular autonomous driving datasets (Waymo, nuScenes, SemanticKITTI). Moreover, our approach outperforms other state-of-the-art SSL methods on 3D semantic segmentation (using up to 10 times less labels), as well as on 3D object detection. Our code will be released on https://github.com/TRAILab/PSA-SSL.

Paper Structure

This paper contains 23 sections, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Comparison of qualitative semantic segmentation results of PSA-DepthContrast and PSA-SegContrast against their original baselines Zhang_2021_depthcontrastnunes2022segcontrast on SemanticKITTI validation scan. Our approach can capture the full extent of objects in the scene, thus exhibiting the least label confusion within a single object.
  • Figure 2: An overview of our self-supervised point cloud representation learning framework.
  • Figure 3: Comparison of qualitative semantic segmentation results of PSA-DepthContrast and PSA-SegContrast against their original baselines Zhang_2021_depthcontrastnunes2022segcontrast on SemanticKITTI validation scan.