Table of Contents
Fetching ...

SOAP: Cross-sensor Domain Adaptation for 3D Object Detection Using Stationary Object Aggregation Pseudo-labelling

Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, Krzysztof Czarnecki

TL;DR

SOAP addresses cross-sensor domain adaptation for LiDAR-based 3D object detection by using Scene-level Full-sequence Aggregation to densify stationary objects, coupled with Quasi-Stationary Training and Spatial Consistency Post-processing to generate high-quality pseudo-labels. By combining these pseudo-labels with a pre-trained detector, SOAP improves cross-domain detection and complements existing SOTA domain adaptation methods in both unsupervised and semi-supervised settings, closing substantial portions of the domain gap (e.g., over 30% and up to ~90% in some configurations). The approach is validated on nuScenes and Waymo with CenterPoint and VoxelNeXt backbones, showing strong gains at longer ranges (30–50 m) and across evaluation metrics (mAP, NDS, and Waymo AP). SOAP’s results demonstrate practical impact for robust cross-sensor deployment, enabling better detector transfer across evolving sensor hardware in autonomous systems.

Abstract

We consider the problem of cross-sensor domain adaptation in the context of LiDAR-based 3D object detection and propose Stationary Object Aggregation Pseudo-labelling (SOAP) to generate high quality pseudo-labels for stationary objects. In contrast to the current state-of-the-art in-domain practice of aggregating just a few input scans, SOAP aggregates entire sequences of point clouds at the input level to reduce the sensor domain gap. Then, by means of what we call quasi-stationary training and spatial consistency post-processing, the SOAP model generates accurate pseudo-labels for stationary objects, closing a minimum of 30.3% domain gap compared to few-frame detectors. Our results also show that state-of-the-art domain adaptation approaches can achieve even greater performance in combination with SOAP, in both the unsupervised and semi-supervised settings.

SOAP: Cross-sensor Domain Adaptation for 3D Object Detection Using Stationary Object Aggregation Pseudo-labelling

TL;DR

SOAP addresses cross-sensor domain adaptation for LiDAR-based 3D object detection by using Scene-level Full-sequence Aggregation to densify stationary objects, coupled with Quasi-Stationary Training and Spatial Consistency Post-processing to generate high-quality pseudo-labels. By combining these pseudo-labels with a pre-trained detector, SOAP improves cross-domain detection and complements existing SOTA domain adaptation methods in both unsupervised and semi-supervised settings, closing substantial portions of the domain gap (e.g., over 30% and up to ~90% in some configurations). The approach is validated on nuScenes and Waymo with CenterPoint and VoxelNeXt backbones, showing strong gains at longer ranges (30–50 m) and across evaluation metrics (mAP, NDS, and Waymo AP). SOAP’s results demonstrate practical impact for robust cross-sensor deployment, enabling better detector transfer across evolving sensor hardware in autonomous systems.

Abstract

We consider the problem of cross-sensor domain adaptation in the context of LiDAR-based 3D object detection and propose Stationary Object Aggregation Pseudo-labelling (SOAP) to generate high quality pseudo-labels for stationary objects. In contrast to the current state-of-the-art in-domain practice of aggregating just a few input scans, SOAP aggregates entire sequences of point clouds at the input level to reduce the sensor domain gap. Then, by means of what we call quasi-stationary training and spatial consistency post-processing, the SOAP model generates accurate pseudo-labels for stationary objects, closing a minimum of 30.3% domain gap compared to few-frame detectors. Our results also show that state-of-the-art domain adaptation approaches can achieve even greater performance in combination with SOAP, in both the unsupervised and semi-supervised settings.
Paper Structure (42 sections, 5 equations, 9 figures, 8 tables)

This paper contains 42 sections, 5 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Scan lines are evident in point clouds when only few input frames are used \ref{['fig:nuScenes_sparse']}\ref{['fig:Waymo_sparse']}, appearing as obvious modes in CDF plots that largely differ because of the modes \ref{['fig:CDF_sparse']}. Aggregating many more frames removes visible scan lines \ref{['fig:nuScenes_dense']}\ref{['fig:Waymo_dense']} and makes CDFs for similar objects in different datasets more alike \ref{['fig:CDF_dense']}
  • Figure 2: Overview of Stationary Object Aggregation Pseudo-labelling (SOAP) pipeline. (a) We first perform Scene-level Full-sequence Aggregation (SFA) using pose transforms. (b) We propose Quasi-Stationary Training (QST) to train a SOAP model to detect stationary objects. (c) The predictions are refined via Spatial Consistency Post-processing (SCP). (d) The predictions from a pre-trained single-/few-frame detector and the SOAP model are combined using Weighted Box Fusion (WBF) solovyev2021wbf. (e) The final SOAP pseudo-labels can be used in combination with SOTA methods to fine-tune a target domain detector.
  • Figure 3: Example of a point cloud generated by SFA. Dynamic objects are distorted while stationary objects are densified.
  • Figure 4: Example of a quasi-stationary object. This object reached a maximum speed of 1.4 m/s with a total displacement of 3.9 m, and thus would be eliminated by naive filtering.
  • Figure 5: Cumulative distribution for Vehicle / Car speed in realistic self-driving datasets.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Definition 1: QSS