Table of Contents
Fetching ...

SALUDA: Surface-based Automotive Lidar Unsupervised Domain Adaptation

Björn Michele, Alexandre Boulch, Gilles Puy, Tuan-Hung Vu, Renaud Marlet, Nicolas Courty

TL;DR

This paper introduces an unsupervised auxiliary task of learning an implicit underlying surface representation simultaneously on source and target data, and demonstrates that this novel strategy differs from classical minimization of statistical divergences or lidar-specific domain adaptation techniques.

Abstract

Learning models on one labeled dataset that generalize well on another domain is a difficult task, as several shifts might happen between the data domains. This is notably the case for lidar data, for which models can exhibit large performance discrepancies due for instance to different lidar patterns or changes in acquisition conditions. This paper addresses the corresponding Unsupervised Domain Adaptation (UDA) task for semantic segmentation. To mitigate this problem, we introduce an unsupervised auxiliary task of learning an implicit underlying surface representation simultaneously on source and target data. As both domains share the same latent representation, the model is forced to accommodate discrepancies between the two sources of data. This novel strategy differs from classical minimization of statistical divergences or lidar-specific domain adaptation techniques. Our experiments demonstrate that our method achieves a better performance than the current state of the art, both in real-to-real and synthetic-to-real scenarios.

SALUDA: Surface-based Automotive Lidar Unsupervised Domain Adaptation

TL;DR

This paper introduces an unsupervised auxiliary task of learning an implicit underlying surface representation simultaneously on source and target data, and demonstrates that this novel strategy differs from classical minimization of statistical divergences or lidar-specific domain adaptation techniques.

Abstract

Learning models on one labeled dataset that generalize well on another domain is a difficult task, as several shifts might happen between the data domains. This is notably the case for lidar data, for which models can exhibit large performance discrepancies due for instance to different lidar patterns or changes in acquisition conditions. This paper addresses the corresponding Unsupervised Domain Adaptation (UDA) task for semantic segmentation. To mitigate this problem, we introduce an unsupervised auxiliary task of learning an implicit underlying surface representation simultaneously on source and target data. As both domains share the same latent representation, the model is forced to accommodate discrepancies between the two sources of data. This novel strategy differs from classical minimization of statistical divergences or lidar-specific domain adaptation techniques. Our experiments demonstrate that our method achieves a better performance than the current state of the art, both in real-to-real and synthetic-to-real scenarios.
Paper Structure (49 sections, 12 figures, 14 tables, 1 algorithm)

This paper contains 49 sections, 12 figures, 14 tables, 1 algorithm.

Figures (12)

  • Figure 1: Unsupervised domain adaptation with SALUDA. It leverages annotated source data, e.g., nuScenes dataset and unlabeled target data, e.g., SemanticKITTI, for semantic segmentation of the target. The surface, which is a by-product of the approach, is colored according to the semantic predictions.
  • Figure 2: Overview of SALUDA (training stage). Step 1, the backbone $\phi(\cdot)$ is trained alternating between source and target point clouds. With (annotated) source data, it produces point-wise latent vectors that are used both by the segmentation head ${\rm cls}(\cdot)$ to classify each point and yield semantic segments, and by the surface reconstruction head ${\rm surf}(\cdot)$ to estimate occupancy. With (unannotated) target data, the latent vectors are only fed to the surface reconstruction head. Conversely, at test time, only the semantic segmentation head is used. Step 2, the obtained weights are used as an initialization for teacher/student self-training. It is done with true labels for source data and pseudo-labels for target data. The teacher is an exponential moving average (EMA) of the student. The self-training loss $\mathcal{L}_{\mathsf{ST}}$ is defined in Section \ref{['sec:traininglosses']}.
  • Figure 3: Visibility query point sampling:$q_{\mathsf{sight}}$ and $q_{\mathsf{front}}$ are placed on the line of sight between sensor and observed point $p$ and pseudo-labeled as empty; $q_{\mathsf{behind}}$ is placed just "after" $p$ and pseudo-labeled as full.
  • Figure 4: Visualization of semantic segmentation results. Obtained with SALUDA, source-only, Mixed BN and CoSMix saltori2022cosmix in the setting NS$\rightarrow$SK$_{10}$, along with the ground-truth segmentation. Classes:car, drivable surf., pedestrian, sidewalk, terrain, vegetation .
  • Figure 5: t-SNE visualisations of the latent space structure for the source-only method and SALUDA, in the NS$\rightarrow$SK$_{10}$ and SynL$\rightarrow$SK$_{19}$ settings. Colors:Source points, Target points.
  • ...and 7 more figures