Table of Contents
Fetching ...

COARSE: Collaborative Pseudo-Labeling with Coarse Real Labels for Off-Road Semantic Segmentation

Aurelio Noca, Xianmei Lei, Jonathan Becktor, Jeffrey Edlund, Anna Sabel, Patrick Spieler, Curtis Padgett, Alexandre Alahi, Deegan Atha

TL;DR

Off-road semantic segmentation suffers from scarce dense labels and strong domain gaps. COARSE introduces a semi-supervised domain adaptation framework that combines sparse in-domain coarse labels with densely labeled out-of-domain data via a collaborative pseudo-labeling pipeline built on a DINOv2 backbone and two decoders (PixelDecoder and PatchDecoder). The method achieves substantial $mIoU$ gains on Rellis-3D and RUGD (8.4% and 9.7% over coarse-label baselines) and demonstrates applicability in real-world multi-biome driving scenarios, showcasing data-efficient learning and robust generalization. This work reduces labeling costs while leveraging unlabeled and simulated data to enhance off-road perception for autonomous navigation.

Abstract

Autonomous off-road navigation faces challenges due to diverse, unstructured environments, requiring robust perception with both geometric and semantic understanding. However, scarce densely labeled semantic data limits generalization across domains. Simulated data helps, but introduces domain adaptation issues. We propose COARSE, a semi-supervised domain adaptation framework for off-road semantic segmentation, leveraging sparse, coarse in-domain labels and densely labeled out-of-domain data. Using pretrained vision transformers, we bridge domain gaps with complementary pixel-level and patch-level decoders, enhanced by a collaborative pseudo-labeling strategy on unlabeled data. Evaluations on RUGD and Rellis-3D datasets show significant improvements of 9.7\% and 8.4\% respectively, versus only using coarse data. Tests on real-world off-road vehicle data in a multi-biome setting further demonstrate COARSE's applicability.

COARSE: Collaborative Pseudo-Labeling with Coarse Real Labels for Off-Road Semantic Segmentation

TL;DR

Off-road semantic segmentation suffers from scarce dense labels and strong domain gaps. COARSE introduces a semi-supervised domain adaptation framework that combines sparse in-domain coarse labels with densely labeled out-of-domain data via a collaborative pseudo-labeling pipeline built on a DINOv2 backbone and two decoders (PixelDecoder and PatchDecoder). The method achieves substantial gains on Rellis-3D and RUGD (8.4% and 9.7% over coarse-label baselines) and demonstrates applicability in real-world multi-biome driving scenarios, showcasing data-efficient learning and robust generalization. This work reduces labeling costs while leveraging unlabeled and simulated data to enhance off-road perception for autonomous navigation.

Abstract

Autonomous off-road navigation faces challenges due to diverse, unstructured environments, requiring robust perception with both geometric and semantic understanding. However, scarce densely labeled semantic data limits generalization across domains. Simulated data helps, but introduces domain adaptation issues. We propose COARSE, a semi-supervised domain adaptation framework for off-road semantic segmentation, leveraging sparse, coarse in-domain labels and densely labeled out-of-domain data. Using pretrained vision transformers, we bridge domain gaps with complementary pixel-level and patch-level decoders, enhanced by a collaborative pseudo-labeling strategy on unlabeled data. Evaluations on RUGD and Rellis-3D datasets show significant improvements of 9.7\% and 8.4\% respectively, versus only using coarse data. Tests on real-world off-road vehicle data in a multi-biome setting further demonstrate COARSE's applicability.

Paper Structure

This paper contains 11 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Polaris RZR dune buggy, outfitted with an extensive sensor suite for autonomous off-road navigation, being driven in the San Diego Grasslands biome.
  • Figure 2: Our COARSE pseudo-labeling approach leverages two decoders -- PatchDecoder and PixelDecoder -- both utilizing the robust semantic features of the DINOv2 encoder. The PixelDecoder further integrates low-level geometric details from the input image. The PatchDecoder is trained on coarse ID data, while the PixelDecoder is trained on a combination of dense OOD and coarse ID data. Pseudo-labels are generated via disagreement of predicted semantic maps.
  • Figure 3: Samples (left) and labels (right) from our multi-biome dataset. Paso Robles Grassland (top), Mojave Desert (middle-top), San Gabriel Canyon (middle-bottom) and synthetic Forest-Sim (bottom).
  • Figure 4: Samples (left) and labels (right) from the Rellis-3D dataset (top two) and RUGD (bottom two), with our custom class mapping.
  • Figure 5: Images (left) and pseudo-labels (right) for the Paso Robles Grassland (top two) and Sang Gabriel Canyon (bottom two). The first and third samples are generated with a Pixel-Pixel model combination, while the second and fourth are generated with the Pixel-Patch pairing.
  • ...and 1 more figures