Table of Contents
Fetching ...

Self-Supervised Traversability Learning with Online Prototype Adaptation for Off-Road Autonomous Driving

Yafeng Bu, Zhenping Sun, Xiaohui Li, Jun Zeng, Xin Zhang, Hui Shen

TL;DR

The paper tackles off-road traversability estimation under limited labeled data by introducing a Bird’s-Eye View (BEV) based, self-supervised framework that online-adapts traversability prototypes. It generates self-supervised labels from vehicle trajectories and obstacle detections, and learns discriminative traversable versus non-traversable representations using InfoNCE loss and multi-scale prototype clustering, complemented by an unlabeled-sample PSA loss. Traversability costs are computed online through a prototype queue updated with cosine similarities, enabling a probabilistic, planning-friendly environment map and real-time operation at 10 Hz with demonstrated 5.5 km field testing. This approach improves generalization across seasons and locations and integrates naturally with downstream motion planning, offering a practical path toward robust, real-time off-road autonomy while acknowledging subjectivity in traversable assumptions and suggesting future geometry-perception fusion for objectivity.

Abstract

Achieving reliable and safe autonomous driving in off-road environments requires accurate and efficient terrain traversability analysis. However, this task faces several challenges, including the scarcity of large-scale datasets tailored for off-road scenarios, the high cost and potential errors of manual annotation, the stringent real-time requirements of motion planning, and the limited computational power of onboard units. To address these challenges, this paper proposes a novel traversability learning method that leverages self-supervised learning, eliminating the need for manual annotation. For the first time, a Birds-Eye View (BEV) representation is used as input, reducing computational burden and improving adaptability to downstream motion planning. During vehicle operation, the proposed method conducts online analysis of traversed regions and dynamically updates prototypes to adaptively assess the traversability of the current environment, effectively handling dynamic scene changes. We evaluate our approach against state-of-the-art benchmarks on both public datasets and our own dataset, covering diverse seasons and geographical locations. Experimental results demonstrate that our method significantly outperforms recent approaches. Additionally, real-world vehicle experiments show that our method operates at 10 Hz, meeting real-time requirements, while a 5.5 km autonomous driving experiment further validates the generated traversability cost maps compatibility with downstream motion planning.

Self-Supervised Traversability Learning with Online Prototype Adaptation for Off-Road Autonomous Driving

TL;DR

The paper tackles off-road traversability estimation under limited labeled data by introducing a Bird’s-Eye View (BEV) based, self-supervised framework that online-adapts traversability prototypes. It generates self-supervised labels from vehicle trajectories and obstacle detections, and learns discriminative traversable versus non-traversable representations using InfoNCE loss and multi-scale prototype clustering, complemented by an unlabeled-sample PSA loss. Traversability costs are computed online through a prototype queue updated with cosine similarities, enabling a probabilistic, planning-friendly environment map and real-time operation at 10 Hz with demonstrated 5.5 km field testing. This approach improves generalization across seasons and locations and integrates naturally with downstream motion planning, offering a practical path toward robust, real-time off-road autonomy while acknowledging subjectivity in traversable assumptions and suggesting future geometry-perception fusion for objectivity.

Abstract

Achieving reliable and safe autonomous driving in off-road environments requires accurate and efficient terrain traversability analysis. However, this task faces several challenges, including the scarcity of large-scale datasets tailored for off-road scenarios, the high cost and potential errors of manual annotation, the stringent real-time requirements of motion planning, and the limited computational power of onboard units. To address these challenges, this paper proposes a novel traversability learning method that leverages self-supervised learning, eliminating the need for manual annotation. For the first time, a Birds-Eye View (BEV) representation is used as input, reducing computational burden and improving adaptability to downstream motion planning. During vehicle operation, the proposed method conducts online analysis of traversed regions and dynamically updates prototypes to adaptively assess the traversability of the current environment, effectively handling dynamic scene changes. We evaluate our approach against state-of-the-art benchmarks on both public datasets and our own dataset, covering diverse seasons and geographical locations. Experimental results demonstrate that our method significantly outperforms recent approaches. Additionally, real-world vehicle experiments show that our method operates at 10 Hz, meeting real-time requirements, while a 5.5 km autonomous driving experiment further validates the generated traversability cost maps compatibility with downstream motion planning.

Paper Structure

This paper contains 18 sections, 14 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: (Top) Building an environmental hypothesis model based on real-time vehicle experience: Predicting the traversability of unexplored regions by extracting prototype vectors from traveled areas. (Bottom-left) RGB images captured by the onboard front-facing camera. (Bottom-middle) BEV generated by integrating LiDAR point clouds, RGB images, and odometry data. (Bottom-right) Traversability analysis results based on real-time experience.
  • Figure 2: The BEV generation process: For each frame of temporally synchronized point clouds and RGB images, colorize the point clouds using a projection matrix, and accumulate multiple frames of point clouds relying on odometry to complete the BEV generation.
  • Figure 3: Overview of training process. First, we map the vehicle trajectory and obstacle detection results into the BEV space to generate self-supervised labels for each frame of the BEV image. A ResNet-based encoder-decoder architecture is designed, where the encoder extracts multi-level features, and the decoder generates traversability feature maps. Then, we extract traversable and untraversable features, as well as features from unlabeled samples, and assign pseudo-labels to the unlabeled samples through clustering. Finally, by optimizing the combined loss function, the network parameters are updated using the backpropagation algorithm.
  • Figure 4: The online prediction process: First, the feature map is obtained through model inference, and the features of traversable regions are extracted using recorded odometry data. Then, the prototype vectors are analyzed through online clustering, and finally, the traversability map is generated by computing with the feature map.
  • Figure 5: The input BEV, ground truth, qualitative results from Schmid et al.schmid2022self , Jung et al.jung2024v , and our method on both our self-collected dataset and the RELLIS-3D dataset. We employ a probabilistic representation with a range of $[0,1]$, where darker regions indicate lower traversability probabilities and brighter regions correspond to higher traversability probabilities.
  • ...and 2 more figures