ViPlanner: Visual Semantic Imperative Learning for Local Navigation

Pascal Roth; Julian Nubert; Fan Yang; Mayank Mittal; Marco Hutter

ViPlanner: Visual Semantic Imperative Learning for Local Navigation

Pascal Roth, Julian Nubert, Fan Yang, Mayank Mittal, Marco Hutter

TL;DR

ViPlanner tackles real-time local navigation in outdoor, unknown environments by integrating semantic information into a learning-based planner. It uses Imperative Learning to train end-to-end with a differentiable semantic costmap, fusing depth, RGB-semantics, and goal information to produce sparse keypoints and a collision risk via a dual-headed network. Semantics are encoded as RGB colors over $N=30$ classes with distinct traversal costs, enabling robust sim-to-real transfer when trained entirely in simulation and aided by a pre-trained semantic segmentation module. Results on a quadrupedal platform demonstrate a $38.02\%$ reduction in traversability cost compared to purely geometric baselines, robustness to noisy inputs, and zero-shot sim-to-real transfer, with open-source code and models available for broader use.

Abstract

Real-time path planning in outdoor environments still challenges modern robotic systems due to differences in terrain traversability, diverse obstacles, and the necessity for fast decision-making. Established approaches have primarily focused on geometric navigation solutions, which work well for structured geometric obstacles but have limitations regarding the semantic interpretation of different terrain types and their affordances. Moreover, these methods fail to identify traversable geometric occurrences, such as stairs. To overcome these issues, we introduce ViPlanner, a learned local path planning approach that generates local plans based on geometric and semantic information. The system is trained using the Imperative Learning paradigm, for which the network weights are optimized end-to-end based on the planning task objective. This optimization uses a differentiable formulation of a semantic costmap, which enables the planner to distinguish between the traversability of different terrains and accurately identify obstacles. The semantic information is represented in 30 classes using an RGB colorspace that can effectively encode the multiple levels of traversability. We show that the planner can adapt to diverse real-world environments without requiring any real-world training. In fact, the planner is trained purely in simulation, enabling a highly scalable training data generation. Experimental results demonstrate resistance to noise, zero-shot sim-to-real transfer, and a decrease of 38.02% in terms of traversability cost compared to purely geometric-based approaches. Code and models are made publicly available: https://github.com/leggedrobotics/viplanner.

ViPlanner: Visual Semantic Imperative Learning for Local Navigation

TL;DR

classes with distinct traversal costs, enabling robust sim-to-real transfer when trained entirely in simulation and aided by a pre-trained semantic segmentation module. Results on a quadrupedal platform demonstrate a

reduction in traversability cost compared to purely geometric baselines, robustness to noisy inputs, and zero-shot sim-to-real transfer, with open-source code and models available for broader use.

Abstract

Paper Structure (14 sections, 3 figures, 1 table)

This paper contains 14 sections, 3 figures, 1 table.

Introduction
Related Work
Modular Geometric Approaches
Modular Semantic Approaches
Imitation Learning and Self-Supervised Learning
RL and Imperative Learning
Problem Formulation
Methodology
Semantic Encoding
Perception and Planning Networks
Perception Networks
Combined Feature Embedding
Planning Network
Semantic Costmap

Figures (3)

Figure 1: Quadrupedal navigation in large scale urban environments requires semantic understanding to successfully follow side- and crosswalks. Four local planning events (A - D) along the autonomously traversed path are shown. The planned path of the proposed semantic imperative planner is projected into the semantic images (middle row), whereas the estimated path of the purely geometric iPlanner yang_iplanner_2023 is overlaid onto the depth image (bottom row). The traversed path is shown in an environment reconstruction generated by open3d_slam_2022.
Figure 2: Overview of the integral components of the proposed approach. The perception and planning networks take a depth image, a semantic image, and the desired goal position as input and estimate a coarse plan together with a collision probability. The network weights and final path are jointly optimized as part of a Bi-Level Optimization scheme.
Figure :

ViPlanner: Visual Semantic Imperative Learning for Local Navigation

TL;DR

Abstract

ViPlanner: Visual Semantic Imperative Learning for Local Navigation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)