Table of Contents
Fetching ...

Traversability-Aware Legged Navigation by Learning from Real-World Visual Data

Hongbo Zhang, Zhongyu Li, Xuanqi Zeng, Laura Smith, Kyle Stachowicz, Dhruv Shah, Linzhu Yue, Zhitao Song, Weipeng Xia, Sergey Levine, Koushil Sreenath, Yun-hui Liu

TL;DR

A novel real-world learning pipeline is introduced that unifies offline demonstrations, online reinforcement learning, and multimodal perception to achieve robust legged navigation through real-world interactions in diverse offroad and unstructured environments.

Abstract

The enhanced mobility brought by legged locomotion empowers quadrupedal robots to navigate through complex and unstructured environments. However, optimizing agile locomotion while accounting for the varying energy costs of traversing different terrains remains an open challenge. Most previous work focuses on planning trajectories with traversability cost estimation based on human-labeled environmental features. However, this human-centric approach is insufficient because it does not account for the varying capabilities of the robot locomotion controllers over challenging terrains. To address this, we develop a novel traversability estimator in a robot-centric manner, based on the value function of the robot's locomotion controller. This estimator is integrated into a new learning-based RGBD navigation framework. The framework employs multiple training stages to develop a planner that guides the robot in avoiding obstacles and hard-to-traverse terrains while reaching its goals. The training of the navigation planner is directly performed in the real world using a sample efficient reinforcement learning method that utilizes both online data and offline datasets. Through extensive benchmarking, we demonstrate that the proposed framework achieves the best performance in accurate traversability cost estimation and efficient learning from multi-modal data (including the robot's color and depth vision, as well as proprioceptive feedback) for real-world training. Using the proposed method, a quadrupedal robot learns to perform traversability-aware navigation through trial and error in various real-world environments with challenging terrains that are difficult to classify using depth vision alone. Moreover, the robot demonstrates the ability to generalize the learned navigation skills to unseen scenarios. Video can be found at https://youtu.be/RSqnIWZ1qks.

Traversability-Aware Legged Navigation by Learning from Real-World Visual Data

TL;DR

A novel real-world learning pipeline is introduced that unifies offline demonstrations, online reinforcement learning, and multimodal perception to achieve robust legged navigation through real-world interactions in diverse offroad and unstructured environments.

Abstract

The enhanced mobility brought by legged locomotion empowers quadrupedal robots to navigate through complex and unstructured environments. However, optimizing agile locomotion while accounting for the varying energy costs of traversing different terrains remains an open challenge. Most previous work focuses on planning trajectories with traversability cost estimation based on human-labeled environmental features. However, this human-centric approach is insufficient because it does not account for the varying capabilities of the robot locomotion controllers over challenging terrains. To address this, we develop a novel traversability estimator in a robot-centric manner, based on the value function of the robot's locomotion controller. This estimator is integrated into a new learning-based RGBD navigation framework. The framework employs multiple training stages to develop a planner that guides the robot in avoiding obstacles and hard-to-traverse terrains while reaching its goals. The training of the navigation planner is directly performed in the real world using a sample efficient reinforcement learning method that utilizes both online data and offline datasets. Through extensive benchmarking, we demonstrate that the proposed framework achieves the best performance in accurate traversability cost estimation and efficient learning from multi-modal data (including the robot's color and depth vision, as well as proprioceptive feedback) for real-world training. Using the proposed method, a quadrupedal robot learns to perform traversability-aware navigation through trial and error in various real-world environments with challenging terrains that are difficult to classify using depth vision alone. Moreover, the robot demonstrates the ability to generalize the learned navigation skills to unseen scenarios. Video can be found at https://youtu.be/RSqnIWZ1qks.

Paper Structure

This paper contains 41 sections, 3 equations, 12 figures, 1 algorithm.

Figures (12)

  • Figure 1: Our proposed framework enables a quadrupedal robot to learn to avoid hard-to-traverse terrains (such as muddy areas covered by leaves) and obstacles from its own interactions through training directly in the real world. The robot learns to utilize its onboard color and depth vision sensors to identify challenging terrains and obstacles while navigating toward the goals, marked by the blue dashed circles. The inclusion of RGB images allows the robot to identify additional terrain textures that are difficult to perceive with depth images alone during the navigation.
  • Figure 2: The proposed training framework of high-level traversability-aware planner with RGBD input is divided into four stages. In Stage 1, the robust locomotion controller $\pi^c$ tracking desired velocity commands $\mathbf{g}^c_k$ and the corresponding traversability estimator $T$ are obtained as elaborated in Fig. \ref{['trv_est']}. During Stage 2, a depth-based goal-reaching planner $\pi^p_{\text{depth}}$ is trained using RL with access to the depth input. This planner will provide demonstrations in the next stage.In Stage 3, we collect real-world rollouts into datasets from both the depth-based planner $\pi^p_{\text{depth}}$ and a human demonstrators. Data are collected in the form of transition samples. Both color and depth information are recorded. Actions $\mathbf{a}^p_{\text{demo},t}$ are recorded from demonstrators, and the online estimated traversability cost $T$ is included as part of the reward $r^p_t$. Finally, during stage 4, the proposed planning policy $\pi^p_{\text{rgbd}}$ is trained using both the samples from the offline dataset and the newly collected transition pairs using RLPD ball2023efficient.
  • Figure 3: The training framework for locomotion and traversability estimator. (i) The goal-conditioned controller (actor policy $\pi^c$) is trained to perform quadrupedal trotting gaits while tracking velocity commands $\mathbf{g}^c_k$ over various terrains. (ii) After obtaining $\pi^c$, a baseline value function $V^c_{\text{flat}}$ is trained to estimate the return while the robot is walking on flat ground using the same actor policy. (iii) To create an uncertainty-aware estimator ($V^c_{\text{terrain}}$), we add a dropout layer before the output of $V^c$ and fine-tune it on complex terrains. (iv) In real-world deployment, we perform $n$ inferences using $V^c_{\text{terrain}}$ for the same goal $\mathbf{g}_k$ and observation $\mathbf{o}^c_k$. The standard deviation $\sigma$ of these inferences measures uncertainty. The mean value is adjusted by subtracting the baseline $V^c_{\text{flat}}$ to reduce bias induced by the given goal $\mathbf{g}^c_k$. The negative of this difference is then multiplied by an adaptive weight $(a - b\sigma)$, where $a$ and $b$ are tunable parameters, resulting in the traversability estimation $T_{\text{value}}$.
  • Figure 4: The training scene (a) and three different testing scenes (b)(c)(d) in GAZEBO simulator are shown. To create environments with varying traversability, lower frictions are assigned to the areas of grey colors representing the water pool while nominal frictions are assigned to the green grasslands. Besides, various obstacles are randomly distributed inside. Policies are trained in the training scenario (a) and evaluated in testing scenarios (b), (c) and (d).
  • Figure 5: Comparison between different choices of traversability estimation against ours is shown. The proposed traversability estimation method achieves at least an 18% reduction in traversability cost across the three testing scenes compared to the other two baseline methods. The evaluation is done on three testing scenarios with ground truth traversability cost represented by the time of the robot walking in water pool areas, obtained directly from the simulator.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2