Table of Contents
Fetching ...

Obstacle-Aware Quadrupedal Locomotion With Resilient Multi-Modal Reinforcement Learning

I Made Aswin Nahrendra, Byeongho Yu, Minho Oh, Dongkyu Lee, Seunghyun Lee, Hyeonwoo Lee, Hyungtae Lim, Hyun Myung

TL;DR

The paper tackles robust quadrupedal locomotion in cluttered real-world environments by fusing proprioception with exteroception through a resilient multi-modal reinforcement learning framework. It introduces DreamWaQ++ with a hierarchical exteroceptive memory, a PointNet-based exteroceptive encoder, and a multi-modal mixer that conditions a PPO-based policy, augmented by estimation, VAE, contrastive losses, and a versatility objective to promote diverse skills. Across stairs, slopes, and deformable terrains, the approach delivers superior stair-climbing performance, emergent probing behaviors, and rapid OOD adaptation, all while remaining sensor-agnostic and capable of sim-to-real transfer thanks to domain randomization. The work provides interpretable latent factors that modulate gait and demonstrates potential for integration with higher-level planners and active sensing, advancing practical, autonomous legged locomotion in uncertain environments.

Abstract

Quadrupedal robots hold promising potential for applications in navigating cluttered environments with resilience akin to their animal counterparts. However, their floating base configuration makes them vulnerable to real-world uncertainties, yielding substantial challenges in their locomotion control. Deep reinforcement learning has become one of the plausible alternatives for realizing a robust locomotion controller. However, the approaches that rely solely on proprioception sacrifice collision-free locomotion because they require front-feet contact to detect the presence of stairs to adapt the locomotion gait. Meanwhile, incorporating exteroception necessitates a precisely modeled map observed by exteroceptive sensors over a period of time. Therefore, this work proposes a novel method to fuse proprioception and exteroception featuring a resilient multi-modal reinforcement learning. The proposed method yields a controller that showcases agile locomotion performance on a quadrupedal robot over a myriad of real-world courses, including rough terrains, steep slopes, and high-rise stairs, while retaining its robustness against out-of-distribution situations.

Obstacle-Aware Quadrupedal Locomotion With Resilient Multi-Modal Reinforcement Learning

TL;DR

The paper tackles robust quadrupedal locomotion in cluttered real-world environments by fusing proprioception with exteroception through a resilient multi-modal reinforcement learning framework. It introduces DreamWaQ++ with a hierarchical exteroceptive memory, a PointNet-based exteroceptive encoder, and a multi-modal mixer that conditions a PPO-based policy, augmented by estimation, VAE, contrastive losses, and a versatility objective to promote diverse skills. Across stairs, slopes, and deformable terrains, the approach delivers superior stair-climbing performance, emergent probing behaviors, and rapid OOD adaptation, all while remaining sensor-agnostic and capable of sim-to-real transfer thanks to domain randomization. The work provides interpretable latent factors that modulate gait and demonstrates potential for integration with higher-level planners and active sensing, advancing practical, autonomous legged locomotion in uncertain environments.

Abstract

Quadrupedal robots hold promising potential for applications in navigating cluttered environments with resilience akin to their animal counterparts. However, their floating base configuration makes them vulnerable to real-world uncertainties, yielding substantial challenges in their locomotion control. Deep reinforcement learning has become one of the plausible alternatives for realizing a robust locomotion controller. However, the approaches that rely solely on proprioception sacrifice collision-free locomotion because they require front-feet contact to detect the presence of stairs to adapt the locomotion gait. Meanwhile, incorporating exteroception necessitates a precisely modeled map observed by exteroceptive sensors over a period of time. Therefore, this work proposes a novel method to fuse proprioception and exteroception featuring a resilient multi-modal reinforcement learning. The proposed method yields a controller that showcases agile locomotion performance on a quadrupedal robot over a myriad of real-world courses, including rough terrains, steep slopes, and high-rise stairs, while retaining its robustness against out-of-distribution situations.
Paper Structure (64 sections, 16 equations, 24 figures, 10 tables)

This paper contains 64 sections, 16 equations, 24 figures, 10 tables.

Figures (24)

  • Figure 1: Agile locomotion on cluttered terrains. The locomotion controller trained using DreamWaQ++ allows a quadrupedal robot to perform agile and resilient locomotion over various obstacles and terrains. The controller exhibits versatile gaits such as (A) ascending and (B) descending over a flight of stairs, (C) performing a leap motion, (D) probing when faced with an uncertain dip, (E) crossing a gap, (F) adapting to unseen deformable disastrous terrain, (G) balancing on movable platforms, and (H) climbing a $35^\circ$ slope. Note that all these behaviors are embodied in a single neural network without specialized training for a particular scenario.
  • Figure 2: Walking over various stairs. (A) A head-to-head race between the proposed controller against baselines. (B) 3D map visualization of the race environment. (C) Affordance-aware locomotion when ascending stairs with rise of $25~\textrm{cm}$ on the left and $20~\textrm{cm}$ on the right side of the robot. (D) Emergent behavior to quickly and efficiently climb stairs with long foot swing motion, compared with a regular case (E) where the robot could not overcome two stair steps at once because the rear foot was located around the middle of the stair step. (F) A quantitative evaluation in the simulation against a baseline visual locomotion controller (ViL-teacher kareer2023vinl) over stairs with increasing rise levels and two different run levels. (G) The success rate is measured on each algorithm by simulating $1,\!000$ robots, which is defined as the percentage of the number of robots that reached the last stair within $10~\mathrm{s}$ over the total number of robots.
  • Figure 3: Probing into uncertain terrains. An emergent probing skill enables the robot to check the upcoming terrain when it poses a high risk and uncertainty. (A) A sequence of the robot's movement to probe the upcoming terrain. (B) Corresponding velocity commands and estimation, showing how the controller resists the given command and allocates time for the robot to check for the terrain. (C) Significant knee flexion-extension (KFE) motions indicated by a sudden change in the calf joint angle, revealing the emergent adaptive behavior as a novel probing skill. (D) The learned control policy can also be fine-tuned by further training the policy in a scenario that includes extreme stage height (see section \ref{['supp:parkour']}), leading to the emergence of a leap motion to safely traverse down a $50~\textrm{cm}$ stage.
  • Figure 4: Adaptation in out-of-distribution scenarios. (A) The robot is externally disturbed by quickly removing the platform it is stepping on. (B) An abrupt change in the robot's perception made the controller rapidly alter the robot's joints at around $t\!=\!2.5~\mathrm{s}$ (A-3) (C) enlarge the robot's support polygon for ensuring a safe and stable landing. (D) A 2D embedding visualization using pairwise controlled manifold approximation projection (PaCMAP) wang2021understanding shows how the multi-modal context dynamically changes over time and capture changes in the environment, providing informative contexts to swiftly adapt the policy. (E) A realistic scenario where the robot can quickly and robustly adapt its locomotion gait when a depth camera is accidentally detached from the robot. (F) Comparison of torque exertions when climbing a $35^\circ$ slope using (G) DreamWaQ and (H) DreamWaQ++. The annotations on top of the boxplot in (F) indicate the significance level measured using a paired $t$-test method (see supplementary section \ref{['supp:t-test']} for more details).
  • Figure 5: Exteroception-aided terrain awareness. (A) Embedding visualization of the multi-modal context encoded by the proposed context encoder in different environments using PacMAP wang2021understanding. The highly disentangled multi-modal context serves as an informative prior for informing about the environment to policy. (B) Boxplots of the multi-modal contexts in an irregular terrain, showing the distribution of embeddings activation from the multi-modal context and highlighting the contrast between activations in the exteroceptive context. (C) The scaling modulation of four strong embeddings ($41$, $42$, $55$, and $64$th embeddings) from (B) results in a real-time modulation of the robot's gait. (D) Heatmap plots of the cross-modal correlation of embedding features visualize the uncertainty measurement of the multi-modal measurements over different terrains.
  • ...and 19 more figures