Table of Contents
Fetching ...

BiRoDiff: Diffusion policies for bipedal robot locomotion on unseen terrains

GVS Mothish, Manan Tayal, Shishir Kolathaya

TL;DR

The paper addresses robust bipedal locomotion on unseen terrains by proposing BiRoDiff, a lightweight diffusion-model-based policy trained offline to produce real-time joint-angle actions conditioned on latent observations. It employs a DDPM framework with a two-network setup (latent observation encoder and denoising network) and 60 denoising steps to generate actions, enabling a single policy to exhibit multiple walking behaviors across terrains. Experiments in Isaac Gym with the Stoch-BiRo robot demonstrate generalization to flat ground, slopes, rough terrain, steps, and discrete terrains, with high sampling efficiency and competitive MSE on training and validation data. The work highlights practical significance for disaster response and exploration by offering a scalable, offline-trained, multimodal gait policy, while outlining future directions such as longer-horizon planning, hardware deployment, and vision integration.

Abstract

Locomotion on unknown terrains is essential for bipedal robots to handle novel real-world challenges, thus expanding their utility in disaster response and exploration. In this work, we introduce a lightweight framework that learns a single walking controller that yields locomotion on multiple terrains. We have designed a real-time robot controller based on diffusion models, which not only captures multiple behaviours with different velocities in a single policy but also generalizes well for unseen terrains. Our controller learns with offline data, which is better than online learning in aspects like scalability, simplicity in training scheme etc. We have designed and implemented a diffusion model-based policy controller in simulation on our custom-made Bipedal Robot model named Stoch BiRo. We have demonstrated its generalization capability and high frequency control step generation relative to typical generative models, which require huge onboarding compute.

BiRoDiff: Diffusion policies for bipedal robot locomotion on unseen terrains

TL;DR

The paper addresses robust bipedal locomotion on unseen terrains by proposing BiRoDiff, a lightweight diffusion-model-based policy trained offline to produce real-time joint-angle actions conditioned on latent observations. It employs a DDPM framework with a two-network setup (latent observation encoder and denoising network) and 60 denoising steps to generate actions, enabling a single policy to exhibit multiple walking behaviors across terrains. Experiments in Isaac Gym with the Stoch-BiRo robot demonstrate generalization to flat ground, slopes, rough terrain, steps, and discrete terrains, with high sampling efficiency and competitive MSE on training and validation data. The work highlights practical significance for disaster response and exploration by offering a scalable, offline-trained, multimodal gait policy, while outlining future directions such as longer-horizon planning, hardware deployment, and vision integration.

Abstract

Locomotion on unknown terrains is essential for bipedal robots to handle novel real-world challenges, thus expanding their utility in disaster response and exploration. In this work, we introduce a lightweight framework that learns a single walking controller that yields locomotion on multiple terrains. We have designed a real-time robot controller based on diffusion models, which not only captures multiple behaviours with different velocities in a single policy but also generalizes well for unseen terrains. Our controller learns with offline data, which is better than online learning in aspects like scalability, simplicity in training scheme etc. We have designed and implemented a diffusion model-based policy controller in simulation on our custom-made Bipedal Robot model named Stoch BiRo. We have demonstrated its generalization capability and high frequency control step generation relative to typical generative models, which require huge onboarding compute.
Paper Structure (18 sections, 6 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 18 sections, 6 equations, 5 figures, 3 tables, 2 algorithms.

Figures (5)

  • Figure 1: Stoch BiRo
  • Figure 2: Biped walking data collected on (a) flat ground and (b) slopes to be used for training the diffusion policy
  • Figure 3: Architecture of Diffusion Policy : (a)The Neural Network $\mathcal{M}$ which takes input as observations, having three hidden layers outputs the latent observations. (b) The diffusion process is represented as actions being denoised in multiple steps. A network $\bm{\epsilon_{\theta}}$ outputs the noise, which has to be separated in each step.
  • Figure 4: Walking behaviour on different terrains
  • Figure 5: Stoch-BiRo navigating across different unseen terrains using the BiRoDiff policy