Table of Contents
Fetching ...

DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

Xiaoyu Huang, Yufeng Chi, Ruofeng Wang, Zhongyu Li, Xue Bin Peng, Sophia Shao, Borivoje Nikolic, Koushil Sreenath

TL;DR

DiffuseLoco tackles the challenge of learning agile, multi-skill legged locomotion from offline data by using a diffusion-based policy trained with a transformer backbone. The approach enables real-time control with delayed inputs and receding horizon planning, achieving zero-shot transfer to real quadruped and biped robots and demonstrating smooth skill transitions. Extensive real-world benchmarks show improved stability and velocity tracking over RL and non-diffusion BC baselines, with rigorous ablations validating design choices. The work suggests a scalable path for expanding offline datasets to cover more skills and morphologies, potentially incorporating richer goal conditioning and vision-language data. It also outlines practical deployment on edge hardware, highlighting the potential for large-scale, diffusion-based locomotion controllers in real-world robotics.

Abstract

This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged robot locomotion, especially with multiple skills in a single policy, presents significant challenges for prior online reinforcement learning methods. To address this challenge, we propose a novel, scalable framework that leverages diffusion models to directly learn from offline multimodal datasets with a diverse set of locomotion skills. With design choices tailored for real-time control in dynamical systems, including receding horizon control and delayed inputs, DiffuseLoco is capable of reproducing multimodality in performing various locomotion skills, zero-shot transfer to real quadrupedal robots, and it can be deployed on edge computing devices. Furthermore, DiffuseLoco demonstrates free transitions between skills and robustness against environmental variations. Through extensive benchmarking in real-world experiments, DiffuseLoco exhibits better stability and velocity tracking performance compared to prior reinforcement learning and non-diffusion-based behavior cloning baselines. The design choices are validated via comprehensive ablation studies. This work opens new possibilities for scaling up learning-based legged locomotion controllers through the scaling of large, expressive models and diverse offline datasets.

DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

TL;DR

DiffuseLoco tackles the challenge of learning agile, multi-skill legged locomotion from offline data by using a diffusion-based policy trained with a transformer backbone. The approach enables real-time control with delayed inputs and receding horizon planning, achieving zero-shot transfer to real quadruped and biped robots and demonstrating smooth skill transitions. Extensive real-world benchmarks show improved stability and velocity tracking over RL and non-diffusion BC baselines, with rigorous ablations validating design choices. The work suggests a scalable path for expanding offline datasets to cover more skills and morphologies, potentially incorporating richer goal conditioning and vision-language data. It also outlines practical deployment on edge hardware, highlighting the potential for large-scale, diffusion-based locomotion controllers in real-world robotics.

Abstract

This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged robot locomotion, especially with multiple skills in a single policy, presents significant challenges for prior online reinforcement learning methods. To address this challenge, we propose a novel, scalable framework that leverages diffusion models to directly learn from offline multimodal datasets with a diverse set of locomotion skills. With design choices tailored for real-time control in dynamical systems, including receding horizon control and delayed inputs, DiffuseLoco is capable of reproducing multimodality in performing various locomotion skills, zero-shot transfer to real quadrupedal robots, and it can be deployed on edge computing devices. Furthermore, DiffuseLoco demonstrates free transitions between skills and robustness against environmental variations. Through extensive benchmarking in real-world experiments, DiffuseLoco exhibits better stability and velocity tracking performance compared to prior reinforcement learning and non-diffusion-based behavior cloning baselines. The design choices are validated via comprehensive ablation studies. This work opens new possibilities for scaling up learning-based legged locomotion controllers through the scaling of large, expressive models and diverse offline datasets.
Paper Structure (54 sections, 5 equations, 17 figures, 3 tables, 1 algorithm)

This paper contains 54 sections, 5 equations, 17 figures, 3 tables, 1 algorithm.

Figures (17)

  • Figure 2: Overview of the three stages of DiffuseLoco. First, we generate or utilize an offline dataset with demonstrations of a set of skills gathered with different methods (left). Then, we train DiffuseLoco policy with DDPM loss on trajectories within the dataset (middle). Finally, DiffuseLoco policy is deployed on robots in the real world and executes a diverse set of agile skills (right).
  • Figure 3: The DiffuseLoco architecture. At time step $t$, it takes in a delayed $h$-step history of proprioceptive states $\mathbf{s}_{t-h-1:t-1}$, goals $\mathbf{g}_{t-h-1:t-1}$, and actions $\mathbf{a}_{t-h-2:t-2}$, and predicts a sequence of $n$ future actions $\mathbf{a}_{t:t+n}$ for the robot's actuators. First, separate MLP encoders map state and goal into embeddings which, with a one-hot diffusion step, are queried by noisy action tokens via $M$ transformer decoder layers for denoising. After $K$ denoising iterations, the predicted action sequence is generated and we feed the executed action back to the model’s input. The model is trained through end-to-end imitation learning.
  • Figure 4: Snapshots of five diverse agile locomotion skills with the DiffuseLoco policy. This represents a leading effort in developing a single policy that can combine an agile bipedal walking skill with other quadrupedal skills and can be deployed on real-world robots.
  • Figure 5: Foot contact map indicating stable walking and skill switching with DiffuseLoco policy and velocity commands. The red circle denotes the legs that are in contact with the ground. The robot initially walks using trotting skill, indicated by a purple background, then switches to pacing, shown in green, following a command change that involves a sudden stop and resume. We emphasize DiffuseLoco's ability to maintain different modalities for stable walking under the same command, switching modalities only when necessary.
  • Figure 6: Depiction of DiffuseLoco's robustness on different ground conditions and terrains: bipedal walking on (a) turf terrain, (c) vinyl composite floor, and (e) half padded floor, where the ground heights, friction and restitution forces on the two standing legs are different; quadrupedal walking on (b) turf, (d) bare floor, (f) over a thick wooden board as a variation in the terrain height.
  • ...and 12 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2