Table of Contents
Fetching ...

LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots

Peilin Wu, Weiji Xie, Jiahang Cao, Hang Lai, Weinan Zhang

TL;DR

LoopSR tackles the sim-to-real gap in legged robotics by lifelong policy adaptation that loops real-world data back into simulated training. It uses a transformer-based trajectory encoder to map real demonstrations into a latent representation and an autoencoder with contrastive learning to derive a robust dynamics model, forming a digital twin of the real world. Environment parameters for continual training are inferred by combining retrieval-based estimates with decoder outputs to reconfigure the simulator, enabling data-efficient adaptation. Evaluations in IsaacGym and on a real Unitree A1 demonstrate that LoopSR outperforms zero-shot baselines and approaches expert performance with limited real data, validating lifelong adaptation as a practical path to robust legged locomotion.

Abstract

Reinforcement Learning (RL) has shown its remarkable and generalizable capability in legged locomotion through sim-to-real transfer. However, while adaptive methods like domain randomization are expected to enhance policy robustness across diverse environments, they potentially compromise the policy's performance in any specific environment, leading to suboptimal real-world deployment due to the No Free Lunch theorem. To address this, we propose LoopSR, a lifelong policy adaptation framework that continuously refines RL policies in the post-deployment stage. LoopSR employs a transformer-based encoder to map real-world trajectories into a latent space and reconstruct a digital twin of the real world for further improvement. Autoencoder architecture and contrastive learning methods are adopted to enhance feature extraction of real-world dynamics. Simulation parameters for continual training are derived by combining predicted values from the decoder with retrieved parameters from a pre-collected simulation trajectory dataset. By leveraging simulated continual training, LoopSR achieves superior data efficiency compared with strong baselines, yielding eminent performance with limited data in both sim-to-sim and sim-to-real experiments. Please refer to https://peilinwu.site/looping-sim-and-real.github.io/ for videos and code.

LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots

TL;DR

LoopSR tackles the sim-to-real gap in legged robotics by lifelong policy adaptation that loops real-world data back into simulated training. It uses a transformer-based trajectory encoder to map real demonstrations into a latent representation and an autoencoder with contrastive learning to derive a robust dynamics model, forming a digital twin of the real world. Environment parameters for continual training are inferred by combining retrieval-based estimates with decoder outputs to reconfigure the simulator, enabling data-efficient adaptation. Evaluations in IsaacGym and on a real Unitree A1 demonstrate that LoopSR outperforms zero-shot baselines and approaches expert performance with limited real data, validating lifelong adaptation as a practical path to robust legged locomotion.

Abstract

Reinforcement Learning (RL) has shown its remarkable and generalizable capability in legged locomotion through sim-to-real transfer. However, while adaptive methods like domain randomization are expected to enhance policy robustness across diverse environments, they potentially compromise the policy's performance in any specific environment, leading to suboptimal real-world deployment due to the No Free Lunch theorem. To address this, we propose LoopSR, a lifelong policy adaptation framework that continuously refines RL policies in the post-deployment stage. LoopSR employs a transformer-based encoder to map real-world trajectories into a latent space and reconstruct a digital twin of the real world for further improvement. Autoencoder architecture and contrastive learning methods are adopted to enhance feature extraction of real-world dynamics. Simulation parameters for continual training are derived by combining predicted values from the decoder with retrieved parameters from a pre-collected simulation trajectory dataset. By leveraging simulated continual training, LoopSR achieves superior data efficiency compared with strong baselines, yielding eminent performance with limited data in both sim-to-sim and sim-to-real experiments. Please refer to https://peilinwu.site/looping-sim-and-real.github.io/ for videos and code.
Paper Structure (17 sections, 8 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 8 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of our proposed LoopSR.
  • Figure 2: Illustration of the core network architecture. The whole trajectory is input to a transformer backbone similar to DT, with output $z$ derived from average pooling. The decoders are separately designed to disentangle different traits from the trajectory.
  • Figure 3: Continual training curve with Origin (DreamWaQ) baseline.
  • Figure 4: Visualization of real-world application scenes.
  • Figure 5: Visualization of simulation gaits and contact phase.
  • ...and 1 more figures