Table of Contents
Fetching ...

Transferable Latent-to-Latent Locomotion Policy for Efficient and Versatile Motion Control of Diverse Legged Robots

Ziang Zheng, Guojian Zhan, Bin Shuai, Shengtao Qin, Jiangtao Li, Tao Zhang, Shengbo Eben Li

TL;DR

The paper tackles the challenge of data-efficient, cross-robot transfer in legged locomotion by introducing L3P, a latent-to-latent policy that operates in a shared latent space defined by task-specific encoders and decoders. A diffusion-based recovery module enforces informative latent representations, enabling the latent policy to transfer across morphologies with lightweight fine-tuning of interfaces. Empirical results show up to 2x improvements in zero-shot transfer and up to 3x faster adaptation on new platforms, validated in both simulation across multiple robots and real-world stair tasks. This work enables scalable, versatile locomotion control by decoupling universal latent skills from platform-specific components, paving the way for robust multi-robot deployment in diverse environments.

Abstract

Reinforcement learning (RL) has demonstrated remarkable capability in acquiring robot skills, but learning each new skill still requires substantial data collection for training. The pretrain-and-finetune paradigm offers a promising approach for efficiently adapting to new robot entities and tasks. Inspired by the idea that acquired knowledge can accelerate learning new tasks with the same robot and help a new robot master a trained task, we propose a latent training framework where a transferable latent-to-latent locomotion policy is pretrained alongside diverse task-specific observation encoders and action decoders. This policy in latent space processes encoded latent observations to generate latent actions to be decoded, with the potential to learn general abstract motion skills. To retain essential information for decision-making and control, we introduce a diffusion recovery module that minimizes information reconstruction loss during pretrain stage. During fine-tune stage, the pretrained latent-to-latent locomotion policy remains fixed, while only the lightweight task-specific encoder and decoder are optimized for efficient adaptation. Our method allows a robot to leverage its own prior experience across different tasks as well as the experience of other morphologically diverse robots to accelerate adaptation. We validate our approach through extensive simulations and real-world experiments, demonstrating that the pretrained latent-to-latent locomotion policy effectively generalizes to new robot entities and tasks with improved efficiency.

Transferable Latent-to-Latent Locomotion Policy for Efficient and Versatile Motion Control of Diverse Legged Robots

TL;DR

The paper tackles the challenge of data-efficient, cross-robot transfer in legged locomotion by introducing L3P, a latent-to-latent policy that operates in a shared latent space defined by task-specific encoders and decoders. A diffusion-based recovery module enforces informative latent representations, enabling the latent policy to transfer across morphologies with lightweight fine-tuning of interfaces. Empirical results show up to 2x improvements in zero-shot transfer and up to 3x faster adaptation on new platforms, validated in both simulation across multiple robots and real-world stair tasks. This work enables scalable, versatile locomotion control by decoupling universal latent skills from platform-specific components, paving the way for robust multi-robot deployment in diverse environments.

Abstract

Reinforcement learning (RL) has demonstrated remarkable capability in acquiring robot skills, but learning each new skill still requires substantial data collection for training. The pretrain-and-finetune paradigm offers a promising approach for efficiently adapting to new robot entities and tasks. Inspired by the idea that acquired knowledge can accelerate learning new tasks with the same robot and help a new robot master a trained task, we propose a latent training framework where a transferable latent-to-latent locomotion policy is pretrained alongside diverse task-specific observation encoders and action decoders. This policy in latent space processes encoded latent observations to generate latent actions to be decoded, with the potential to learn general abstract motion skills. To retain essential information for decision-making and control, we introduce a diffusion recovery module that minimizes information reconstruction loss during pretrain stage. During fine-tune stage, the pretrained latent-to-latent locomotion policy remains fixed, while only the lightweight task-specific encoder and decoder are optimized for efficient adaptation. Our method allows a robot to leverage its own prior experience across different tasks as well as the experience of other morphologically diverse robots to accelerate adaptation. We validate our approach through extensive simulations and real-world experiments, demonstrating that the pretrained latent-to-latent locomotion policy effectively generalizes to new robot entities and tasks with improved efficiency.

Paper Structure

This paper contains 19 sections, 1 theorem, 8 equations, 8 figures, 2 tables.

Key Result

Theorem 1

Assume that $\mathcal{L}_{policy}(h)$ is a differentiable function that measures policy performance in the latent space, and $\mathcal{L}_{rec}$ enforces the recover-ability of $z$. Then, under joint minimization of $\mathcal{L} = \mathcal{L}_{policy}(h) + \lambda\, \mathcal{L}_{rec}$, the training then $h^*$ is both discriminative for decision-making and stable in the sense that its reconstructi

Figures (8)

  • Figure 1: Biological inspiration for latent-to-latent locomotion policy.
  • Figure 2: Overview of the L3P framework, which enables transferable locomotion control across diverse legged robots. The framework consists of three key modules: (1) an observation latent encoder that maps raw sensory inputs into a shared latent space, (2) a latent policy backbone that learns a generalizable control strategy, and (3) a latent action decoder that translates latent actions into robot-specific motor commands. Recovery modules for both observation and action ensure latent space consistency. The framework is trained in three stages: (i) latent space definition and alignment, (ii) transfer from a single entity to diverse robots, and (iii) zero-shot generalization from simple to complex terrains.
  • Figure 3: Overview of the simulation playground used for training, implemented in Isaac Sim. The environment is structured such that different tracks are arranged horizontally, while terrain difficulty increases progressively in the vertical direction. As the level rises, the tasks become increasingly challenging.
  • Figure 4: Performance verification via multi-type legged robots. We conduct experiment on seven classical legged robot (A1, Go1, Go2, Anymal-B, Anymal-C, Anymal-D hutter2016anymal, Spot) and record the accomplished difficulty level during the training. Each curve is the average result between 5 seeds.
  • Figure 5: Performance verification by comparing L3P-Transfer with MLP-Transfer. Each part represents an ability of passing corresponding terrain. The value represents the final level the policy could reach in the end through insufficient training on a simple flat terrain.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1: Optimal Latent Space Alignment
  • proof