Transferable Latent-to-Latent Locomotion Policy for Efficient and Versatile Motion Control of Diverse Legged Robots
Ziang Zheng, Guojian Zhan, Bin Shuai, Shengtao Qin, Jiangtao Li, Tao Zhang, Shengbo Eben Li
TL;DR
The paper tackles the challenge of data-efficient, cross-robot transfer in legged locomotion by introducing L3P, a latent-to-latent policy that operates in a shared latent space defined by task-specific encoders and decoders. A diffusion-based recovery module enforces informative latent representations, enabling the latent policy to transfer across morphologies with lightweight fine-tuning of interfaces. Empirical results show up to 2x improvements in zero-shot transfer and up to 3x faster adaptation on new platforms, validated in both simulation across multiple robots and real-world stair tasks. This work enables scalable, versatile locomotion control by decoupling universal latent skills from platform-specific components, paving the way for robust multi-robot deployment in diverse environments.
Abstract
Reinforcement learning (RL) has demonstrated remarkable capability in acquiring robot skills, but learning each new skill still requires substantial data collection for training. The pretrain-and-finetune paradigm offers a promising approach for efficiently adapting to new robot entities and tasks. Inspired by the idea that acquired knowledge can accelerate learning new tasks with the same robot and help a new robot master a trained task, we propose a latent training framework where a transferable latent-to-latent locomotion policy is pretrained alongside diverse task-specific observation encoders and action decoders. This policy in latent space processes encoded latent observations to generate latent actions to be decoded, with the potential to learn general abstract motion skills. To retain essential information for decision-making and control, we introduce a diffusion recovery module that minimizes information reconstruction loss during pretrain stage. During fine-tune stage, the pretrained latent-to-latent locomotion policy remains fixed, while only the lightweight task-specific encoder and decoder are optimized for efficient adaptation. Our method allows a robot to leverage its own prior experience across different tasks as well as the experience of other morphologically diverse robots to accelerate adaptation. We validate our approach through extensive simulations and real-world experiments, demonstrating that the pretrained latent-to-latent locomotion policy effectively generalizes to new robot entities and tasks with improved efficiency.
