COMPASS: Cross-embodiment Mobility Policy via Residual RL and Skill Synthesis
Wei Liu, Huihua Zhao, Chenran Li, Yuchen Deng, Joydeep Biswas, Soha Pouya, Yan Chang
TL;DR
The paper tackles scalable mobility across diverse robot embodiments by proposing COMPASS, a three-stage pipeline that first learns mobility priors via imitation learning, then specializes these priors for each embodiment with residual RL, and finally distills the specialists into a single generalist policy conditioned on an embodiment embedding. By leveraging a world-model-based latent representation, COMPASS enables data-efficient adaptation and strong cross-embodiment generalization, including zero-shot sim-to-real transfer. Empirical results demonstrate large improvements in success rate (SR) and travel efficiency (WTT) for embodiment specialists and a robust generalist policy that performs comparably to specialists across multiple platforms. The approach also enables applications beyond mobility, such as open vocabulary navigation and synthetic data generation for downstream Vision-Language-Action models, highlighting practical impact for scalable, adaptable robotic systems.
Abstract
As robots are increasingly deployed in diverse application domains, enabling robust mobility across different embodiments has become a critical challenge. Classical mobility stacks, though effective on specific platforms, require extensive per-robot tuning and do not scale easily to new embodiments. Learning-based approaches, such as imitation learning (IL), offer alternatives, but face significant limitations on the need for high-quality demonstrations for each embodiment. To address these challenges, we introduce COMPASS, a unified framework that enables scalable cross-embodiment mobility using expert demonstrations from only a single embodiment. We first pre-train a mobility policy on a single robot using IL, combining a world model with a policy model. We then apply residual reinforcement learning (RL) to efficiently adapt this policy to diverse embodiments through corrective refinements. Finally, we distill specialist policies into a single generalist policy conditioned on an embodiment embedding vector. This design significantly reduces the burden of collecting data while enabling robust generalization across a wide range of robot designs. Our experiments demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy on unseen embodiments, and further demonstrates zero-shot sim-to-real transfer.
