Rapid Motor Adaptation for Robotic Manipulator Arms
Yichao Liang, Kevin Ellis, João Henriques
TL;DR
The paper presents Rapid Motor Adaptation for Robotic Manipulator Arms (RMA$^2$), extending RMA to dexterous manipulation by learning geometry-aware priors via category-instance dictionaries and a depth-based adapter that infers environment embeddings from history and depth imagery. The method uses a two-phase training scheme: (i) a policy conditioned on privileged environment parameters learned with PPO, and (ii) an adapter that predicts the embedding from past observations and depth data, enabling deployment without privileged information. Across four ManiSkill2 tasks (Pick & Place on YCB/EGAD, Peg Insertion, Faucet Turning), RMA$^2$ consistently outperforms domain-randomization baselines and ablations, while approaching an Oracle upper bound. The work demonstrates improved generalization and sample efficiency, highlighting the value of geometry-aware embeddings and depth-informed adaptation for robust, real-world manipulation.
Abstract
Developing generalizable manipulation skills is a core challenge in embodied AI. This includes generalization across diverse task configurations, encompassing variations in object shape, density, friction coefficient, and external disturbances such as forces applied to the robot. Rapid Motor Adaptation (RMA) offers a promising solution to this challenge. It posits that essential hidden variables influencing an agent's task performance, such as object mass and shape, can be effectively inferred from the agent's action and proprioceptive history. Drawing inspiration from RMA in locomotion and in-hand rotation, we use depth perception to develop agents tailored for rapid motor adaptation in a variety of manipulation tasks. We evaluated our agents on four challenging tasks from the Maniskill2 benchmark, namely pick-and-place operations with hundreds of objects from the YCB and EGAD datasets, peg insertion with precise position and orientation, and operating a variety of faucets and handles, with customized environment variations. Empirical results demonstrate that our agents surpass state-of-the-art methods like automatic domain randomization and vision-based policies, obtaining better generalization performance and sample efficiency.
