ASC: Adaptive Skill Coordination for Robotic Mobile Manipulation
Naoki Yokoyama, Alex Clegg, Joanne Truong, Eric Undersander, Tsung-Yen Yang, Sergio Arnaud, Sehoon Ha, Dhruv Batra, Akshara Rai
TL;DR
Adaptive Skill Coordination (ASC) tackles long-horizon mobile manipulation by learning a set of basic visuomotor skills, a coordination policy to sequence them, and a corrective policy that adapts in out-of-distribution states. Trained entirely in simulation, ASC transfers zero-shot to the Boston Dynamics Spot robot and demonstrates robust real-world performance across eight environments without detailed maps or precise object locations. Key findings show that coordination plus correction markedly reduce hand-off failures and improve resilience to dynamic obstacles and disturbances, outperforming baselines and BD-provided APIs in long-range navigation and occluded-object grasping. The work highlights the practical viability of sim-to-real learned components for real-world, vision-based robotic manipulation with minimal environment knowledge.
Abstract
We present Adaptive Skill Coordination (ASC) -- an approach for accomplishing long-horizon tasks like mobile pick-and-place (i.e., navigating to an object, picking it, navigating to another location, and placing it). ASC consists of three components -- (1) a library of basic visuomotor skills (navigation, pick, place), (2) a skill coordination policy that chooses which skill to use when, and (3) a corrective policy that adapts pre-trained skills in out-of-distribution states. All components of ASC rely only on onboard visual and proprioceptive sensing, without requiring detailed maps with obstacle layouts or precise object locations, easing real-world deployment. We train ASC in simulated indoor environments, and deploy it zero-shot (without any real-world experience or fine-tuning) on the Boston Dynamics Spot robot in eight novel real-world environments (one apartment, one lab, two microkitchens, two lounges, one office space, one outdoor courtyard). In rigorous quantitative comparisons in two environments, ASC achieves near-perfect performance (59/60 episodes, or 98%), while sequentially executing skills succeeds in only 44/60 (73%) episodes. Extensive perturbation experiments show that ASC is robust to hand-off errors, changes in the environment layout, dynamic obstacles (e.g., people), and unexpected disturbances. Supplementary videos at adaptiveskillcoordination.github.io.
