One-Shot Dual-Arm Imitation Learning
Yilong Wang, Edward Johns
TL;DR
ODIL tackles one-shot learning for dual-arm manipulation by combining a dual-arm coordination paradigm with a three-stage visual servoing controller that first aligns to a bottleneck and then replays a single demonstrated trajectory. It leverages deep feature matching for robust visual alignment and fuses information from a global camera and a wrist camera via an Unscented Kalman Filter to achieve precise, robust localization across 4-DoF and 6-DoF tasks, even with distractors and occlusions. The method outperforms state-of-the-art one-shot imitation baselines on six real-world tasks and demonstrates resilience to scene changes, without requiring object models or additional data collection. These results indicate a practical and scalable route to data-efficient dual-arm manipulation in everyday tasks. Future work includes extending to multi-stage tasks, incorporating failure recovery, and generalizing to novel objects.
Abstract
We introduce One-Shot Dual-Arm Imitation Learning (ODIL), which enables dual-arm robots to learn precise and coordinated everyday tasks from just a single demonstration of the task. ODIL uses a new three-stage visual servoing (3-VS) method for precise alignment between the end-effector and target object, after which replay of the demonstration trajectory is sufficient to perform the task. This is achieved without requiring prior task or object knowledge, or additional data collection and training following the single demonstration. Furthermore, we propose a new dual-arm coordination paradigm for learning dual-arm tasks from a single demonstration. ODIL was tested on a real-world dual-arm robot, demonstrating state-of-the-art performance across six precise and coordinated tasks in both 4-DoF and 6-DoF settings, and showing robustness in the presence of distractor objects and partial occlusions. Videos are available at: https://www.robot-learning.uk/one-shot-dual-arm.
