HYDRA: Hybrid Robot Actions for Imitation Learning
Suneel Belkhale, Yuchen Cui, Dorsa Sadigh
TL;DR
HYDRA tackles imitation-learning distribution shift by introducing a two-level action representation that combines sparse waypoints with dense low-level actions and by performing offline action relabeling to boost dataset consistency. The method uses a multi-headed architecture to predict modes, waypoints, and actions, enabling dynamic switching between coarse and fine-grained control at test time. Empirical results across seven long-horizon manipulation tasks in simulation and the real world show 30-40% improvements over strong baselines, with ablations highlighting the benefits of action relabeling and hybrid action spaces. HYDRA demonstrates robust performance in challenging, real-world robotics tasks and offers a practical approach to balancing dexterity and data efficiency in imitation learning.
Abstract
Imitation Learning (IL) is a sample efficient paradigm for robot learning using expert demonstrations. However, policies learned through IL suffer from state distribution shift at test time, due to compounding errors in action prediction which lead to previously unseen states. Choosing an action representation for the policy that minimizes this distribution shift is critical in imitation learning. Prior work propose using temporal action abstractions to reduce compounding errors, but they often sacrifice policy dexterity or require domain-specific knowledge. To address these trade-offs, we introduce HYDRA, a method that leverages a hybrid action space with two levels of action abstractions: sparse high-level waypoints and dense low-level actions. HYDRA dynamically switches between action abstractions at test time to enable both coarse and fine-grained control of a robot. In addition, HYDRA employs action relabeling to increase the consistency of actions in the dataset, further reducing distribution shift. HYDRA outperforms prior imitation learning methods by 30-40% on seven challenging simulation and real world environments, involving long-horizon tasks in the real world like making coffee and toasting bread. Videos are found on our website: https://tinyurl.com/3mc6793z
