Waypoint-Based Imitation Learning for Robotic Manipulation
Lucy Xiaoyang Shi, Archit Sharma, Tony Z. Zhao, Chelsea Finn
TL;DR
This work tackles the compounding-errors problem in long-horizon imitation learning by automatically extracting a minimal, linear-interpolation-based waypoint sequence from demonstrations using a dynamic-programming formulation. The Automatic Waypoint Extraction (AWE) preprocessing step reduces the decision horizon and can be plugged into behavioral cloning methods such as Diffusion Policy and ACT, yielding consistent performance gains across simulated benchmarks and real bimanual tasks with limited data. Key findings show substantial improvements in success rates (up to 25% in simulation and 4–28% in real tasks) and horizon reductions (up to 10x), with analysis highlighting the importance of DP optimization, policy expressivity, and proper error budgeting. The approach is practical, assumes no extra supervision, and demonstrates strong real-world viability, while acknowledging limitations related to proprioceptive reliance and precision requirements in certain tasks.
Abstract
While imitation learning methods have seen a resurgent interest for robotic manipulation, the well-known problem of compounding errors continues to afflict behavioral cloning (BC). Waypoints can help address this problem by reducing the horizon of the learning problem for BC, and thus, the errors compounded over time. However, waypoint labeling is underspecified, and requires additional human supervision. Can we generate waypoints automatically without any additional human supervision? Our key insight is that if a trajectory segment can be approximated by linear motion, the endpoints can be used as waypoints. We propose Automatic Waypoint Extraction (AWE) for imitation learning, a preprocessing module to decompose a demonstration into a minimal set of waypoints which when interpolated linearly can approximate the trajectory up to a specified error threshold. AWE can be combined with any BC algorithm, and we find that AWE can increase the success rate of state-of-the-art algorithms by up to 25% in simulation and by 4-28% on real-world bimanual manipulation tasks, reducing the decision making horizon by up to a factor of 10. Videos and code are available at https://lucys0.github.io/awe/
