$\texttt{SPIN}$: distilling $\texttt{Skill-RRT}$ for long-horizon prehensile and non-prehensile manipulation
Haewon Jung, Donguk Lee, Haecheol Park, JunHyeop Kim, Beomjoon Kim
TL;DR
SPIN addresses long-horizon PNP manipulation by distilling a planner, Skill-RRT, into a fast, reactive policy through imitation learning. It introduces connectors to bridge state gaps between separately trained skills and uses Lazy Skill-RRT to efficiently generate training problems for connectors; high-quality plans are then distilled with a diffusion policy, trained on planner trajectories with noise to capture multimodality. The approach achieves zero-shot sim-to-real transfer, delivering high simulated success rates (approximately 95%, 93%, and 98% across Card Flip, Bookshelf, and Kitchen) and strong real-world performance (17/20, 18/20, 16/20) while maintaining practical inference times. This combination of planning-derived skill chaining, learned connectors, and diffusion-based imitation offers a data-efficient path to robust, real-time manipulation in contact-rich, long-horizon tasks with significant practical impact for robotic manipulation systems.
Abstract
Current robots struggle with long-horizon manipulation tasks requiring sequences of prehensile and non-prehensile skills, contact-rich interactions, and long-term reasoning. We present $\texttt{SPIN}$ ($\textbf{S}$kill $\textbf{P}$lanning to $\textbf{IN}$ference), a framework that distills a computationally intensive planning algorithm into a policy via imitation learning. We propose $\texttt{Skill-RRT}$, an extension of RRT that incorporates skill applicability checks and intermediate object pose sampling for solving such long-horizon problems. To chain independently trained skills, we introduce $\textit{connectors}$, goal-conditioned policies trained to minimize object disturbance during transitions. High-quality demonstrations are generated with $\texttt{Skill-RRT}$ and distilled through noise-based replay in order to reduce online computation time. The resulting policy, trained entirely in simulation, transfers zero-shot to the real world and achieves over 80% success across three challenging long-horizon manipulation tasks and outperforms state-of-the-art hierarchical RL and planning methods.
