Pretrained Bayesian Non-parametric Knowledge Prior in Robotic Long-Horizon Reinforcement Learning
Yuan Meng, Xiangtong Yao, Kejia Chen, Yansong Wu, Liding Zhang, Zhenshan Bing, Alois Knoll
TL;DR
HELIOS addresses the challenge of learning long-horizon robotic tasks by introducing a Bayesian non-parametric skill prior based on Dirichlet Process Mixtures with birth/merge dynamics, enabling a flexible, interpretable repertoire of primitive skills. The method consists of two phases: Phase I pretrains the non-parametric prior via a VAE/GRU backbone and online DPM updates, while Phase II uses a SAC-based upstream to infer latent skill embeddings that drive a fixed decoder to generate long action sequences, guided by a KL-based prior in a maximum-entropy-like objective. Empirical results on the Franka Kitchen benchmark show that HELIOS outperforms strong baselines in both learning speed and task success, with clear visualization of well-clustered skill motifs and evidence of zero-shot adaptation to unseen subtasks. The work demonstrates that a richer non-parametric prior improves exploration, skill recombination, and generalization in complex, multi-goal robotic manipulation, offering a scalable path toward transfer learning in long-horizon tasks.$
Abstract
Reinforcement learning (RL) methods typically learn new tasks from scratch, often disregarding prior knowledge that could accelerate the learning process. While some methods incorporate previously learned skills, they usually rely on a fixed structure, such as a single Gaussian distribution, to define skill priors. This rigid assumption can restrict the diversity and flexibility of skills, particularly in complex, long-horizon tasks. In this work, we introduce a method that models potential primitive skill motions as having non-parametric properties with an unknown number of underlying features. We utilize a Bayesian non-parametric model, specifically Dirichlet Process Mixtures, enhanced with birth and merge heuristics, to pre-train a skill prior that effectively captures the diverse nature of skills. Additionally, the learned skills are explicitly trackable within the prior space, enhancing interpretability and control. By integrating this flexible skill prior into an RL framework, our approach surpasses existing methods in long-horizon manipulation tasks, enabling more efficient skill transfer and task success in complex environments. Our findings show that a richer, non-parametric representation of skill priors significantly improves both the learning and execution of challenging robotic tasks. All data, code, and videos are available at https://ghiara.github.io/HELIOS/.
