Table of Contents
Fetching ...

Pretrained Bayesian Non-parametric Knowledge Prior in Robotic Long-Horizon Reinforcement Learning

Yuan Meng, Xiangtong Yao, Kejia Chen, Yansong Wu, Liding Zhang, Zhenshan Bing, Alois Knoll

TL;DR

HELIOS addresses the challenge of learning long-horizon robotic tasks by introducing a Bayesian non-parametric skill prior based on Dirichlet Process Mixtures with birth/merge dynamics, enabling a flexible, interpretable repertoire of primitive skills. The method consists of two phases: Phase I pretrains the non-parametric prior via a VAE/GRU backbone and online DPM updates, while Phase II uses a SAC-based upstream to infer latent skill embeddings that drive a fixed decoder to generate long action sequences, guided by a KL-based prior in a maximum-entropy-like objective. Empirical results on the Franka Kitchen benchmark show that HELIOS outperforms strong baselines in both learning speed and task success, with clear visualization of well-clustered skill motifs and evidence of zero-shot adaptation to unseen subtasks. The work demonstrates that a richer non-parametric prior improves exploration, skill recombination, and generalization in complex, multi-goal robotic manipulation, offering a scalable path toward transfer learning in long-horizon tasks.$

Abstract

Reinforcement learning (RL) methods typically learn new tasks from scratch, often disregarding prior knowledge that could accelerate the learning process. While some methods incorporate previously learned skills, they usually rely on a fixed structure, such as a single Gaussian distribution, to define skill priors. This rigid assumption can restrict the diversity and flexibility of skills, particularly in complex, long-horizon tasks. In this work, we introduce a method that models potential primitive skill motions as having non-parametric properties with an unknown number of underlying features. We utilize a Bayesian non-parametric model, specifically Dirichlet Process Mixtures, enhanced with birth and merge heuristics, to pre-train a skill prior that effectively captures the diverse nature of skills. Additionally, the learned skills are explicitly trackable within the prior space, enhancing interpretability and control. By integrating this flexible skill prior into an RL framework, our approach surpasses existing methods in long-horizon manipulation tasks, enabling more efficient skill transfer and task success in complex environments. Our findings show that a richer, non-parametric representation of skill priors significantly improves both the learning and execution of challenging robotic tasks. All data, code, and videos are available at https://ghiara.github.io/HELIOS/.

Pretrained Bayesian Non-parametric Knowledge Prior in Robotic Long-Horizon Reinforcement Learning

TL;DR

HELIOS addresses the challenge of learning long-horizon robotic tasks by introducing a Bayesian non-parametric skill prior based on Dirichlet Process Mixtures with birth/merge dynamics, enabling a flexible, interpretable repertoire of primitive skills. The method consists of two phases: Phase I pretrains the non-parametric prior via a VAE/GRU backbone and online DPM updates, while Phase II uses a SAC-based upstream to infer latent skill embeddings that drive a fixed decoder to generate long action sequences, guided by a KL-based prior in a maximum-entropy-like objective. Empirical results on the Franka Kitchen benchmark show that HELIOS outperforms strong baselines in both learning speed and task success, with clear visualization of well-clustered skill motifs and evidence of zero-shot adaptation to unseen subtasks. The work demonstrates that a richer non-parametric prior improves exploration, skill recombination, and generalization in complex, multi-goal robotic manipulation, offering a scalable path toward transfer learning in long-horizon tasks.$

Abstract

Reinforcement learning (RL) methods typically learn new tasks from scratch, often disregarding prior knowledge that could accelerate the learning process. While some methods incorporate previously learned skills, they usually rely on a fixed structure, such as a single Gaussian distribution, to define skill priors. This rigid assumption can restrict the diversity and flexibility of skills, particularly in complex, long-horizon tasks. In this work, we introduce a method that models potential primitive skill motions as having non-parametric properties with an unknown number of underlying features. We utilize a Bayesian non-parametric model, specifically Dirichlet Process Mixtures, enhanced with birth and merge heuristics, to pre-train a skill prior that effectively captures the diverse nature of skills. Additionally, the learned skills are explicitly trackable within the prior space, enhancing interpretability and control. By integrating this flexible skill prior into an RL framework, our approach surpasses existing methods in long-horizon manipulation tasks, enabling more efficient skill transfer and task success in complex environments. Our findings show that a richer, non-parametric representation of skill priors significantly improves both the learning and execution of challenging robotic tasks. All data, code, and videos are available at https://ghiara.github.io/HELIOS/.

Paper Structure

This paper contains 17 sections, 8 equations, 6 figures.

Figures (6)

  • Figure 1: HELIOS Framework overview. The training process is divided into two phases. In Phase I, a VAE with GRU modules is used to pre-train a skill representation model from a dataset of action trajectories. The model leverages a DPM to capture the non-parametric nature of skill priors, aiding in learning precise action patterns and subsequent effective task representations. In Phase II, this pre-trained skill decoder and prior are deployed within a RL framework to address long-horizon manipulation tasks. Here, the upstream inference model uses soft actor-critic structure to learn specific task reasoning, ensuring the successful execution of complex, extended long-horizon tasks.
  • Figure 2: Skill distribution assumption of our proposed framework.
  • Figure 3: The evaluation of Bayesian non-parametric skill prior. a, The total training loss ($\mathcal{L}_{total}$) observed during the skill prior pretraining phase. b, The evolution of the number of generated clusters in the Bayesian non-parametric skill prior space throughout training. We conduct at least five trials and report the mean and standard deviation ($\mu\pm\sigma$). c, t-SNE projection of the DPM-based skill prior in the final epoch.
  • Figure 4: Average reward of total long-horizon manipulation task. In the Franka-Kitchen Benchmark, we use sparse rewards to train the agent, awarding a score of 1 only for successfully completed subtasks and 0 otherwise. For each model, we run at least five trials, reporting the average reward and standard deviation ($\mu\pm\sigma$) for comparison.
  • Figure 5: Long-horizon manipulation task performance. a, Snapshots of each sub-task, demonstrating that the agent efficiently completes all assigned subtasks. b, The average skill ratios applied across the entire task, based on the results from at least five trials. c, Snapshots of each primitive skill motion. Our Bayesian non-parametric prior captures seven base skills: "pick", "place", "pull", "rotate", "toggle", and "explore" movements (both left and right directions). d, t-SNE projections of each primitive skill motion, showing the re-encoded skills through the pre-trained skill encoder $q(z|\bm{a}_i)$ and their corresponding assignments in the Bayesian non-parametric knowledge space.
  • ...and 1 more figures