Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation

Carlos Plou; Ana C. Murillo; Ruben Martinez-Cantin

Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation

Carlos Plou, Ana C. Murillo, Ruben Martinez-Cantin

TL;DR

The paper tackles data-efficient learning for robotic manipulation by introducing active exploration in Bayesian model-based RL to learn dynamics with uncertainty. It compares approximate Bayesian inference methods (Laplace, MC dropout, deep ensembles) and a Jensen-Shannon divergence-based exploration utility within a structured exploration/evaluation pipeline evaluated on realistic robotic tasks. Key contributions include a detailed comparison of inference techniques, a practical entropy-based exploration metric, and a scalable workflow that reduces real-world robot interactions while enabling transfer to multiple tasks. The findings demonstrate improved sample efficiency, better uncertainty calibration, and robust performance across continuous control and manipulation settings, advancing toward practical deployment in robotics.

Abstract

Efficiently tackling multiple tasks within complex environment, such as those found in robot manipulation, remains an ongoing challenge in robotics and an opportunity for data-driven solutions, such as reinforcement learning (RL). Model-based RL, by building a dynamic model of the robot, enables data reuse and transfer learning between tasks with the same robot and similar environment. Furthermore, data gathering in robotics is expensive and we must rely on data efficient approaches such as model-based RL, where policy learning is mostly conducted on cheaper simulations based on the learned model. Therefore, the quality of the model is fundamental for the performance of the posterior tasks. In this work, we focus on improving the quality of the model and maintaining the data efficiency by performing active learning of the dynamic model during a preliminary exploration phase based on maximize information gathering. We employ Bayesian neural network models to represent, in a probabilistic way, both the belief and information encoded in the dynamic model during exploration. With our presented strategies we manage to actively estimate the novelty of each transition, using this as the exploration reward. In this work, we compare several Bayesian inference methods for neural networks, some of which have never been used in a robotics context, and evaluate them in a realistic robot manipulation setup. Our experiments show the advantages of our Bayesian model-based RL approach, with similar quality in the results than relevant alternatives with much lower requirements regarding robot execution steps. Unlike related previous studies that focused the validation solely on toy problems, our research takes a step towards more realistic setups, tackling robotic arm end-tasks.

Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation

TL;DR

Abstract

Paper Structure (19 sections, 15 equations, 7 figures, 1 table)

This paper contains 19 sections, 15 equations, 7 figures, 1 table.

INTRODUCTION
RELATED WORK
Model-based Reinforcement Learning
Exploration and Active Learning in DRL
BAYESIAN MODEL-BASED REINFORCEMENT LEARNING
Approximate Bayesian inference
EXPLORATION FOR MBRL
Exploration as experimental design
PIPELINE
Exploration Pipeline
Evaluation Pipeline
Evaluation details
Environments
Experimental details
RESULTS
...and 4 more sections

Figures (7)

Figure 1: Overview of the active exploration problem through Bayesian model-based RL. The Bayesian model is responsible for predicting both the next state distribution and its degree of novelty. Lastly, we exploit the knowledge acquired during exploration to solve different tasks.
Figure 2: Scheme of our Bayesian model in a toy example. The plot shows the difference between the aleatoric and the epistemic uncertainty. Besides, it showcases the predictive distribution for a given pair $(s, a)$.
Figure 3: Differences among BDL methods for approximating the posterior distribution $p(\mathbf{\theta} | \mathcal{D})$. While deep ensembles and MC-dropout both yield sampling approaches around the different local maxima of the posterior, Laplace approach estimates a Gaussian distribution around its peak $\mathbf{\theta}_{MAP}$.
Figure 4: Scheme of BDL methods for approximating the predictive distribution in a neural network. While deep ensembles and MC-dropout get samples from multiple forward passes, Laplace method estimates a Gaussian through linearization technique (in the picture, the Laplace method is applied in a subnetwork).
Figure 5: Summary of both exploration and evaluation pipelines with their key stages. The starting points are marked with Init. Particularly, the exploration begins collecting data from a random policy and the evaluation starts from the buffer of trajectories collected along the exploration.
...and 2 more figures

Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation

TL;DR

Abstract

Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)