Table of Contents
Fetching ...

Solving Robotics Tasks with Prior Demonstration via Exploration-Efficient Deep Reinforcement Learning

Chengyandan Shen, Christoffer Sloth

TL;DR

The paper tackles sample-inefficient, demonstration-driven robotics learning by introducing DRLR, an exploration-efficient DRL framework that integrates a calibrated Q-value-based action-selection module with SAC and a reference policy. By replacing TD3 with SAC and calibrating Q-values using demonstrations sampled from a fixed dataset, DRLR mitigates bootstrapping error and state-distribution shifts, achieving robust generalization across reward densities and high-dimensional state-action spaces. Empirical results in bucket loading and open-drawer tasks show significant improvements in exploration efficiency and final performance, with successful sim-to-real deployment on a wheel-loader task and strong robustness to demonstration quality. The work demonstrates practical benefits for real-world industrial robotics by reducing required interactions while maintaining high task performance in diverse environments.

Abstract

This paper proposes an exploration-efficient Deep Reinforcement Learning with Reference policy (DRLR) framework for learning robotics tasks that incorporates demonstrations. The DRLR framework is developed based on an algorithm called Imitation Bootstrapped Reinforcement Learning (IBRL). We propose to improve IBRL by modifying the action selection module. The proposed action selection module provides a calibrated Q-value, which mitigates the bootstrapping error that otherwise leads to inefficient exploration. Furthermore, to prevent the RL policy from converging to a sub-optimal policy, SAC is used as the RL policy instead of TD3. The effectiveness of our method in mitigating bootstrapping error and preventing overfitting is empirically validated by learning two robotics tasks: bucket loading and open drawer, which require extensive interactions with the environment. Simulation results also demonstrate the robustness of the DRLR framework across tasks with both low and high state-action dimensions, and varying demonstration qualities. To evaluate the developed framework on a real-world industrial robotics task, the bucket loading task is deployed on a real wheel loader. The sim2real results validate the successful deployment of the DRLR framework.

Solving Robotics Tasks with Prior Demonstration via Exploration-Efficient Deep Reinforcement Learning

TL;DR

The paper tackles sample-inefficient, demonstration-driven robotics learning by introducing DRLR, an exploration-efficient DRL framework that integrates a calibrated Q-value-based action-selection module with SAC and a reference policy. By replacing TD3 with SAC and calibrating Q-values using demonstrations sampled from a fixed dataset, DRLR mitigates bootstrapping error and state-distribution shifts, achieving robust generalization across reward densities and high-dimensional state-action spaces. Empirical results in bucket loading and open-drawer tasks show significant improvements in exploration efficiency and final performance, with successful sim-to-real deployment on a wheel-loader task and strong robustness to demonstration quality. The work demonstrates practical benefits for real-world industrial robotics by reducing required interactions while maintaining high task performance in diverse environments.

Abstract

This paper proposes an exploration-efficient Deep Reinforcement Learning with Reference policy (DRLR) framework for learning robotics tasks that incorporates demonstrations. The DRLR framework is developed based on an algorithm called Imitation Bootstrapped Reinforcement Learning (IBRL). We propose to improve IBRL by modifying the action selection module. The proposed action selection module provides a calibrated Q-value, which mitigates the bootstrapping error that otherwise leads to inefficient exploration. Furthermore, to prevent the RL policy from converging to a sub-optimal policy, SAC is used as the RL policy instead of TD3. The effectiveness of our method in mitigating bootstrapping error and preventing overfitting is empirically validated by learning two robotics tasks: bucket loading and open drawer, which require extensive interactions with the environment. Simulation results also demonstrate the robustness of the DRLR framework across tasks with both low and high state-action dimensions, and varying demonstration qualities. To evaluate the developed framework on a real-world industrial robotics task, the bucket loading task is deployed on a real wheel loader. The sim2real results validate the successful deployment of the DRLR framework.

Paper Structure

This paper contains 20 sections, 16 equations, 16 figures, 4 tables, 1 algorithm.

Figures (16)

  • Figure 1: Overview of the proposed exploration-efficient DRLR framework. The proposed framework extends a sample-efficient DRL-Ref method with a simple action selection module to mitigate inefficient explorations caused by (1) Bootstrapping error leads to the RL policy selecting out-of-distribution actions. (2) Ref policy fails to provide good actions under state distribution shifts.
  • Figure 2: Selected tasks for testing the proposed framework.
  • Figure 3: Exp2: the effectiveness of the proposed new action selection method with the Open Drawer task.
  • Figure 4: Exp3: Validate the effectiveness of the proposed new action selection method with the Bucket Loading task.
  • Figure 5: Exp4: Validate the effectiveness of SAC with the open drawer task.
  • ...and 11 more figures