Table of Contents
Fetching ...

Deep Incremental Model Informed Reinforcement Learning for Continuous Robotic Control

Cong Li

TL;DR

The paper addresses the poor data efficiency of model-free RL in continuous robotic control by introducing a one-step backward data (OSBK) driven deep incremental model. It reformulates dynamics as $s_{t+1}=s_t+L_t\,\Delta a_t$ with $\Delta a_t=a_t-a_{t-1}$ and learns $L_t$ via a neural network, enabling imagined data generation in a Dyna-style framework and online residual policy fine-tuning. Empirical results on Mujoco benchmarks show substantially faster learning than SAC and competitive performance with MBPO and MnM, highlighting scalability to higher-dimensional robots. This physics-informed approach provides a data-efficient, adaptable MBRL framework that can leverage control-theoretic structure for improved sample efficiency and online refinement.

Abstract

Model-based reinforcement learning attempts to use an available or learned model to improve the data efficiency of reinforcement learning. This work proposes a one-step lookback approach that jointly learns the deep incremental model and the policy to realize the sample-efficient continuous robotic control, wherein the control-theoretical knowledge is utilized to decrease the model learning difficulty and facilitate efficient training. Specifically, we use one-step backward data to facilitate the deep incremental model, an alternative structured representation of the robotic evolution model, that accurately predicts the robotic movement but with low sample complexity. This is because the formulated deep incremental model degrades the model learning difficulty into a parametric matrix learning problem, which is especially favourable to high-dimensional robotic applications. The imagined data from the learned deep incremental model is used to supplement training data to enhance the sample efficiency. Comparative numerical simulations on benchmark continuous robotics control problems are conducted to validate the efficiency of our proposed one-step lookback approach.

Deep Incremental Model Informed Reinforcement Learning for Continuous Robotic Control

TL;DR

The paper addresses the poor data efficiency of model-free RL in continuous robotic control by introducing a one-step backward data (OSBK) driven deep incremental model. It reformulates dynamics as with and learns via a neural network, enabling imagined data generation in a Dyna-style framework and online residual policy fine-tuning. Empirical results on Mujoco benchmarks show substantially faster learning than SAC and competitive performance with MBPO and MnM, highlighting scalability to higher-dimensional robots. This physics-informed approach provides a data-efficient, adaptable MBRL framework that can leverage control-theoretic structure for improved sample efficiency and online refinement.

Abstract

Model-based reinforcement learning attempts to use an available or learned model to improve the data efficiency of reinforcement learning. This work proposes a one-step lookback approach that jointly learns the deep incremental model and the policy to realize the sample-efficient continuous robotic control, wherein the control-theoretical knowledge is utilized to decrease the model learning difficulty and facilitate efficient training. Specifically, we use one-step backward data to facilitate the deep incremental model, an alternative structured representation of the robotic evolution model, that accurately predicts the robotic movement but with low sample complexity. This is because the formulated deep incremental model degrades the model learning difficulty into a parametric matrix learning problem, which is especially favourable to high-dimensional robotic applications. The imagined data from the learned deep incremental model is used to supplement training data to enhance the sample efficiency. Comparative numerical simulations on benchmark continuous robotics control problems are conducted to validate the efficiency of our proposed one-step lookback approach.
Paper Structure (13 sections, 9 equations, 2 figures, 1 algorithm)

This paper contains 13 sections, 9 equations, 2 figures, 1 algorithm.

Figures (2)

  • Figure 1: The Mujoco benchmark continuous control tasks.
  • Figure 2: The learning curves of SAC, MBPO, MnM and our approach

Theorems & Definitions (6)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Remark 6