Table of Contents
Fetching ...

Guiding Reinforcement Learning with Incomplete System Dynamics

Shuyuan Wang, Jingliang Duan, Nathan P. Lawrence, Philip D. Loewen, Michael G. Forbes, R. Bhushan Gopaluni, Lixian Zhang

TL;DR

This paper addresses the intermediate situation where there is not enough model information for complete controller design, but there is enough to suggest that a model-free approach is not the best approach either and obtains an embedded controller guided by the partial model and thus improves the learning efficiency of an RL-enhanced approach.

Abstract

Model-free reinforcement learning (RL) is inherently a reactive method, operating under the assumption that it starts with no prior knowledge of the system and entirely depends on trial-and-error for learning. This approach faces several challenges, such as poor sample efficiency, generalization, and the need for well-designed reward functions to guide learning effectively. On the other hand, controllers based on complete system dynamics do not require data. This paper addresses the intermediate situation where there is not enough model information for complete controller design, but there is enough to suggest that a model-free approach is not the best approach either. By carefully decoupling known and unknown information about the system dynamics, we obtain an embedded controller guided by our partial model and thus improve the learning efficiency of an RL-enhanced approach. A modular design allows us to deploy mainstream RL algorithms to refine the policy. Simulation results show that our method significantly improves sample efficiency compared with standard RL methods on continuous control tasks, and also offers enhanced performance over traditional control approaches. Experiments on a real ground vehicle also validate the performance of our method, including generalization and robustness.

Guiding Reinforcement Learning with Incomplete System Dynamics

TL;DR

This paper addresses the intermediate situation where there is not enough model information for complete controller design, but there is enough to suggest that a model-free approach is not the best approach either and obtains an embedded controller guided by the partial model and thus improves the learning efficiency of an RL-enhanced approach.

Abstract

Model-free reinforcement learning (RL) is inherently a reactive method, operating under the assumption that it starts with no prior knowledge of the system and entirely depends on trial-and-error for learning. This approach faces several challenges, such as poor sample efficiency, generalization, and the need for well-designed reward functions to guide learning effectively. On the other hand, controllers based on complete system dynamics do not require data. This paper addresses the intermediate situation where there is not enough model information for complete controller design, but there is enough to suggest that a model-free approach is not the best approach either. By carefully decoupling known and unknown information about the system dynamics, we obtain an embedded controller guided by our partial model and thus improve the learning efficiency of an RL-enhanced approach. A modular design allows us to deploy mainstream RL algorithms to refine the policy. Simulation results show that our method significantly improves sample efficiency compared with standard RL methods on continuous control tasks, and also offers enhanced performance over traditional control approaches. Experiments on a real ground vehicle also validate the performance of our method, including generalization and robustness.

Paper Structure

This paper contains 17 sections, 1 theorem, 16 equations, 6 figures, 3 tables.

Key Result

Proposition 1

Let $P$ be the stabilizing solution of DARE, and assume that $Z_1^{-1}$ and $(R + B^\top P B)^{-1}$ exists. Then the Jacobians of the implicit function defined by DARE are given by where $Z_1,Z_2, Z_3$ are defined by and $M_1, M_2, M_3$ are defined by

Figures (6)

  • Figure 1: Dynamics model and partial model knowledge for different tasks: quadruped robot; self-driving vehicle; inverted pendulum. Green elements represent known parameters and structure; Red elements represent unknown parameters.
  • Figure 2: Schematic diagram for our policy network with partial knowledge control module inside.
  • Figure 3: Training curves on the control benchmarks. Solid lines show the mean; shaded regions show the standard deviations over five runs.
  • Figure 4: Experiment overview. The field is $6 \times 8\,$m, with $8$ cameras on top of the field capturing the pose and position of the robot. A target for the capturing system is fixed on the side of the robot.
  • Figure 5: Trajectories following our method, SAC, and the reference.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Proposition 1: east2020infinite