Table of Contents
Fetching ...

Model predictive control-based value estimation for efficient reinforcement learning

Qizhen Wu, Kexin Liu, Lei Chen

TL;DR

An improved RL method based on model predictive control that models the environment through a data-driven approach that demonstrates higher learning efficiency, faster convergent speed of strategies tending to the local optimal value, and less sample capacity space required by experience replay buffers is designed.

Abstract

Reinforcement learning suffers from limitations in real practices primarily due to the number of required interactions with virtual environments. It results in a challenging problem because we are implausible to obtain a local optimal strategy with only a few attempts for many learning methods. Hereby, we design an improved reinforcement learning method based on model predictive control that models the environment through a data-driven approach. Based on the learned environment model, it performs multi-step prediction to estimate the value function and optimize the policy. The method demonstrates higher learning efficiency, faster convergent speed of strategies tending to the local optimal value, and less sample capacity space required by experience replay buffers. Experimental results, both in classic databases and in a dynamic obstacle avoidance scenario for an unmanned aerial vehicle, validate the proposed approaches.

Model predictive control-based value estimation for efficient reinforcement learning

TL;DR

An improved RL method based on model predictive control that models the environment through a data-driven approach that demonstrates higher learning efficiency, faster convergent speed of strategies tending to the local optimal value, and less sample capacity space required by experience replay buffers is designed.

Abstract

Reinforcement learning suffers from limitations in real practices primarily due to the number of required interactions with virtual environments. It results in a challenging problem because we are implausible to obtain a local optimal strategy with only a few attempts for many learning methods. Hereby, we design an improved reinforcement learning method based on model predictive control that models the environment through a data-driven approach. Based on the learned environment model, it performs multi-step prediction to estimate the value function and optimize the policy. The method demonstrates higher learning efficiency, faster convergent speed of strategies tending to the local optimal value, and less sample capacity space required by experience replay buffers. Experimental results, both in classic databases and in a dynamic obstacle avoidance scenario for an unmanned aerial vehicle, validate the proposed approaches.
Paper Structure (9 sections, 19 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 9 sections, 19 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: Differences between $n$--TD and our method.
  • Figure 2: Framework of MPC--based RL.
  • Figure 3: Comparison with MPC--based RL and baseline. (a) Episode Return in CW. (b) Episode Return in CP. (c) Loss Function Value in CP. (d) Episode Return in PD. (e) Loss Function Value in PD. (f) Loss Function Value in HO. (g) Episode Return in HO.
  • Figure 4: The MDP of UAV dynamic obstacle avoidance problem.
  • Figure 5: Comparison with DDPG in UAV path--planning. (a) Episode Return. (b) Loss Function Value. (c) DDPG $7^{th}$ episode. (d) DDPG $23^{rd}$ episode. (e) DDPG--MPC $7^{th}$ episode. (f) DDPG--MPC $23^{rd}$ episode. (g) Indoor flight verification.