Table of Contents
Fetching ...

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

Sanket Kamthe, Marc Peter Deisenroth

TL;DR

The paper tackles data-inefficient reinforcement learning for control under constraints by combining Gaussian Process dynamics with probabilistic Model Predictive Control. It introduces a deterministic reformulation of probabilistic MPC using moment matching to propagate model uncertainty and applies Pontryagin's Maximum Principle for constrained open-loop planning, enabling efficient gradient-based optimization. The approach yields state- and constraint-aware planning with short horizons, and updates the GP online to maintain robustness. Empirical results on cart-pole and double-pendulum tasks show superior data efficiency and effective constraint handling compared to PILCO and zero-variance baselines, demonstrating practical impact for learning-based control under real-world constraints.

Abstract

Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A large number of interactions may be impractical in many real-world applications, such as robotics, and many practical systems have to obey limitations in the form of state space or control constraints. To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We then use MPC to find a control sequence that minimises the expected long-term cost. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach does not only achieve state-of-the-art data efficiency, but also is a principled way for RL in constrained environments.

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

TL;DR

The paper tackles data-inefficient reinforcement learning for control under constraints by combining Gaussian Process dynamics with probabilistic Model Predictive Control. It introduces a deterministic reformulation of probabilistic MPC using moment matching to propagate model uncertainty and applies Pontryagin's Maximum Principle for constrained open-loop planning, enabling efficient gradient-based optimization. The approach yields state- and constraint-aware planning with short horizons, and updates the GP online to maintain robustness. Empirical results on cart-pole and double-pendulum tasks show superior data efficiency and effective constraint handling compared to PILCO and zero-variance baselines, demonstrating practical impact for learning-based control under real-world constraints.

Abstract

Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A large number of interactions may be impractical in many real-world applications, such as robotics, and many practical systems have to obey limitations in the form of state space or control constraints. To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We then use MPC to find a control sequence that minimises the expected long-term cost. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach does not only achieve state-of-the-art data efficiency, but also is a principled way for RL in constrained environments.

Paper Structure

This paper contains 26 sections, 4 theorems, 29 equations, 3 figures, 1 table.

Key Result

Lemma 1

The moment matching mapping $f_{MM}$ is Lipschitz continuous for controls defined over a compact set $\mathcal{U}$.

Figures (3)

  • Figure 1: State constraints in RL benchmarks. \ref{['fig:cp constraint']} The position of the cart is constrained on the left side by a wall. \ref{['fig:dp constraint']} The angle of the inner pendulum cannot enter the grey region.
  • Figure 2: Performance of RL algorithms. Error bars represent the standard error. \ref{['fig:cartpole']} Cart-pole; \ref{['fig:doublePend']} Double pendulum. GP-MPC (blue) consistently outperforms PILCO (red) and the zero-variance MPC approach (yellow) in terms of data efficiency. While the zero-variance MPC approach works well on the cart-pole task, it fails in the double-pendulum task. We attribute this to the inability to explore the state space sufficiently well.
  • Figure 3: Performance with state-space constraints. Error bars represent the standard error. \ref{['fig:cp constraint']} Cart-pole; \ref{['fig:dp constraint']} Double pendulum.GP-MPC with chance constraints. GP-MPC-Var (blue) is the only method that is able to consistently solve the problem. Expected violations constraint GP-MPC-Mean (yellow) fails in cart-pole. PILCO (red) violates state constraints and struggles to complete the task.

Theorems & Definitions (9)

  • Remark 1
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Lemma 3