Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
Sanket Kamthe, Marc Peter Deisenroth
TL;DR
The paper tackles data-inefficient reinforcement learning for control under constraints by combining Gaussian Process dynamics with probabilistic Model Predictive Control. It introduces a deterministic reformulation of probabilistic MPC using moment matching to propagate model uncertainty and applies Pontryagin's Maximum Principle for constrained open-loop planning, enabling efficient gradient-based optimization. The approach yields state- and constraint-aware planning with short horizons, and updates the GP online to maintain robustness. Empirical results on cart-pole and double-pendulum tasks show superior data efficiency and effective constraint handling compared to PILCO and zero-variance baselines, demonstrating practical impact for learning-based control under real-world constraints.
Abstract
Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A large number of interactions may be impractical in many real-world applications, such as robotics, and many practical systems have to obey limitations in the form of state space or control constraints. To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We then use MPC to find a control sequence that minimises the expected long-term cost. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach does not only achieve state-of-the-art data efficiency, but also is a principled way for RL in constrained environments.
