Model-based Lookahead Reinforcement Learning
Zhang-Wei Hong, Joni Pajarinen, Jan Peters
TL;DR
This work addresses the data-efficiency gap between model-free and model-based RL by unifying their strengths in a Model Predictive Control with Model-Free RL (MPC-MFRL) framework. It jointly learns a policy, a value function, and a forward dynamics model, and leverages policy-guided trajectory sampling and value-based trajectory evaluation within MPC, supplemented by a soft-greedy action selection. Empirical results on MuJoCo tasks show that MPC-MFRL achieves state-of-the-art data efficiency while matching or surpassing model-free performance, particularly on challenging tasks like Ant and HalfCheetah. The approach demonstrates robust improvements from policy-informed exploration and highlights the practical potential for data-efficient planning in robotics and complex control problems.
Abstract
Model-based Reinforcement Learning (MBRL) allows data-efficient learning which is required in real world applications such as robotics. However, despite the impressive data-efficiency, MBRL does not achieve the final performance of state-of-the-art Model-free Reinforcement Learning (MFRL) methods. We leverage the strengths of both realms and propose an approach that obtains high performance with a small amount of data. In particular, we combine MFRL and Model Predictive Control (MPC). While MFRL's strength in exploration allows us to train a better forward dynamics model for MPC, MPC improves the performance of the MFRL policy by sampling-based planning. The experimental results in standard continuous control benchmarks show that our approach can achieve MFRL`s level of performance while being as data-efficient as MBRL.
