Robust Model Based Reinforcement Learning Using $\mathcal{L}_1$ Adaptive Control
Minjun Sung, Sambhu H. Karumanchi, Aditya Gahlawat, Naira Hovakimyan
TL;DR
This work addresses robustness gaps in Model-Based Reinforcement Learning by introducing $L_1$-MBRL, a general add-on that couples an $L_1$ adaptive controller with any MBRL algorithm. The method affinizes the learned nonlinear dynamics via a first-order Taylor expansion around a nominal input to produce a control-affine model and switches to this affine approximation under a tolerance $\epsilon_a$, enabling the $L_1$ controller to compensate uncertainties without modifying the base learner. The authors provide a continuous-time theoretical bound on the prediction error, showing it can be kept within $\epsilon_l+\epsilon_a$ initially and converge to $2\epsilon_a$ as sampling time $T_s$ decreases, and they validate the approach on METRPO and MBMF across MuJoCo environments with noise, reporting improved performance and robustness, especially under aleatoric uncertainty. Overall, $L_1$-MBRL offers a practical, theoretically grounded pathway to enhance the reliability and sample efficiency of model-based policies in uncertain, real-world-like settings.
Abstract
We introduce $\mathcal{L}_1$-MBRL, a control-theoretic augmentation scheme for Model-Based Reinforcement Learning (MBRL) algorithms. Unlike model-free approaches, MBRL algorithms learn a model of the transition function using data and use it to design a control input. Our approach generates a series of approximate control-affine models of the learned transition function according to the proposed switching law. Using the approximate model, control input produced by the underlying MBRL is perturbed by the $\mathcal{L}_1$ adaptive control, which is designed to enhance the robustness of the system against uncertainties. Importantly, this approach is agnostic to the choice of MBRL algorithm, enabling the use of the scheme with various MBRL algorithms. MBRL algorithms with $\mathcal{L}_1$ augmentation exhibit enhanced performance and sample efficiency across multiple MuJoCo environments, outperforming the original MBRL algorithms, both with and without system noise.
