Table of Contents
Fetching ...

Robust Model Based Reinforcement Learning Using $\mathcal{L}_1$ Adaptive Control

Minjun Sung, Sambhu H. Karumanchi, Aditya Gahlawat, Naira Hovakimyan

TL;DR

This work addresses robustness gaps in Model-Based Reinforcement Learning by introducing $L_1$-MBRL, a general add-on that couples an $L_1$ adaptive controller with any MBRL algorithm. The method affinizes the learned nonlinear dynamics via a first-order Taylor expansion around a nominal input to produce a control-affine model and switches to this affine approximation under a tolerance $\epsilon_a$, enabling the $L_1$ controller to compensate uncertainties without modifying the base learner. The authors provide a continuous-time theoretical bound on the prediction error, showing it can be kept within $\epsilon_l+\epsilon_a$ initially and converge to $2\epsilon_a$ as sampling time $T_s$ decreases, and they validate the approach on METRPO and MBMF across MuJoCo environments with noise, reporting improved performance and robustness, especially under aleatoric uncertainty. Overall, $L_1$-MBRL offers a practical, theoretically grounded pathway to enhance the reliability and sample efficiency of model-based policies in uncertain, real-world-like settings.

Abstract

We introduce $\mathcal{L}_1$-MBRL, a control-theoretic augmentation scheme for Model-Based Reinforcement Learning (MBRL) algorithms. Unlike model-free approaches, MBRL algorithms learn a model of the transition function using data and use it to design a control input. Our approach generates a series of approximate control-affine models of the learned transition function according to the proposed switching law. Using the approximate model, control input produced by the underlying MBRL is perturbed by the $\mathcal{L}_1$ adaptive control, which is designed to enhance the robustness of the system against uncertainties. Importantly, this approach is agnostic to the choice of MBRL algorithm, enabling the use of the scheme with various MBRL algorithms. MBRL algorithms with $\mathcal{L}_1$ augmentation exhibit enhanced performance and sample efficiency across multiple MuJoCo environments, outperforming the original MBRL algorithms, both with and without system noise.

Robust Model Based Reinforcement Learning Using $\mathcal{L}_1$ Adaptive Control

TL;DR

This work addresses robustness gaps in Model-Based Reinforcement Learning by introducing -MBRL, a general add-on that couples an adaptive controller with any MBRL algorithm. The method affinizes the learned nonlinear dynamics via a first-order Taylor expansion around a nominal input to produce a control-affine model and switches to this affine approximation under a tolerance , enabling the controller to compensate uncertainties without modifying the base learner. The authors provide a continuous-time theoretical bound on the prediction error, showing it can be kept within initially and converge to as sampling time decreases, and they validate the approach on METRPO and MBMF across MuJoCo environments with noise, reporting improved performance and robustness, especially under aleatoric uncertainty. Overall, -MBRL offers a practical, theoretically grounded pathway to enhance the reliability and sample efficiency of model-based policies in uncertain, real-world-like settings.

Abstract

We introduce -MBRL, a control-theoretic augmentation scheme for Model-Based Reinforcement Learning (MBRL) algorithms. Unlike model-free approaches, MBRL algorithms learn a model of the transition function using data and use it to design a control input. Our approach generates a series of approximate control-affine models of the learned transition function according to the proposed switching law. Using the approximate model, control input produced by the underlying MBRL is perturbed by the adaptive control, which is designed to enhance the robustness of the system against uncertainties. Importantly, this approach is agnostic to the choice of MBRL algorithm, enabling the use of the scheme with various MBRL algorithms. MBRL algorithms with augmentation exhibit enhanced performance and sample efficiency across multiple MuJoCo environments, outperforming the original MBRL algorithms, both with and without system noise.
Paper Structure (24 sections, 1 theorem, 45 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 24 sections, 1 theorem, 45 equations, 9 figures, 5 tables, 2 algorithms.

Key Result

Theorem 1

Consider the system described by Equation (eq:ground_truth_nonlinear), and its learned control-affine representation in Equation (eq:continuous_control_affine). Additionally, assume that the system is operating under the augmented feedback control presented in Equation (eq:final_augmented_input). Le where $0< T_s < t_{\max} \leq H < \infty$, and $H$ is the known bounded horizon (see Sec. subsec:M

Figures (9)

  • Figure 1: $\mathcal{L}_1$-MBRL Framework. The policy box $\pi_\phi(\cdot|x_t)$ includes policy update and control input sampling for each time step. Although this figure illustrates an on-policy MBRL setting with a parameterized $\pi_\phi$ to provide a simple visualization, the framework is not limited to such class and can also be applied to off-policy algorithms or without a parameterized policy.
  • Figure 2: Comparison of performance between fully nonlinear and control-affine model on the Halfcheetah environment using METRPO.The control-affine model failed to learn the Halfcheetah dynamics.
  • Figure 3: Contribution of $\mathcal{L}_1$ in the training and testing phase. The notation $\mathcal{L}_1$ on (off)-on (off) indicates $\mathcal{L}_1$ is applied (not applied) during training-testing, respectively. The error bar ranges for one standard deviation of the performance. On-on and off-off correspond to our main result in Table \ref{['tab:main_result']}. As expected, the on-on case achieved the highest performance in most scenarios.
  • Figure 4: The architecture of $\mathcal{L}_1$ adaptive controller.
  • Figure 5: Plots of $\mathcal{L}_1$ -METRPO learning curves as a function of episodic steps. The performance is averaged across multiple random seeds such that the solid lines indicate the average return at the corresponding timestep, and the shaded regions indicate one standard deviation.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Remark 1
  • Theorem 1
  • proof : Proof of Theorem \ref{['theorem']}.
  • Remark 2