LiRA: Light-Robust Adversary for Model-based Reinforcement Learning in Real World
Taisuke Kobayashi
TL;DR
This paper tackles robustness in model-based reinforcement learning for real-world robotics by introducing LiRA, a light-robust adversary whose strength is automatically tuned during learning via a state-dependent gain. By re-deriving adversarial learning through variational inference and enforcing a per-state light-robust constraint, LiRA balances robustness with conservativeness and mitigates training collapse. The approach is supported by three practical mechanisms—restricted normalizing flows (RNF), hindsight reparameterization gradient (HRG), and midrange-mean balancing (MMB)—and validated in both simulation (worm-type robot) and real-world (quadruped) experiments, achieving moderate robustness with limited data. This work advances safer, more sample-efficient robustness for real-world model-based RL and points to integration with uncertainty-aware MPC for further improvements. Overall, LiRA demonstrates that adaptive adversarial disturbance during learning can yield robust policies without sacrificing real-time performance.
Abstract
Model-based reinforcement learning has attracted much attention due to its high sample efficiency and is expected to be applied to real-world robotic applications. In the real world, as unobservable disturbances can lead to unexpected situations, robot policies should be taken to improve not only control performance but also robustness. Adversarial learning is an effective way to improve robustness, but excessive adversary would increase the risk of malfunction, and make the control performance too conservative. Therefore, this study addresses a new adversarial learning framework to make reinforcement learning robust moderately and not conservative too much. To this end, the adversarial learning is first rederived with variational inference. In addition, \textit{light robustness}, which allows for maximizing robustness within an acceptable performance degradation, is utilized as a constraint. As a result, the proposed framework, so-called LiRA, can automatically adjust adversary level, balancing robustness and conservativeness. The expected behaviors of LiRA are confirmed in numerical simulations. In addition, LiRA succeeds in learning a force-reactive gait control of a quadrupedal robot only with real-world data collected less than two hours.
