LiRA: Light-Robust Adversary for Model-based Reinforcement Learning in Real World

Taisuke Kobayashi

LiRA: Light-Robust Adversary for Model-based Reinforcement Learning in Real World

Taisuke Kobayashi

TL;DR

This paper tackles robustness in model-based reinforcement learning for real-world robotics by introducing LiRA, a light-robust adversary whose strength is automatically tuned during learning via a state-dependent gain. By re-deriving adversarial learning through variational inference and enforcing a per-state light-robust constraint, LiRA balances robustness with conservativeness and mitigates training collapse. The approach is supported by three practical mechanisms—restricted normalizing flows (RNF), hindsight reparameterization gradient (HRG), and midrange-mean balancing (MMB)—and validated in both simulation (worm-type robot) and real-world (quadruped) experiments, achieving moderate robustness with limited data. This work advances safer, more sample-efficient robustness for real-world model-based RL and points to integration with uncertainty-aware MPC for further improvements. Overall, LiRA demonstrates that adaptive adversarial disturbance during learning can yield robust policies without sacrificing real-time performance.

Abstract

Model-based reinforcement learning has attracted much attention due to its high sample efficiency and is expected to be applied to real-world robotic applications. In the real world, as unobservable disturbances can lead to unexpected situations, robot policies should be taken to improve not only control performance but also robustness. Adversarial learning is an effective way to improve robustness, but excessive adversary would increase the risk of malfunction, and make the control performance too conservative. Therefore, this study addresses a new adversarial learning framework to make reinforcement learning robust moderately and not conservative too much. To this end, the adversarial learning is first rederived with variational inference. In addition, \textit{light robustness}, which allows for maximizing robustness within an acceptable performance degradation, is utilized as a constraint. As a result, the proposed framework, so-called LiRA, can automatically adjust adversary level, balancing robustness and conservativeness. The expected behaviors of LiRA are confirmed in numerical simulations. In addition, LiRA succeeds in learning a force-reactive gait control of a quadrupedal robot only with real-world data collected less than two hours.

LiRA: Light-Robust Adversary for Model-based Reinforcement Learning in Real World

TL;DR

Abstract

Paper Structure (31 sections, 19 equations, 17 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 19 equations, 17 figures, 3 tables, 1 algorithm.

Introduction
Related work
Preliminaries
Model-based reinforcement learning
Adversarial learning
LiRA: Light-robust adversary
Overview
Adversarial learning with variational inference
Integration with light robustness
Implementation tricks
Restricted normalizing flows (RNF)
Hindsight reparameterization gradient (HRG)
Midrange-mean balancing (MMB)
Numerical verification
Task
...and 16 more sections

Figures (17)

Figure 1: Proposed framework, LiRA: a trainable adversary generates a disturbance to deteriorate the predictive performance of (world) model in an adversarial manner; by limiting the deterioration of the predictive performances between disturbance-marginalized and disturbance-aware models, the adversary level is automatically tuned, reverting to its prior.
Figure 2: Process to calculate loss of LiRA: the dashed lines show the non-backpropagatable signals replayed from the buffer; the blue lines and blocks show per-sample processing, while the green lines and blocks are for processing for the batch, making values of samples a scalar loss.
Figure 3: Restricted normalizing flows (RNF): when the base distribution is limited to be symmetric and the (conditional) invertible transformations are only odd functions, the mean (i.e. the center of distribution) is carried over to the converted distribution.
Figure 4: Hindsight reparameterization gradient (HRG): even if $d$ in the replay buffer has no computational graph, it can be obtained by passing through normalizing flows back and forth.
Figure 5: Midrange-mean balancing (MMB): the asymmetry in prediction loss can be captured, determining the prioritization to be minimized.
...and 12 more figures

LiRA: Light-Robust Adversary for Model-based Reinforcement Learning in Real World

TL;DR

Abstract

LiRA: Light-Robust Adversary for Model-based Reinforcement Learning in Real World

Authors

TL;DR

Abstract

Table of Contents

Figures (17)