Table of Contents
Fetching ...

Training Verifiably Robust Agents Using Set-Based Reinforcement Learning

Manuel Wendl, Lukas Koller, Tobias Ladner, Matthias Althoff

TL;DR

This work integrates set-based reachability into reinforcement learning to train verifiably robust agents for continuous control. By propagating uncertainty sets through the actor and critic and optimizing a set-based loss, the method yields policies that minimize worst-case disturbances under an $\ell_\infty$ perturbation model and provides formal safety guarantees via reachability analysis. The authors derive explicit set-based losses and gradients, compare SA-SC and SA-PC against standard and adversarial baselines, and demonstrate robustness improvements across multiple benchmarks. The approach enables safer deployment of neural controllers in safety-critical settings and offers a pathway for rigorous verification in learning-based control systems.

Abstract

Reinforcement learning often uses neural networks to solve complex control tasks. However, neural networks are sensitive to input perturbations, which makes their deployment in safety-critical environments challenging. This work lifts recent results from formally verifying neural networks against such disturbances to reinforcement learning in continuous state and action spaces using reachability analysis. While previous work mainly focuses on adversarial attacks for robust reinforcement learning, we train neural networks utilizing entire sets of perturbed inputs and maximize the worst-case reward. The obtained agents are verifiably more robust than agents obtained by related work, making them more applicable in safety-critical environments. This is demonstrated with an extensive empirical evaluation of four different benchmarks.

Training Verifiably Robust Agents Using Set-Based Reinforcement Learning

TL;DR

This work integrates set-based reachability into reinforcement learning to train verifiably robust agents for continuous control. By propagating uncertainty sets through the actor and critic and optimizing a set-based loss, the method yields policies that minimize worst-case disturbances under an perturbation model and provides formal safety guarantees via reachability analysis. The authors derive explicit set-based losses and gradients, compare SA-SC and SA-PC against standard and adversarial baselines, and demonstrate robustness improvements across multiple benchmarks. The approach enables safer deployment of neural controllers in safety-critical settings and offers a pathway for rigorous verification in learning-based control systems.

Abstract

Reinforcement learning often uses neural networks to solve complex control tasks. However, neural networks are sensitive to input perturbations, which makes their deployment in safety-critical environments challenging. This work lifts recent results from formally verifying neural networks against such disturbances to reinforcement learning in continuous state and action spaces using reachability analysis. While previous work mainly focuses on adversarial attacks for robust reinforcement learning, we train neural networks utilizing entire sets of perturbed inputs and maximize the worst-case reward. The obtained agents are verifiably more robust than agents obtained by related work, making them more applicable in safety-critical environments. This is demonstrated with an extensive empirical evaluation of four different benchmarks.
Paper Structure (26 sections, 3 theorems, 65 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 26 sections, 3 theorems, 65 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Given an input set $\mathcal{X}$, the output set of a neural network can be enclosed as:

Figures (7)

  • Figure 1: Comparison of standard and our novel set-based reinforcement learning on a navigation task. Left: Some trajectories of the standard agent intersect with the obstacle. Right: We can formally verify the safety of our robust agent.
  • Figure 2: Illustration of the structure of the deep deterministic policy gradient algorithm; ➀ and ➁ show the components that are augmented through our set-based training (\ref{['ch:setBasedRL']}).
  • Figure 3: Probability density function of a zonotope propagated through a neural network with $\operatorname{ReLU}$-activations: Exact density function obtained via sampling (blue), interval enclosure (yellow), and the density of sets obtained using \ref{['prop:setBasedForwardProp']} with uniformly distributed $\beta_j \sim \mathscr{U}(-1,1)$ (\ref{['prop:Zonotope']}) (green).
  • Figure 4: Comparison of $\underline{V}_\mu(s_0)$ for the (a) 1D Quadrocopter, (c) Navigation Task, and (d) Inverted Pendulum benchmark. The TD3 implementation is compared in (b) for the 1D Quadrocopter.
  • Figure 5: Quad. 1D: Comparison of the reachable altitudes $z$ and vertical speeds $\dot z$ for $\epsilon_\text{test}=0.15$.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Definition 1: Neural Network, bishop2006pattern
  • Definition 2: Zonotope girard2005reachability
  • Proposition 1: Neural Network Set Propagation NEURIPS2018_f2f44698
  • Proposition 2: Set-Based Regression Loss
  • proof
  • Definition 3: Set-Based Policy Gradient SA-SC
  • Definition 4: Set-Based Policy Gradient SA-PC
  • Proposition 3: Tight Expectation-Preserving Set Propagation
  • proof
  • proof
  • ...and 1 more