Table of Contents
Fetching ...

VIBR: Learning View-Invariant Value Functions for Robust Visual Control

Tom Dupuis, Jaonary Rabarisoa, Quoc-Cuong Pham, David Filliat

TL;DR

VIBR addresses the challenge of robust visuomotor control under heavy visual perturbations by learning view-invariant value predictions without auxiliary representation-learning losses. It introduces a multi-view, risk-regularized Bellman residual objective that enforces invariance across observers while maintaining TD-learning updates. Empirical results on the Distracting Control Suite show state-of-the-art performance, strong OOD generalization, and clear benefits from including multiple views and the risk-extrapolation term. The work highlights that invariant prediction can provide a practical and efficient inductive bias for robust reinforcement learning in visually diverse environments, with applicability to sim-to-real settings where multiple views are available during training. Overall, VIBR offers a principled alternative to representation-centric invariance in RL and demonstrates notable gains in both in-distribution and out-of-distribution scenarios.

Abstract

End-to-end reinforcement learning on images showed significant progress in the recent years. Data-based approach leverage data augmentation and domain randomization while representation learning methods use auxiliary losses to learn task-relevant features. Yet, reinforcement still struggles in visually diverse environments full of distractions and spurious noise. In this work, we tackle the problem of robust visual control at its core and present VIBR (View-Invariant Bellman Residuals), a method that combines multi-view training and invariant prediction to reduce out-of-distribution (OOD) generalization gap for RL based visuomotor control. Our model-free approach improve baselines performances without the need of additional representation learning objectives and with limited additional computational cost. We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation. Our approach achieves state-of the-art results on the Distracting Control Suite benchmark, a challenging benchmark still not solved by current methods, where we evaluate the robustness to a number of visual perturbators, as well as OOD generalization and extrapolation capabilities.

VIBR: Learning View-Invariant Value Functions for Robust Visual Control

TL;DR

VIBR addresses the challenge of robust visuomotor control under heavy visual perturbations by learning view-invariant value predictions without auxiliary representation-learning losses. It introduces a multi-view, risk-regularized Bellman residual objective that enforces invariance across observers while maintaining TD-learning updates. Empirical results on the Distracting Control Suite show state-of-the-art performance, strong OOD generalization, and clear benefits from including multiple views and the risk-extrapolation term. The work highlights that invariant prediction can provide a practical and efficient inductive bias for robust reinforcement learning in visually diverse environments, with applicability to sim-to-real settings where multiple views are available during training. Overall, VIBR offers a principled alternative to representation-centric invariance in RL and demonstrates notable gains in both in-distribution and out-of-distribution scenarios.

Abstract

End-to-end reinforcement learning on images showed significant progress in the recent years. Data-based approach leverage data augmentation and domain randomization while representation learning methods use auxiliary losses to learn task-relevant features. Yet, reinforcement still struggles in visually diverse environments full of distractions and spurious noise. In this work, we tackle the problem of robust visual control at its core and present VIBR (View-Invariant Bellman Residuals), a method that combines multi-view training and invariant prediction to reduce out-of-distribution (OOD) generalization gap for RL based visuomotor control. Our model-free approach improve baselines performances without the need of additional representation learning objectives and with limited additional computational cost. We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation. Our approach achieves state-of the-art results on the Distracting Control Suite benchmark, a challenging benchmark still not solved by current methods, where we evaluate the robustness to a number of visual perturbators, as well as OOD generalization and extrapolation capabilities.
Paper Structure (46 sections, 1 theorem, 26 equations, 21 figures, 1 table, 1 algorithm)

This paper contains 46 sections, 1 theorem, 26 equations, 21 figures, 1 table, 1 algorithm.

Key Result

Proposition 3.1

Suppose $Q_\theta^\pi$ a parametrized value function. $Q_\theta^\pi \in {Q}_\Theta^{\mathrm{inv}}$ if and only if:

Figures (21)

  • Figure 1: Evaluation metrics at the end of training aggregated over all 5 curriculum benchmarks and 6 tasks of the Distracting Control Suite, 21 episodes and 4 seeds each. See section \ref{['sec:results']} for details.
  • Figure 2: (a):Loss landscape of VIBR in observation space. Given two observers $x^k, x^l$ that define training domains in $\mathcal{O}$, VIBR uses V-REx to control the ID (interpolation) and OOD (extrapolation) risks (b):Toy Experiment of VIBR loss landscape in parameter spaceRed points are individual local minima of each training domains (3). Green star is individual minimum of the testing domain (held-out). Blue square is the global minimum of ERM over training domains. White triangle is the global minimum of V-REx over training domains. See Appendix \ref{['appendix:toyExp']} and Section \ref{['sec:toyexp']} for details.
  • Figure 3: (a): Evaluation score (IQM and Generalization Gap) of VIBR and baselines over all 5 evaluation domains. Vertical bars are bootstrapped CI. (b): Effect of training curriculum on generalization.
  • Figure 4: Evaluation score over ablations and variations of VIBR. Shaded areas are bootstrapped CI.
  • Figure 5: (a): Cosine similarity between RL and auxiliary task during training of representation learninig baselines. (b): Distribution and lower Pareto frontier of VIBR loss components during training over 4 seeds. (c): Evolution of empirical inter-observer variance loss during training.
  • ...and 16 more figures

Theorems & Definitions (5)

  • Definition 2.1
  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Proposition 3.1