Table of Contents
Fetching ...

Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions

Rui Yang, Jie Wang, Guoping Wu, Bin Li

TL;DR

A novel robust variational Bayesian inference for offline RL (TRACER) is proposed, which introduces Bayesian inference for the first time to capture the uncertainty via offline data for robustness against all types of data corruptions and significantly outperforms several state-of-the-art approaches.

Abstract

Real-world offline datasets are often subject to data corruptions (such as noise or adversarial attacks) due to sensor failures or malicious attacks. Despite advances in robust offline reinforcement learning (RL), existing methods struggle to learn robust agents under high uncertainty caused by the diverse corrupted data (i.e., corrupted states, actions, rewards, and dynamics), leading to performance degradation in clean environments. To tackle this problem, we propose a novel robust variational Bayesian inference for offline RL (TRACER). It introduces Bayesian inference for the first time to capture the uncertainty via offline data for robustness against all types of data corruptions. Specifically, TRACER first models all corruptions as the uncertainty in the action-value function. Then, to capture such uncertainty, it uses all offline data as the observations to approximate the posterior distribution of the action-value function under a Bayesian inference framework. An appealing feature of TRACER is that it can distinguish corrupted data from clean data using an entropy-based uncertainty measure, since corrupted data often induces higher uncertainty and entropy. Based on the aforementioned measure, TRACER can regulate the loss associated with corrupted data to reduce its influence, thereby enhancing robustness and performance in clean environments. Experiments demonstrate that TRACER significantly outperforms several state-of-the-art approaches across both individual and simultaneous data corruptions.

Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions

TL;DR

A novel robust variational Bayesian inference for offline RL (TRACER) is proposed, which introduces Bayesian inference for the first time to capture the uncertainty via offline data for robustness against all types of data corruptions and significantly outperforms several state-of-the-art approaches.

Abstract

Real-world offline datasets are often subject to data corruptions (such as noise or adversarial attacks) due to sensor failures or malicious attacks. Despite advances in robust offline reinforcement learning (RL), existing methods struggle to learn robust agents under high uncertainty caused by the diverse corrupted data (i.e., corrupted states, actions, rewards, and dynamics), leading to performance degradation in clean environments. To tackle this problem, we propose a novel robust variational Bayesian inference for offline RL (TRACER). It introduces Bayesian inference for the first time to capture the uncertainty via offline data for robustness against all types of data corruptions. Specifically, TRACER first models all corruptions as the uncertainty in the action-value function. Then, to capture such uncertainty, it uses all offline data as the observations to approximate the posterior distribution of the action-value function under a Bayesian inference framework. An appealing feature of TRACER is that it can distinguish corrupted data from clean data using an entropy-based uncertainty measure, since corrupted data often induces higher uncertainty and entropy. Based on the aforementioned measure, TRACER can regulate the loss associated with corrupted data to reduce its influence, thereby enhancing robustness and performance in clean environments. Experiments demonstrate that TRACER significantly outperforms several state-of-the-art approaches across both individual and simultaneous data corruptions.

Paper Structure

This paper contains 42 sections, 2 theorems, 39 equations, 6 figures, 11 tables.

Key Result

Lemma A.3

(Performance Difference) For any $\tilde{\pi}$ and $\pi$, we have

Figures (6)

  • Figure 1: Graphical model of decision-making process. Nodes connected by solid lines denote data points in the offline dataset, while the Q values (i.e., action values) connected by dashed lines are not part of the dataset. These Q values are often objectives that offline algorithms aim to approximate.
  • Figure 2: In the left, we report the means and standard deviations on CARLA under random simultaneous corruptions. In the right, we report the results with random simultaneous corruptions against different corruption levels.
  • Figure 3: In the first column, we report the mean and standard deviation to show the superiority of using the entropy-based uncertainty measure. In the second and third columns, we report the results over three seeds to show the higher entropy of corrupted data compared to clean data during training.
  • Figure 4: Architecture of TRACER.
  • Figure 5: We report the smoothed curves of mean of entropy values for each batch in 'Walker2d-medium-replay-v2' and 'Halfcheetah-medium-replay-v2' under adversarial and random simultaneous data corruptions.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Lemma A.3
  • proof
  • Theorem A.4
  • proof