On the Perturbed States for Transformed Input-robust Reinforcement Learning

Tung M. Luu; Haeyong Kang; Tri Ton; Thanh Nguyen; Chang D. Yoo

On the Perturbed States for Transformed Input-robust Reinforcement Learning

Tung M. Luu, Haeyong Kang, Tri Ton, Thanh Nguyen, Chang D. Yoo

TL;DR

This paper tackles the vulnerability of reinforcement learning agents to adversarial perturbations in observed states by introducing Transformed Input-robust RL (TIRL), a plug-in defense that applies input transformations to the state before policy evaluation. It formalizes two guiding principles—bounded transformations and autoencoder-styled denoising—and analyzes their impact on the RL performance gap under SA-MDP attacks, deriving a bound that ties robustness to the transformed input discrepancy. The authors instantiate TIRL with three concrete transformations: Bit-Depth Reduction (BDR), Vector Quantization (VQ), and Autoencoder-styled Denoising (AED/VAED), evaluating them with Soft Actor-Critic on five MuJoCo continuous-control tasks. Across gray-box and white-box attack scenarios, VQ generally offers the strongest robustness with manageable trade-offs in natural performance, while BDR provides a lighter defense and AED-based approaches can be vulnerable to gradient-based attackers. The work demonstrates that input transformations can substantially enhance robustness without requiring adversarial training, and highlights future opportunities to combine TIRL with other defenses and to scale to high-dimensional inputs.

Abstract

Reinforcement Learning (RL) agents demonstrating proficiency in a training environment exhibit vulnerability to adversarial perturbations in input observations during deployment. This underscores the importance of building a robust agent before its real-world deployment. To alleviate the challenging point, prior works focus on developing robust training-based procedures, encompassing efforts to fortify the deep neural network component's robustness or subject the agent to adversarial training against potent attacks. In this work, we propose a novel method referred to as Transformed Input-robust RL (TIRL), which explores another avenue to mitigate the impact of adversaries by employing input transformation-based defenses. Specifically, we introduce two principles for applying transformation-based defenses in learning robust RL agents: (1) autoencoder-styled denoising to reconstruct the original state and (2) bounded transformations (bit-depth reduction and vector quantization (VQ)) to achieve close transformed inputs. The transformations are applied to the state before feeding it into the policy network. Extensive experiments on multiple MuJoCo environments demonstrate that input transformation-based defenses, i.e., VQ, defend against several adversaries in the state observations. The official code is available at https://github.com/tunglm2203/tirl

On the Perturbed States for Transformed Input-robust Reinforcement Learning

TL;DR

Abstract

Paper Structure (21 sections, 1 theorem, 12 equations, 6 figures, 5 tables, 2 algorithms)

This paper contains 21 sections, 1 theorem, 12 equations, 6 figures, 5 tables, 2 algorithms.

Introduction
Related Work
Preliminaries
Reinforcement Learning
Training Soft Actor-Critic
Test-time Adversarial Attacks
Transformed Input-robust RL (TIRL)
Effective Input Transformation Defenses
Input Transformation Principles
Various Input Transformations
Bounded Transformation: Bit-Depth Reduction
Bounded Transformation: Vector Quantization
Autoencoder-styled Denoising
Experiments
Experimental Setup
...and 6 more sections

Key Result

Proposition 1

Consider a $K$-Lipschitz continuous policy $\pi$ parameterized by the Gaussian distribution with a constant variance independent of state for a regular MDP. Let the corresponding value function be $V^{\pi}(s)$. Define $\mathcal{T}_{t}$ and $\mathcal{T}_{d}$ as the transformations applied during trai where, $\zeta$ is a constant independent of $\pi$. Here, $\pi\circ\mathcal{T}_t$ and $\pi\circ\math

Figures (6)

Figure 1: Transformed Input-robust RL (TIRL)-Vector Quantization (VQ): Reinforcement learning with the state adversary at test time. The state $s$ is adversarially perturbed by the adversary $\Psi(s)$ into $\tilde{s}$, which is then transformed by the transformation $\mathcal{T}$ before being fed to the agent.
Figure 2: Top: Illustration of using bit-depth reduction to quantize the 1-D state space. The state is assigned to the closest point. Bottom: The current state and its perturbed versions may be assigned the same value if $\epsilon$ is not too large. A larger bin width ($bW$) led to greater robustness.
Figure 3: Illustration of using vector quantization to reduce the space of adversarial attacks in a 2-D state space. The green dots represent the centroids of clusters, while the gray dotted lines mark the boundaries between clusters. Each state is assigned to closest centroid. Fewer clusters result in a sparser state space, leading to greater robustness.
Figure 4: The five MuJoCo environments in OpenAI Gym brockman2016openai are used to evaluate the robustness for SAC-based TILR.
Figure 5: The ablation study of different bin width ($bW$) for BDR transformation in Hopper environment. We evaluate the robustness under different attacks with various $\epsilon$ scales.
...and 1 more figures

Theorems & Definitions (1)

Proposition 1

On the Perturbed States for Transformed Input-robust Reinforcement Learning

TL;DR

Abstract

On the Perturbed States for Transformed Input-robust Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (1)