On the Perturbed States for Transformed Input-robust Reinforcement Learning
Tung M. Luu, Haeyong Kang, Tri Ton, Thanh Nguyen, Chang D. Yoo
TL;DR
This paper tackles the vulnerability of reinforcement learning agents to adversarial perturbations in observed states by introducing Transformed Input-robust RL (TIRL), a plug-in defense that applies input transformations to the state before policy evaluation. It formalizes two guiding principles—bounded transformations and autoencoder-styled denoising—and analyzes their impact on the RL performance gap under SA-MDP attacks, deriving a bound that ties robustness to the transformed input discrepancy. The authors instantiate TIRL with three concrete transformations: Bit-Depth Reduction (BDR), Vector Quantization (VQ), and Autoencoder-styled Denoising (AED/VAED), evaluating them with Soft Actor-Critic on five MuJoCo continuous-control tasks. Across gray-box and white-box attack scenarios, VQ generally offers the strongest robustness with manageable trade-offs in natural performance, while BDR provides a lighter defense and AED-based approaches can be vulnerable to gradient-based attackers. The work demonstrates that input transformations can substantially enhance robustness without requiring adversarial training, and highlights future opportunities to combine TIRL with other defenses and to scale to high-dimensional inputs.
Abstract
Reinforcement Learning (RL) agents demonstrating proficiency in a training environment exhibit vulnerability to adversarial perturbations in input observations during deployment. This underscores the importance of building a robust agent before its real-world deployment. To alleviate the challenging point, prior works focus on developing robust training-based procedures, encompassing efforts to fortify the deep neural network component's robustness or subject the agent to adversarial training against potent attacks. In this work, we propose a novel method referred to as Transformed Input-robust RL (TIRL), which explores another avenue to mitigate the impact of adversaries by employing input transformation-based defenses. Specifically, we introduce two principles for applying transformation-based defenses in learning robust RL agents: (1) autoencoder-styled denoising to reconstruct the original state and (2) bounded transformations (bit-depth reduction and vector quantization (VQ)) to achieve close transformed inputs. The transformations are applied to the state before feeding it into the policy network. Extensive experiments on multiple MuJoCo environments demonstrate that input transformation-based defenses, i.e., VQ, defend against several adversaries in the state observations. The official code is available at https://github.com/tunglm2203/tirl
