State Entropy Regularization for Robust Reinforcement Learning
Yonatan Ashlag, Uri Koren, Mirco Mutti, Esther Derman, Pierre-Luc Bacon, Shie Mannor
TL;DR
This work analyzes state entropy regularization as a robustness-promoting mechanism in reinforcement learning. It provides a rigorous characterization showing that state entropy exactly solves a reward-robust RL problem via an explicit uncertainty set and delivers nontrivial bounds under kernel uncertainty, with policy entropy offering different robustness properties. It also establishes fundamental limits: entropy regularization cannot fully solve kernel robustness and can harm risk-averse performance, with robustness benefits being sensitive to rollout budgets. Empirically, state entropy improves resilience to spatially structured perturbations in both discrete and continuous tasks, albeit requiring sufficient rollouts for reliable entropy estimation. Overall, the paper clarifies when and how state entropy regularization can enhance robustness and where its applicability is limited in practice.
Abstract
State entropy regularization has empirically shown better exploration and sample complexity in reinforcement learning (RL). However, its theoretical guarantees have not been studied. In this paper, we show that state entropy regularization improves robustness to structured and spatially correlated perturbations. These types of variation are common in transfer learning but often overlooked by standard robust RL methods, which typically focus on small, uncorrelated changes. We provide a comprehensive characterization of these robustness properties, including formal guarantees under reward and transition uncertainty, as well as settings where the method performs poorly. Much of our analysis contrasts state entropy with the widely used policy entropy regularization, highlighting their different benefits. Finally, from a practical standpoint, we illustrate that compared with policy entropy, the robustness advantages of state entropy are more sensitive to the number of rollouts used for policy evaluation.
