Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher, Onofrio Semeraro, Lionel Mathelin
TL;DR
This work analyzes the robustness and generalisation of policies learned via Maximum-Entropy Reinforcement Learning in chaotic PO-MDPs with Gaussian observation noise. It formalises robustness to observation noise through excess risk under noise and demonstrates that entropy regularisation correlates with improved robustness and a flatter, more regular loss landscape. The study ties robustness to learning-theory complexity measures, showing norm-based capacity metrics and the trace of the Fisher Information decrease with entropy, indicating a link between regularity and robustness. Through experiments on Lorenz and Kuramoto--Sivashinsky dynamics using PPO with varying entropy levels, it provides evidence that entropy regularisation acts as a regulariser and reduces average Fisher Information, with practical implications for designing robust entropy-regularised RL algorithms.
Abstract
The generalisation and robustness properties of policies learnt through Maximum-Entropy Reinforcement Learning are investigated on chaotic dynamical systems with Gaussian noise on the observable. First, the robustness under noise contamination of the agent's observation of entropy regularised policies is observed. Second, notions of statistical learning theory, such as complexity measures on the learnt model, are borrowed to explain and predict the phenomenon. Results show the existence of a relationship between entropy-regularised policy optimisation and robustness to noise, which can be described by the chosen complexity measures.
