Reinforcement Learning Using known Invariances
Alexandru Cioba, Aya Kayal, Laura Toni, Sattar Vakili, Alberto Bernacchia
TL;DR
The paper tackles how to leverage known environmental symmetries in reinforcement learning with nonlinear function approximation by embedding invariances into kernel methods. It introduces a symmetry-aware variant of optimistic least-squares value iteration (LSVI) that uses totally invariant kernels to enforce symmetry in both rewards and transitions. Theoretical contributions include novel bounds on the maximum information gain $\Gamma_{k_G}(T)$ and the $\epsilon$-covering number for invariant RKHSs, translating into symmetry-aware sample complexity gains, along with a regret bound analysis. Empirically, the approach shows substantial sample-efficiency improvements on synthetic MDPs, a Frozen Lake variant with $D_4$ symmetry, and a 2D placement task, supporting the practical value of structural priors in RL. Overall, the work demonstrates that encoding known symmetries via invariant kernels can yield meaningful gains in learning efficiency and generalization for kernel-based RL.
Abstract
In many real-world reinforcement learning (RL) problems, the environment exhibits inherent symmetries that can be exploited to improve learning efficiency. This paper develops a theoretical and algorithmic framework for incorporating known group symmetries into kernel-based RL. We propose a symmetry-aware variant of optimistic least-squares value iteration (LSVI), which leverages invariant kernels to encode invariance in both rewards and transition dynamics. Our analysis establishes new bounds on the maximum information gain and covering numbers for invariant RKHSs, explicitly quantifying the sample efficiency gains from symmetry. Empirical results on a customized Frozen Lake environment and a 2D placement design problem confirm the theoretical improvements, demonstrating that symmetry-aware RL achieves significantly better performance than their standard kernel counterparts. These findings highlight the value of structural priors in designing more sample-efficient reinforcement learning algorithms.
