Table of Contents
Fetching ...

Reinforcement Learning Using known Invariances

Alexandru Cioba, Aya Kayal, Laura Toni, Sattar Vakili, Alberto Bernacchia

TL;DR

The paper tackles how to leverage known environmental symmetries in reinforcement learning with nonlinear function approximation by embedding invariances into kernel methods. It introduces a symmetry-aware variant of optimistic least-squares value iteration (LSVI) that uses totally invariant kernels to enforce symmetry in both rewards and transitions. Theoretical contributions include novel bounds on the maximum information gain $\Gamma_{k_G}(T)$ and the $\epsilon$-covering number for invariant RKHSs, translating into symmetry-aware sample complexity gains, along with a regret bound analysis. Empirically, the approach shows substantial sample-efficiency improvements on synthetic MDPs, a Frozen Lake variant with $D_4$ symmetry, and a 2D placement task, supporting the practical value of structural priors in RL. Overall, the work demonstrates that encoding known symmetries via invariant kernels can yield meaningful gains in learning efficiency and generalization for kernel-based RL.

Abstract

In many real-world reinforcement learning (RL) problems, the environment exhibits inherent symmetries that can be exploited to improve learning efficiency. This paper develops a theoretical and algorithmic framework for incorporating known group symmetries into kernel-based RL. We propose a symmetry-aware variant of optimistic least-squares value iteration (LSVI), which leverages invariant kernels to encode invariance in both rewards and transition dynamics. Our analysis establishes new bounds on the maximum information gain and covering numbers for invariant RKHSs, explicitly quantifying the sample efficiency gains from symmetry. Empirical results on a customized Frozen Lake environment and a 2D placement design problem confirm the theoretical improvements, demonstrating that symmetry-aware RL achieves significantly better performance than their standard kernel counterparts. These findings highlight the value of structural priors in designing more sample-efficient reinforcement learning algorithms.

Reinforcement Learning Using known Invariances

TL;DR

The paper tackles how to leverage known environmental symmetries in reinforcement learning with nonlinear function approximation by embedding invariances into kernel methods. It introduces a symmetry-aware variant of optimistic least-squares value iteration (LSVI) that uses totally invariant kernels to enforce symmetry in both rewards and transitions. Theoretical contributions include novel bounds on the maximum information gain and the -covering number for invariant RKHSs, translating into symmetry-aware sample complexity gains, along with a regret bound analysis. Empirically, the approach shows substantial sample-efficiency improvements on synthetic MDPs, a Frozen Lake variant with symmetry, and a 2D placement task, supporting the practical value of structural priors in RL. Overall, the work demonstrates that encoding known symmetries via invariant kernels can yield meaningful gains in learning efficiency and generalization for kernel-based RL.

Abstract

In many real-world reinforcement learning (RL) problems, the environment exhibits inherent symmetries that can be exploited to improve learning efficiency. This paper develops a theoretical and algorithmic framework for incorporating known group symmetries into kernel-based RL. We propose a symmetry-aware variant of optimistic least-squares value iteration (LSVI), which leverages invariant kernels to encode invariance in both rewards and transition dynamics. Our analysis establishes new bounds on the maximum information gain and covering numbers for invariant RKHSs, explicitly quantifying the sample efficiency gains from symmetry. Empirical results on a customized Frozen Lake environment and a 2D placement design problem confirm the theoretical improvements, demonstrating that symmetry-aware RL achieves significantly better performance than their standard kernel counterparts. These findings highlight the value of structural priors in designing more sample-efficient reinforcement learning algorithms.

Paper Structure

This paper contains 44 sections, 9 theorems, 51 equations, 5 figures, 1 table, 2 algorithms.

Key Result

Proposition 1

In finite-horizon MDPs, equivariant policies have invariant value functions $V_\pi(s)$ and $Q_\pi(s,a)$. Moreover, if $Q(s,a)$ is invariant under $G$ and satisfies the Bellman equation, then the greedy policy $\pi(s) = \arg\max_a Q(s,a)$ is equivariant.

Figures (5)

  • Figure 1: Comparison of KOVI with invariant kernel vs. standard RBF kernel, across different settings. Cumulative regret is plotted against the number of episodes. (a) Regret for synthetic setting. (b) A rendered frame of Frozen Lake. (c,d) Regret for the Frozen Lake, fixed and random setting, respectively. Regret is averaged over 20 random seeds. Shaded area represents the standard error.
  • Figure 2: Regret comparison of invariant vs. RBF kernel for SynPl (a); Optimal placement (b); Best placement achieved by KOVI (c); Baseline random placement from the random policy (d).
  • Figure 3: Average return computed on evaluation (test) data (a) and training data (b) vs. number of training episodes for the Random Layout Frozen Lake environment. Shaded areas represent standard error.
  • Figure 4: Comparison of KOVI with a standard RBF kernel, KOVI with an invariant kernel, DQN, and SymDQN on the FrozenLake (Fixed) environment. Average episodic return versus number of environment timesteps, averaged over 20 random seeds. The shaded area represents the standard error.
  • Figure 5: Reward function $r(s,a)$ generated by kernel ridge regression using an invariant kernel with lengthscale $=0.1$ and $\lambda= 0.01$

Theorems & Definitions (15)

  • Proposition 1
  • Lemma 1: Brown et. al.
  • Theorem 1
  • Theorem 2: Yang et. al.
  • Corollary 1
  • proof
  • Proposition 2
  • proof
  • Theorem 3
  • proof : Proof of Theorem \ref{['the:hoeffding']}
  • ...and 5 more