Table of Contents
Fetching ...

Approximate Equivariance in Reinforcement Learning

Jung Yeon Park, Sujay Bhatt, Sihan Zeng, Lawson L. S. Wong, Alec Koppel, Sumitra Ganesh, Robin Walters

TL;DR

This work introduces a formal framework for approximately equivariant reinforcement learning by defining $(G,\epsilon_R,\epsilon_P)$-invariant MDPs and proving that the optimal $Q$-function is approximately invariant under symmetry transformations. It develops a practical architecture based on relaxed group and steerable convolutions to learn policies and value functions that are robust to symmetry breaking, and provides theoretical guarantees on near-invariance of $Q^{*}$. Empirically, the authors show that approximately equivariant RL achieves strong performance and robustness across continuous control tasks and a real-world stock trading dataset, often outperforming exact-equivariant baselines when symmetry is imperfect. The approach improves sample efficiency, resilience to noise, and can adapt to symmetry-breaking factors, offering a flexible inductive bias for RL in realistic environments. The work also provides public code to reproduce the results.

Abstract

Equivariant neural networks have shown great success in reinforcement learning, improving sample efficiency and generalization when there is symmetry in the task. However, in many problems, only approximate symmetry is present, which makes imposing exact symmetry inappropriate. Recently, approximately equivariant networks have been proposed for supervised classification and modeling physical systems. In this work, we develop approximately equivariant algorithms in reinforcement learning (RL). We define approximately equivariant MDPs and theoretically characterize the effect of approximate equivariance on the optimal $Q$ function. We propose novel RL architectures using relaxed group and steerable convolutions and experiment on several continuous control domains and stock trading with real financial data. Our results demonstrate that the approximately equivariant network performs on par with exactly equivariant networks when exact symmetries are present, and outperforms them when the domains exhibit approximate symmetry. As an added byproduct of these techniques, we observe increased robustness to noise at test time. Our code is available at https://github.com/jypark0/approx_equiv_rl.

Approximate Equivariance in Reinforcement Learning

TL;DR

This work introduces a formal framework for approximately equivariant reinforcement learning by defining -invariant MDPs and proving that the optimal -function is approximately invariant under symmetry transformations. It develops a practical architecture based on relaxed group and steerable convolutions to learn policies and value functions that are robust to symmetry breaking, and provides theoretical guarantees on near-invariance of . Empirically, the authors show that approximately equivariant RL achieves strong performance and robustness across continuous control tasks and a real-world stock trading dataset, often outperforming exact-equivariant baselines when symmetry is imperfect. The approach improves sample efficiency, resilience to noise, and can adapt to symmetry-breaking factors, offering a flexible inductive bias for RL in realistic environments. The work also provides public code to reproduce the results.

Abstract

Equivariant neural networks have shown great success in reinforcement learning, improving sample efficiency and generalization when there is symmetry in the task. However, in many problems, only approximate symmetry is present, which makes imposing exact symmetry inappropriate. Recently, approximately equivariant networks have been proposed for supervised classification and modeling physical systems. In this work, we develop approximately equivariant algorithms in reinforcement learning (RL). We define approximately equivariant MDPs and theoretically characterize the effect of approximate equivariance on the optimal function. We propose novel RL architectures using relaxed group and steerable convolutions and experiment on several continuous control domains and stock trading with real financial data. Our results demonstrate that the approximately equivariant network performs on par with exactly equivariant networks when exact symmetries are present, and outperforms them when the domains exhibit approximate symmetry. As an added byproduct of these techniques, we observe increased robustness to noise at test time. Our code is available at https://github.com/jypark0/approx_equiv_rl.

Paper Structure

This paper contains 40 sections, 4 theorems, 38 equations, 9 figures, 4 tables.

Key Result

Theorem 1

Let the rewards $R$ be bounded $R_{\min} \leq R \leq R_{\max}$, $0 \leq \gamma <1$ and let $g \in G$ be an onto mapping. For any state $s$ and action $a$, we have where $\alpha = \frac{\epsilon_R + \gamma \rho_{\mathscr{F}}(V^*) \epsilon_P}{1 - \gamma}$.

Figures (9)

  • Figure 1: An approximately equivariant policy $\pi$ on a Reacher domain, where the goal is to determine the torques (green, magenta) to apply on each joint for the fingertip to reach the target (red). Due to wear, the first joint is more responsive to positive torques. When the state is flipped, the policy also flips the actions but can learn to adjust for symmetry breaking factors.
  • Figure 2: Illustration of the approximately $D_2$-equivariant encoder and policy (critic is not shown for space). The $D_2$ group consists of vertical reflections and $\pi$ rotations. Both the encoder and policy consist of relaxed group convolution layers.
  • Figure 3: Total episode reward on selected domains in the DeepMind Control Suite, shaded regions indicate $95\%$ confidence intervals (CI). Compared to an exactly equivariant agent (ExactEquiv), our approximately equivariant agent (ApproxEquiv) outperforms in Acrobot, performs similarly in two domains, and is slightly worse in the Reacher domain. ApproxEquiv can outperform ExactEquiv on some modified variants with inexact symmetry as it can adjust for symmetry breaking. Our agent outperforms all other baselines, including a non-equivariant agent, suggesting that relaxed symmetry is a good inductive bias.
  • Figure 4: Selected domains in DeepMind Control Suite. The domains were modified to remove extrinsic symmetry and to include several types of symmetry breaking factors such as repeating or reflecting actions in certain states, or by modifying gravity.
  • Figure 5: Visualization of relaxed weights for the first layer of the encoder and policy over all runs. Similar weights for each $g$ indicate perfect equivariance while differing values indicate symmetry breaking. The modified variants of most domains exhibit larger differences or increased variance in the relaxed weights compared to the original variant.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Definition 1: Equivariance Error
  • Definition 2: $\varepsilon$-stabilizer
  • Definition 3: Approximate $G$-Equivariance
  • Definition 4
  • Theorem 1
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • ...and 1 more