Table of Contents
Fetching ...

Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments

Junwoo Chang, Minwoo Park, Joohwan Seo, Roberto Horowitz, Jongmin Lee, Jongeun Choi

TL;DR

<3-5 sentence high-level summary>The paper tackles the problem that real-world RL environments rarely exhibit exact group symmetries, causing symmetry-based generalization to degrade when symmetries are locally broken. It introduces the Partially group-Invariant MDP (PI-MDP) framework and develops Partially Equivariant RL (PERL) algorithms that gate between invariant and standard Bellman updates using disagreement-based signals to localize symmetry usage. The proposed approach is instantiated as PE-DQN for discrete control and PE-SAC for continuous control, and evaluated across Grid-World, locomotion, and manipulation tasks, showing improved sample efficiency and robustness over purely invariant or approximate-equivariant baselines. The work provides a principled method to leverage symmetry where it holds while remaining robust to localized symmetry-breaking, with practical implications for robotics and real-world RL systems.

Abstract

Group symmetries provide a powerful inductive bias for reinforcement learning (RL), enabling efficient generalization across symmetric states and actions via group-invariant Markov Decision Processes (MDPs). However, real-world environments almost never realize fully group-invariant MDPs; dynamics, actuation limits, and reward design usually break symmetries, often only locally. Under group-invariant Bellman backups for such cases, local symmetry-breaking introduces errors that propagate across the entire state-action space, resulting in global value estimation errors. To address this, we introduce Partially group-Invariant MDP (PI-MDP), which selectively applies group-invariant or standard Bellman backups depending on where symmetry holds. This framework mitigates error propagation from locally broken symmetries while maintaining the benefits of equivariance, thereby enhancing sample efficiency and generalizability. Building on this framework, we present practical RL algorithms -- Partially Equivariant (PE)-DQN for discrete control and PE-SAC for continuous control -- that combine the benefits of equivariance with robustness to symmetry-breaking. Experiments across Grid-World, locomotion, and manipulation benchmarks demonstrate that PE-DQN and PE-SAC significantly outperform baselines, highlighting the importance of selective symmetry exploitation for robust and sample-efficient RL.

Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments

TL;DR

<3-5 sentence high-level summary>The paper tackles the problem that real-world RL environments rarely exhibit exact group symmetries, causing symmetry-based generalization to degrade when symmetries are locally broken. It introduces the Partially group-Invariant MDP (PI-MDP) framework and develops Partially Equivariant RL (PERL) algorithms that gate between invariant and standard Bellman updates using disagreement-based signals to localize symmetry usage. The proposed approach is instantiated as PE-DQN for discrete control and PE-SAC for continuous control, and evaluated across Grid-World, locomotion, and manipulation tasks, showing improved sample efficiency and robustness over purely invariant or approximate-equivariant baselines. The work provides a principled method to leverage symmetry where it holds while remaining robust to localized symmetry-breaking, with practical implications for robotics and real-world RL systems.

Abstract

Group symmetries provide a powerful inductive bias for reinforcement learning (RL), enabling efficient generalization across symmetric states and actions via group-invariant Markov Decision Processes (MDPs). However, real-world environments almost never realize fully group-invariant MDPs; dynamics, actuation limits, and reward design usually break symmetries, often only locally. Under group-invariant Bellman backups for such cases, local symmetry-breaking introduces errors that propagate across the entire state-action space, resulting in global value estimation errors. To address this, we introduce Partially group-Invariant MDP (PI-MDP), which selectively applies group-invariant or standard Bellman backups depending on where symmetry holds. This framework mitigates error propagation from locally broken symmetries while maintaining the benefits of equivariance, thereby enhancing sample efficiency and generalizability. Building on this framework, we present practical RL algorithms -- Partially Equivariant (PE)-DQN for discrete control and PE-SAC for continuous control -- that combine the benefits of equivariance with robustness to symmetry-breaking. Experiments across Grid-World, locomotion, and manipulation benchmarks demonstrate that PE-DQN and PE-SAC significantly outperform baselines, highlighting the importance of selective symmetry exploitation for robust and sample-efficient RL.

Paper Structure

This paper contains 59 sections, 4 theorems, 62 equations, 10 figures, 8 tables, 1 algorithm.

Key Result

Lemma 1

For any bounded $Q$ and any $(s,a)\in\mathcal{S}\times \mathcal{A}$,

Figures (10)

  • Figure 1: Overview of partial equivariance in reinforcement learning. Equivariant networks provide strong inductive bias and sample efficiency in environments with exact symmetry. Left: In the symmetric case, the equivariant policy $\pi_E$ exploits this structure and learns an optimal action $a$ to reach the goal. Right: When the agent and goal are rotated by $90^\circ$ but a fixed obstacle (not represented in the agent's state) remains in place, the symmetry of the true dynamics is broken. An exactly equivariant policy is forced to output the rotated action $ga$, which is invalid due to the obstacle in some cases, thereby corrupting training. Our framework introduces a gating function $\lambda$ that detects such symmetry-breaking and activates the non-equivariant policy $\pi_N$, preserving robustness while retaining the sample efficiency benefits of equivariance in symmetric regions.
  • Figure 2: Benchmark environments. We evaluate our method across both discrete and continuous control tasks under symmetry-breaking conditions. Specifically, we use the Grid-World environment for the discrete case, and locomotion and manipulation tasks for the continuous case.
  • Figure 3: Performance comparison in the discrete space (Grid-World) environment. We evaluate the average performance over 100K steps with five random seeds. Shaded regions denote standard error. We vary the number of obstacles, which act as symmetry-breaking factors. PE-DQN consistently outperforms the baselines, and the performance gap widens as symmetry-breaking increases, demonstrating both robustness and sample efficiency.
  • Figure 4: Performance comparison in Grid-World under reward-level symmetry-breaking and complex dynamics. Results are averaged over 100K environment steps with five random seeds; shaded regions denote standard error. (a) Reward-level symmetry-breaking is introduced by making half of the obstacles passable while assigning a negative reward upon traversal, in layouts with 10 and 30 obstacles. (b) Complex dynamics setting with stochastic transitions in 40-obstacle layout. PE-DQN consistently outperforms the baselines in both settings, indicating robustness to reward-level symmetry-breaking and challenging dynamics.
  • Figure 5: Performance comparison in the continuous space environments. Results are averaged over 1M training steps in MuJoCo tasks, and 30K, 500K steps in the Fetch, UR5e Reach environment, using eight random seeds from locomotion tasks and five random seeds from manipulation tasks. Shaded regions denote standard error. For RPP finzi2021residual, we re-ran the official code. Discrepancies with the reported numbers arise because RPP reports "max over steps" rather than average performance. PE-SAC consistently outperforms all baselines across these tasks.
  • ...and 5 more figures

Theorems & Definitions (11)

  • Definition 1: Per-state–action symmetry-breaking
  • Lemma 1: One-step Bellman error
  • Proposition 1: Value-function gap
  • Definition 2: PI-MDP
  • Theorem 1: Partially group-invariant optimality operator
  • Corollary 1: Proximity bound
  • Remark 1: Hard gating
  • proof
  • proof
  • proof
  • ...and 1 more