Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments
Junwoo Chang, Minwoo Park, Joohwan Seo, Roberto Horowitz, Jongmin Lee, Jongeun Choi
TL;DR
<3-5 sentence high-level summary>The paper tackles the problem that real-world RL environments rarely exhibit exact group symmetries, causing symmetry-based generalization to degrade when symmetries are locally broken. It introduces the Partially group-Invariant MDP (PI-MDP) framework and develops Partially Equivariant RL (PERL) algorithms that gate between invariant and standard Bellman updates using disagreement-based signals to localize symmetry usage. The proposed approach is instantiated as PE-DQN for discrete control and PE-SAC for continuous control, and evaluated across Grid-World, locomotion, and manipulation tasks, showing improved sample efficiency and robustness over purely invariant or approximate-equivariant baselines. The work provides a principled method to leverage symmetry where it holds while remaining robust to localized symmetry-breaking, with practical implications for robotics and real-world RL systems.
Abstract
Group symmetries provide a powerful inductive bias for reinforcement learning (RL), enabling efficient generalization across symmetric states and actions via group-invariant Markov Decision Processes (MDPs). However, real-world environments almost never realize fully group-invariant MDPs; dynamics, actuation limits, and reward design usually break symmetries, often only locally. Under group-invariant Bellman backups for such cases, local symmetry-breaking introduces errors that propagate across the entire state-action space, resulting in global value estimation errors. To address this, we introduce Partially group-Invariant MDP (PI-MDP), which selectively applies group-invariant or standard Bellman backups depending on where symmetry holds. This framework mitigates error propagation from locally broken symmetries while maintaining the benefits of equivariance, thereby enhancing sample efficiency and generalizability. Building on this framework, we present practical RL algorithms -- Partially Equivariant (PE)-DQN for discrete control and PE-SAC for continuous control -- that combine the benefits of equivariance with robustness to symmetry-breaking. Experiments across Grid-World, locomotion, and manipulation benchmarks demonstrate that PE-DQN and PE-SAC significantly outperform baselines, highlighting the importance of selective symmetry exploitation for robust and sample-efficient RL.
