Equivariant Reinforcement Learning under Partial Observability
Hai Nguyen, Andrea Baisero, David Klee, Dian Wang, Robert Platt, Christopher Amato
TL;DR
This work addresses learning under partial observability by leveraging rotational symmetries through equivariant neural architectures within a POMDP framework. It extends group-invariant MDP theory to POMDPs, proving that the optimal value and policy can be made invariant/equivariant under the symmetry group $G$, and implements this via an equivariant actor and invariant critic, including an equivariant LSTM. The proposed Equi-RA2C and Equi-RSAC architectures demonstrate superior sample efficiency and final performance on grid-world and robotic manipulation tasks, with zero-shot sim-to-real transfer on a UR5 robot. The results highlight the practical impact of embedding symmetry into representation and recurrence for partial observability, while acknowledging sensitivity to imperfect symmetry and suggesting future work on robustness to asymmetries.
Abstract
Incorporating inductive biases is a promising approach for tackling challenging robot learning domains with sample-efficient solutions. This paper identifies partially observable domains where symmetries can be a useful inductive bias for efficient learning. Specifically, by encoding the equivariance regarding specific group symmetries into the neural networks, our actor-critic reinforcement learning agents can reuse solutions in the past for related scenarios. Consequently, our equivariant agents outperform non-equivariant approaches significantly in terms of sample efficiency and final performance, demonstrated through experiments on a range of robotic tasks in simulation and real hardware.
