Equivariant Action Sampling for Reinforcement Learning and Planning
Linfeng Zhao, Owen Howell, Xupeng Zhu, Jung Yeon Park, Zhewen Zhang, Robin Walters, Lawson L. S. Wong
TL;DR
This work tackles the challenge of preserving symmetry in sampling-based reinforcement learning for continuous control. It introduces a two-step equivariant sampling framework that enforces symmetry in both the energy/benefit evaluation and the action-sampling process, including a G-augmented sampling scheme to achieve strong equivariance at finite sample counts. The approach is extended to sampling-based model-predictive planning via an equivariant MPPI/TD-MPC pipeline, requiring $G$-equivariant dynamics, rewards, and policies. Theoretical results establish the equivariance of the Bellman operator under Euclidean symmetry and show improved generalization and sample efficiency in coordinate regression and continuous control tasks, including 2D/3D PointMass, Reacher, and MetaWorld scenarios. Overall, the paper demonstrates that explicitly preserving symmetry in sampling-based planning yields substantial performance and data-efficiency benefits for symmetric RL problems, with practical implications for robotics.
Abstract
Reinforcement learning (RL) algorithms for continuous control tasks require accurate sampling-based action selection. Many tasks, such as robotic manipulation, contain inherent problem symmetries. However, correctly incorporating symmetry into sampling-based approaches remains a challenge. This work addresses the challenge of preserving symmetry in sampling-based planning and control, a key component for enhancing decision-making efficiency in RL. We introduce an action sampling approach that enforces the desired symmetry. We apply our proposed method to a coordinate regression problem and show that the symmetry aware sampling method drastically outperforms the naive sampling approach. We furthermore develop a general framework for sampling-based model-based planning with Model Predictive Path Integral (MPPI). We compare our MPPI approach with standard sampling methods on several continuous control tasks. Empirical demonstrations across multiple continuous control environments validate the effectiveness of our approach, showcasing the importance of symmetry preservation in sampling-based action selection.
