Table of Contents
Fetching ...

Equivariant Action Sampling for Reinforcement Learning and Planning

Linfeng Zhao, Owen Howell, Xupeng Zhu, Jung Yeon Park, Zhewen Zhang, Robin Walters, Lawson L. S. Wong

TL;DR

This work tackles the challenge of preserving symmetry in sampling-based reinforcement learning for continuous control. It introduces a two-step equivariant sampling framework that enforces symmetry in both the energy/benefit evaluation and the action-sampling process, including a G-augmented sampling scheme to achieve strong equivariance at finite sample counts. The approach is extended to sampling-based model-predictive planning via an equivariant MPPI/TD-MPC pipeline, requiring $G$-equivariant dynamics, rewards, and policies. Theoretical results establish the equivariance of the Bellman operator under Euclidean symmetry and show improved generalization and sample efficiency in coordinate regression and continuous control tasks, including 2D/3D PointMass, Reacher, and MetaWorld scenarios. Overall, the paper demonstrates that explicitly preserving symmetry in sampling-based planning yields substantial performance and data-efficiency benefits for symmetric RL problems, with practical implications for robotics.

Abstract

Reinforcement learning (RL) algorithms for continuous control tasks require accurate sampling-based action selection. Many tasks, such as robotic manipulation, contain inherent problem symmetries. However, correctly incorporating symmetry into sampling-based approaches remains a challenge. This work addresses the challenge of preserving symmetry in sampling-based planning and control, a key component for enhancing decision-making efficiency in RL. We introduce an action sampling approach that enforces the desired symmetry. We apply our proposed method to a coordinate regression problem and show that the symmetry aware sampling method drastically outperforms the naive sampling approach. We furthermore develop a general framework for sampling-based model-based planning with Model Predictive Path Integral (MPPI). We compare our MPPI approach with standard sampling methods on several continuous control tasks. Empirical demonstrations across multiple continuous control environments validate the effectiveness of our approach, showcasing the importance of symmetry preservation in sampling-based action selection.

Equivariant Action Sampling for Reinforcement Learning and Planning

TL;DR

This work tackles the challenge of preserving symmetry in sampling-based reinforcement learning for continuous control. It introduces a two-step equivariant sampling framework that enforces symmetry in both the energy/benefit evaluation and the action-sampling process, including a G-augmented sampling scheme to achieve strong equivariance at finite sample counts. The approach is extended to sampling-based model-predictive planning via an equivariant MPPI/TD-MPC pipeline, requiring -equivariant dynamics, rewards, and policies. Theoretical results establish the equivariance of the Bellman operator under Euclidean symmetry and show improved generalization and sample efficiency in coordinate regression and continuous control tasks, including 2D/3D PointMass, Reacher, and MetaWorld scenarios. Overall, the paper demonstrates that explicitly preserving symmetry in sampling-based planning yields substantial performance and data-efficiency benefits for symmetric RL problems, with practical implications for robotics.

Abstract

Reinforcement learning (RL) algorithms for continuous control tasks require accurate sampling-based action selection. Many tasks, such as robotic manipulation, contain inherent problem symmetries. However, correctly incorporating symmetry into sampling-based approaches remains a challenge. This work addresses the challenge of preserving symmetry in sampling-based planning and control, a key component for enhancing decision-making efficiency in RL. We introduce an action sampling approach that enforces the desired symmetry. We apply our proposed method to a coordinate regression problem and show that the symmetry aware sampling method drastically outperforms the naive sampling approach. We furthermore develop a general framework for sampling-based model-based planning with Model Predictive Path Integral (MPPI). We compare our MPPI approach with standard sampling methods on several continuous control tasks. Empirical demonstrations across multiple continuous control environments validate the effectiveness of our approach, showcasing the importance of symmetry preservation in sampling-based action selection.

Paper Structure

This paper contains 56 sections, 2 theorems, 41 equations, 12 figures, 1 table.

Key Result

theorem thmcountertheorem

The Bellman operator of a geometric MDP is equivariant under the Euclidean group $\mathrm{E}(d)$, which includes $d$-dimensional isometric transformations.

Figures (12)

  • Figure 1: Illustration of the coordinate regression problem (Sec \ref{['subsec:def-coord-regression']}) and its equivariance. (Left) The energy function EBM takes image and coordinate samples and outputs scalar energy value. (Right) Equivariance in coordinate regression: rotating the image and augmenting samples results in rotated coordinate prediction.
  • Figure 2: Demonstration of results on coordinate regression problem: left two columns for training on entire region, and right three columns for training only on coordinates in first quadrant.
  • Figure 3: Measuring the equivariance error of using whether $G$-invariant $E(s,a)$ and whether augment action with $G$.
  • Figure 4: The proposed sampling-based planning algorithm $a_0 = \texttt{plan}(s_0)$: if the input state is rotated, the output action should be rotated accordingly. This requires (1) the learned functions to be $G$-equivariant or $G$-invariant networks and (2) a specialized sampling strategy, as introduced in our method.
  • Figure 5: Tasks used in experiments: (1) PointMass in 2D, (2) Reacher, (3) Customized 3D version of PointMass with multiple particles to control, and (4) MetaWorld task to reach an object with gripper.
  • ...and 7 more figures

Theorems & Definitions (3)

  • definition thmcounterdefinition: Geometric MDP
  • theorem thmcountertheorem
  • proposition thmcounterproposition