Exploiting Symmetry in Dynamics for Model-Based Reinforcement Learning with Asymmetric Rewards

Yasin Sonmez; Neelay Junnarkar; Murat Arcak

Exploiting Symmetry in Dynamics for Model-Based Reinforcement Learning with Asymmetric Rewards

Yasin Sonmez, Neelay Junnarkar, Murat Arcak

TL;DR

This work tackles sample efficiency in model-based reinforcement learning by enforcing dynamical symmetry through Cartan's moving frame, yielding a reduced, $G$-invariant representation of the dynamics. It formalizes a reduced function $\bar{F}$ on $\mathcal{X}^b \times \mathcal{U}$ and reconstructs the full dynamics via $F(x,u) = \phi_{\gamma(x)}^{-1}(\bar{F}(\rho(x), \psi_{\gamma(x)}(u)))$, ensuring invariance by construction. Empirical results on Parking and Reacher show that symmetry-aware dynamics learning can achieve lower observation error with smaller networks, particularly when parameter budgets are limited, indicating improved data efficiency. This approach broadens the applicability of symmetry techniques in reinforcement learning by allowing symmetry in the dynamics to be exploited independently of reward symmetry.

Abstract

Recent work in reinforcement learning has leveraged symmetries in the model to improve sample efficiency in training a policy. A commonly used simplifying assumption is that the dynamics and reward both exhibit the same symmetry; however, in many real-world environments, the dynamical model exhibits symmetry independent of the reward model. In this paper, we assume only the dynamics exhibit symmetry, extending the scope of problems in reinforcement learning and learning in control theory to which symmetry techniques can be applied. We use Cartan's moving frame method to introduce a technique for learning dynamics that, by construction, exhibit specified symmetries. Numerical experiments demonstrate that the proposed method learns a more accurate dynamical model

Exploiting Symmetry in Dynamics for Model-Based Reinforcement Learning with Asymmetric Rewards

TL;DR

This work tackles sample efficiency in model-based reinforcement learning by enforcing dynamical symmetry through Cartan's moving frame, yielding a reduced,

-invariant representation of the dynamics. It formalizes a reduced function

and reconstructs the full dynamics via

, ensuring invariance by construction. Empirical results on Parking and Reacher show that symmetry-aware dynamics learning can achieve lower observation error with smaller networks, particularly when parameter budgets are limited, indicating improved data efficiency. This approach broadens the applicability of symmetry techniques in reinforcement learning by allowing symmetry in the dynamics to be exploited independently of reward symmetry.

Abstract

Paper Structure (10 sections, 2 theorems, 18 equations, 5 figures)

This paper contains 10 sections, 2 theorems, 18 equations, 5 figures.

Introduction
Symmetries in Model Dynamics
Cartan's Moving Frame
Dimension Reduction
Learning Model Dynamics
Experiments
Two Cars Parking Scenario
Reacher
Conclusion
Acknowledgements

Key Result

Lemma 1

For all $g \in G$ and $x \in \mathcal{X}$, $\rho(\phi_g(x)) = \rho(x)$.

Figures (5)

Figure 1: Relationship between $F$ and $\bar{F}$.
Figure 2: Experimental environments included: (1) "Reacher" on the left, featuring two rotating controlled joints that exhibit rotation symmetry with the objective of reaching a target point, and (2) "Parking" on the right, involving two controlled vehicles maneuvering to park in designated spots without collision, demonstrating both rotational and translational symmetry in dynamics.
Figure 3: Illustration of translational and rotational invariance in car dynamics. By applying Cartan's moving frame method, coordinates are transformed to position the car at the origin with a neutral orientation. The function $\rho$ reduces the system's state to a lower-dimensional space, without losing essential dynamics information. The dynamics are learned in these modified coordinates via a smaller neural network (bottom) compared to the usual NN (middle). The NN's output is then reconverted to original coordinates using $\phi_{\gamma(x)}^{-1}$. As usual, the neural network training is simplified by outputting the difference between the next and current states, represented as $\Delta F(x, u)$ and $\Delta \bar{F}(x, u)$ in Section \ref{['sec:learning']}. This is an illustration of a single car. In the experiments, additional symmetries are observed due to the presence of two cars; however, the second car has been omitted here for clarity.
Figure 4: Comparison of learning the dynamics with and without using symmetry in parking scenario with different NN architectures. The y-axis (observation error) is the error on the test dataset. Mean and standard deviation reported over 4 runs.
Figure 5: Comparison of learning the dynamics with and without using symmetry in reacher environment with different NN architectures. The y-axis (observation error) is the error on the test dataset. Mean and standard deviation reported over 4 runs.

Theorems & Definitions (7)

Definition 1: Lie Transformation Group
Definition 2: Invariant Dynamics
Lemma 1
proof
Example 1
Theorem 1
proof

Exploiting Symmetry in Dynamics for Model-Based Reinforcement Learning with Asymmetric Rewards

TL;DR

Abstract

Exploiting Symmetry in Dynamics for Model-Based Reinforcement Learning with Asymmetric Rewards

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (7)