Equivariant Offline Reinforcement Learning

Arsh Tangri; Ondrej Biza; Dian Wang; David Klee; Owen Howell; Robert Platt

Equivariant Offline Reinforcement Learning

Arsh Tangri, Ondrej Biza, Dian Wang, David Klee, Owen Howell, Robert Platt

TL;DR

This work investigates the use of $SO(2)$-equivariant neural networks for offline RL with a limited number of demonstrations and provides empirical evidence demonstrating how equivariance improves offline learning algorithms in the low-data regime.

Abstract

Sample efficiency is critical when applying learning-based methods to robotic manipulation due to the high cost of collecting expert demonstrations and the challenges of on-robot policy learning through online Reinforcement Learning (RL). Offline RL addresses this issue by enabling policy learning from an offline dataset collected using any behavioral policy, regardless of its quality. However, recent advancements in offline RL have predominantly focused on learning from large datasets. Given that many robotic manipulation tasks can be formulated as rotation-symmetric problems, we investigate the use of $SO(2)$-equivariant neural networks for offline RL with a limited number of demonstrations. Our experimental results show that equivariant versions of Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) outperform their non-equivariant counterparts. We provide empirical evidence demonstrating how equivariance improves offline learning algorithms in the low-data regime.

Equivariant Offline Reinforcement Learning

TL;DR

This work investigates the use of

-equivariant neural networks for offline RL with a limited number of demonstrations and provides empirical evidence demonstrating how equivariance improves offline learning algorithms in the low-data regime.

Abstract

-equivariant neural networks for offline RL with a limited number of demonstrations. Our experimental results show that equivariant versions of Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) outperform their non-equivariant counterparts. We provide empirical evidence demonstrating how equivariance improves offline learning algorithms in the low-data regime.

Paper Structure (23 sections, 28 equations, 4 figures, 7 tables)

This paper contains 23 sections, 28 equations, 4 figures, 7 tables.

Introduction
Related Work
Background
Equivariant Offline Reinforcement Learning
$SO(2)$-Invariant MDPs
$SO(2)$-Invariant MDP for Robotic-Manipulation
$SO(2)$-Equivariant CQL and IQL
Experiments
Manipulation Tasks
Offline Datasets
Implementation Details
Results
Non-Equivariant Offline RL algorithms
Equivariant Offline RL algorithms
Conclusion and Limitations
...and 8 more sections

Figures (4)

Figure 1: Pybullet Tasks. We show the state upon reset (left) and the respective goal state achieved by an expert (right). The position and rotation of the objects is randomized at each reset.
Figure 2: Performance of Equi-CQL, Non-Equi CQL as a function of gradient steps, on the Block-in-Bowl-Medium dataset with 10 episodes
Figure 3: Commutative Diagram For $G$-equivariant function: Let $\Phi(g, \cdot ): G \times \Omega \rightarrow \Omega$ denote the action of $G$ on $\Omega$. Let $\Phi'(g, \cdot ): G \times \Omega' \rightarrow \Omega'$ denote the action of $G$ on $\Omega'$. The map $\Psi: \Omega \rightarrow \Omega'$ is $G$-equivariant if and only if the following diagram is commutative for all $g\in G$.
Figure 4: Illustration of Normalized Learned Q-values: The learned Q-values of the non-invariant critic are either significantly overestimated or underestimated, which can be inferred by the large standard-deviations. Furthermore, they also fail to assign consistent Q-values to rotated state-action pairs, despite the $SO(2)-$invariant nature of the task. The invariant critic assigns consistent values to rotated $(s,a)$ when the rotation-angle is a multiple of 90 degrees. Furthermore, the overestimation is also significantly smaller

Equivariant Offline Reinforcement Learning

TL;DR

Abstract

Equivariant Offline Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)