Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist

Hai Nguyen; Tadashi Kozuno; Cristian C. Beltran-Hernandez; Masashi Hamaya

Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist

Hai Nguyen, Tadashi Kozuno, Cristian C. Beltran-Hernandez, Masashi Hamaya

TL;DR

The paper tackles a symmetry-rich, contact-dominant peg-in-hole task under partial observability by introducing a memory-based reinforcement learning approach that uses haptic and proprioceptive signals. It leverages domain symmetry through data augmentation and auxiliary losses to enforce invariant/equivariant behavior without specialized symmetry networks, enabling efficient learning in simulation and direct hardware training within roughly 3 hours using demonstrations. The proposed RSAC-Aug-Aux method matches or surpasses state-based baselines, generalizes across multiple peg shapes, and demonstrates robust transfer to real robots, contributing a practical framework for symmetry-aware learning in POMDPs. This work has meaningful implications for safe, efficient learning in contact-rich robotic assembly and related tasks, with potential extensions to imperfect symmetry and tactile-rich sensing.

Abstract

This study tackles the representative yet challenging contact-rich peg-in-hole task of robotic assembly, using a soft wrist that can operate more safely and tolerate lower-frequency control signals than a rigid one. Previous studies often use a fully observable formulation, requiring external setups or estimators for the peg-to-hole pose. In contrast, we use a partially observable formulation and deep reinforcement learning from demonstrations to learn a memory-based agent that acts purely on haptic and proprioceptive signals. Moreover, previous works do not incorporate potential domain symmetry and thus must search for solutions in a bigger space. Instead, we propose to leverage the symmetry for sample efficiency by augmenting the training data and constructing auxiliary losses to force the agent to adhere to the symmetry. Results in simulation with five different symmetric peg shapes show that our proposed agent can be comparable to or even outperform a state-based agent. In particular, the sample efficiency also allows us to learn directly on the real robot within 3 hours.

Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist

TL;DR

Abstract

Paper Structure (21 sections, 6 equations, 10 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 6 equations, 10 figures, 2 tables, 1 algorithm.

INTRODUCTION
RELATED WORKS
Pose Estimation in Soft Robots
Peg-In-Hole with Soft Robots using DRL
Symmetry-aware Policy Learning
BACKGROUND
Partially Observable Markov Decision Processes
Notations
SYMMETRIC PEG-IN-HOLE AS A POMDP
PROPOSED METHOD
Data Augmentation using Domain Symmetry
Constructing Auxiliary Losses using Domain Symmetry
LEARNING IN SIMULATION
Peg-In-Hole with a Simulated Soft Wrist
Results
...and 6 more sections

Figures (10)

Figure 1: We leverage the domain symmetry to augment data (through reflections and rotations) and to regularize (though auxiliary losses) to learn a symmetry-aware agent.
Figure 2: Transforming a 2D point $p_0 = (x, y)$ using X-axis reflection ($F^x$), Y-axis reflection ($F^y$), counter-clockwise rotations of $\{0, \pi/2, \pi, 3\pi/2\}$ around the origin ($R^4$), and their sequential combinations ($F^{xy} = F^x*F^y$ and $F^{xy}*R^4$).
Figure 3: Our sim. setup.
Figure 4: The hole symmetry allows data augmentation on a history $h_T$ through the transformation group $G = F^{xy}$. Moreover, given the current history $h_t$, the Q-function and the policy need to satisfy specific properties for $\forall g \in G$.
Figure 5: We propose leveraging the domain symmetry through data augmentation (yellow boxes) and symmetric auxiliary losses for the actor and the critic (blue boxes), applied to a recurrent SAC agent ni2021recurrent.
...and 5 more figures

Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist

TL;DR

Abstract

Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist

Authors

TL;DR

Abstract

Table of Contents

Figures (10)