Symmetry-aware Reinforcement Learning for Robotic Assembly under Partial Observability with a Soft Wrist
Hai Nguyen, Tadashi Kozuno, Cristian C. Beltran-Hernandez, Masashi Hamaya
TL;DR
The paper tackles a symmetry-rich, contact-dominant peg-in-hole task under partial observability by introducing a memory-based reinforcement learning approach that uses haptic and proprioceptive signals. It leverages domain symmetry through data augmentation and auxiliary losses to enforce invariant/equivariant behavior without specialized symmetry networks, enabling efficient learning in simulation and direct hardware training within roughly 3 hours using demonstrations. The proposed RSAC-Aug-Aux method matches or surpasses state-based baselines, generalizes across multiple peg shapes, and demonstrates robust transfer to real robots, contributing a practical framework for symmetry-aware learning in POMDPs. This work has meaningful implications for safe, efficient learning in contact-rich robotic assembly and related tasks, with potential extensions to imperfect symmetry and tactile-rich sensing.
Abstract
This study tackles the representative yet challenging contact-rich peg-in-hole task of robotic assembly, using a soft wrist that can operate more safely and tolerate lower-frequency control signals than a rigid one. Previous studies often use a fully observable formulation, requiring external setups or estimators for the peg-to-hole pose. In contrast, we use a partially observable formulation and deep reinforcement learning from demonstrations to learn a memory-based agent that acts purely on haptic and proprioceptive signals. Moreover, previous works do not incorporate potential domain symmetry and thus must search for solutions in a bigger space. Instead, we propose to leverage the symmetry for sample efficiency by augmenting the training data and constructing auxiliary losses to force the agent to adhere to the symmetry. Results in simulation with five different symmetric peg shapes show that our proposed agent can be comparable to or even outperform a state-based agent. In particular, the sample efficiency also allows us to learn directly on the real robot within 3 hours.
