Context-Based Meta Reinforcement Learning for Robust and Adaptable Peg-in-Hole Assembly Tasks
Ahmed Shokry, Walid Gomaa, Tobias Zaenker, Murad Dawood, Rohit Menon, Shady A. Maged, Mohammed I. Awad, Maren Bennewitz
TL;DR
This paper tackles robust peg-in-hole assembly under hole-pose uncertainty by adapting context-based meta reinforcement learning. It replaces the unmeasurable reward-based context with a measurable motion-to-hole signal, encoded as m in the context c=(o,a,o',m), and extends the framework to incorporate force/torque sensor data, enabling real-world adaptation without reliance on calibrated vision. A dedicated OOD adaptation procedure further enables generalization to large pose deviations by guiding latent-space exploration toward motions that close the hole distance. Across simulated and real-world experiments with multiple peg/hole shapes, the approach yields superior training and adaptation efficiency, improved robustness to orientation uncertainty, and strong generalization to out-of-distribution tasks, with substantially reduced data requirements compared to prior work.
Abstract
Autonomous assembly is an essential capability for industrial and service robots, with Peg-in-Hole (PiH) insertion being one of the core tasks. However, PiH assembly in unknown environments is still challenging due to uncertainty in task parameters, such as the hole position and orientation, resulting from sensor noise. Although context-based meta reinforcement learning (RL) methods have been previously presented to adapt to unknown task parameters in PiH assembly tasks, the performance depends on a sample-inefficient procedure or human demonstrations. Thus, to enhance the applicability of meta RL in real-world PiH assembly tasks, we propose to train the agent to use information from the robot's forward kinematics and an uncalibrated camera. Furthermore, we improve the performance by efficiently adapting the meta-trained agent to use data from force/torque sensor. Finally, we propose an adaptation procedure for out-of-distribution tasks whose parameters are different from the training tasks. Experiments on simulated and real robots prove that our modifications enhance the sample efficiency during meta training, real-world adaptation performance, and generalization of the context-based meta RL agent in PiH assembly tasks compared to previous approaches.
