Table of Contents
Fetching ...

Context-Based Meta Reinforcement Learning for Robust and Adaptable Peg-in-Hole Assembly Tasks

Ahmed Shokry, Walid Gomaa, Tobias Zaenker, Murad Dawood, Rohit Menon, Shady A. Maged, Mohammed I. Awad, Maren Bennewitz

TL;DR

This paper tackles robust peg-in-hole assembly under hole-pose uncertainty by adapting context-based meta reinforcement learning. It replaces the unmeasurable reward-based context with a measurable motion-to-hole signal, encoded as m in the context c=(o,a,o',m), and extends the framework to incorporate force/torque sensor data, enabling real-world adaptation without reliance on calibrated vision. A dedicated OOD adaptation procedure further enables generalization to large pose deviations by guiding latent-space exploration toward motions that close the hole distance. Across simulated and real-world experiments with multiple peg/hole shapes, the approach yields superior training and adaptation efficiency, improved robustness to orientation uncertainty, and strong generalization to out-of-distribution tasks, with substantially reduced data requirements compared to prior work.

Abstract

Autonomous assembly is an essential capability for industrial and service robots, with Peg-in-Hole (PiH) insertion being one of the core tasks. However, PiH assembly in unknown environments is still challenging due to uncertainty in task parameters, such as the hole position and orientation, resulting from sensor noise. Although context-based meta reinforcement learning (RL) methods have been previously presented to adapt to unknown task parameters in PiH assembly tasks, the performance depends on a sample-inefficient procedure or human demonstrations. Thus, to enhance the applicability of meta RL in real-world PiH assembly tasks, we propose to train the agent to use information from the robot's forward kinematics and an uncalibrated camera. Furthermore, we improve the performance by efficiently adapting the meta-trained agent to use data from force/torque sensor. Finally, we propose an adaptation procedure for out-of-distribution tasks whose parameters are different from the training tasks. Experiments on simulated and real robots prove that our modifications enhance the sample efficiency during meta training, real-world adaptation performance, and generalization of the context-based meta RL agent in PiH assembly tasks compared to previous approaches.

Context-Based Meta Reinforcement Learning for Robust and Adaptable Peg-in-Hole Assembly Tasks

TL;DR

This paper tackles robust peg-in-hole assembly under hole-pose uncertainty by adapting context-based meta reinforcement learning. It replaces the unmeasurable reward-based context with a measurable motion-to-hole signal, encoded as m in the context c=(o,a,o',m), and extends the framework to incorporate force/torque sensor data, enabling real-world adaptation without reliance on calibrated vision. A dedicated OOD adaptation procedure further enables generalization to large pose deviations by guiding latent-space exploration toward motions that close the hole distance. Across simulated and real-world experiments with multiple peg/hole shapes, the approach yields superior training and adaptation efficiency, improved robustness to orientation uncertainty, and strong generalization to out-of-distribution tasks, with substantially reduced data requirements compared to prior work.

Abstract

Autonomous assembly is an essential capability for industrial and service robots, with Peg-in-Hole (PiH) insertion being one of the core tasks. However, PiH assembly in unknown environments is still challenging due to uncertainty in task parameters, such as the hole position and orientation, resulting from sensor noise. Although context-based meta reinforcement learning (RL) methods have been previously presented to adapt to unknown task parameters in PiH assembly tasks, the performance depends on a sample-inefficient procedure or human demonstrations. Thus, to enhance the applicability of meta RL in real-world PiH assembly tasks, we propose to train the agent to use information from the robot's forward kinematics and an uncalibrated camera. Furthermore, we improve the performance by efficiently adapting the meta-trained agent to use data from force/torque sensor. Finally, we propose an adaptation procedure for out-of-distribution tasks whose parameters are different from the training tasks. Experiments on simulated and real robots prove that our modifications enhance the sample efficiency during meta training, real-world adaptation performance, and generalization of the context-based meta RL agent in PiH assembly tasks compared to previous approaches.
Paper Structure (23 sections, 4 equations, 14 figures)

This paper contains 23 sections, 4 equations, 14 figures.

Figures (14)

  • Figure 1: We use context-based meta reinforcement learning to perform peg-in-hole assembly tasks with unknown hole position. Unlike previous work metarlinsertion, which uses immeasurable reward as a part of the context data to infer the task, which results in a sample-inefficient adaptation, we use data from the robot's forward kinematics and uncalibrated camera to infer task parameters. Additionally, we adapt the agent to use force/torque sensor data to avoid occlusion problems. Finally, we propose an adaptation procedure to out-of-distribution tasks with huge errors in the estimated hole position and demonstrate the superior performance of our methods.
  • Figure 2: The meta RL agent receives the hole position estimated from noisy external sensor, the current peg position (middle), the distance moved due to the last action calculated from forward kinematics, the detected hole and peg features in the 2D images captured before and after the action (left), and the force/torque sensor reading from the robot. The context encoder (right) uses these data to estimate the unknown information about the task, i.e., the actual hole position, and adapts the policy which produces the incremental robot motion.
  • Figure 3: The context encoder neural network of PEARL pearl maps the collected context data $c=(o,a,o^\prime,r)$ to a posterior Gaussian distribution representing the agent's belief over the current task, from which latent variables are sampled to adapt the policy.
  • Figure 4: The observation is the peg position relative to the estimated hole. The reward is the negative $L_2$ distance between the peg and the actual hole position. Different actual hole positions have different rewards but the same observation metarlinsertion.
  • Figure 5: An example of tasks with the same reward and original context data metarlinsertion and the corresponding posterior Gaussian distribution. Tasks with the same original context data require opposite actions resulting in a high variance posterior Gaussian distribution.
  • ...and 9 more figures