Table of Contents
Fetching ...

A General Peg-in-Hole Assembly Policy Based on Domain Randomized Reinforcement Learning

Xinyu Liu, Aljaz Kramberger, Leon Bodenhagen

TL;DR

This work tackles generalization for peg-in-hole assembly in six degrees of freedom by learning a general policy (GenPiH) through PPO in a highly domain-randomized, dynamic simulation environment. It introduces a large-scale parallel training setup (8,192 environments) and demonstrates near-100% insertion success across diverse hole poses, followed by sim-to-real validation on a UR10e robot without task-specific tuning. The method relies on a dynamic simulation pipeline, a two-layer neural network policy, and a reward structure combining dense and sparse signals to drive robust 6-DOF alignment and insertion. The results indicate strong generalization and practical viability, with future work aimed at optimizing trajectories to reduce redundant motion and improve efficiency.

Abstract

Generalization is important for peg-in-hole assembly, a fundamental industrial operation, to adapt to dynamic industrial scenarios and enhance manufacturing efficiency. While prior work has enhanced generalization ability for pose variations, spatial generalization to six degrees of freedom (6-DOF) is less researched, limiting application in real-world scenarios. This paper addresses this limitation by developing a general policy GenPiH using Proximal Policy Optimization(PPO) and dynamic simulation with domain randomization. The policy learning experiment demonstrates the policy's generalization ability with nearly 100\% success insertion across over eight thousand unique hole poses in parallel environments, and sim-to-real validation on a UR10e robot confirms the policy's performance through direct trajectory execution without task-specific tuning.

A General Peg-in-Hole Assembly Policy Based on Domain Randomized Reinforcement Learning

TL;DR

This work tackles generalization for peg-in-hole assembly in six degrees of freedom by learning a general policy (GenPiH) through PPO in a highly domain-randomized, dynamic simulation environment. It introduces a large-scale parallel training setup (8,192 environments) and demonstrates near-100% insertion success across diverse hole poses, followed by sim-to-real validation on a UR10e robot without task-specific tuning. The method relies on a dynamic simulation pipeline, a two-layer neural network policy, and a reward structure combining dense and sparse signals to drive robust 6-DOF alignment and insertion. The results indicate strong generalization and practical viability, with future work aimed at optimizing trajectories to reduce redundant motion and improve efficiency.

Abstract

Generalization is important for peg-in-hole assembly, a fundamental industrial operation, to adapt to dynamic industrial scenarios and enhance manufacturing efficiency. While prior work has enhanced generalization ability for pose variations, spatial generalization to six degrees of freedom (6-DOF) is less researched, limiting application in real-world scenarios. This paper addresses this limitation by developing a general policy GenPiH using Proximal Policy Optimization(PPO) and dynamic simulation with domain randomization. The policy learning experiment demonstrates the policy's generalization ability with nearly 100\% success insertion across over eight thousand unique hole poses in parallel environments, and sim-to-real validation on a UR10e robot confirms the policy's performance through direct trajectory execution without task-specific tuning.

Paper Structure

This paper contains 12 sections, 8 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Training pipeline.
  • Figure 2: Policy learning metrics.
  • Figure 3: The assembly process in real experiment and joints trajectory.