Table of Contents
Fetching ...

SPARR: Simulation-based Policies with Asymmetric Real-world Residuals for Assembly

Yijie Guo, Iretiayo Akinola, Lars Johannsmeier, Hugo Hadfield, Abhishek Gupta, Yashraj Narang

TL;DR

This work proposes a hybrid approach that combines a simulation-trained base policy with a real-world residual policy to efficiently adapt to real-world variations and achieves near-perfect success rates across diverse two-part assembly tasks.

Abstract

Robotic assembly presents a long-standing challenge due to its requirement for precise, contact-rich manipulation. While simulation-based learning has enabled the development of robust assembly policies, their performance often degrades when deployed in real-world settings due to the sim-to-real gap. Conversely, real-world reinforcement learning (RL) methods avoid the sim-to-real gap, but rely heavily on human supervision and lack generalization ability to environmental changes. In this work, we propose a hybrid approach that combines a simulation-trained base policy with a real-world residual policy to efficiently adapt to real-world variations. The base policy, trained in simulation using low-level state observations and dense rewards, provides strong priors for initial behavior. The residual policy, learned in the real world using visual observations and sparse rewards, compensates for discrepancies in dynamics and sensor noise. Extensive real-world experiments demonstrate that our method, SPARR, achieves near-perfect success rates across diverse two-part assembly tasks. Compared to the state-of-the-art zero-shot sim-to-real methods, SPARR improves success rates by 38.4% while reducing cycle time by 29.7%. Moreover, SPARR requires no human expertise, in contrast to the state-of-the-art real-world RL approaches that depend heavily on human supervision.

SPARR: Simulation-based Policies with Asymmetric Real-world Residuals for Assembly

TL;DR

This work proposes a hybrid approach that combines a simulation-trained base policy with a real-world residual policy to efficiently adapt to real-world variations and achieves near-perfect success rates across diverse two-part assembly tasks.

Abstract

Robotic assembly presents a long-standing challenge due to its requirement for precise, contact-rich manipulation. While simulation-based learning has enabled the development of robust assembly policies, their performance often degrades when deployed in real-world settings due to the sim-to-real gap. Conversely, real-world reinforcement learning (RL) methods avoid the sim-to-real gap, but rely heavily on human supervision and lack generalization ability to environmental changes. In this work, we propose a hybrid approach that combines a simulation-trained base policy with a real-world residual policy to efficiently adapt to real-world variations. The base policy, trained in simulation using low-level state observations and dense rewards, provides strong priors for initial behavior. The residual policy, learned in the real world using visual observations and sparse rewards, compensates for discrepancies in dynamics and sensor noise. Extensive real-world experiments demonstrate that our method, SPARR, achieves near-perfect success rates across diverse two-part assembly tasks. Compared to the state-of-the-art zero-shot sim-to-real methods, SPARR improves success rates by 38.4% while reducing cycle time by 29.7%. Moreover, SPARR requires no human expertise, in contrast to the state-of-the-art real-world RL approaches that depend heavily on human supervision.
Paper Structure (22 sections, 7 figures, 2 tables)

This paper contains 22 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 2: Overview of experimental tasks. (Top) 10 AutoMate tang2024automate assembly tasks. AutoMate provides a dataset of 100 assembly tasks with diverse parts. We choose 10 out of 100 tasks with near-perfect specialist policy pre-trained in simulation. (Bottom) 3 tasks on NIST board. NIST (National Institute of Standards and Technology) provides Assembly Task Boards as performance benchmarks to evaluate robotic assembly technologies. We consider the peg and gear insertion tasks on task board #1.
  • Figure 3: Illustration of asymmetric policy combination. Combining state-based base policies from simulation and image-based residual policies learned in the real-world.
  • Figure 4: Performance on 10 AutoMate tasks. We evaluate the success rate ($\uparrow$ higher is better) and cycle time ($\downarrow$ lower is better) averaged over 20 episodes. SERL, AutoMate, and SPARR (Ours) transfer simulation-trained policies to the real world without human effort, where SPARR achieves substantially higher success rates and shorter cycle times. HIL-SERL (Oracle) serves as an upper bound, assuming access to near-optimal human demonstrations and continuous human supervision.
  • Figure 5: Policy deployment on different socket poses for AutoMate task 00731. Each box indicates the (x, y) coordinates of the socket pose and the corresponding success rate (0–1) during evaluation. The training socket pose is at (0.48, 0), and for evaluation, the socket is displaced by 2$cm$: up (0.50, 0), down (0.46, 0), left (0.48, –0.02), and right (0.48, 0.02). (a) Base policy from simulation. (b) Base policy with a state-based residual policy. (c) SPARR (Ours): base policy with image-based residual policy. The color bar represents success rate from 0 (yellow) to 1 (green). SPARR achieves higher success rates (darker green) and demonstrates robustness to socket pose variations.
  • Figure 6: Adaptation of simulation policies from AutoMate tasks to NIST tasks. We show images from wrist-mounted cameras here. Fig. \ref{['fig:task']} (bottom) shows the NIST tasks from the front camera view.
  • ...and 2 more figures