TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

Yunfan Jiang; Chen Wang; Ruohan Zhang; Jiajun Wu; Li Fei-Fei

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

Yunfan Jiang, Chen Wang, Ruohan Zhang, Jiajun Wu, Li Fei-Fei

TL;DR

TRANSIC tackles sim-to-real transfer for contact-rich manipulation by combining a strong base policy learned in simulation with a residual policy learned from online human corrections. It trains base policies in 3D point-cloud space and distills them into robust joint-position controllers, then collects human corrections during real-world execution to train a gated residual that is fused with the base policy at deployment. The framework demonstrates strong transfer across four FurnitureBench tasks with substantially less real-world data, showing robustness to multiple gap types and the ability to generalize to unseen objects, while scaling with human effort. Emergent behaviors such as error recovery and safety-aware actions emerge, enabling longer-horizon manipulation and safer deployment, with potential for domain-agnostic generalist robotics.

Abstract

Learning in simulation and transferring the learned policy to the real world has the potential to enable generalist robots. The key challenge of this approach is to address simulation-to-reality (sim-to-real) gaps. Previous methods often require domain-specific knowledge a priori. We argue that a straightforward way to obtain such knowledge is by asking humans to observe and assist robot policy execution in the real world. The robots can then learn from humans to close various sim-to-real gaps. We propose TRANSIC, a data-driven approach to enable successful sim-to-real transfer based on a human-in-the-loop framework. TRANSIC allows humans to augment simulation policies to overcome various unmodeled sim-to-real gaps holistically through intervention and online correction. Residual policies can be learned from human corrections and integrated with simulation policies for autonomous execution. We show that our approach can achieve successful sim-to-real transfer in complex and contact-rich manipulation tasks such as furniture assembly. Through synergistic integration of policies learned in simulation and from humans, TRANSIC is effective as a holistic approach to addressing various, often coexisting sim-to-real gaps. It displays attractive properties such as scaling with human effort. Videos and code are available at https://transic-robot.github.io/

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

TL;DR

Abstract

Paper Structure (80 sections, 9 equations, 27 figures, 16 tables, 1 algorithm)

This paper contains 80 sections, 9 equations, 27 figures, 16 tables, 1 algorithm.

Introduction
Sim-to-Real Policy Transfer by Learning from Online Correction
Preliminaries
Learning Base Policies in Simulation with RL
Policy Learning with 3D Representation
Action Space Distillation
Learning Residual Policies from Online Correction
Human-in-the-Loop Data Collection
Human Correction as Residual Policies
An Integrated Deployment Framework
Implementation Details
Experiments
Tasks, Baselines, and Evaluation Protocol
Results
Transic is effective for sim-to-real transfer and requires significantly less real-world data ($\mathbf{\mathcal{Q}1}$).
...and 65 more sections

Figures (27)

Figure 1: Transic for sim-to-real transfer in contact-rich robotic manipulation tasks.a) and b) Naïvely deploying policies trained in simulation usually fails due to various sim-to-real gaps. Here, the robot attempts to first align the light bulb with the base and then insert and screw the light bulb into the base. c) A human operator monitors robot behaviors, intervenes, and provides online correction through teleoperation when necessary. Human data are collected to train a residual policy to tackle various sim-to-real gaps in a holistic manner. d) The simulation and the residual policies are integrated together during test time to achieve a successful sim-to-real transfer for contact-rich tasks, such as screwing a light bulb into the base.
Figure 2: Transic method overview.a) Base policies are first trained in simulation through action space distillation with demonstrations generated by RL teacher policies. Base policies take point cloud as input to reduce perception gap. b) The acquired base policies are first deployed with a human operator monitoring the execution. The human intervenes and corrects through teleoperation when necessary. Such correction data are collected to learn residual policies. Finally, both residual policies and base policies are integrated during test time to achieve a successful transfer.
Figure 3: Four tasks benchmarked in this work. They are fundamental skills required to assemble a square table from FurnitureBench heo2023furniturebench. The task definition can be found in Appendix \ref{['supp:sec:task_definition']}.
Figure 4: Average success rates over four benchmarked tasks. Numerical results in Table \ref{['supp:table:main_exp_result']}.
Figure 5: Robustness to different sim-to-real gaps. Numbers are averaged success rates (%). Polar bars represent performances after training with data collected specifically to address a particular gap. Dashed lines are zero-shot performances. Shaded circles show average performances.
...and 22 more figures

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

TL;DR

Abstract

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

Authors

TL;DR

Abstract

Table of Contents

Figures (27)