Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels

Adam Wei; Abhinav Agarwal; Boyuan Chen; Rohan Bosworth; Nicholas Pfaff; Russ Tedrake

Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels

Adam Wei, Abhinav Agarwal, Boyuan Chen, Rohan Bosworth, Nicholas Pfaff, Russ Tedrake

TL;DR

<3-5 sentence high-level summary> This work investigates cotraining diffusion-based visuomotor policies with both simulated and real robot data, focusing on planar pushing from pixels as a canonical task. It systematically analyzes how mixing ratios, data scales, and distribution shifts affect real-world performance, finding that simulated data substantially boosts performance when real data is scarce, with gains following a power-law trend and plateauing unless real data is increased. The study reveals that reducing physical domain gaps generally yields larger benefits than pushing for high visual fidelity, though some visual discrepancy aids domain discernibility and transfer. By examining mechanisms such as dataset coverage and domain discernibility, the paper provides practical insights for designing simulators and cotraining pipelines in robotics.

Abstract

Cotraining with demonstration data generated both in simulation and on real hardware has emerged as a promising recipe for scaling imitation learning in robotics. This work seeks to elucidate basic principles of this sim-and-real cotraining to inform simulation design, sim-and-real dataset creation, and policy training. Our experiments confirm that cotraining with simulated data can dramatically improve performance, especially when real data is limited. We show that these performance gains scale with additional simulated data up to a plateau; adding more real-world data increases this performance ceiling. The results also suggest that reducing physical domain gaps may be more impactful than visual fidelity for non-prehensile or contact-rich tasks. Perhaps surprisingly, we find that some visual gap can help cotraining -- binary probes reveal that high-performing policies must learn to distinguish simulated domains from real. We conclude by investigating this nuance and mechanisms that facilitate positive transfer between sim-and-real. Focusing narrowly on the canonical task of planar pushing from pixels allows us to be thorough in our study. In total, our experiments span 50+ real-world policies (evaluated on 1000+ trials) and 250 simulated policies (evaluated on 50,000+ trials). Videos and code can be found at https://sim-and-real-cotraining.github.io/.

Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels

TL;DR

Abstract

Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)