Table of Contents
Fetching ...

Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels

Adam Wei, Abhinav Agarwal, Boyuan Chen, Rohan Bosworth, Nicholas Pfaff, Russ Tedrake

TL;DR

<3-5 sentence high-level summary> This work investigates cotraining diffusion-based visuomotor policies with both simulated and real robot data, focusing on planar pushing from pixels as a canonical task. It systematically analyzes how mixing ratios, data scales, and distribution shifts affect real-world performance, finding that simulated data substantially boosts performance when real data is scarce, with gains following a power-law trend and plateauing unless real data is increased. The study reveals that reducing physical domain gaps generally yields larger benefits than pushing for high visual fidelity, though some visual discrepancy aids domain discernibility and transfer. By examining mechanisms such as dataset coverage and domain discernibility, the paper provides practical insights for designing simulators and cotraining pipelines in robotics.

Abstract

Cotraining with demonstration data generated both in simulation and on real hardware has emerged as a promising recipe for scaling imitation learning in robotics. This work seeks to elucidate basic principles of this sim-and-real cotraining to inform simulation design, sim-and-real dataset creation, and policy training. Our experiments confirm that cotraining with simulated data can dramatically improve performance, especially when real data is limited. We show that these performance gains scale with additional simulated data up to a plateau; adding more real-world data increases this performance ceiling. The results also suggest that reducing physical domain gaps may be more impactful than visual fidelity for non-prehensile or contact-rich tasks. Perhaps surprisingly, we find that some visual gap can help cotraining -- binary probes reveal that high-performing policies must learn to distinguish simulated domains from real. We conclude by investigating this nuance and mechanisms that facilitate positive transfer between sim-and-real. Focusing narrowly on the canonical task of planar pushing from pixels allows us to be thorough in our study. In total, our experiments span 50+ real-world policies (evaluated on 1000+ trials) and 250 simulated policies (evaluated on 50,000+ trials). Videos and code can be found at https://sim-and-real-cotraining.github.io/.

Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels

TL;DR

<3-5 sentence high-level summary> This work investigates cotraining diffusion-based visuomotor policies with both simulated and real robot data, focusing on planar pushing from pixels as a canonical task. It systematically analyzes how mixing ratios, data scales, and distribution shifts affect real-world performance, finding that simulated data substantially boosts performance when real data is scarce, with gains following a power-law trend and plateauing unless real data is increased. The study reveals that reducing physical domain gaps generally yields larger benefits than pushing for high visual fidelity, though some visual discrepancy aids domain discernibility and transfer. By examining mechanisms such as dataset coverage and domain discernibility, the paper provides practical insights for designing simulators and cotraining pipelines in robotics.

Abstract

Cotraining with demonstration data generated both in simulation and on real hardware has emerged as a promising recipe for scaling imitation learning in robotics. This work seeks to elucidate basic principles of this sim-and-real cotraining to inform simulation design, sim-and-real dataset creation, and policy training. Our experiments confirm that cotraining with simulated data can dramatically improve performance, especially when real data is limited. We show that these performance gains scale with additional simulated data up to a plateau; adding more real-world data increases this performance ceiling. The results also suggest that reducing physical domain gaps may be more impactful than visual fidelity for non-prehensile or contact-rich tasks. Perhaps surprisingly, we find that some visual gap can help cotraining -- binary probes reveal that high-performing policies must learn to distinguish simulated domains from real. We conclude by investigating this nuance and mechanisms that facilitate positive transfer between sim-and-real. Focusing narrowly on the canonical task of planar pushing from pixels allows us to be thorough in our study. In total, our experiments span 50+ real-world policies (evaluated on 1000+ trials) and 250 simulated policies (evaluated on 50,000+ trials). Videos and code can be found at https://sim-and-real-cotraining.github.io/.

Paper Structure

This paper contains 34 sections, 6 equations, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Sim-and-real cotraining aims to train visuomotor policies using both simulated and real-world robot data to maximize performance on a real-world objective.
  • Figure 2: An example of planar-pushing graesdal2024tightconvexrelaxationscontactrich. The circle is the pusher, the black T is the slider, and the yellow T is the goal.
  • Figure 3: Real-world performance of cotrained policies at different data scales and mixing ratios. $\bigstar$ depicts the optimal $\alpha$ and $\blacksquare$ depicts the natural mixing ratio. When real data is limited, cotraining with sim data can improve performance by 2-7x.
  • Figure 4: Pretraining with a cotraining mixture significantly outperforms pretraining with sim-only.
  • Figure 5: A visualization and comparison of the sim2real gap in Section \ref{['sec:real_world']} and the sim2target gap in Section \ref{['sec:sim']}.
  • ...and 12 more figures