Flow-based Domain Randomization for Learning and Sequencing Robotic Skills
Aidan Curtis, Eric Li, Michael Noseworthy, Nishad Gothoskar, Sachin Chitta, Hui Li, Leslie Pack Kaelbling, Nicole Carey
TL;DR
This work addresses the robustness gap in sim-to-real robotic learning by learning domain-randomization distributions with GoFlow, a method that couples a normalizing-flow neural sampler with an entropy-regularized objective to maximize policy performance across diverse environments. The approach yields more flexible, expressive sampling than fixed or simple parametric DR and demonstrates superior domain coverage in six simulated domains and a real gear-insertion task. It also integrates these learned distributions into a belief-space planning framework, using a privileged value function to detect out-of-distribution states and guide information gathering for long-horizon manipulation under partial observability. The results highlight GoFlow’s potential to improve sim-to-real transfer and enable risk-aware, multi-step planning in complex robotics tasks, while acknowledging training variance and the need for careful threshold tuning. Overall, the paper presents a novel, end-to-end framework for adaptive environment sampling and planning under uncertainty with practical robotic impact, including real-world gear insertion.
Abstract
Domain randomization in reinforcement learning is an established technique for increasing the robustness of control policies trained in simulation. By randomizing environment properties during training, the learned policy can become robust to uncertainties along the randomized dimensions. While the environment distribution is typically specified by hand, in this paper we investigate automatically discovering a sampling distribution via entropy-regularized reward maximization of a normalizing-flow-based neural sampling distribution. We show that this architecture is more flexible and provides greater robustness than existing approaches that learn simpler, parameterized sampling distributions, as demonstrated in six simulated and one real-world robotics domain. Lastly, we explore how these learned sampling distributions, combined with a privileged value function, can be used for out-of-distribution detection in an uncertainty-aware multi-step manipulation planner.
