Domain Randomization is Sample Efficient for Linear Quadratic Control
Tesshu Fujinami, Bruce D. Lee, Nikolai Matni, George J. Pappas
TL;DR
This work analyzes sample efficiency for learning-based LQR control under model uncertainty, comparing domain randomization (DR), certainty equivalence (CE), and robust control (RC). It proves that, with an appropriately designed sampling distribution, DR attains the optimal asymptotic $1/N$ excess-cost decay that matches CE, while RC yields favorable burn-in in low-data regimes due to robust stabilization. The authors introduce a gradient-based DR algorithm and validate the theoretical trends through numerical experiments on both linear and nonlinear pendulum systems. They also derive tighter RC bounds showing a fundamental trade-off between asymptotic efficiency and conservatism, and discuss extensions to nonlinear systems and misspecification. Overall, the paper provides a principled account of when DR can outperform RC and CE in learning-enabled control and highlights open questions for broader applicability.
Abstract
We study the sample efficiency of domain randomization and robust control for the benchmark problem of learning the linear quadratic regulator (LQR). Domain randomization, which synthesizes controllers by minimizing average performance over a distribution of model parameters, has achieved empirical success in robotics, but its theoretical properties remain poorly understood. We establish that with an appropriately chosen sampling distribution, domain randomization achieves the optimal asymptotic rate of decay in the excess cost, matching certainty equivalence. We further demonstrate that robust control, while potentially overly conservative, exhibits superior performance in the low-data regime due to its ability to stabilize uncertain systems with coarse parameter estimates. We propose a gradient-based algorithm for domain randomization that performs well in numerical experiments, which enables us to validate the trends predicted by our analysis. These results provide insights into the use of domain randomization in learning-enabled control, and highlight several open questions about its application to broader classes of systems.
