Table of Contents
Fetching ...

Domain Randomization is Sample Efficient for Linear Quadratic Control

Tesshu Fujinami, Bruce D. Lee, Nikolai Matni, George J. Pappas

TL;DR

This work analyzes sample efficiency for learning-based LQR control under model uncertainty, comparing domain randomization (DR), certainty equivalence (CE), and robust control (RC). It proves that, with an appropriately designed sampling distribution, DR attains the optimal asymptotic $1/N$ excess-cost decay that matches CE, while RC yields favorable burn-in in low-data regimes due to robust stabilization. The authors introduce a gradient-based DR algorithm and validate the theoretical trends through numerical experiments on both linear and nonlinear pendulum systems. They also derive tighter RC bounds showing a fundamental trade-off between asymptotic efficiency and conservatism, and discuss extensions to nonlinear systems and misspecification. Overall, the paper provides a principled account of when DR can outperform RC and CE in learning-enabled control and highlights open questions for broader applicability.

Abstract

We study the sample efficiency of domain randomization and robust control for the benchmark problem of learning the linear quadratic regulator (LQR). Domain randomization, which synthesizes controllers by minimizing average performance over a distribution of model parameters, has achieved empirical success in robotics, but its theoretical properties remain poorly understood. We establish that with an appropriately chosen sampling distribution, domain randomization achieves the optimal asymptotic rate of decay in the excess cost, matching certainty equivalence. We further demonstrate that robust control, while potentially overly conservative, exhibits superior performance in the low-data regime due to its ability to stabilize uncertain systems with coarse parameter estimates. We propose a gradient-based algorithm for domain randomization that performs well in numerical experiments, which enables us to validate the trends predicted by our analysis. These results provide insights into the use of domain randomization in learning-enabled control, and highlight several open questions about its application to broader classes of systems.

Domain Randomization is Sample Efficient for Linear Quadratic Control

TL;DR

This work analyzes sample efficiency for learning-based LQR control under model uncertainty, comparing domain randomization (DR), certainty equivalence (CE), and robust control (RC). It proves that, with an appropriately designed sampling distribution, DR attains the optimal asymptotic excess-cost decay that matches CE, while RC yields favorable burn-in in low-data regimes due to robust stabilization. The authors introduce a gradient-based DR algorithm and validate the theoretical trends through numerical experiments on both linear and nonlinear pendulum systems. They also derive tighter RC bounds showing a fundamental trade-off between asymptotic efficiency and conservatism, and discuss extensions to nonlinear systems and misspecification. Overall, the paper provides a principled account of when DR can outperform RC and CE in learning-enabled control and highlights open questions for broader applicability.

Abstract

We study the sample efficiency of domain randomization and robust control for the benchmark problem of learning the linear quadratic regulator (LQR). Domain randomization, which synthesizes controllers by minimizing average performance over a distribution of model parameters, has achieved empirical success in robotics, but its theoretical properties remain poorly understood. We establish that with an appropriately chosen sampling distribution, domain randomization achieves the optimal asymptotic rate of decay in the excess cost, matching certainty equivalence. We further demonstrate that robust control, while potentially overly conservative, exhibits superior performance in the low-data regime due to its ability to stabilize uncertain systems with coarse parameter estimates. We propose a gradient-based algorithm for domain randomization that performs well in numerical experiments, which enables us to validate the trends predicted by our analysis. These results provide insights into the use of domain randomization in learning-enabled control, and highlight several open questions about its application to broader classes of systems.

Paper Structure

This paper contains 33 sections, 24 theorems, 131 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Suppose the dataset $\mathopen{}\left\{(X_t^n, U_t^n, X_{t+1}^n)\right\}\mathclose{}_{t=1, n=1}^{T,N}$ is collected from N trajectories of the system eq: linear system via a random control input $U_t \sim \mathcal{N}(0, \Sigma_u)$. Let $\hat{\theta}$ be the least square estimate computed by eq: leas where $L_{\mathsf{DR}}(\theta^\star, \delta) = \mathsf{poly}(d_{\theta}, \max\mathopen{}\left\{1, \

Figures (4)

  • Figure 1: Illustration of the sample efficinecy of various synthesis methods.
  • Figure 2: Excess cost of controllers found via three methods using models fit with various amounts of data.
  • Figure 3: Cost of CE and DR controllers based on models fit with various amounts of data.
  • Figure 4: Domain Randomized Policy-Gradient for the Linear Quadratic Regulator

Theorems & Definitions (26)

  • Lemma 1
  • Theorem 1
  • Definition 2: Robust Stabilizability
  • Example 1
  • Theorem 3
  • Lemma 2: Performance Difference Lemma, Lemma 12 of fazel2018global
  • Lemma 3: Lyapunov Perturbation
  • Lemma 4
  • Lemma 5: Riccati Perturbation, Proposition 4 and 6 of simchowitz2020naive
  • Lemma 6: Simplifying inequalities
  • ...and 16 more