On Globally Optimal Stochastic Policy Gradient Methods for Domain Randomized LQR Synthesis

Alex Nguyen-Le; Nikolai Matni

On Globally Optimal Stochastic Policy Gradient Methods for Domain Randomized LQR Synthesis

Alex Nguyen-Le, Nikolai Matni

Abstract

Domain randomization is a simple, effective, and flexible scheme for obtaining robust feedback policies aimed at reducing the sim-to-real gap due to model mismatch. While domain randomization methods have yielded impressive demonstrations in the robotics-learning literature, general and theoretically motivated principles for designing optimization schemes that effectively leverage the randomization are largely unexplored. We address this gap by considering a stochastic policy gradient descent method for the domain randomized linear-quadratic regulator synthesis problem, a situation simple enough to provide theoretical guarantees. In particular, we demonstrate that stochastic gradients obtained by repeatedly sampling new systems at each gradient step converge to global optima with appropriate hyperparameters choices, and yield better controllers with lower variability in the final controllers when compared to approaches that do not resample. Sampling is often a quick and cheap operation, so computing policy gradients with newly sampled systems at each iteration is preferable to evaluating gradients on a fixed set of systems.

On Globally Optimal Stochastic Policy Gradient Methods for Domain Randomized LQR Synthesis

Abstract

Paper Structure (24 sections, 21 theorems, 95 equations, 1 figure, 1 table, 2 algorithms)

This paper contains 24 sections, 21 theorems, 95 equations, 1 figure, 1 table, 2 algorithms.

Introduction
Related Work: LQR Policy Gradient Methods
Related Work: Domain Randomization
Main Contributions
Problem Description
Convergence Analysis of Gradient Descent for DR-LQR
DR-LQR Cost is Coercive
DR-LQR Cost is Locally L-Smooth
Gradient Dominance of DR-LQR Cost:
Convergence Analysis of SGD for DR-LQR
SGD Analysis Setup
One-Step Decomposition
Concentration of minibatched-gradient estimators
Convergence of Stochastic Gradient Descent
Numerical Experiments
...and 9 more sections

Key Result

Lemma 1

Fix $c>0$. The cost $\mathbb{E}[J(K,\theta)]$ is $L_K$-smooth at any $K\in S_c$, with $L_K$ given by Where the operator norm is given by the maximum singular value. Here, We note that all quantities in eq:LK_upper_bound can be bounded above by polynomials in problem specific parameters and $c$.

Figures (1)

Figure 1: All figures are generated with 1000 independent trials with 10000 gradient steps. Left: We plot the median and the 25% and 75% percentiles for 1000 independent trials of optimizing the domain randomization cost with varying minibatch size. To approximate the DR-cost we employ the sample average estimator, $J_{DR}(K) \approx \frac{1}{n_\text{dr}}\sum_{i=1}^{n_\text{dr}} J(K,\theta_i)$ with a large amount of samples ($n_{dr} = 10^5$) used to visualize the descent dynamics of the DR-cost. Center: We provide a zoomed in figure for the descent dynamics of the SA synthesized controller of FLMP25B and the DR synthesized controller by \ref{['alg:dr-mba']} in this work with $M=8$ for both optimization procedures, as well as the 25% and 75% percentiles of the cost trajectories. Right: We plot the empirical distribution (on a logarithmic scale) of the $\ell_2$ norm of the final obtained controller to illustrate the reduced variance in the controller synthesized by DR versus SA.

Theorems & Definitions (35)

Lemma 1: Policy-Smoothness
proof
Proposition 1: Theorem 1 of Hu et al. HZLMFB23
Lemma 2: Approximate Gradient-Domination
proof
Proposition 2
Theorem 1: Linear Convergence
proof
Proposition 3: DR Feasible Step Size
Lemma 3: Cost Decomposition, FGKM18 Thm. 31
...and 25 more

On Globally Optimal Stochastic Policy Gradient Methods for Domain Randomized LQR Synthesis

Abstract

On Globally Optimal Stochastic Policy Gradient Methods for Domain Randomized LQR Synthesis

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (35)