Model-Free Learning for the Linear Quadratic Regulator over Rate-Limited Channels

Lintao Ye; Aritra Mitra; Vijay Gupta

Model-Free Learning for the Linear Quadratic Regulator over Rate-Limited Channels

Lintao Ye, Aritra Mitra, Vijay Gupta

TL;DR

This work proposes a new adaptive quantization algorithm titled Adaptively Quantized Gradient Descent (AQGD), which guarantees exponentially fast convergence to the globally optimal policy, with no deterioration of the exponent relative to the unquantized setting, above a certain finite threshold bit-rate allowed by the communication channel.

Abstract

Consider a linear quadratic regulator (LQR) problem being solved in a model-free manner using the policy gradient approach. If the gradient of the quadratic cost is being transmitted across a rate-limited channel, both the convergence and the rate of convergence of the resulting controller may be affected by the bit-rate permitted by the channel. We first pose this problem in a communication-constrained optimization framework and propose a new adaptive quantization algorithm titled Adaptively Quantized Gradient Descent (AQGD). This algorithm guarantees exponentially fast convergence to the globally optimal policy, with no deterioration of the exponent relative to the unquantized setting, above a certain finite threshold bit-rate allowed by the communication channel. We then propose a variant of AQGD that provides similar performance guarantees when applied to solve the model-free LQR problem. Our approach reveals the benefits of adaptive quantization in preserving fast linear convergence rates, and, as such, may be of independent interest to the literature on compressed optimization. Our work also marks a first step towards a more general bridge between the fields of model-free control design and networked control systems.

Model-Free Learning for the Linear Quadratic Regulator over Rate-Limited Channels

TL;DR

Abstract

Paper Structure (15 sections, 20 theorems, 102 equations, 3 figures, 2 algorithms)

This paper contains 15 sections, 20 theorems, 102 equations, 3 figures, 2 algorithms.

Introduction
Problem Formulation and Preliminaries
Adaptively Quantized Gradient Descent (AQGD) for Communication-Constrained Optimization
The AQGD Algorithm
Convergence Analysis and Results for AQGD
Achieving Minimal Bit-Rates using $\epsilon$-net Coverings
AQGD under Local Assumptions and with Noisy Gradients
Application to the Model-Free LQR
Numerical Results
Conclusions and Future Directions
Proof of Theorem \ref{['thm:PL']}
Proofs Omitted in Section \ref{['sec:noisy AQDG']}
Proof of Theorem \ref{['thm:noisy gradient']}
Proof of Theorem \ref{['thm:noisy gradient for LQR']}
Auxiliary Lemmas

Key Result

Theorem 3.3

(Convergence of AQGD) Suppose $f:\mathbb{R}^d \rightarrow \mathbb{R}$ is $L$-smooth and $\mu$-strongly convex. Suppose AQGD (Algorithm algo:AQGD) is run with step-size $\alpha=1/(6L)$ and contraction factor $\gamma=\sqrt{d}/2^b.$ There exists a universal constant $C \geq 1$ such that if the bit-prec then the following is true $\forall t \geq 0$:

Figures (3)

Figure 1: Communication-constrained policy optimization for model-free LQR. At each iteration $t$, the decision-maker sends the current policy $K_t$ to an agent over a noiseless channel of infinite capacity. The agent evaluates and encodes the noisy policy gradient $\widehat{\nabla J(K_t)}$ using $\overline{B}$ bits, and transmits the encoded symbol $\sigma_t$ to the decision-maker over a noiseless rate-limited channel. The decision-maker updates the policy based on the decoded policy gradient $g_t$.
Figure 2: Communication-constrained optimization setup with exact gradients at the worker, where $g_t$ represents the decoded gradient at the server.
Figure 3: The suboptimality gap $J(K_t)-J(K^*)$ versus the iteration $t$ in the AQGD and NAQGD algorithms. In Fig. \ref{['fig:convergence']}(b), the results are averaged over $10$ experiments and the shaded regions represent quantiles.

Theorems & Definitions (40)

Definition 3.1
Definition 3.2
Theorem 3.3
Theorem 3.4
Lemma 3.5
Lemma 3.6
Definition 3.7
Theorem 3.8
proof
Definition 3.9
...and 30 more

Model-Free Learning for the Linear Quadratic Regulator over Rate-Limited Channels

TL;DR

Abstract

Model-Free Learning for the Linear Quadratic Regulator over Rate-Limited Channels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (40)