Learning Decentralized Linear Quadratic Regulators with $\sqrt{T}$ Regret

Lintao Ye; Ming Chi; Ruiquan Liao; Vijay Gupta

Learning Decentralized Linear Quadratic Regulators with $\sqrt{T}$ Regret

Lintao Ye, Ming Chi, Ruiquan Liao, Vijay Gupta

TL;DR

This work tackles online design of decentralized LQR with unknown system dynamics under information constraints modeled by a directed graph and delay. It introduces a disturbance-feedback controller (DFC) formulation and combines system identification via regularized LS with an online convex optimization (OCO) method that accommodates memory and delayed feedback, yielding a near-optimal $\tilde{O}(\sqrt{T})$ regret relative to the best decentralized policy. The key contributions include a provable regret bound under a partially nested information pattern, a fully decentralized control policy design respecting information constraints, and extensions to general information structures and stabilizable systems. The approach balances learning the system while controlling it, using Gaussian inputs to facilitate sharp estimation guarantees and a finite-memory DFC class to keep the policy implementable. Numerical results validate the theory and demonstrate favorable performance compared with offline certainty-equivalence baselines across different information patterns.

Abstract

We propose an online learning algorithm that adaptively designs a decentralized linear quadratic regulator when the system model is unknown a priori and new data samples from a single system trajectory become progressively available. The algorithm uses a disturbance-feedback representation of state-feedback controllers coupled with online convex optimization with memory and delayed feedback. Under the assumption that the system is stable or given a known stabilizing controller, we show that our controller enjoys an expected regret that scales as $\sqrt{T}$ with the time horizon $T$ for the case of partially nested information pattern. For more general information patterns, the optimal controller is unknown even if the system model is known. In this case, the regret of our controller is shown with respect to a linear sub-optimal controller. We validate our theoretical findings using numerical experiments.

Learning Decentralized Linear Quadratic Regulators with $\sqrt{T}$ Regret

TL;DR

regret relative to the best decentralized policy. The key contributions include a provable regret bound under a partially nested information pattern, a fully decentralized control policy design respecting information constraints, and extensions to general information structures and stabilizable systems. The approach balances learning the system while controlling it, using Gaussian inputs to facilitate sharp estimation guarantees and a finite-memory DFC class to keep the policy implementable. Numerical results validate the theory and demonstrate favorable performance compared with offline certainty-equivalence baselines across different information patterns.

Abstract

with the time horizon

for the case of partially nested information pattern. For more general information patterns, the optimal controller is unknown even if the system model is known. In this case, the regret of our controller is shown with respect to a linear sub-optimal controller. We validate our theoretical findings using numerical experiments.

Paper Structure (50 sections, 29 theorems, 228 equations, 3 figures, 1 table, 3 algorithms)

This paper contains 50 sections, 29 theorems, 228 equations, 3 figures, 1 table, 3 algorithms.

Introduction
Problem Formulation and Preliminary Results
Decentralized LQR with Sparsity and Delay Constraints
Problem Considered in this Paper and Our Result
Summary of Main Symbols
Disturbance-Feedback Controller
Algorithm Design
Phase I: System Identification
Phase II: Decentralized Online Control
OCO with Memory and Delayed Feedback
Decentralized Control Policy Design
Results for Algorithm \ref{['algorithm:control design']}
Regret Analysis: Proof of Theorem \ref{['thm:regret upper bound']}
Properties under a Good Probabilistic Events
Upper Bounds on $R_0$, $R_1$, $R_4$ and $R_5$
...and 35 more sections

Key Result

Lemma 1

lamperski2015optimal Suppose Assumptions ass:info structure-ass:pairs hold. Consider the problem given in eqn:dis LQR obj, and let $\mathcal{P}(\mathcal{U},\mathcal{H})$ be the associated information graph. Suppose Assumption ass:pairs holds. For all $r\in\mathcal{U}$, define matrices $P_r$ and $K_r where for each $r\in\mathcal{U}$, $s\in\mathcal{U}$ is the unique node such that $r\rightarrow s$.

Figures (3)

Figure 1: The directed graph of Example \ref{['exp:running example']}. Node $i\in\mathcal{V}$ represents a subsystem with state $x_i(t)$, a solid edge $(i,j)\in\mathcal{A}$ is labeled with the information propagation delay from $i$ to $j$, and the dotted edges represent the coupling of the dynamics among the nodes in $\mathcal{V}$.
Figure 2: The information graph of Example \ref{['exp:running example']}. Each node in the information graph is a subset of the nodes in the directed graph given in Fig. \ref{['fig:directed graph']}.
Figure 3: Results for Regret when Algorithm \ref{['algorithm:control design']} is applied to Example \ref{['exp:running example']}.

Theorems & Definitions (44)

Example 1
Lemma 1
Lemma 2
Definition 1
Lemma 3
proof
Lemma 4
Lemma 5
Proposition 1
Definition 2
...and 34 more

Learning Decentralized Linear Quadratic Regulators with $\sqrt{T}$ Regret

TL;DR

Abstract

Learning Decentralized Linear Quadratic Regulators with $\sqrt{T}$ Regret

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (44)