Regret Analysis of Multi-task Representation Learning for Linear-Quadratic Adaptive Control

Bruce D. Lee; Leonardo F. Toso; Thomas T. Zhang; James Anderson; Nikolai Matni

Regret Analysis of Multi-task Representation Learning for Linear-Quadratic Adaptive Control

Bruce D. Lee, Leonardo F. Toso, Thomas T. Zhang, James Anderson, Nikolai Matni

TL;DR

The paper studies online fleet learning for multiple linear systems sharing a common representation within adaptive linear-quadratic control. It introduces a multi-task Certainty-Equivalent control framework augmented with continual exploration and a De-bias & Feature Whiten (DFW) representation-update, enabling distributed learning of a shared basis $\Phi_*$ and task-specific weights $\theta_*^{(h)}$. The authors derive non-asymptotic regret bounds under two regimes: benign exploration where regret scales as $\tilde O\left(\sqrt{T/H}\right)$, and difficult exploration where regret scales as $\tilde O\left(\sqrt{d_u d_\theta }\sqrt{T} + T^{3/4}/H^{1/5}\right)$, highlighting the benefit of larger agent counts and potential reduction in effective task parameters when sharing representations. Numerical experiments on a multi-task cartpole setup validate the theoretical trends and demonstrate substantial regret reductions relative to fully centralized single-task learning. The work advances regret guarantees for online multi-task control with misspecified representations and provides a framework integrating representation learning with online adaptive control.

Abstract

Representation learning is a powerful tool that enables learning over large multitudes of agents or domains by enforcing that all agents operate on a shared set of learned features. However, many robotics or controls applications that would benefit from collaboration operate in settings with changing environments and goals, whereas most guarantees for representation learning are stated for static settings. Toward rigorously establishing the benefit of representation learning in dynamic settings, we analyze the regret of multi-task representation learning for linear-quadratic control. This setting introduces unique challenges. Firstly, we must account for and balance the $\textit{misspecification}$ introduced by an approximate representation. Secondly, we cannot rely on the parameter update schemes of single-task online LQR, for which least-squares often suffices, and must devise a novel scheme to ensure sufficient improvement. We demonstrate that for settings where exploration is "benign", the regret of any agent after $T$ timesteps scales as $\tilde O(\sqrt{T/H})$, where $H$ is the number of agents. In settings with "difficult" exploration, the regret scales as $\tilde O(\sqrt{d_u d_θ} \sqrt{T} + T^{3/4}/H^{1/5})$, where $d_x$ is the state-space dimension, $d_u$ is the input dimension, and $d_θ$ is the task-specific parameter count. In both cases, by comparing to the minimax single-task regret $O(\sqrt{d_x d_u^2}\sqrt{T})$, we see a benefit of a large number of agents. Notably, in the difficult exploration case, by sharing a representation across tasks, the effective task-specific parameter count can often be small $d_θ< d_x d_u$. Lastly, we provide numerical validation of the trends we predict.

Regret Analysis of Multi-task Representation Learning for Linear-Quadratic Adaptive Control

TL;DR

and task-specific weights

. The authors derive non-asymptotic regret bounds under two regimes: benign exploration where regret scales as

, and difficult exploration where regret scales as

, highlighting the benefit of larger agent counts and potential reduction in effective task parameters when sharing representations. Numerical experiments on a multi-task cartpole setup validate the theoretical trends and demonstrate substantial regret reductions relative to fully centralized single-task learning. The work advances regret guarantees for online multi-task control with misspecified representations and provides a framework integrating representation learning with online adaptive control.

Abstract

introduced by an approximate representation. Secondly, we cannot rely on the parameter update schemes of single-task online LQR, for which least-squares often suffices, and must devise a novel scheme to ensure sufficient improvement. We demonstrate that for settings where exploration is "benign", the regret of any agent after

timesteps scales as

, where

is the number of agents. In settings with "difficult" exploration, the regret scales as

, where

is the state-space dimension,

is the input dimension, and

is the task-specific parameter count. In both cases, by comparing to the minimax single-task regret

, we see a benefit of a large number of agents. Notably, in the difficult exploration case, by sharing a representation across tasks, the effective task-specific parameter count can often be small

. Lastly, we provide numerical validation of the trends we predict.

Paper Structure (24 sections, 23 theorems, 98 equations, 1 figure, 3 algorithms)

This paper contains 24 sections, 23 theorems, 98 equations, 1 figure, 3 algorithms.

Introduction
Related Work
Contribution
Problem Formulation
System and Data assumptions
Control Objective
Algorithm Description
Representation Error Guarantees
Regret analysis
Not Easily Identifiable
Easily identifiable
Numerical Validation
Conclusion
Outline for Proofs of \ref{['thm: regret bound naive exploration']} and \ref{['thm: regret bound no exploration']}
Technical Preliminaries
...and 9 more sections

Key Result

Lemma II.1

Define $\varepsilon^{(h)} \triangleq \tfrac{\|P_\star^{(h)}\|^{-10}}{3000}$. If $\left\| - \right\|_F^2 \leq \varepsilon^{(h)},$ then

Figures (1)

Figure 1: Regret of Algorithm \ref{['alg: ce with exploration']} with varying number of tasks $H$. We consider $k_{\mathsf{fin}} = 10$ epochs with initial epoch length $\tau_1 = 30$, an exploratory sequence scaling as $\sigma_k^2 \propto \frac{1}{\sqrt{2^k}}$, state and controller bounds $x_b = 25$, and $K_b = 15$, and random $\Phi_0$ with $d(\Phi_0, \Phi_\star) \approx 0.99$.

Theorems & Definitions (31)

Lemma II.1: Theorem 3 of simchowitz2020naive
Definition II.1: stewart1990matrix
Theorem II.1: DFW guarantee, redux
Theorem II.2
Theorem III.1
Theorem III.2
Theorem VII.1: Misspecified LS Est. Error - Formal Version of \ref{['thm: ls error informal']}, Theorem 5 of lee2023nonasymptotic
Lemma VII.1: Adapted from lee2023nonasymptotic
Lemma VII.2
Proposition VII.1
...and 21 more

Regret Analysis of Multi-task Representation Learning for Linear-Quadratic Adaptive Control

TL;DR

Abstract

Regret Analysis of Multi-task Representation Learning for Linear-Quadratic Adaptive Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (31)