Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification

Bruce D. Lee; Anders Rantzer; Nikolai Matni

Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification

Bruce D. Lee, Anders Rantzer, Nikolai Matni

TL;DR

The paper investigates adaptive linear quadratic control when the learner relies on a misspecified dynamics representation basis learned from offline data. It introduces a Certainty Equivalent Control algorithm with continual exploration, yielding nonasymptotic regret bounds that combine sublinear terms with a linear-in-$T$ bias term proportional to the representation error. It further shows that, in the zero-misspecification case, logarithmic regret is achievable under sufficient excitation, and demonstrates the practical value of pretraining via simulations and offline learning of representations. The results highlight when pretraining helps in rapid adaptation and quantify the trade-off between misspecification and data efficiency for adaptive control. Overall, the work connects transfer-learning ideas to adaptive control and provides concrete guidance on when pretraining is beneficial in nonasymptotic settings.

Abstract

The strategy of pre-training a large model on a diverse dataset, then fine-tuning for a particular application has yielded impressive results in computer vision, natural language processing, and robotic control. This strategy has vast potential in adaptive control, where it is necessary to rapidly adapt to changing conditions with limited data. Toward concretely understanding the benefit of pre-training for adaptive control, we study the adaptive linear quadratic control problem in the setting where the learner has prior knowledge of a collection of basis matrices for the dynamics. This basis is misspecified in the sense that it cannot perfectly represent the dynamics of the underlying data generating process. We propose an algorithm that uses this prior knowledge, and prove upper bounds on the expected regret after $T$ interactions with the system. In the regime where $T$ is small, the upper bounds are dominated by a term that scales with either $\texttt{poly}(\log T)$ or $\sqrt{T}$, depending on the prior knowledge available to the learner. When $T$ is large, the regret is dominated by a term that grows with $δT$, where $δ$ quantifies the level of misspecification. This linear term arises due to the inability to perfectly estimate the underlying dynamics using the misspecified basis, and is therefore unavoidable unless the basis matrices are also adapted online. However, it only dominates for large $T$, after the sublinear terms arising due to the error in estimating the weights for the basis matrices become negligible. We provide simulations that validate our analysis. Our simulations also show that offline data from a collection of related systems can be used as part of a pre-training stage to estimate a misspecified dynamics basis, which is in turn used by our adaptive controller.

Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification

TL;DR

bias term proportional to the representation error. It further shows that, in the zero-misspecification case, logarithmic regret is achievable under sufficient excitation, and demonstrates the practical value of pretraining via simulations and offline learning of representations. The results highlight when pretraining helps in rapid adaptation and quantify the trade-off between misspecification and data efficiency for adaptive control. Overall, the work connects transfer-learning ideas to adaptive control and provides concrete guidance on when pretraining is beneficial in nonasymptotic settings.

Abstract

interactions with the system. In the regime where

is small, the upper bounds are dominated by a term that scales with either

, depending on the prior knowledge available to the learner. When

is large, the regret is dominated by a term that grows with

, where

quantifies the level of misspecification. This linear term arises due to the inability to perfectly estimate the underlying dynamics using the misspecified basis, and is therefore unavoidable unless the basis matrices are also adapted online. However, it only dominates for large

, after the sublinear terms arising due to the error in estimating the weights for the basis matrices become negligible. We provide simulations that validate our analysis. Our simulations also show that offline data from a collection of related systems can be used as part of a pre-training stage to estimate a misspecified dynamics basis, which is in turn used by our adaptive controller.

Paper Structure (47 sections, 28 theorems, 177 equations, 3 figures, 2 algorithms)

This paper contains 47 sections, 28 theorems, 177 equations, 3 figures, 2 algorithms.

Introduction
Related Work
Adaptive Control
Nonasymptotic Adaptive LQR
Multi-task Representation Learning
Contributions
Notation
Problem Formulation
System model
Learning objective
Algorithm description
Regret Bounds
Certainty equivalent control with continual exploration
Certainty equivalent control without additional exploration
Proof sketch
...and 32 more sections

Key Result

lemma 1

Define $\varepsilon \triangleq \frac{1}{2916 \norm{P^\star}^{10}}$. If $\norm{\bmat{\hat{A} \hat{B}}-\bmat{A^\star B^\star}}_F^2 \leq \varepsilon,$ then $\calJ(\hat{K}) - \calJ(K^\star) \leq 142 \norm{P^\star}^8 \norm{\bmat{\hat{A} \hat{B}}-\bmat{A^\star B^\star}}_F^2$.

Figures (3)

Figure 1: Regret of \ref{['alg: ce with exploration']} with various choices for $\hat{\Phi}$.
Figure 2: We plot the regret of \ref{['alg: ce with exploration']} with $\hat{\Phi}$ describing a lumped parameter model (right), or a lumped parameter model and extended such that the condition \ref{['asmpt: persisent excitation']} is violated (left). In both settings the representations are perturbed, resulting in a misspecification between the true representation $\Phi^\star$ and the representation estimate $\hat{\Phi}$. The regret is compared to that incurred by running \ref{['alg: ce with exploration']} with a fully unknown $A^\star$ and $B^\star$.
Figure 3: Regret of Alg. \ref{['alg: ce with exploration']} with $\hat{\Phi}$ learned offline from related systems.

Theorems & Definitions (54)

definition 1: stewart1990matrix
lemma 1: Theorem 3 of simchowitz2020naive
theorem 1
theorem 2
definition 2
theorem 3: least squares estimation error
lemma 2: Epoch-wise covariance bounds
proof
lemma 3
proof
...and 44 more

Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification

TL;DR

Abstract

Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (54)