Robustness and Consistency in Linear Quadratic Control with Untrusted Predictions
Tongxin Li, Ruixiao Yang, Guannan Qu, Guanya Shi, Chenkai Yu, Adam Wierman, Steven H. Low
TL;DR
This work addresses learning-augmented linear quadratic control with untrusted predictions by formalizing a robustness-consistency trade-off via competitive ratio bounds. It introduces the $\lambda$-confident control that blends a fully prediction-trusting policy with a prediction-agnostic policy, and derives a bound on the competitive ratio that depends on the trust parameter and prediction error. To avoid exogenously fixing the trust level, the authors propose a self-tuning policy that adaptively updates $\lambda_t$ online, achieving regret $O((\mu_{\mathsf{VAR}}(\mathbf{w})+\mu_{\mathsf{VAR}}(\mathbf{\hat w}))\log T)$ and a competitive ratio of the form $CR(\varepsilon) \le 1 + 2||H|| \varepsilon/(OPT + C\varepsilon) + O(((\mu_{\mathsf{VAR}}(\mathbf{w})+\mu_{\mathsf{VAR}}(\mathbf{\hat w}))\log T)/OPT)$. The approach converges under mild variation assumptions and is validated on robotics tracking, EV charging, and Cart-Pole tasks, showing near-optimal performance across a spectrum of prediction quality. Overall, the work provides a practical framework for safely leveraging black-box AI predictions in control systems with provable guarantees and broad applicability.
Abstract
We study the problem of learning-augmented predictive linear quadratic control. Our goal is to design a controller that balances \textit{"consistency"}, which measures the competitive ratio when predictions are accurate, and \textit{"robustness"}, which bounds the competitive ratio when predictions are inaccurate. We propose a novel $λ$-confident policy and provide a competitive ratio upper bound that depends on a trust parameter $λ\in [0,1]$ set based on the confidence in the predictions and some prediction error $\varepsilon$. Motivated by online learning methods, we design a self-tuning policy that adaptively learns the trust parameter $λ$ with a competitive ratio that depends on $\varepsilon$ and the variation of system perturbations and predictions. We show that its competitive ratio is bounded from above by $ 1+{O(\varepsilon)}/({Θ(1)+Θ(\varepsilon)})+O(μ_{\mathsf{Var}})$ where $μ_\mathsf{Var}$ measures the variation of perturbations and predictions. It implies that when the variations of perturbations and predictions are small, by automatically adjusting the trust parameter online, the self-tuning scheme ensures a competitive ratio that does not scale up with the prediction error $\varepsilon$.
