Can Transformers Learn Optimal Filtering for Unknown Systems?
Haldun Balim, Zhe Du, Samet Oymak, Necmiye Ozay
TL;DR
This work investigates using transformer-based models to predict outputs of unknown dynamical systems by meta-learning over a collection of source systems drawn from a common distribution. The proposed meta-output-predictor (MOP) trains on past outputs to predict the next output, enabling rapid adaptation to unseen dynamics and even matching Kalman-filter optimality for linear systems, while showing promise in non-ideal noise and nonlinear settings such as planar quadrotors. A key theoretical contribution is a generalization bound: the excess risk decays as $\mathcal{O}(1/\sqrt{MT})$ under stability and robustness assumptions, with explicit dependence on the covering number of the transformer class and noise bounds. The work also highlights limitations in slow-mixing systems and under distribution shifts, motivating future research into robustness and safe deployment, including extensions to closed-loop control scenarios.
Abstract
Transformer models have shown great success in natural language processing; however, their potential remains mostly unexplored for dynamical systems. In this work, we investigate the optimal output estimation problem using transformers, which generate output predictions using all the past ones. Particularly, we train the transformer using various distinct systems and then evaluate the performance on unseen systems with unknown dynamics. Empirically, the trained transformer adapts exceedingly well to different unseen systems and even matches the optimal performance given by the Kalman filter for linear systems. In more complex settings with non-i.i.d. noise, time-varying dynamics, and nonlinear dynamics like a quadrotor system with unknown parameters, transformers also demonstrate promising results. To support our experimental findings, we provide statistical guarantees that quantify the amount of training data required for the transformer to achieve a desired excess risk. Finally, we point out some limitations by identifying two classes of problems that lead to degraded performance, highlighting the need for caution when using transformers for control and estimation.
