Enhanced Transformer architecture for in-context learning of dynamical systems
Matteo Rufolo, Dario Piga, Gabriele Maroni, Marco Forgione
TL;DR
This work enhances in-context learning for dynamical systems by embedding a probabilistic learning objective into a Transformer-based meta-model, enabling predictive uncertainty via a Gaussian output. It introduces non-contiguous context handling and arbitrary initial-condition inputs, plus a recurrent patching scheme to efficiently process long sequences. Empirical results on Wiener–Hammerstein systems show that longer context windows yield RMSEs approaching the noise floor $\approx$ $0.1$ and that the model can quantify uncertainty through $\mu$ and $\sigma$ outputs, even under distribution shifts with successful fine-tuning. The approach promises practical benefits for meta-state estimation and synthetic-data-based system identification, with ongoing work to scale and further reduce computation.
Abstract
Recently introduced by some of the authors, the in-context identification paradigm aims at estimating, offline and based on synthetic data, a meta-model that describes the behavior of a whole class of systems. Once trained, this meta-model is fed with an observed input/output sequence (context) generated by a real system to predict its behavior in a zero-shot learning fashion. In this paper, we enhance the original meta-modeling framework through three key innovations: by formulating the learning task within a probabilistic framework; by managing non-contiguous context and query windows; and by adopting recurrent patching to effectively handle long context sequences. The efficacy of these modifications is demonstrated through a numerical example focusing on the Wiener-Hammerstein system class, highlighting the model's enhanced performance and scalability.
