Enhanced Transformer architecture for in-context learning of dynamical systems

Matteo Rufolo; Dario Piga; Gabriele Maroni; Marco Forgione

Enhanced Transformer architecture for in-context learning of dynamical systems

Matteo Rufolo, Dario Piga, Gabriele Maroni, Marco Forgione

TL;DR

This work enhances in-context learning for dynamical systems by embedding a probabilistic learning objective into a Transformer-based meta-model, enabling predictive uncertainty via a Gaussian output. It introduces non-contiguous context handling and arbitrary initial-condition inputs, plus a recurrent patching scheme to efficiently process long sequences. Empirical results on Wiener–Hammerstein systems show that longer context windows yield RMSEs approaching the noise floor $\approx$ $0.1$ and that the model can quantify uncertainty through $\mu$ and $\sigma$ outputs, even under distribution shifts with successful fine-tuning. The approach promises practical benefits for meta-state estimation and synthetic-data-based system identification, with ongoing work to scale and further reduce computation.

Abstract

Recently introduced by some of the authors, the in-context identification paradigm aims at estimating, offline and based on synthetic data, a meta-model that describes the behavior of a whole class of systems. Once trained, this meta-model is fed with an observed input/output sequence (context) generated by a real system to predict its behavior in a zero-shot learning fashion. In this paper, we enhance the original meta-modeling framework through three key innovations: by formulating the learning task within a probabilistic framework; by managing non-contiguous context and query windows; and by adopting recurrent patching to effectively handle long context sequences. The efficacy of these modifications is demonstrated through a numerical example focusing on the Wiener-Hammerstein system class, highlighting the model's enhanced performance and scalability.

Enhanced Transformer architecture for in-context learning of dynamical systems

TL;DR

and that the model can quantify uncertainty through

and

outputs, even under distribution shifts with successful fine-tuning. The approach promises practical benefits for meta-state estimation and synthetic-data-based system identification, with ongoing work to scale and further reduce computation.

Abstract

Paper Structure (12 sections, 8 equations, 5 figures, 1 table)

This paper contains 12 sections, 8 equations, 5 figures, 1 table.

Introduction
Problem description
Advancing the Transformer Architecture
Learning Probability Distributions
Handling arbitrary initial conditions
Patching for long context sequences
Numerical Example
System Class and Dataset Distribution
Transformer architecture
Results
Out-of-distribution
Conclusions

Figures (5)

Figure 1: Final Transformer architecture to handle probabilistic prediction (top right: decoder's output is a sequence of mean and standard deviation values); non-contiguous context and query (bottom right: initial input/output values of the query fed to the decoder); long context sequences (bottom left: context sequence split into patches and processed by an RNN).
Figure 2: Visual representation of the implemented patching approach. For the sake of visualization, a single-input ($n_u=1$) single-output ($n_y=1$) system is considered. Therefore, the context consists of an input (blu) and output (red) sequence.
Figure 3: Root mean square error in validation vs. training iteration number for different context lengths $m$.
Figure 4: Multi-step-ahead simulation with white noise input of the Transformer trained with $m=16k$. Results on 256 randomly sampled systems superposed (left) and on a particular system (right). Actual output $y$ (black), simulated output mean $\mu$ (blue), and simulation error $y - \mu$ (red). The shaded area (light blue) is made by $\pm 3$ standard deviations provided by the meta-model.
Figure 5: Multi-step-ahead simulation. Meta-model trained with white noise input and tested with PRBS input (top row). Meta-model fine-tuned on PRBS input signals (bottom row). Actual output $\tilde{y}$ (black), simulated output mean $\mu$ (blue), and simulation error $\tilde{y} - \mu$ (red). The shaded area (light blue) is made by $\pm$3 standard deviation provided by the meta-model.

Theorems & Definitions (1)

Remark 1

Enhanced Transformer architecture for in-context learning of dynamical systems

TL;DR

Abstract

Enhanced Transformer architecture for in-context learning of dynamical systems

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (1)