Table of Contents
Fetching ...

Enhanced Transformer architecture for in-context learning of dynamical systems

Matteo Rufolo, Dario Piga, Gabriele Maroni, Marco Forgione

TL;DR

This work enhances in-context learning for dynamical systems by embedding a probabilistic learning objective into a Transformer-based meta-model, enabling predictive uncertainty via a Gaussian output. It introduces non-contiguous context handling and arbitrary initial-condition inputs, plus a recurrent patching scheme to efficiently process long sequences. Empirical results on Wiener–Hammerstein systems show that longer context windows yield RMSEs approaching the noise floor $\approx$ $0.1$ and that the model can quantify uncertainty through $\mu$ and $\sigma$ outputs, even under distribution shifts with successful fine-tuning. The approach promises practical benefits for meta-state estimation and synthetic-data-based system identification, with ongoing work to scale and further reduce computation.

Abstract

Recently introduced by some of the authors, the in-context identification paradigm aims at estimating, offline and based on synthetic data, a meta-model that describes the behavior of a whole class of systems. Once trained, this meta-model is fed with an observed input/output sequence (context) generated by a real system to predict its behavior in a zero-shot learning fashion. In this paper, we enhance the original meta-modeling framework through three key innovations: by formulating the learning task within a probabilistic framework; by managing non-contiguous context and query windows; and by adopting recurrent patching to effectively handle long context sequences. The efficacy of these modifications is demonstrated through a numerical example focusing on the Wiener-Hammerstein system class, highlighting the model's enhanced performance and scalability.

Enhanced Transformer architecture for in-context learning of dynamical systems

TL;DR

This work enhances in-context learning for dynamical systems by embedding a probabilistic learning objective into a Transformer-based meta-model, enabling predictive uncertainty via a Gaussian output. It introduces non-contiguous context handling and arbitrary initial-condition inputs, plus a recurrent patching scheme to efficiently process long sequences. Empirical results on Wiener–Hammerstein systems show that longer context windows yield RMSEs approaching the noise floor and that the model can quantify uncertainty through and outputs, even under distribution shifts with successful fine-tuning. The approach promises practical benefits for meta-state estimation and synthetic-data-based system identification, with ongoing work to scale and further reduce computation.

Abstract

Recently introduced by some of the authors, the in-context identification paradigm aims at estimating, offline and based on synthetic data, a meta-model that describes the behavior of a whole class of systems. Once trained, this meta-model is fed with an observed input/output sequence (context) generated by a real system to predict its behavior in a zero-shot learning fashion. In this paper, we enhance the original meta-modeling framework through three key innovations: by formulating the learning task within a probabilistic framework; by managing non-contiguous context and query windows; and by adopting recurrent patching to effectively handle long context sequences. The efficacy of these modifications is demonstrated through a numerical example focusing on the Wiener-Hammerstein system class, highlighting the model's enhanced performance and scalability.
Paper Structure (12 sections, 8 equations, 5 figures, 1 table)

This paper contains 12 sections, 8 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Final Transformer architecture to handle probabilistic prediction (top right: decoder's output is a sequence of mean and standard deviation values); non-contiguous context and query (bottom right: initial input/output values of the query fed to the decoder); long context sequences (bottom left: context sequence split into patches and processed by an RNN).
  • Figure 2: Visual representation of the implemented patching approach. For the sake of visualization, a single-input ($n_u=1$) single-output ($n_y=1$) system is considered. Therefore, the context consists of an input (blu) and output (red) sequence.
  • Figure 3: Root mean square error in validation vs. training iteration number for different context lengths $m$.
  • Figure 4: Multi-step-ahead simulation with white noise input of the Transformer trained with $m=16k$. Results on 256 randomly sampled systems superposed (left) and on a particular system (right). Actual output $y$ (black), simulated output mean $\mu$ (blue), and simulation error $y - \mu$ (red). The shaded area (light blue) is made by $\pm 3$ standard deviations provided by the meta-model.
  • Figure 5: Multi-step-ahead simulation. Meta-model trained with white noise input and tested with PRBS input (top row). Meta-model fine-tuned on PRBS input signals (bottom row). Actual output $\tilde{y}$ (black), simulated output mean $\mu$ (blue), and simulation error $\tilde{y} - \mu$ (red). The shaded area (light blue) is made by $\pm$3 standard deviation provided by the meta-model.

Theorems & Definitions (1)

  • Remark 1