Table of Contents
Fetching ...

Transformers as Implicit State Estimators: In-Context Learning in Dynamical Systems

Usman Akram, Haris Vikalo

TL;DR

This work demonstrates that frozen transformers, trained through in-context learning on synthetic dynamical trajectories, can perform latent-state estimation for both linear-Gaussian and nonlinear dynamical systems without test-time gradient updates. The authors construct Kalman-filter-like operations using transformer primitives via a RAW-like framework, and show that the transformer’s predictions converge toward Kalman-filter behavior as context length and model scale increase. In nonlinear settings, the Transformer attains accuracy comparable to EKF and PF, and in some cases surpasses them, illustrating robust, data-driven inference. The findings imply that transformer-based in-context learning can serve as a flexible, non-parametric approach to output prediction and latent-state estimation in dynamical systems, with robustness to missing model information and potential applicability to control tasks.

Abstract

Predicting the behavior of a dynamical system from noisy observations of its past outputs is a classical problem encountered across engineering and science. For linear systems with Gaussian inputs, the Kalman filter -- the best linear minimum mean-square error estimator of the state trajectory -- is optimal in the Bayesian sense. For nonlinear systems, Bayesian filtering is typically approached using suboptimal heuristics such as the Extended Kalman Filter (EKF), or numerical methods such as particle filtering (PF). In this work, we show that transformers, employed in an in-context learning (ICL) setting, can implicitly infer hidden states in order to predict the outputs of a wide family of dynamical systems, without test-time gradient updates or explicit knowledge of the system model. Specifically, when provided with a short context of past input-output pairs and, optionally, system parameters, a frozen transformer accurately predicts the current output. In linear-Gaussian regimes, its predictions closely match those of the Kalman filter; in nonlinear regimes, its performance approaches that of EKF and PF. Moreover, prediction accuracy degrades gracefully when key parameters, such as the state-transition matrix, are withheld from the context, demonstrating robustness and implicit parameter inference. These findings suggest that transformer in-context learning provides a flexible, non-parametric alternative for output prediction in dynamical systems, grounded in implicit latent-state estimation.

Transformers as Implicit State Estimators: In-Context Learning in Dynamical Systems

TL;DR

This work demonstrates that frozen transformers, trained through in-context learning on synthetic dynamical trajectories, can perform latent-state estimation for both linear-Gaussian and nonlinear dynamical systems without test-time gradient updates. The authors construct Kalman-filter-like operations using transformer primitives via a RAW-like framework, and show that the transformer’s predictions converge toward Kalman-filter behavior as context length and model scale increase. In nonlinear settings, the Transformer attains accuracy comparable to EKF and PF, and in some cases surpasses them, illustrating robust, data-driven inference. The findings imply that transformer-based in-context learning can serve as a flexible, non-parametric approach to output prediction and latent-state estimation in dynamical systems, with robustness to missing model information and potential applicability to control tasks.

Abstract

Predicting the behavior of a dynamical system from noisy observations of its past outputs is a classical problem encountered across engineering and science. For linear systems with Gaussian inputs, the Kalman filter -- the best linear minimum mean-square error estimator of the state trajectory -- is optimal in the Bayesian sense. For nonlinear systems, Bayesian filtering is typically approached using suboptimal heuristics such as the Extended Kalman Filter (EKF), or numerical methods such as particle filtering (PF). In this work, we show that transformers, employed in an in-context learning (ICL) setting, can implicitly infer hidden states in order to predict the outputs of a wide family of dynamical systems, without test-time gradient updates or explicit knowledge of the system model. Specifically, when provided with a short context of past input-output pairs and, optionally, system parameters, a frozen transformer accurately predicts the current output. In linear-Gaussian regimes, its predictions closely match those of the Kalman filter; in nonlinear regimes, its performance approaches that of EKF and PF. Moreover, prediction accuracy degrades gracefully when key parameters, such as the state-transition matrix, are withheld from the context, demonstrating robustness and implicit parameter inference. These findings suggest that transformer in-context learning provides a flexible, non-parametric alternative for output prediction in dynamical systems, grounded in implicit latent-state estimation.

Paper Structure

This paper contains 35 sections, 66 equations, 12 figures, 2 tables, 1 algorithm.

Figures (12)

  • Figure 1: Performance of the transformer in explicit state estimation from scalar measurements (Strategy 1). The transformer’s predictions are compared to Kalman filtering, SGD, ridge regression, and OLS.
  • Figure 2: Comparison between transformer and classical estimators in one-step output prediction for scalar measurements.
  • Figure 3: Performance of in-context learning with a transformer that is not provided noise covariances $Q$ and $R$. (a) Mean-square error (MSE) under Strategy 1. (b) Mean-squared prediction difference (MSPD) relative to the baselines under Strategy 1. (c) MSE under Strategy 2. (d) MSPD relative to the baselines under Strategy 2.
  • Figure 4: In-context learning (ICL) with a transformer for systems with 2D measurements. The transformer receives the full system specification and performs one-step output prediction.
  • Figure 5: Performance of in-context learning (ICL) with a transformer under fully missing model parameters (scalar measurements). No information about the state transition matrix or noise covariances is included in the context.
  • ...and 7 more figures