Table of Contents
Fetching ...

Duality Theory for Non-Markovian Linear Gaussian Models

Aditya Kudre, Heng-Sheng Chang, Prashant G. Mehta

Abstract

This work develops a duality theory for partially observed linear Gaussian models in discrete time. The state process evolves according to a causal but non-Markovian (or higher-order Gauss-Markov) structure, captured by a lower-triangular transition operator, which is related to transformer, with $T$ as the context length. The main contributions are: (i) a dual control system for the linear Gaussian model, formulated as a backward difference equation (B $Δ$ E); (ii) a duality principle establishing that a specific linear-quadratic optimal control problem for the B $Δ$ E is dual to the filtering problem for the partially observed model; and (iii) an explicit optimal control formula yielding a novel (transformer-like) linear predictor, referred to as the dual filter, whose computational complexity scales linearly in the time horizon $T$, in contrast to the $O(T^3)$ cost of classical smoothing and Wiener-Hopf approaches.

Duality Theory for Non-Markovian Linear Gaussian Models

Abstract

This work develops a duality theory for partially observed linear Gaussian models in discrete time. The state process evolves according to a causal but non-Markovian (or higher-order Gauss-Markov) structure, captured by a lower-triangular transition operator, which is related to transformer, with as the context length. The main contributions are: (i) a dual control system for the linear Gaussian model, formulated as a backward difference equation (B E); (ii) a duality principle establishing that a specific linear-quadratic optimal control problem for the B E is dual to the filtering problem for the partially observed model; and (iii) an explicit optimal control formula yielding a novel (transformer-like) linear predictor, referred to as the dual filter, whose computational complexity scales linearly in the time horizon , in contrast to the cost of classical smoothing and Wiener-Hopf approaches.

Paper Structure

This paper contains 23 sections, 2 theorems, 41 equations, 4 figures, 1 algorithm.

Key Result

Theorem 1

Consider an estimator where $y$ is the solution of the dual control system eq:dual_BDE for a control input $u\in\mathcal{U}$ with terminal condition $y_T = f\in\mathbb{R}^\mathsf{d}$. Then $\blacktriangleleft$$\blacktriangleleft$

Figures (4)

  • Figure A1: Graphical representation of the non-Markovian linear Gaussian model with order $\tau$ and horizon $T$. The state process $X$ is represented by circles and the observation process $Z$ is represented by squares. The arrows represent the causal dependencies between the processes, including transition matrices $\mathrm{A}_{t,s}$ and $\mathrm{C}_t$. The dashed arrows indicate the prediction task of estimating $Z_T$ given the past observations $Z_{0:T-1}$.
  • Figure D1: Structural correspondence between the dual filter iteration and a self-attention layer in a decoder-only transformer. (left) The proposed iteration propagates the momentum sequence from $p^{(l)}_{\mathcal{T}_-}$ to $p^{(l+1)}_{\mathcal{T}_-}$. (right) A transformer layer maps input embeddings $\rho^{(l)}_{\mathcal{T}_-}$ to updated ones $\rho^{(l+1)}_{\mathcal{T}_-}$.
  • Figure E1: Numerical comparison of the proposed dual filter with classical estimation methods across three dynamical systems. The columns correspond to the tracking, oscillating, and cumulative fractional dynamics, respectively. The first row shows the controls $u_t^{(T)}$ for $T = 16$, $40$, and $64$, obtained from batch smoothing, the causal Wiener-Hopf filter, and the dual filter, which overlap exactly. The second row presents one realization of the predictions $\hat{Z}_{T|T-1}$ alongside the noisy observations $Z_T$, while the third row shows the mean squared error $\text{MSE}_T$ over a batch containing $100$ trajectories; in both cases, the results are indistinguishable across methods. Results are color-coded as follows: observations (black), growing-state Kalman filter (blue), batch smoothing (green), causal Wiener-Hopf filter (orange), and dual filter (red).
  • Figure F1: Computational complexity comparison. FLOPs as a function of horizon $T$ for the growing-state Kalman filter (blue), batch smoothing (green), causal Wiener-Hopf filter (orange), and the proposed Dual Filter (red). Classical methods exhibit $O\left(T^3\right)$ complexity, whereas the dual filter achieves linear complexity $O\left(T\right)$ for fixed-order settings ($\tau=2$, light red) and quadratic complexity $O\left(T^2\right)$ for full-order settings ($\tau=T$, dark red). Solid lines represent empirical measurements; dashed lines indicate theoretical trends. $^*$Unlike the recursive Kalman filter, the dual filter is a sequential batch-processing algorithm that operates on the full observation sequence, a common dependency in batch methods.

Theorems & Definitions (7)

  • Theorem 1: Duality principle
  • proof
  • Remark 1: Interpretation of the duality principle
  • Theorem 2: Optimal Control
  • proof
  • Remark 2: Inversion of $T$-dimensional Matrix
  • Remark 3