On Data-Driven Unbiased Predictors using the Koopman Operator
Roland Schurig, Pieter van Goor, Karl Worthmann, Rolf Findeisen
TL;DR
The paper tackles bias in data-driven Koopman predictors when finite data fail to identify true eigenfunctions, by performing a moment-based analysis of the multi-step residual in the lifted space. It derives a closed-form expression for the one-step residual and its variance, and establishes unbiasedness conditions over multiple steps, formulating a set of constraints E_{N-1} for the predictor matrix $A$. By decomposing $A$ into fixed and freedom components and optimizing a variance objective via a convex least-squares problem, the authors produce an unbiased, variance-reduced predictor that can be computed efficiently from finite data. A Duffing oscillator example demonstrates zero-mean multi-step residuals with variance comparable to EDMD-based methods, validating the approach and its potential for uncertainty-aware long-horizon Koopman forecasting and control.
Abstract
The Koopman operator and its data-driven approximations, such as extended dynamic mode decomposition (EDMD), are widely used for analysing, modelling, and controlling nonlinear dynamical systems. However, when the true Koopman eigenfunctions cannot be identified from finite data, multi-step predictions may suffer from structural inaccuracies and systematic bias. To address this issue, we analyse the first and second moments of the multi-step prediction residual. By decomposing the residual into contributions from the one-step approximation error and the propagation of accumulated inaccuracies, we derive a closed-form expression characterising these effects. This analysis enables the development of a novel and computationally efficient algorithm that enforces unbiasedness and reduces variance in the resulting predictor. The proposed method is validated in numerical simulations, showing improved uncertainty properties compared to standard EDMD. These results lay the foundation for uncertainty-aware and unbiased Koopman-based prediction frameworks that can be extended to controlled and stochastic systems.
