Data-driven approximation of Koopman operators and generators: Convergence rates and error bounds

Liam Llamazares-Elias; Samir Llamazares-Elias; Jonas Latz; Stefan Klus

Data-driven approximation of Koopman operators and generators: Convergence rates and error bounds

Liam Llamazares-Elias, Samir Llamazares-Elias, Jonas Latz, Stefan Klus

TL;DR

This work develops a data-driven, Monte Carlo Galerkin framework to approximate linear transfer operators and their infinitesimal generators for dynamical systems, unifying EDMD and gEDMD under a common theory. It proves almost-sure convergence of the data-driven operator $\widehat{\mathcal{A}}_{NM}$ to the Galerkin projection $\mathcal{A}_N$ and establishes convergence of eigenvalues and eigenfunctions, with explicit rates and robustness to measurement noise. The analysis extends to joint limits in dictionary size $N$ and data size $M$, without requiring Gram-matrix invertibility, and provides explicit error bounds in terms of Gram and structure matrices. Numerical experiments on deterministic and stochastic systems corroborate the theory, showing predictable convergence rates, spectral recovery, and varying robustness to noise across basis choices, thereby validating a versatile, scalable framework that subsumes EDMD and gEDMD.

Abstract

Global information about dynamical systems can be extracted by analysing associated infinite-dimensional transfer operators, such as Perron-Frobenius and Koopman operators as well as their infinitesimal generators. In practice, these operators typically need to be approximated from data. Popular approximation methods are extended dynamic mode decomposition (EDMD) and generator extended mode decomposition (gEDMD). We propose a unified framework that leverages Monte Carlo sampling to approximate the operator of interest on a finite-dimensional space spanned by a set of basis functions. Our framework contains EDMD and gEDMD as special cases, but can also be used to approximate more general operators. Our key contributions are proofs of the convergence of the approximating operator and its spectrum under non-restrictive conditions. Moreover, we derive explicit convergence rates and account for the presence of noise in the observations. Whilst all these results are broadly applicable, they also refine previous analyses of EDMD and gEDMD. We verify the analytical results with the aid of several numerical experiments.

Data-driven approximation of Koopman operators and generators: Convergence rates and error bounds

TL;DR

to the Galerkin projection

and establishes convergence of eigenvalues and eigenfunctions, with explicit rates and robustness to measurement noise. The analysis extends to joint limits in dictionary size

and data size

, without requiring Gram-matrix invertibility, and provides explicit error bounds in terms of Gram and structure matrices. Numerical experiments on deterministic and stochastic systems corroborate the theory, showing predictable convergence rates, spectral recovery, and varying robustness to noise across basis choices, thereby validating a versatile, scalable framework that subsumes EDMD and gEDMD.

Abstract

Paper Structure (19 sections, 21 theorems, 180 equations, 3 figures, 1 table)

This paper contains 19 sections, 21 theorems, 180 equations, 3 figures, 1 table.

Introduction
A general framework for data-based recovery of dynamics
Notation
Dynamical systems and linear operators
Mathematical framework
Data-driven approximation as a projection
Convergence of the projections
Joint limit in data and dictionary
Accounting for measurement error
Convergence of eigenvalues and eigenfunctions
Numerical experiments
Benchmark problems
Numerical results as the number of data points tends to infinity
Numerical results as the number of dictionary elements tends to infinity
Numerical results with noise
...and 4 more sections

Key Result

Theorem 3.1

Let $\Psi$ satisfy Assumption continuous. Then the matrix $\widehat{\bm{A}}_{NM}$ which approximates $\widehat{\mathcal{A}}$ is, with probability $1$, a matrix representation of the projection of $\widehat{\mathcal{A}}$ onto $\widehat{\mathcal{F}}_{NM}$. That is, where $\widehat{\mathcal{A}}_{NM}$ is the operator that has matrix representation $\widehat{\bm{A}}_{NM}$ on $\widehat{\Psi}$.

Figures (3)

Figure 1: Average normalized error $\varepsilon:=\mathbb{E}\qty[\left\lVert {\widehat{\bm{A}}_{MN}-\bm{A}_N} \right\rVert/\left\lVert \bm{A}_N \right\rVert]$ as a function of the number of data points $M$ for: the Koopman generator of the ODE \ref{['ODE1']} in Figure \ref{['imgODE']}, the Koopman generator and Koopman operator for the double-well potential \ref{['double well']} in Figures \ref{['imgDoubleWell']} and \ref{['imgDoubleWellEDMD']}, and the Koopman generator, the Perron--Frobenius generator and the Koopman operator for the OU process in Figures \ref{['imgOU']}, \ref{['imgOU_PF']}, and \ref{['imgOU_EDMD']}. In all cases, monomials up to order $8$ and the same number of Gaussian observables and FEM basis functions are used. The red and purple lines represent the slopes $-\frac{1}{2}$ and $-1$, respectively. The blue, red and green lines represent the average error over $50$ simulations of the above approximations. The shaded areas represent the 95% confidence intervals for the respective errors.
Figure 2: Average normalized error $\varepsilon:=\mathbb{E}\qty[\left\lVert {\widehat{\bm{A}}_{MN}-\bm{A}_N} \right\rVert/\left\lVert \bm{A}_N \right\rVert]$ and the theoretical error bound in Corollary \ref{['OC theorem']} as a function of the number of observables $N$ for the Koopman generator of the ODE \ref{['ODE1']} in Figure \ref{['imgODEdict']}, the Koopman generator and Koopman operator for the double-well system \ref{['double well']} in Figures \ref{['imgDoubleWelldict']} and \ref{['double_well_EDMDdict']}, and the Koopman generator, the Perron--Frobenius, and Koopman operator for the OU process \ref{['OU']} using up to $1024$ Gaussian functions in Figures \ref{['imgOUdict']}, \ref{['imgOU_PFdict']}, and \ref{['imgOU_EDMDdict']}.
Figure 3: Average normalized error $\varepsilon:=\mathbb{E}\qty[\left\lVert {\widehat{\bm{A}}_{MN}-\bm{A}_N} \right\rVert/\left\lVert \bm{A}_N \right\rVert]$ as a function of the number of data points $M$. In Figures \ref{['ODE_small_noise']}, \ref{['ODE_medium_noise']}, and \ref{['ODE_large_noise']}, we take $\sigma =10^{-3},10^{-2},10^{-1}$, respectively, and approximate the Koopman generator of the ODE. In Figures \ref{['OU_PF_small_noise']}, \ref{['OU_PF_medium_noise']}, \ref{['OU_PF_large_noise']} and in \ref{['OU_small_noise']}, \ref{['OU_medium_noise']}, \ref{['OU_large_noise']} we also take $\sigma =10^{-3},10^{-2},10^{-1}$ and now approximate the Perron--Frobenius operator and Koopman generator of \ref{['OU']}, respectively. In all cases, monomials up to order $8$ and the same number of Gaussian observables and FEM basis functions are used. The red and purple lines represent the slopes $-\frac{1}{2}$ and $-1$, respectively. The blue, red and green lines represent the error averaged over $50$ simulations of the above approximations. The shaded areas represent the 95% confidence intervals for the respective errors.

Theorems & Definitions (47)

Theorem 3.1: Empirical projection
proof
Corollary 3.2: Exact approximation
proof
Example 3.3: Finite element basis of degree $k$
Lemma 3.4
proof
Theorem 3.5: Convergence in data limit
proof
Corollary 3.6
...and 37 more

Data-driven approximation of Koopman operators and generators: Convergence rates and error bounds

TL;DR

Abstract

Data-driven approximation of Koopman operators and generators: Convergence rates and error bounds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (47)