Table of Contents
Fetching ...

Self-Supervised Evolution Operator Learning for High-Dimensional Dynamical Systems

Giacomo Turri, Luigi Bonati, Kai Zhu, Massimiliano Pontil, Pietro Novelli

TL;DR

The paper addresses learning evolution operators for high-dimensional dynamical systems from data, aiming for interpretable spectral decompositions rather than black-box prediction. It introduces an encoder-only, self-supervised contrastive objective that aligns learned representations with the leading spectral components of the operator and demonstrates scalability to complex systems. A key theoretical insight links the objective to the VAMP-2 score in the Hilbert-Schmidt setting, enabling a practical, covariance-based estimation of the operator on learned features. Empirical results across protein folding, ligand binding, and climate data show interpretable slow modes and transferability, with open-source code enabling reproducibility.

Abstract

We introduce an encoder-only approach to learn the evolution operators of large-scale non-linear dynamical systems, such as those describing complex natural phenomena. Evolution operators are particularly well-suited for analyzing systems that exhibit complex spatio-temporal patterns and have become a key analytical tool across various scientific communities. As terabyte-scale weather datasets and simulation tools capable of running millions of molecular dynamics steps per day are becoming commodities, our approach provides an effective tool to make sense of them from a data-driven perspective. The core of it lies in a remarkable connection between self-supervised representation learning methods and the recently established learning theory of evolution operators. To show the usefulness of the proposed method, we test it across multiple scientific domains: explaining the folding dynamics of small proteins, the binding process of drug-like molecules in host sites, and autonomously finding patterns in climate data. Code and data to reproduce the experiments are made available open source.

Self-Supervised Evolution Operator Learning for High-Dimensional Dynamical Systems

TL;DR

The paper addresses learning evolution operators for high-dimensional dynamical systems from data, aiming for interpretable spectral decompositions rather than black-box prediction. It introduces an encoder-only, self-supervised contrastive objective that aligns learned representations with the leading spectral components of the operator and demonstrates scalability to complex systems. A key theoretical insight links the objective to the VAMP-2 score in the Hilbert-Schmidt setting, enabling a practical, covariance-based estimation of the operator on learned features. Empirical results across protein folding, ligand binding, and climate data show interpretable slow modes and transferability, with open-source code enabling reproducibility.

Abstract

We introduce an encoder-only approach to learn the evolution operators of large-scale non-linear dynamical systems, such as those describing complex natural phenomena. Evolution operators are particularly well-suited for analyzing systems that exhibit complex spatio-temporal patterns and have become a key analytical tool across various scientific communities. As terabyte-scale weather datasets and simulation tools capable of running millions of molecular dynamics steps per day are becoming commodities, our approach provides an effective tool to make sense of them from a data-driven perspective. The core of it lies in a remarkable connection between self-supervised representation learning methods and the recently established learning theory of evolution operators. To show the usefulness of the proposed method, we test it across multiple scientific domains: explaining the folding dynamics of small proteins, the binding process of drug-like molecules in host sites, and autonomously finding patterns in climate data. Code and data to reproduce the experiments are made available open source.

Paper Structure

This paper contains 24 sections, 1 theorem, 23 equations, 9 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

The loss function eq:abstract_loss is equivalent to the following operator learning loss:

Figures (9)

  • Figure 1: Forecasting errors and training times for the Lorenz '63 example (averaged over 20 independent runs). RMSE values are scaled by $10^{-2}$.
  • Figure 2: Slow dynamical modes of biomolecular processes. A: Trp-Cage folding. Time series of the leading eigenfunction $\Psi_1$ (red, left axis) alongside RMSD (gray, right axis), capturing transitions between folded (F) and unfolded (U) states. Representative snapshots of each state are shown. In the folded structure, key hydrogen bonds identified as relevant by the LASSO model are highlighted. B: Calixarene binding. Eigenfunctions $\Psi_1$ (left) and $\Psi_2$ (right) capture ligand transitions from unbound (U) to semi-bound (S) and bound (B) states. The model using a representation transferred from other ligands (solid line) closely matches one trained from scratch (dashed). Right: representative structures corresponding to each metastable state.
  • Figure 3: ENSO mode from an encoder trained with Alg. \ref{['alg:main']}. A Mode associated with the 11th eigenfunction, highlighting dominant activation in the tropical Pacific. Boxes indicate standard ENSO monitoring zones. B Right eigenfunctions corresponding to the 11th eigenvalues, compared to the ONI index (black). The vertical line marks the split between training and validation sets.
  • Figure 4: Leading eigenfunctions computed by our and baseline approaches. Each row corresponds to a different method, and each column shows an eigenfunction ordered by decreasing eigenvalue magnitude.
  • Figure 5: The value of the leading eigenfunction $\Psi_1$ of the evolution operator is highly correlated with the RMSD and Radius of Gyration of the Trp-cage protein.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof