Koopman Invariants as Drivers of Emergent Time-Series Clustering in Joint-Embedding Predictive Architectures
Pablo Ruiz-Morales, Dries Vanoost, Davy Pissoort, Mathias Verbeke
TL;DR
The paper investigates why Joint-Embedding Predictive Architectures (JEPAs) often cluster time-series representations by underlying dynamical regimes. It develops a Koopman operator-based theory showing that the JEPA predictive objective promotes learning invariant regime-indicator functions, i.e., eigenfunctions with eigenvalue $1$ of the $\\Delta$-step Koopman operator, within a finite mixture of ergodic regimes. Under an idealized setup with a linear predictor and EMA-target tracking, the loss decomposes into a mean-prediction term and an invariant term, and zero loss is achieved when the encoder spans the invariant subspace $\mathcal{V}$ and the predictor acts as the identity on that subspace. Empirically, on synthetic data with 18 regimes, JEPA learns regime-aligned latent clusters and a linear predictor $M$ that is near-identity on the learned subspace, confirming the theory and highlighting JEPA’s potential for interpretable regime identification and anomaly detection in time-series. This work bridges modern self-supervised learning and dynamical-systems theory, offering principled insights to design more robust, interpretable time-series models.
Abstract
Joint-Embedding Predictive Architectures (JEPAs), a powerful class of self-supervised models, exhibit an unexplained ability to cluster time-series data by their underlying dynamical regimes. We propose a novel theoretical explanation for this phenomenon, hypothesizing that JEPA's predictive objective implicitly drives it to learn the invariant subspace of the system's Koopman operator. We prove that an idealized JEPA loss is minimized when the encoder represents the system's regime indicator functions, which are Koopman eigenfunctions. This theory was validated on synthetic data with known dynamics, demonstrating that constraining the JEPA's linear predictor to be a near-identity operator is the key inductive bias that forces the encoder to learn these invariants. We further discuss that this constraint is critical for selecting this interpretable solution from a class of mathematically equivalent but entangled optima, revealing the predictor's role in representation disentanglement. This work demystifies a key behavior of JEPAs, provides a principled connection between modern self-supervised learning and dynamical systems theory, and informs the design of more robust and interpretable time-series models.
