Table of Contents
Fetching ...

Linearly-scalable and entropy-optimal learning of nonstationary and nonlinear manifolds

Illia Horenko

TL;DR

This work introduces Entropy-Optimal Manifold Clustering (EOMC), a metricized, entropy-regularized extension of PCA-clustering that achieves linear-in-$T$ scalability while robustly recovering nonlinear, nonstationary manifolds. By incorporating loss metrisation via $oldsymbol{ }$ and entropy regularization via $eta$, EOMC yields an efficient, convex optimization with analytic updates for the cluster means, local manifolds, and soft assignments; it further supports simultaneous learning of data reliability through $oldsymbol{ }_0(t)$. The method is validated on synthetic benchmarks and on Lorenz-96 and modified Hasegawa-Wakatani (mHW) models, where it reveals metastable regime-switching dynamics, extended predictability horizons via transfer-operator analysis, and substantial lossy-data compression gains relative to PCA-based approaches. Visualizing EOMC coordinates with t-SNE demonstrates that the internal probabilities encode meaningful manifold structure, reinforcing the approach’s interpretability and robustness for data-driven fluid-mechanics and geosciences tasks.

Abstract

We propose an Entropy-Optimal Manifold Clustering (EOMC) - and show that it mitigates the cost scaling and robustness issues of the existing dimensionality reduction and manifold learning tools in nonstationary and nonlinear situations, while pertaining the favourable O(T) iteration complexity scaling in the statistics size T, and allowing explicit computation of input data reliability. Application to the Lorenz-96 dynamical system in chaotic regime, as well as to a modified Hasegawa-Wakatani (mHW) model of drift-wave turbulence in the edge of a tokamak plasma reveals that for both of the models their essential dynamics is best described as a metastable regime-switching process, making infrequent transitions between the very persistent low-dimensional manifolds. At the same time, the Markovian mean exit times and relaxation times (that bound the predictability horizons for the identified regime-switching process) appear to decrease only very slowly with the growing external forcing - indicating approximately two-fold longer prediction horizons then is currently anticipated based on analysis of positive Lyapunov exponents, even in very chaotic model regimes. It is also demonstrated that when applied for a lossy compression of the Lorenz-96 and mHW output data in various forcing regimes, EOMC achieves several orders of magnitude smaller compression loss - when compared to the common PCA-related linear compression approaches that build a backbone of the state-of-the-art lossy data compression tools (like JPEG, MP3, and others). These findings open new exciting opportunities for EOMC and transfer operator theory, by offering new possibilities to significantly improve predictive skills and performance of data-driven tools in fluid mechanics and geosciences applications.

Linearly-scalable and entropy-optimal learning of nonstationary and nonlinear manifolds

TL;DR

This work introduces Entropy-Optimal Manifold Clustering (EOMC), a metricized, entropy-regularized extension of PCA-clustering that achieves linear-in- scalability while robustly recovering nonlinear, nonstationary manifolds. By incorporating loss metrisation via and entropy regularization via , EOMC yields an efficient, convex optimization with analytic updates for the cluster means, local manifolds, and soft assignments; it further supports simultaneous learning of data reliability through . The method is validated on synthetic benchmarks and on Lorenz-96 and modified Hasegawa-Wakatani (mHW) models, where it reveals metastable regime-switching dynamics, extended predictability horizons via transfer-operator analysis, and substantial lossy-data compression gains relative to PCA-based approaches. Visualizing EOMC coordinates with t-SNE demonstrates that the internal probabilities encode meaningful manifold structure, reinforcing the approach’s interpretability and robustness for data-driven fluid-mechanics and geosciences tasks.

Abstract

We propose an Entropy-Optimal Manifold Clustering (EOMC) - and show that it mitigates the cost scaling and robustness issues of the existing dimensionality reduction and manifold learning tools in nonstationary and nonlinear situations, while pertaining the favourable O(T) iteration complexity scaling in the statistics size T, and allowing explicit computation of input data reliability. Application to the Lorenz-96 dynamical system in chaotic regime, as well as to a modified Hasegawa-Wakatani (mHW) model of drift-wave turbulence in the edge of a tokamak plasma reveals that for both of the models their essential dynamics is best described as a metastable regime-switching process, making infrequent transitions between the very persistent low-dimensional manifolds. At the same time, the Markovian mean exit times and relaxation times (that bound the predictability horizons for the identified regime-switching process) appear to decrease only very slowly with the growing external forcing - indicating approximately two-fold longer prediction horizons then is currently anticipated based on analysis of positive Lyapunov exponents, even in very chaotic model regimes. It is also demonstrated that when applied for a lossy compression of the Lorenz-96 and mHW output data in various forcing regimes, EOMC achieves several orders of magnitude smaller compression loss - when compared to the common PCA-related linear compression approaches that build a backbone of the state-of-the-art lossy data compression tools (like JPEG, MP3, and others). These findings open new exciting opportunities for EOMC and transfer operator theory, by offering new possibilities to significantly improve predictive skills and performance of data-driven tools in fluid mechanics and geosciences applications.

Paper Structure

This paper contains 20 sections, 5 theorems, 18 equations, 8 figures.

Key Result

Lemma 1

Let $\mathcal{T}$ be a real-valued $D \times d$ matrix ($D > d$) with orthonormal columns (i.e., $\mathcal{T}^\dagger \mathcal{T} = I_d$, where $I_d$ be the $d \times d$ identity matrix). The kernel of the operator $B = I_D - \mathcal{T}\mathcal{T}^\dagger$ is non-empty; specifically, it contains no

Figures (8)

  • Figure 1: Graphic illustration of the five phases of the EOMC data analysis pipeline. Text description is provided in the Sec. \ref{['sec:synt_ex1']}.
  • Figure 2: Analysis results for the nonstationary data from Example 1 (switching between two-dimensional ball and torus surface manifolds in ten dimensions with noise).
  • Figure 3: Analysis results for the nonstationary data from Example 2 (switching between one-dimensional peace sign and prism contours manifolds in hundred dimensions with noise).
  • Figure 4: Illustration of the modified EOMC learning from Sec. \ref{['sec:gamma_0']} and eqs. (\ref{['eq:eomc_g0']}-\ref{['eq:gamma0']}): co-inference of data reliability function $\gamma_0$.
  • Figure 5: EOMC analysis results for the data from Lorenz-96 model with the external forcings $F=7,10,12$ (in rows) for the dependence between compression factor and loss as functions of reduced manifold dimensionality $d$ (first column) and number $K$ of local manifolds (second column), as well as the identified trajectories of EOMC internal coordinates $\gamma$ as functions of time (third column).
  • ...and 3 more figures

Theorems & Definitions (10)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Theorem 1
  • proof