Linearly-scalable and entropy-optimal learning of nonstationary and nonlinear manifolds
Illia Horenko
TL;DR
This work introduces Entropy-Optimal Manifold Clustering (EOMC), a metricized, entropy-regularized extension of PCA-clustering that achieves linear-in-$T$ scalability while robustly recovering nonlinear, nonstationary manifolds. By incorporating loss metrisation via $oldsymbol{ }$ and entropy regularization via $eta$, EOMC yields an efficient, convex optimization with analytic updates for the cluster means, local manifolds, and soft assignments; it further supports simultaneous learning of data reliability through $oldsymbol{ }_0(t)$. The method is validated on synthetic benchmarks and on Lorenz-96 and modified Hasegawa-Wakatani (mHW) models, where it reveals metastable regime-switching dynamics, extended predictability horizons via transfer-operator analysis, and substantial lossy-data compression gains relative to PCA-based approaches. Visualizing EOMC coordinates with t-SNE demonstrates that the internal probabilities encode meaningful manifold structure, reinforcing the approach’s interpretability and robustness for data-driven fluid-mechanics and geosciences tasks.
Abstract
We propose an Entropy-Optimal Manifold Clustering (EOMC) - and show that it mitigates the cost scaling and robustness issues of the existing dimensionality reduction and manifold learning tools in nonstationary and nonlinear situations, while pertaining the favourable O(T) iteration complexity scaling in the statistics size T, and allowing explicit computation of input data reliability. Application to the Lorenz-96 dynamical system in chaotic regime, as well as to a modified Hasegawa-Wakatani (mHW) model of drift-wave turbulence in the edge of a tokamak plasma reveals that for both of the models their essential dynamics is best described as a metastable regime-switching process, making infrequent transitions between the very persistent low-dimensional manifolds. At the same time, the Markovian mean exit times and relaxation times (that bound the predictability horizons for the identified regime-switching process) appear to decrease only very slowly with the growing external forcing - indicating approximately two-fold longer prediction horizons then is currently anticipated based on analysis of positive Lyapunov exponents, even in very chaotic model regimes. It is also demonstrated that when applied for a lossy compression of the Lorenz-96 and mHW output data in various forcing regimes, EOMC achieves several orders of magnitude smaller compression loss - when compared to the common PCA-related linear compression approaches that build a backbone of the state-of-the-art lossy data compression tools (like JPEG, MP3, and others). These findings open new exciting opportunities for EOMC and transfer operator theory, by offering new possibilities to significantly improve predictive skills and performance of data-driven tools in fluid mechanics and geosciences applications.
