Coherent set identification via direct low rank maximum likelihood estimation
Robert Polzin, Ilja Klebanov, Nikolas Nüsken, Péter Koltai
TL;DR
This paper analyzes two low-rank approaches to dynamical data: the classical coherence problem, which seeks partitions of states that remain highly distinguishable under a stochastic transition, and Direct Bayesian Model Reduction (DBMR), which directly estimates a low-rank factorization of the transition. It demonstrates that DBMR outputs can be expressed as the full transition composed with a projection, $\Lambda = P\Pi$, and derives bounds ensuring the reduced model’s coherence controls that of the full model. A key theoretical contribution is a bound connecting the Frobenius-norm error between the normalized full and reduced transitions to the relaxed likelihood gap $\hat{\ell}$, supported by a Pinsker-type inequality, thereby relating projection-based and likelihood-based estimations and linking Frobenius and KL objectives. Numerically, DBMR provides interpretable, structure-preserving reduced models and, despite potential local maxima, shows broad alignment with the classical coherence approach, suggesting complementary utility and motivating further exploration of symmetrized DBMR and alternative objective formulations.
Abstract
We analyze connections between two low rank modeling approaches from the last decade for treating dynamical data. The first one is the coherence problem (or coherent set approach), where groups of states are sought that evolve under the action of a stochastic transition matrix in a way maximally distinguishable from other groups. The second one is a low rank factorization approach for stochastic matrices, called Direct Bayesian Model Reduction (DBMR), which estimates the low rank factors directly from observed data. We show that DBMR results in a low rank model that is a projection of the full model, and exploit this insight to infer bounds on a quantitative measure of coherence within the reduced model. Both approaches can be formulated as optimization problems, and we also prove a bound between their respective objectives. On a broader scope, this work relates the two classical loss functions of nonnegative matrix factorization, namely the Frobenius norm and the generalized Kullback--Leibler divergence, and suggests new links between likelihood-based and projection-based estimation of probabilistic models.
