KGMM: A K-means Clustering Approach to Gaussian Mixture Modeling for Score Function Estimation
Ludovico T. Giorgini, Tobias Bischoff, Andre N. Souza
TL;DR
This work tackles the challenge of estimating the score function, the gradient of the log steady-state density, in complex dynamical systems. It introduces KGMM, a two-stage approach that first uses bisecting K-means clustering with Gaussian Mixture Models to obtain cluster-wise, noise-averaged score targets, and then trains a neural network to interpolate the score across state space. KGMM demonstrates accurate recovery of invariant measures and long-time statistics for potential, chaotic Lorenz-type, and KS systems, while offering substantial computational savings over conventional Denoising Score Matching. The method shows particular strength in moderate dimensions and large datasets, with clear guidelines on hyperparameter choice and limitations due to dimensionality and finite-$\sigma$ bias. Overall, KGMM advances data-driven, reduced-order stochastic modelling by providing a robust, efficient framework for score-function estimation in complex dynamical systems.
Abstract
We propose a hybrid method for accurately estimating the score function, i.e., the gradient of the log steady-state density, using a Gaussian Mixture Model (GMM) in conjunction with a bisecting K-means clustering step. Our approach, which we call KGMM, offers a systematic way to combine statistical density estimation with a neural-network-based interpolation of the score, leveraging the strengths of both. We demonstrate its ability to accurately reconstruct the long-time statistical properties of several paradigmatic systems, including potential systems, chaotic Lorenz-type models, and the Kuramoto-Sivashinsky equation. Numerical experiments show that KGMM yields robust estimates of the score function, even for small values of the covariance amplitude in the GMM, where standard GMM methods tend to fail because of noise amplification. We compare the performance of KGMM against the conventional Denoising Score Matching (DSM) approach, demonstrating that KGMM achieves more faithful reconstruction of the steady-state distribution for low-dimensional systems at a fraction of the computational cost. These accurate estimates allow us to build effective stochastic reduced-order models that reproduce the invariant measures of the target dynamics.
