Density Estimation via Binless Multidimensional Integration
Matteo Carli, Alex Rodriguez, Alessandro Laio, Aldo Glielmo
TL;DR
Density estimation in high dimensions is challenging due to the curse of dimensionality and the need for robust, data-efficient methods. The paper introduces Binless Multidimensional Thermodynamic Integration (BMTI), a nonparametric approach that estimates the negative log-density by measuring log-density differences between neighboring points and integrating these differences on an adaptive, manifold-aware neighbourhood graph. BMTI derives a maximum-likelihood formulation yielding a linear system for the log-density values, and provides a principled way to quantify uncertainties through a covariance structure of the differences, while also offering approximate and regularised variants to handle disconnected graphs. Through extensive synthetic and realistic datasets, BMTI demonstrates improved accuracy and smoothness over state-of-the-art estimators across dimensionalities up to at least 20, highlighting its data efficiency and robustness for applications in physics and chemistry where free-energy landscapes are essential.
Abstract
We introduce the Binless Multidimensional Thermodynamic Integration (BMTI) method for nonparametric, robust, and data-efficient density estimation. BMTI estimates the logarithm of the density by initially computing log-density differences between neighbouring data points. Subsequently, such differences are integrated, weighted by their associated uncertainties, using a maximum-likelihood formulation. This procedure can be seen as an extension to a multidimensional setting of the thermodynamic integration, a technique developed in statistical physics. The method leverages the manifold hypothesis, estimating quantities within the intrinsic data manifold without defining an explicit coordinate map. It does not rely on any binning or space partitioning, but rather on the construction of a neighbourhood graph based on an adaptive bandwidth selection procedure. BMTI mitigates the limitations commonly associated with traditional nonparametric density estimators, effectively reconstructing smooth profiles even in high-dimensional embedding spaces. The method is tested on a variety of complex synthetic high-dimensional datasets, where it is shown to outperform traditional estimators, and is benchmarked on realistic datasets from the chemical physics literature.
