Learning a distance measure from the information-estimation geometry of data
Guy Ohayon, Pierre-Etienne H. Fiquet, Florentin Guth, Jona Ballé, Eero P. Simoncelli
TL;DR
The paper addresses the lack of a principled, unlabeled perceptual distance by deriving the Information-Estimation Metric (IEM) from the geometry of the data density via an information-estimation bridge. It defines the distance between two signals as the integrated mismatch between score fields of the Gaussian-blurred densities across noise levels, and proves that this yields a global metric with a local Riemannian interpretation. For Gaussian priors the IEM reduces to the Mahalanobis distance, while for more complex priors it adapts to the distribution’s geometry; a second-order expansion yields a local metric that reflects curvature of the log-density. Importantly, the IEM can be learned in an unsupervised way by training a denoiser (diffusion-model–style) on unlabeled data and computing the integral, with experiments on ImageNet showing competitive correlation with human perceptual judgments across standard image-quality benchmarks. The framework opens avenues for unsupervised clustering, information retrieval, and improved evaluation of restoration and compression systems, albeit with higher computational cost than some supervised metrics.
Abstract
We introduce the Information-Estimation Metric (IEM), a novel form of distance function derived from an underlying continuous probability density over a domain of signals. The IEM is rooted in a fundamental relationship between information theory and estimation theory, which links the log-probability of a signal with the errors of an optimal denoiser, applied to noisy observations of the signal. In particular, the IEM between a pair of signals is obtained by comparing their denoising error vectors over a range of noise amplitudes. Geometrically, this amounts to comparing the score vector fields of the blurred density around the signals over a range of blur levels. We prove that the IEM is a valid global metric and derive a closed-form expression for its local second-order approximation, which yields a Riemannian metric. For Gaussian-distributed signals, the IEM coincides with the Mahalanobis distance. But for more complex distributions, it adapts, both locally and globally, to the geometry of the distribution. In practice, the IEM can be computed using a learned denoiser (analogous to generative diffusion models) and solving a one-dimensional integral. To demonstrate the value of our framework, we learn an IEM on the ImageNet database. Experiments show that this IEM is competitive with or outperforms state-of-the-art supervised image quality metrics in predicting human perceptual judgments.
