Table of Contents
Fetching ...

Learning a distance measure from the information-estimation geometry of data

Guy Ohayon, Pierre-Etienne H. Fiquet, Florentin Guth, Jona Ballé, Eero P. Simoncelli

TL;DR

The paper addresses the lack of a principled, unlabeled perceptual distance by deriving the Information-Estimation Metric (IEM) from the geometry of the data density via an information-estimation bridge. It defines the distance between two signals as the integrated mismatch between score fields of the Gaussian-blurred densities across noise levels, and proves that this yields a global metric with a local Riemannian interpretation. For Gaussian priors the IEM reduces to the Mahalanobis distance, while for more complex priors it adapts to the distribution’s geometry; a second-order expansion yields a local metric that reflects curvature of the log-density. Importantly, the IEM can be learned in an unsupervised way by training a denoiser (diffusion-model–style) on unlabeled data and computing the integral, with experiments on ImageNet showing competitive correlation with human perceptual judgments across standard image-quality benchmarks. The framework opens avenues for unsupervised clustering, information retrieval, and improved evaluation of restoration and compression systems, albeit with higher computational cost than some supervised metrics.

Abstract

We introduce the Information-Estimation Metric (IEM), a novel form of distance function derived from an underlying continuous probability density over a domain of signals. The IEM is rooted in a fundamental relationship between information theory and estimation theory, which links the log-probability of a signal with the errors of an optimal denoiser, applied to noisy observations of the signal. In particular, the IEM between a pair of signals is obtained by comparing their denoising error vectors over a range of noise amplitudes. Geometrically, this amounts to comparing the score vector fields of the blurred density around the signals over a range of blur levels. We prove that the IEM is a valid global metric and derive a closed-form expression for its local second-order approximation, which yields a Riemannian metric. For Gaussian-distributed signals, the IEM coincides with the Mahalanobis distance. But for more complex distributions, it adapts, both locally and globally, to the geometry of the distribution. In practice, the IEM can be computed using a learned denoiser (analogous to generative diffusion models) and solving a one-dimensional integral. To demonstrate the value of our framework, we learn an IEM on the ImageNet database. Experiments show that this IEM is competitive with or outperforms state-of-the-art supervised image quality metrics in predicting human perceptual judgments.

Learning a distance measure from the information-estimation geometry of data

TL;DR

The paper addresses the lack of a principled, unlabeled perceptual distance by deriving the Information-Estimation Metric (IEM) from the geometry of the data density via an information-estimation bridge. It defines the distance between two signals as the integrated mismatch between score fields of the Gaussian-blurred densities across noise levels, and proves that this yields a global metric with a local Riemannian interpretation. For Gaussian priors the IEM reduces to the Mahalanobis distance, while for more complex priors it adapts to the distribution’s geometry; a second-order expansion yields a local metric that reflects curvature of the log-density. Importantly, the IEM can be learned in an unsupervised way by training a denoiser (diffusion-model–style) on unlabeled data and computing the integral, with experiments on ImageNet showing competitive correlation with human perceptual judgments across standard image-quality benchmarks. The framework opens avenues for unsupervised clustering, information retrieval, and improved evaluation of restoration and compression systems, albeit with higher computational cost than some supervised metrics.

Abstract

We introduce the Information-Estimation Metric (IEM), a novel form of distance function derived from an underlying continuous probability density over a domain of signals. The IEM is rooted in a fundamental relationship between information theory and estimation theory, which links the log-probability of a signal with the errors of an optimal denoiser, applied to noisy observations of the signal. In particular, the IEM between a pair of signals is obtained by comparing their denoising error vectors over a range of noise amplitudes. Geometrically, this amounts to comparing the score vector fields of the blurred density around the signals over a range of blur levels. We prove that the IEM is a valid global metric and derive a closed-form expression for its local second-order approximation, which yields a Riemannian metric. For Gaussian-distributed signals, the IEM coincides with the Mahalanobis distance. But for more complex distributions, it adapts, both locally and globally, to the geometry of the distribution. In practice, the IEM can be computed using a learned denoiser (analogous to generative diffusion models) and solving a one-dimensional integral. To demonstrate the value of our framework, we learn an IEM on the ImageNet database. Experiments show that this IEM is competitive with or outperforms state-of-the-art supervised image quality metrics in predicting human perceptual judgments.

Paper Structure

This paper contains 58 sections, 7 theorems, 84 equations, 9 figures, 2 tables.

Key Result

Theorem 1

For every ${{\Gamma}>0}$, the $\textnormal{IEM}$ is a proper distance metric: it is symmetric, non-negative, equal to zero if and only if ${\bm{x}}_{1}={\bm{x}}_{2}$, and it satisfies the triangle inequality.

Figures (9)

  • Figure 1: The information-estimation geometry around two points. We show a Gaussian mixture log-density and its gradient vector fields around the points $\gamma{\bm{x}}_{1}$ and $\gamma{\bm{x}}_{2}$ for three different SNR levels $\gamma$. The space is rescaled by $\gamma$ and the distribution collapses to a point at $\gamma=0$. When blurring the density (small $\gamma$), the two modes merge, and the gradients around $\gamma{\bm{x}}_{1}$ point toward either of the modes. When the two modes are far enough apart (large $\gamma$), most gradient vectors point toward their closest mode. Thus, the local gradients around a given point can capture different geometrical features of the distribution, depending on the SNR $\gamma$. The Information-Estimation Metric (IEM, \ref{['eq:distance-qv']}) between the two points ${\bm{x}}_{1}$ and ${\bm{x}}_{2}$ is the square error between the local gradient fields around them, weighted by a Gaussian window (illustrated by the opacity of the gradients' arrows) and integrated over all levels of SNR $\gamma\in[0,\Gamma]$.
  • Figure 2: Illustrating the global and local geometry of the Information-Estimation Metric (IEM) on three different prior densities.Top row: Equidistant IEM contours relative to an example reference point (white star). When $p_{{\mathbf{x}}}$ is Gaussian (middle column), the IEM coincides with the well-known Mahalanobis distance. For a separable Laplacian prior (left column), the equidistant contours cluster and curve around the axes, following the high-probability ridges. For a Gaussian mixture prior (right column), the contours reflect the shapes of the modes. These examples illustrate how the IEM adapts to the global geometry of the given prior density. Bottom row: Ellipses representing the local discrimination thresholds of the local Riemannian metric ${{\bm{G}}({\bm{x}},{\Gamma})}$ (\ref{['eq:local_metric_hessian']}). Larger ellipse radii correspond to higher discrimination thresholds, i.e., lower sensitivity to local perturbations. For the Gaussian prior, the local metric is constant across the entire domain (identical to the Mahalanobis metric). For the Laplace (heavy-tailed) prior, the discrimination thresholds are smaller in high-probability regions---consistent with human perception and predictions of efficient coding theories. Moreover, the orientations of the ellipses align with the equiprobable log-density contours, implying that ${\bm{G}}({\bm{x}},{\Gamma})$ is more sensitive to perturbations that yield a larger change in the probability of ${\bm{x}}$. For the Gaussian mixture density, the discrimination thresholds are smaller between the modes, and the major axes of the ellipses align with the direction of larger local variance. Overall, these examples illustrate that ${\bm{G}}({\bm{x}},{\Gamma})$ is more sensitive in regions of higher log-density curvature and to perturbations that induce larger local changes in probability.
  • Figure 3: Illustrating the disagreement between different types of perceptual distance measures. We ranked the distorted images associated with each reference image in the LIVE and CSIQ databases (middle row), according to the IEM and several other metrics. Each column displays the distorted images with the largest positive (bottom row) or negative (top row) rank differences between the IEM and the compared metric (denoted in the title of the column).
  • Figure 4: Spearman's rank correlation coefficient (SRCC) results on full-reference image similarity benchmarks. On TID2013, LIVE, and CSIQ, the IEM performs competitively with previous state-of-the-art supervised methods, but struggles on TQD (texture similarity data), as do most methods. In contrast, the unsupervised $\text{IEM}_{\text{sq.}}$ performs surprisingly well on TQD. Our supervised variant, which only learns $f_{\omega}$, achieves strong results on both types of databases simultaneously.
  • Figure 5: Two-alternative forced choice (2AFC) performance comparison on the different distortion categories in the BAPPS dataset. The unsupervised $\text{IEM}_{\text{sq.}}$ achieves competitive performance in most types of distortion. Our supervised variant, $\text{IEM}_{f_\omega}$, further improves the results.
  • ...and 4 more figures

Theorems & Definitions (15)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Definition 2
  • Definition 3
  • Theorem 2
  • proof
  • Theorem 2
  • proof
  • Proposition 1
  • ...and 5 more