Table of Contents
Fetching ...

Bregman-Hausdorff divergence: strengthening the connections between computational geometry and machine learning

Tuyen Pham, Hana Dal Poz Kouřimská, Hubert Wagner

TL;DR

This work addresses the problem of comparing sets of probabilistic predictions within a non-metric Bregman geometry by defining the Bregman--Hausdorff divergences (primal, dual) and a symmetric Chernoff variant, all grounded in a Legendre-type generator $F$ with $D_F(x\|y)=F(x)-(F(y)+\langle \nabla F(y), x-y\rangle)$. It develops efficient algorithms, notably using decomposable Bregman Kd-trees, to compute $H_{F}(P\|Q)$ and its Chernoff counterpart, with a dual-space mapping for the dual variant, and demonstrates practical scalability with $P,Q$ as large sets of probability vectors. The KL specialization yields information-theoretic interpretations: $H_{KL}(P\|Q)$ and $H'_{KL}(P\|Q)$ quantify worst-case expected bit-loss when approximating one collection of distributions by another, reinforcing the link between computational geometry and ML loss landscapes. Empirical results on model predictions show meaningful discrimination between predictions from different-trained models and confirm substantial speedups (up to orders of magnitude) using shell-accelerated Bregman NN search, indicating the method’s practical utility in ML pipelines that operate in Bregman geometries.

Abstract

The purpose of this paper is twofold. On a technical side, we propose an extension of the Hausdorff distance from metric spaces to spaces equipped with asymmetric distance measures. Specifically, we focus on the family of Bregman divergences, which includes the popular Kullback--Leibler divergence (also known as relative entropy). As a proof of concept, we use the resulting Bregman--Hausdorff divergence to compare two collections of probabilistic predictions produced by different machine learning models trained using the relative entropy loss. The algorithms we propose are surprisingly efficient even for large inputs with hundreds of dimensions. In addition to the introduction of this technical concept, we provide a survey. It outlines the basics of Bregman geometry, as well as computational geometry algorithms. We focus on algorithms that are compatible with this geometry and are relevant for machine learning.

Bregman-Hausdorff divergence: strengthening the connections between computational geometry and machine learning

TL;DR

This work addresses the problem of comparing sets of probabilistic predictions within a non-metric Bregman geometry by defining the Bregman--Hausdorff divergences (primal, dual) and a symmetric Chernoff variant, all grounded in a Legendre-type generator with . It develops efficient algorithms, notably using decomposable Bregman Kd-trees, to compute and its Chernoff counterpart, with a dual-space mapping for the dual variant, and demonstrates practical scalability with as large sets of probability vectors. The KL specialization yields information-theoretic interpretations: and quantify worst-case expected bit-loss when approximating one collection of distributions by another, reinforcing the link between computational geometry and ML loss landscapes. Empirical results on model predictions show meaningful discrimination between predictions from different-trained models and confirm substantial speedups (up to orders of magnitude) using shell-accelerated Bregman NN search, indicating the method’s practical utility in ML pipelines that operate in Bregman geometries.

Abstract

The purpose of this paper is twofold. On a technical side, we propose an extension of the Hausdorff distance from metric spaces to spaces equipped with asymmetric distance measures. Specifically, we focus on the family of Bregman divergences, which includes the popular Kullback--Leibler divergence (also known as relative entropy). As a proof of concept, we use the resulting Bregman--Hausdorff divergence to compare two collections of probabilistic predictions produced by different machine learning models trained using the relative entropy loss. The algorithms we propose are surprisingly efficient even for large inputs with hundreds of dimensions. In addition to the introduction of this technical concept, we provide a survey. It outlines the basics of Bregman geometry, as well as computational geometry algorithms. We focus on algorithms that are compatible with this geometry and are relevant for machine learning.

Paper Structure

This paper contains 10 sections, 24 equations, 11 figures, 3 tables, 3 algorithms.

Figures (11)

  • Figure 1: A visualization of the relative entropy.
  • Figure 2: Visualization of a Bregman divergence formula for a one-dimensional domain.
  • Figure 3: Left: concentric primal Itakura--Saito balls. Right: concentric primal generalized Kullback--Leibler balls.
  • Figure 4: Geometric interpretation of primal (left) and dual (right) Bregman balls in dimension one.
  • Figure 5: In blue (light): Primal $KL$ balls with centers $x$ and $y$ intersect at the Chernoff point $c$. In magenta (dark): A dual $KL$ ball of radius $D_{KL}(x\|c)$ is drawn about $c$.
  • ...and 6 more figures