Table of Contents
Fetching ...

Tempered Calculus for ML: Application to Hyperbolic Model Embedding

Richard Nock, Ehsan Amid, Frank Nielsen, Alexander Soen, Manfred K. Warmuth

TL;DR

Tempered Calculus for ML introduces a generalization of integration through $t$-additivity and its calculus, unifying classical and nonextensive perspectives to shape ML distortion measures. It defines a $t$-Riemann integral and a generalized derivative, linked by $ extfrak{g}_t(z)= ext{log}_t( ext{exp}(z))$, enabling tunable distortion properties such as hyperbolicity and metricity. The framework is applied to hyperbolic embeddings, notably for embedding boosted decision trees in the Poincaré disk, supported by Monotonic Decision Trees (MDT) to maintain interpretable, monotone confidence paths and a boosting scheme (logisticBoost) that preserves hyperbolic structure. The work provides both theoretical tools and practical embedding methods, including Lorentz and Poincaré models under tempering, with empirical demonstrations on DT/MDT embeddings and their interpretability in hyperbolic space. Together, these contributions offer a principled way to design and analyze ML distortions with controlled geometric properties for improved encoding, hierarchy capture, and visualization in hyperbolic representations.

Abstract

Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc. In this paper, we unveil a grounded theory and tools which can help improve these distortions to better cope with ML requirements. We start with a generalization of Riemann integration that also encapsulates functions that are not strictly additive but are, more generally, $t$-additive, as in nonextensive statistical mechanics. Notably, this recovers Volterra's product integral as a special case. We then generalize the Fundamental Theorem of calculus using an extension of the (Euclidean) derivative. This, along with a series of more specific Theorems, serves as a basis for results showing how one can specifically design, alter, or change fundamental properties of distortion measures in a simple way, with a special emphasis on geometric- and ML-related properties that are the metricity, hyperbolicity, and encoding. We show how to apply it to a problem that has recently gained traction in ML: hyperbolic embeddings with a "cheap" and accurate encoding along the hyperbolic vs Euclidean scale. We unveil a new application for which the Poincaré disk model has very appealing features, and our theory comes in handy: \textit{model} embeddings for boosted combinations of decision trees, trained using the log-loss (trees) and logistic loss (combinations).

Tempered Calculus for ML: Application to Hyperbolic Model Embedding

TL;DR

Tempered Calculus for ML introduces a generalization of integration through -additivity and its calculus, unifying classical and nonextensive perspectives to shape ML distortion measures. It defines a -Riemann integral and a generalized derivative, linked by , enabling tunable distortion properties such as hyperbolicity and metricity. The framework is applied to hyperbolic embeddings, notably for embedding boosted decision trees in the Poincaré disk, supported by Monotonic Decision Trees (MDT) to maintain interpretable, monotone confidence paths and a boosting scheme (logisticBoost) that preserves hyperbolic structure. The work provides both theoretical tools and practical embedding methods, including Lorentz and Poincaré models under tempering, with empirical demonstrations on DT/MDT embeddings and their interpretability in hyperbolic space. Together, these contributions offer a principled way to design and analyze ML distortions with controlled geometric properties for improved encoding, hierarchy capture, and visualization in hyperbolic representations.

Abstract

Most mathematical distortions used in ML are fundamentally integral in nature: -divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc. In this paper, we unveil a grounded theory and tools which can help improve these distortions to better cope with ML requirements. We start with a generalization of Riemann integration that also encapsulates functions that are not strictly additive but are, more generally, -additive, as in nonextensive statistical mechanics. Notably, this recovers Volterra's product integral as a special case. We then generalize the Fundamental Theorem of calculus using an extension of the (Euclidean) derivative. This, along with a series of more specific Theorems, serves as a basis for results showing how one can specifically design, alter, or change fundamental properties of distortion measures in a simple way, with a special emphasis on geometric- and ML-related properties that are the metricity, hyperbolicity, and encoding. We show how to apply it to a problem that has recently gained traction in ML: hyperbolic embeddings with a "cheap" and accurate encoding along the hyperbolic vs Euclidean scale. We unveil a new application for which the Poincaré disk model has very appealing features, and our theory comes in handy: \textit{model} embeddings for boosted combinations of decision trees, trained using the log-loss (trees) and logistic loss (combinations).
Paper Structure (12 sections, 9 theorems, 26 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 12 sections, 9 theorems, 26 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Any function is either $t$-Riemann integrable for all $t\in \mathbb{R}$ simultaneously, or for none. In the former case, we have the relationship

Figures (3)

  • Figure 1: Plot of $\textfrak{g}_t(z)$\ref{['eq-liftt']} for different values of $t$ (color map on the right bar), showing where it is convex / concave. The $t=1$ case ($\textfrak{g}_1(z) = z$) is emphasized in red.
  • Figure 2: Suppose $r \stackrel{\mathrm{.}}{=} \|\bm{z}\|$ is the norm a point $\bm{z}$ in Poincaré disk $\mathbb{B}_1$. Fix $t\in [0,1]$ (color bar). The plot gives the norm $r^{(t)}$ of a point $\bm{z}^{(t)}$ in the t-self such that $d_{\mathbb{B}_1}^{(t)}(\bm{z}^{(t)}, \bm{0}) = d_{\mathbb{B}_1}(\bm{z}, \bm{0})$.
  • Figure 3: A small decision tree (DT) learned on UCI abalone (left) and its corresponding monotonic decision tree (MDT, right) learned using getMDT. Colors ( red, green) denote the majority class in each node. In each node, the real-valued prediction \ref{['ecbrReal']} is indicated, also in color. Observe that, indeed, $H$ does not grant path-monotonic classification but $H'$ does (Definition \ref{['defMDT']}). Observe also that in $H'$, some nodes have outdegree 1; also, internal node $\#$6 in the DT, whose prediction is worse than its parent, disappears in $H'$. One arc in $H'$ is represented in double width because its Boolean test aggregates both tests it takes to go from $\#$3 to $\#$10 in $H$. Finally, the depths of $H$ and $H'$ are the same.

Theorems & Definitions (14)

  • Definition 3.1
  • Theorem 1
  • Theorem 2
  • Lemma 1
  • Definition 4.1
  • Lemma 2
  • Definition 4.2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • ...and 4 more