Table of Contents
Fetching ...

Fast hyperboloid decision tree algorithms

Philippe Chlenski, Ethan Turok, Antonio Moretti, Itsik Pe'er

TL;DR

The proposed hyperDT eliminates the need for computationally intensive Riemannian optimization, numerically unstable exponential and logarithmic maps, or pairwise comparisons between points by leveraging inner products to adapt Euclidean decision tree algorithms to hyperbolic space.

Abstract

Hyperbolic geometry is gaining traction in machine learning for its effectiveness at capturing hierarchical structures in real-world data. Hyperbolic spaces, where neighborhoods grow exponentially, offer substantial advantages and consistently deliver state-of-the-art results across diverse applications. However, hyperbolic classifiers often grapple with computational challenges. Methods reliant on Riemannian optimization frequently exhibit sluggishness, stemming from the increased computational demands of operations on Riemannian manifolds. In response to these challenges, we present hyperDT, a novel extension of decision tree algorithms into hyperbolic space. Crucially, hyperDT eliminates the need for computationally intensive Riemannian optimization, numerically unstable exponential and logarithmic maps, or pairwise comparisons between points by leveraging inner products to adapt Euclidean decision tree algorithms to hyperbolic space. Our approach is conceptually straightforward and maintains constant-time decision complexity while mitigating the scalability issues inherent in high-dimensional Euclidean spaces. Building upon hyperDT we introduce hyperRF, a hyperbolic random forest model. Extensive benchmarking across diverse datasets underscores the superior performance of these models, providing a swift, precise, accurate, and user-friendly toolkit for hyperbolic data analysis.

Fast hyperboloid decision tree algorithms

TL;DR

The proposed hyperDT eliminates the need for computationally intensive Riemannian optimization, numerically unstable exponential and logarithmic maps, or pairwise comparisons between points by leveraging inner products to adapt Euclidean decision tree algorithms to hyperbolic space.

Abstract

Hyperbolic geometry is gaining traction in machine learning for its effectiveness at capturing hierarchical structures in real-world data. Hyperbolic spaces, where neighborhoods grow exponentially, offer substantial advantages and consistently deliver state-of-the-art results across diverse applications. However, hyperbolic classifiers often grapple with computational challenges. Methods reliant on Riemannian optimization frequently exhibit sluggishness, stemming from the increased computational demands of operations on Riemannian manifolds. In response to these challenges, we present hyperDT, a novel extension of decision tree algorithms into hyperbolic space. Crucially, hyperDT eliminates the need for computationally intensive Riemannian optimization, numerically unstable exponential and logarithmic maps, or pairwise comparisons between points by leveraging inner products to adapt Euclidean decision tree algorithms to hyperbolic space. Our approach is conceptually straightforward and maintains constant-time decision complexity while mitigating the scalability issues inherent in high-dimensional Euclidean spaces. Building upon hyperDT we introduce hyperRF, a hyperbolic random forest model. Extensive benchmarking across diverse datasets underscores the superior performance of these models, providing a swift, precise, accurate, and user-friendly toolkit for hyperbolic data analysis.
Paper Structure (59 sections, 33 equations, 8 figures, 6 tables)

This paper contains 59 sections, 33 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Geodesic partitions in the hyperboloid model $\mathbb{H}^{2,1}$ (left) and poincare model $\mathbb{P}^{2,1}$ (right) into two halves (purple/yellow). In $\mathbb{H}^{2,1}$, a geodesic can be expressed as the intersection of the hyperboloid with an angled plane through the origin of the ambient space (transparent white). While these two representations are equivalent these partitions can be expressed more compactly in $\mathbb{H}^{2,1}$.
  • Figure 2: Learned HyperDT decision boundaries for 2, 3, 4, and 5-class mixtures of wrapped normal distributions visualized on the Poincaré disk. All trees have a maximum depth of 3 and forgo post-training pruning. In the visualization, regions are colored according to their predicted class labels while data points are colored according to their true class labels.
  • Figure 3: Time to run 5-fold cross-validation, averaged over 10 seeds for each classifier as a function of the number of points. Shaded regions are 95% confidence intervals. Split by dataset: (a) wrapped normal mixture, (b) NeuroSEED OTU embeddings, and (c) Polblogs embeddings.
  • Figure 4: Rescaling basis vector $\mathbf{v^0} = \langle \sin(\theta), \cos(\theta) \rangle$ by $\alpha(\theta, 1) = \sqrt{-\sec(2\theta)}$ produces a point on $\mathbb{H}^{1,1}$ for all $\theta$ values between $\pi/4$ and $3\pi/4$.
  • Figure 5: A plot of function $\delta(\pi / 4 + .01, \theta)$ as $\theta$ varies from $\pi/4$ to $3\pi/4$. This plot reveals the nonlinearity of the angle distance function.
  • ...and 3 more figures