Table of Contents
Fetching ...

Learning over von Mises-Fisher Distributions via a Wasserstein-like Geometry

Kisung You, Dennis Shung, Mauro Giuffrè

TL;DR

This work expands the statistical toolbox for directional data analysis by introducing a tractable, transport-inspired distance tailored to the geometry of the hypersphere, and develops the efficient algorithms for vMF mixture reduction, enabling structure-preserving compression of mixture models in high-dimensional settings.

Abstract

We introduce a novel, geometry-aware distance metric for the family of von Mises-Fisher (vMF) distributions, which are fundamental models for directional data on the unit hypersphere. Although the vMF distribution is widely employed in a variety of probabilistic learning tasks involving spherical data, principled tools for comparing vMF distributions remain limited, primarily due to the intractability of normalization constants and the absence of suitable geometric metrics. Motivated by the theory of optimal transport, we propose a Wasserstein-like distance that decomposes the discrepancy between two vMF distributions into two interpretable components: a geodesic term capturing the angular separation between mean directions, and a variance-like term quantifying differences in concentration parameters. The derivation leverages a Gaussian approximation in the high-concentration regime to yield a tractable, closed-form expression that respects the intrinsic spherical geometry. We show that the proposed distance exhibits desirable theoretical properties and induces a latent geometric structure on the space of non-degenerate vMF distributions. As a primary application, we develop the efficient algorithms for vMF mixture reduction, enabling structure-preserving compression of mixture models in high-dimensional settings. Empirical results on synthetic datasets and real-world high-dimensional embeddings, including biomedical sentence representations and deep visual features, demonstrate the effectiveness of the proposed geometry in distinguishing distributions and supporting interpretable inference. This work expands the statistical toolbox for directional data analysis by introducing a tractable, transport-inspired distance tailored to the geometry of the hypersphere.

Learning over von Mises-Fisher Distributions via a Wasserstein-like Geometry

TL;DR

This work expands the statistical toolbox for directional data analysis by introducing a tractable, transport-inspired distance tailored to the geometry of the hypersphere, and develops the efficient algorithms for vMF mixture reduction, enabling structure-preserving compression of mixture models in high-dimensional settings.

Abstract

We introduce a novel, geometry-aware distance metric for the family of von Mises-Fisher (vMF) distributions, which are fundamental models for directional data on the unit hypersphere. Although the vMF distribution is widely employed in a variety of probabilistic learning tasks involving spherical data, principled tools for comparing vMF distributions remain limited, primarily due to the intractability of normalization constants and the absence of suitable geometric metrics. Motivated by the theory of optimal transport, we propose a Wasserstein-like distance that decomposes the discrepancy between two vMF distributions into two interpretable components: a geodesic term capturing the angular separation between mean directions, and a variance-like term quantifying differences in concentration parameters. The derivation leverages a Gaussian approximation in the high-concentration regime to yield a tractable, closed-form expression that respects the intrinsic spherical geometry. We show that the proposed distance exhibits desirable theoretical properties and induces a latent geometric structure on the space of non-degenerate vMF distributions. As a primary application, we develop the efficient algorithms for vMF mixture reduction, enabling structure-preserving compression of mixture models in high-dimensional settings. Empirical results on synthetic datasets and real-world high-dimensional embeddings, including biomedical sentence representations and deep visual features, demonstrate the effectiveness of the proposed geometry in distinguishing distributions and supporting interpretable inference. This work expands the statistical toolbox for directional data analysis by introducing a tractable, transport-inspired distance tailored to the geometry of the hypersphere.

Paper Structure

This paper contains 17 sections, 4 theorems, 37 equations, 8 figures.

Key Result

Theorem 3.1

Let $\lbrace (\boldsymbol{\mu}_i,\kappa_i)\rbrace_{i\in \mathcal{I}}$ be a collection of vMF distributions with $\kappa_i \in (0, \infty)$. Then the $\mathcal{WL}$ dissimilarity is: (1) well-defined, (2) continuous, (3) topologically consistent to induce a meaningful topology, and (4) well-behaved i

Figures (8)

  • Figure 1: Comparison of interpolations between two distinct von Mises–Fisher distributions (left). Interpolation paths are shown for the standard $L_2$ geometry (middle) and the proposed $\mathcal{WL}$ distance metric (right).
  • Figure 2: Representative densities of four von Mises–Fisher (vMF) distribution types formed by combinations of perturbed mean directions and concentration parameters.
  • Figure 3: Visualization of 400 randomly generated vMF distributions (left) and two-dimensional embeddings obtained via multidimensional scaling using the proposed $\mathcal{WL}$ distance (middle) and the standard $L_2$ distance (right). Colors indicate distribution types: north-high (purple), north-low (blue), south-high (green), and south-low (yellow).
  • Figure 4: Simulated example of mixture model reduction. Left-top: density of the ground-truth 4-component vMF mixture model. Left-bottom: 400 randomly generated samples, color-coded by component membership. Right: log-transformed BIC values for independently fitted mixtures and reduced models using greedy and partitional methods across varying numbers of components.
  • Figure 5: Two-dimensional embeddings of 1000 abstracts represented by vMF distributions over sentence embeddings. Embedding techniques include multidimensional scaling (MDS), t-stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). Colors correspond to the five ground-truth disease categories.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Theorem 3.1: Well-posedness
  • proof : Proof of Theorem \ref{['theorem-wellposed']}
  • Lemma 3.2
  • proof : Proof of Lemma \ref{['lemma-positive-real']}
  • Theorem 3.3
  • proof : Proof of Theorem \ref{['theorem-metric']}
  • Theorem 3.4
  • proof : Proof of Theorem \ref{['theorem-barycenter-uniqueness']}