A variational framework for modal estimation

Tâm LeMinh; Julyan Arbel; Florence Forbes; Hien Duy Nguyen

A variational framework for modal estimation

Tâm LeMinh, Julyan Arbel, Florence Forbes, Hien Duy Nguyen

Abstract

We approach multivariate mode estimation through Gibbs distributions and introduce GERVE (Gibbs-measure Entropy-Regularised Variational Estimation), a likelihood-free framework that approximates Gibbs measures directly from samples by maximizing an entropy-regularised variational objective with natural-gradient updates. GERVE brings together kernel density estimation, mean-shift, variational inference, and annealing in a single platform for mode estimation. It fits Gaussian mixtures that concentrate on high-density regions and yields cluster assignments from responsibilities, with reduced sensitivity to the chosen number of components. We provide theory in two regimes: as the Gibbs temperature approaches zero, mixture components converge to population modes; at fixed temperature, maximisers of the empirical objective exist, are consistent, and are asymptotically normal. We also propose a bootstrap procedure for per-mode confidence ellipses and stability scores. Simulation and real-data studies show accurate mode recovery and emergent clustering, robust to mixture overspecification. GERVE is a practical likelihood-free approach when the number of modes or groups is unknown and full density estimation is impractical.

A variational framework for modal estimation

Abstract

Paper Structure (99 sections, 35 theorems, 239 equations, 19 figures, 6 tables, 4 algorithms)

This paper contains 99 sections, 35 theorems, 239 equations, 19 figures, 6 tables, 4 algorithms.

Abstract.
Keywords.
Introduction
Mode seeking with annealed Gaussian mixtures
Sample-based mode estimation and modal clustering
GERVE algorithm and variants
Variational optimisation with natural gradients
Variant A: Equivalence to Gaussian mean-shift
Variant B: Gaussian mixture GERVE
Complexity and practical guidelines
Bootstrap uncertainty quantification for mode estimation
Bootstrap procedure
Statistical validity
Simulation studies
Clustering example
...and 84 more sections

Key Result

Theorem 2.1

Assume $f \in \mathcal{C}^3(\mathcal{S})$ and bounded on $\mathcal{S}$, and for each mode in $\{{\mathb{x}}\xspace_i^\star\}_{i=1}^{I}$, ${\mathb{x}}\xspace_i^\star \in \mathrm{int}(\mathcal{S})$ and ${\mathb{H}}\xspace_i := -\nabla^2 f({\mathb{x}}\xspace_i^\star)\succ 0$. Consider $\mathcal{Q}$ to

Figures (19)

Figure 1: Overspecified clustering ($K=7$) of a triangle Gaussian mixture sample. Data points are colored according to the component with the highest posterior responsibility in the fitted mixture. Ellipses represent the component covariances. Left: GERVE returns 3 effective clusters (the component with dashed trajectory has vanishing weight, the remaining six components form three clusters by grouping equivalent components). Right: GMM-EM returns a partition into 7 clusters.
Figure 2: Mode estimation vs. sample size $N$ for the triangle mixture ($I=3$) example. Curves: means ($\mathsf{MR}_\epsilon$, NN) or medians (HM) over $n_{\text{rep}}=100$; bands: 95% confidence intervals.
Figure 3: Baseline collision hotspots identified by GERVE for the Greater London Area between 2020 and 2024, with stability scores. The normalised coordinates are bounded within $[-0.7,0.7]\times[-0.35,0.35]$ (normalised window) box, but all hotspots are in the $[-0.15,0.15]\times[-0.075,0.075]$ box (zoomed-in window), see Supplementary Material \ref{['app:uk']}, Figure \ref{['fig:collision_with_ellipses_2024']} for the full study area. Left: with hotspot IDs for reference to Table \ref{['tab:hotspots_2024_full']}, Supplementary Material \ref{['app:uk']}. Right: 95% confidence ellipses, in normalised coordinates.
Figure S1: Responsibility regions (in color resp. blue, pink, yellow) and (approximate) modal basins for a Gaussian mixture with 3 components, means at ${\boldsymbol{\mu}}\xspace_1=(-\cos(\pi/6),-0.5)$, ${\boldsymbol{\mu}}\xspace_2=(0,1)$, ${\boldsymbol{\mu}}\xspace_3=(\cos(\pi/6),0)$, mixture weights $\boldsymbol{\pi}=(0.16, 0.80, 0.04)$ and isotropic covariances ${\boldsymbol{\Sigma}}\xspace_k=\sigma_k^2 {\mathb{I}}\xspace$ with variances $(\sigma_1^2,\sigma_2^2,\sigma_3^2)=(0.30, 0.95, 0.10)$. Covariances are represented as circles of radii $\sigma_k$.
Figure S2: Sample of $N=6000$ points from a three-component Gaussian mixture whose means are located at the nodes of an equilateral triangle. Background shading: Gaussian KDE with bandwidth $h$ selected by Scott’s rule scott1992multivariate, revealing three high-density regions.
...and 14 more figures

Theorems & Definitions (72)

Theorem 2.1: Gaussian mixture concentration on global modes
Theorem 3.1: Asymptotics of empirical maximisers
Remark
Proposition 4.1: Equivalence to Gaussian mean-shift
Theorem 5.1: Parameter-level bootstrap validity
Theorem 5.3: Matched-modes bootstrap validity and confidence ellipses
Proposition 5.4: Robustness to inexact maximisation
Proposition S1.1: Truncation gap
Theorem S1.2: Truncation equivalence on a bounded support
Lemma S1.4: Basin containment
...and 62 more

A variational framework for modal estimation

Abstract

A variational framework for modal estimation

Authors

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (72)