Winner-takes-all learners are geometry-aware conditional density estimators

Victor Letzelter; David Perera; Cédric Rommel; Mathieu Fontaine; Slim Essid; Gael Richard; Patrick Pérez

Winner-takes-all learners are geometry-aware conditional density estimators

Victor Letzelter, David Perera, Cédric Rommel, Mathieu Fontaine, Slim Essid, Gael Richard, Patrick Pérez

TL;DR

This work tackles conditional density estimation under task ambiguity by leveraging Winner-takes-all (WTA) predictions that induce input-dependent centroidal Voronoi tessellations. It introduces Kernel-WTA and, more importantly, Voronoi-WTA as density estimators that preserve the geometric advantages of WTA while modeling intra-cell density via truncated kernels, with a common scaling factor across cells. Theoretical results establish distribution convergence of Voronoi-WTA as the number of hypotheses grows and show superior asymptotic quantization compared to static histograms, supported by Zador-based analysis. Empirically, Voronoi-WTA achieves competitive NLL and RMSE against strong baselines (MDN, Histogram) on synthetic, UCI, and audio datasets, while exhibiting robustness to hyperparameter choices and clear advantages in capturing input-dependent uncertainty. The approach provides a scalable, geometry-aware framework for uncertainty quantification in ambiguous predictive tasks with practical impact across domains requiring probabilistic calibration and density estimation.

Abstract

Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses. Recently, a connection was established between Winner-takes-all training and centroidal Voronoi tessellations, showing that, once trained, hypotheses should quantize optimally the shape of the conditional distribution to predict. However, the best use of these hypotheses for uncertainty quantification is still an open question. In this work, we show how to leverage the appealing geometric properties of the Winner-takes-all learners for conditional density estimation, without modifying its original training scheme. We theoretically establish the advantages of our novel estimator both in terms of quantization and density estimation, and we demonstrate its competitiveness on synthetic and real-world datasets, including audio data.

Winner-takes-all learners are geometry-aware conditional density estimators

TL;DR

Abstract

Paper Structure (49 sections, 13 theorems, 72 equations, 13 figures, 6 tables)

This paper contains 49 sections, 13 theorems, 72 equations, 13 figures, 6 tables.

Introduction
Background
Winner-takes-all training
Desirable geometrical properties
Probabilistic interpretation as a mixture model
Limitations of current estimators
Conditional density approximation
Kernel WTA
Voronoi WTA
Likelihood computation and sampling
Theoretical properties
Convergence in distribution independent of $h$
Better asymptotic quantization
Empirical study
Experimental setting
...and 34 more sections

Key Result

Proposition 2.1

A necessary condition for minimizing problm is that $\mathcal{Y}_k(g)$ are the Voronoi regions generated by the $z_k$, and simultaneously, $\{\mathcal{Y}_k(z)\}$ forms a centroidal Voronoi tessellation generated by $\{z_k\}$.

Figures (13)

Figure 1: Limitations of Dirac Mixtures. Model predictions for different inputs $x$ (columns) are shown with blue-shaded circles; the colorbar indicates hypothesis scores. Green points depict the target distribution for each input. Black lines mark the boundaries of the Voronoi tessellation associated with the predictions.
Figure 2: Qualitative results. Each panel shows a different dataset: Single Gaussian, Rotated Two Moons, and Changing Damier. Within each panel, columns correspond to predictions made by: MDN, Score-based WTA, and Histogram (left to right). Dots represent predicted (or fixed) hypotheses: means, centroids, and bins. Their colors encode the predicted score or mixture weight for MDN, where darker blue corresponds to higher scores. Red circles represent the MDN's predicted variance for each Gaussian (opacity reflects mixture weight), while WTA figures depict the Voronoi tessellations for predicted hypotheses. 1st row: 16 hypotheses, 2nd row: 49 hypotheses.
Figure 3: Quantitative comparison. Each column corresponds to a dataset, and each row to a different metric detailed in Section \ref{['sec:exp-setting']}. Dotted lines correspond to theoretical quantization errors from Proposition \ref{['th:wta_histogram_risk']}. Dirac Voronoi-WTA corresponds to the limit when the scaling factor $h \to 0$\ref{['eq:mixturedelta']}, while Unif. Voronoi-WTA is the limit when $h \to \infty$. Results are averaged over three random seeds, with standard deviations given in Appendix, Figure \ref{['fig:stds_results']}. Detailed discussion is given in Section \ref{['sec:experiments']}.
Figure 4: Impact of the scaling factor. Results on the dataset Uniform to Gaussians with $16$ hypotheses, computed over three random seeds. Unweighted Kernel-WTA corresponds to \ref{['eq:kde']} with fixed uniform scores $\gamma_{\theta}^k(x)=1/ K$. Truncated-Kernel Histogram is the standard Histogram where truncated kernels are placed on the fixed hypotheses, instead of uniform kernels (Unif. Histogram) used in Figure \ref{['fig:quantitative']}. See Appendix \ref{['app:additional_results_synth']} for more results.
Figure 5: Illustration of the proof of Proposition \ref{['th:radius']}. On the left, we show that $a_K$ is exactly $r_K$ apart from its closest centroid. On the right, we illustrate the sequence $a_{\varphi\circ\psi(K)}$, from which we define $B_\infty$.
...and 8 more figures

Theorems & Definitions (25)

Definition : Centroidal Voronoi Tessellation
Proposition 2.1: du1999centroidal, du1999centroidal
Proposition 5.1
Proposition 5.2
Definition : Weak convergence
Definition : Uniform convergence
Theorem 2.1: Zador theorem
Proposition 2.6
proof
Proposition 2.7
...and 15 more

Winner-takes-all learners are geometry-aware conditional density estimators

TL;DR

Abstract

Winner-takes-all learners are geometry-aware conditional density estimators

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (25)