Table of Contents
Fetching ...

Simplex-to-Euclidean Bijection for Conjugate and Calibrated Multiclass Gaussian Process

Bernardo Williams, Harsha Vardhan Tetali, Arto Klami, Marcelo Hartmann

Abstract

We propose a conjugate and calibrated Gaussian process (GP) model for multi-class classification by exploiting the geometry of the probability simplex. Our approach uses Aitchison geometry to map simplex-valued class probabilities to an unconstrained Euclidean representation, turning classification into a GP regression problem with fewer latent dimensions than standard multi-class GP classifiers. This yields conjugate inference and reliable predictive probabilities without relying on distributional approximations in the model construction. The method is compatible with standard sparse GP regression techniques, enabling scalable inference on larger datasets. Empirical results show well-calibrated and competitive performance across synthetic and real-world datasets.

Simplex-to-Euclidean Bijection for Conjugate and Calibrated Multiclass Gaussian Process

Abstract

We propose a conjugate and calibrated Gaussian process (GP) model for multi-class classification by exploiting the geometry of the probability simplex. Our approach uses Aitchison geometry to map simplex-valued class probabilities to an unconstrained Euclidean representation, turning classification into a GP regression problem with fewer latent dimensions than standard multi-class GP classifiers. This yields conjugate inference and reliable predictive probabilities without relying on distributional approximations in the model construction. The method is compatible with standard sparse GP regression techniques, enabling scalable inference on larger datasets. Empirical results show well-calibrated and competitive performance across synthetic and real-world datasets.
Paper Structure (32 sections, 1 theorem, 37 equations, 8 figures, 3 tables)

This paper contains 32 sections, 1 theorem, 37 equations, 8 figures, 3 tables.

Key Result

Proposition 1

Let $K\ge 2$. Define, for each class $k\in\{1,\dots,K\}$, Let which equals the corresponding Aitchison distance between the class centers on the simplex by the ILR isometry. Let $\mathcal{V}_k:=\{z\in\mathbb R^{D}:\|z-\boldsymbol m^{(k)}\|\le \|z-\boldsymbol m^{(\ell)}\|\ \forall \ell\}$. For any $\varepsilon\in(0,1)$, if then for every $\boldsymbol{x}$ and every $k$, where $C$ denotes the cla

Figures (8)

  • Figure 1: Illustration of Exact-ILR on a $K=3$ toy problem. Top (data): Ground-truth class probabilities (blue, green and red), $\boldsymbol\pi(x)$ over $x\in[-1,1]$, and three test inputs $x_*^{(1:3)}$. Middle (likelihood): Discrete labels are represented by Gaussian pseudo-observations centered at $\boldsymbol m^{(k)}=\varphi(\boldsymbol\mu^{(k)})$ in Euclidean space (Sec. \ref{['sec:method']}), where $\boldsymbol\mu^{(k)}=\lambda\,\mathbf e_k+(1-\lambda)\tfrac{1}{K}\mathbf 1$. The three panels visualize this class-weighted latent-space likelihood at $x_*^{(1:3)}$. Bottom (posterior): For each $x_*^{(i)}$, we draw posterior samples of the latent GP, map them through $\varphi^{-1}$, and plot the resulting samples of $\boldsymbol\pi_*$. Each point is one sampled probability vector (color = $\arg\max$ class) and the average is $\bar{\boldsymbol\pi}_*^{(i)}$. The two lowermost panels show the posterior over the two latent GP coordinates across $x\in[-1,1]$, with training points overlaid.
  • Figure 2: Likelihood in simplex and Euclidean spaces. The likelihood of observing classes $1$, $2$, or $3$ at input $x$ is a Gaussian mixture with modes at $\boldsymbol m^{(k)}=\varphi(\boldsymbol\mu^{(k)})$ in Euclidean space (right), which induces a pushforward likelihood on the simplex (left). The figure shows the likelihood for $\boldsymbol{\pi}(x) = (1/3,1/3,1/3)$.
  • Figure 3: Error, NLL and ECE for UCI datasets in the exact setting.
  • Figure 4: Sparse models performance on datasets from the UCI repository.
  • Figure 5: Error, NLL and ECE for increasing overlap of the input variables.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Proposition 1: Choice of $\sigma$ for negligible component intersection