Table of Contents
Fetching ...

Simplex Clustering via sBeta with Applications to Online Adjustment of Black-Box Predictions

Florent Chiaroni, Malik Boudiaf, Amar Mitiche, Ismail Ben Ayed

TL;DR

The paper tackles domain shift in deep-network predictions by proposing a model-agnostic, privacy-preserving approach that clusters softmax outputs on the probability simplex. It introduces sBeta, a unimodal generalization of the Beta density on each simplex coordinate, and the k-sBetas clustering model that uses block-coordinate descent to estimate $\alpha_k,\beta_k$ and assignments $\mathbf U$. Key contributions include a tractable, moment-based parameter estimation (MoM) and unimodality-constrained optimization to avoid bimodality and degeneracy, together with an optimal transport-based cluster-to-class mapping when $K=D$. Empirical results across unsupervised domain adaptation, transductive zero/one-shot CLIP tasks, and real-time road segmentation demonstrate competitive performance and practicality, with public code for reproducibility.

Abstract

We explore clustering the softmax predictions of deep neural networks and introduce a novel probabilistic clustering method, referred to as k-sBetas. In the general context of clustering discrete distributions, the existing methods focused on exploring distortion measures tailored to simplex data, such as the KL divergence, as alternatives to the standard Euclidean distance. We provide a general maximum a posteriori (MAP) perspective of clustering distributions, emphasizing that the statistical models underlying the existing distortion-based methods may not be descriptive enough. Instead, we optimize a mixed-variable objective measuring data conformity within each cluster to the introduced sBeta density function, whose parameters are constrained and estimated jointly with binary assignment variables. Our versatile formulation approximates various parametric densities for modeling simplex data and enables the control of the cluster-balance bias. This yields highly competitive performances for the unsupervised adjustment of black-box model predictions in various scenarios. Our code and comparisons with the existing simplex-clustering approaches and our introduced softmax-prediction benchmarks are publicly available: https://github.com/fchiaroni/Clustering_Softmax_Predictions.

Simplex Clustering via sBeta with Applications to Online Adjustment of Black-Box Predictions

TL;DR

The paper tackles domain shift in deep-network predictions by proposing a model-agnostic, privacy-preserving approach that clusters softmax outputs on the probability simplex. It introduces sBeta, a unimodal generalization of the Beta density on each simplex coordinate, and the k-sBetas clustering model that uses block-coordinate descent to estimate and assignments . Key contributions include a tractable, moment-based parameter estimation (MoM) and unimodality-constrained optimization to avoid bimodality and degeneracy, together with an optimal transport-based cluster-to-class mapping when . Empirical results across unsupervised domain adaptation, transductive zero/one-shot CLIP tasks, and real-time road segmentation demonstrate competitive performance and practicality, with public code for reproducibility.

Abstract

We explore clustering the softmax predictions of deep neural networks and introduce a novel probabilistic clustering method, referred to as k-sBetas. In the general context of clustering discrete distributions, the existing methods focused on exploring distortion measures tailored to simplex data, such as the KL divergence, as alternatives to the standard Euclidean distance. We provide a general maximum a posteriori (MAP) perspective of clustering distributions, emphasizing that the statistical models underlying the existing distortion-based methods may not be descriptive enough. Instead, we optimize a mixed-variable objective measuring data conformity within each cluster to the introduced sBeta density function, whose parameters are constrained and estimated jointly with binary assignment variables. Our versatile formulation approximates various parametric densities for modeling simplex data and enables the control of the cluster-balance bias. This yields highly competitive performances for the unsupervised adjustment of black-box model predictions in various scenarios. Our code and comparisons with the existing simplex-clustering approaches and our introduced softmax-prediction benchmarks are publicly available: https://github.com/fchiaroni/Clustering_Softmax_Predictions.
Paper Structure (33 sections, 29 equations, 9 figures, 16 tables, 2 algorithms)

This paper contains 33 sections, 29 equations, 9 figures, 16 tables, 2 algorithms.

Figures (9)

  • Figure 1: Real-time (45 fps) black-box adaptation for road segmentation on images of size $2048 \times 1024$ by clustering the softmax predictions from a source model, pre-trained on GTA5 and applied to Cityscapes. See details in Sec. \ref{['subsec_real_time_UDA_road_seg']}.
  • Figure 2: Modelling softmax marginal distributions. Figures (a), (b), (c) and (d) compare the density fittings of real-world marginal distributions of softmax predictions, represented by their respective histograms (orange bars). Specifically, the histograms in Fig. (a), (b), and (d) were extracted from the SVHN$\rightarrow$MNIST benchmark and the one in (c) from the VISDA-C one (details in Sec. \ref{['subsec_softmax_preds_comp']}). Figs. \ref{['fig_S_to_M_hists']} and \ref{['fig_VISDA_C_hists']} in the Appendix depict the whole set of histograms per class. Normalized probability density functions $\mathcal{N}$, $p(Eucl)$, $p(KL)$, $p(HG)$, $\mathtt{Beta}$ and $\mathtt{sBeta}$ are obtained, respectively, from the normal density, Euclidean distance, Kullback-Leibler divergence, Hilbert distance, $\mathtt{Beta}$, and $\mathtt{sBeta}$, all listed in Table. \ref{['prob_to_metric_table']}.
  • Figure 3: Partitioning points within probability simplex $\Delta^{2}$. Specifically, these points are projected into a 2D equilateral triangle for a clear visualization, where each vertex represents one dimension of the probability space. The figures depict the probabilistic predictions from a hypothetical source model, with and without distribution shift. Fig. (b) depicts the standard argmax assignment, whereas Fig. (c) represents a Probability Simplex Clustering (PSC) using a cluster-to-class alignment. Symbols $+$ (green), x (orange) and $\bullet$ (blue) indicate separate class assignments.
  • Figure 4: Beta and sBeta density functions. Figures (a) and (b) respectively illustrate $\mathtt{Beta}$ density function, and the presented variant referred to as $\mathtt{sBeta}$. Fig. (c) shows $\mathtt{sBeta}$ depending on the concentration parameter $\lambda$ while maintaining the same mode.
  • Figure 5: Visualization of the introduced constraints. Fig. (a) shows $\mathtt{sBeta}$ density estimation, constrained with the threshold $\tau^-$, on a sample following a bimodal $\mathtt{Beta}$ distribution. Fig. (b) shows $\mathtt{sBeta}$ density estimation, constrained with the threshold $\tau^+$, on a sample following a Dirac distribution.
  • ...and 4 more figures

Theorems & Definitions (3)

  • proof
  • proof
  • proof