Table of Contents
Fetching ...

Universal Lower Bounds and Optimal Rates: Achieving Minimax Clustering Error in Sub-Exponential Mixture Models

Maximilien Dreveton, Alperen Gözeten, Matthias Grossglauser, Patrick Thiran

TL;DR

This work derives a universal minimax lower bound for clustering under finite mixture models using Chernoff information, unifying performance guarantees across tail regimes. It then shows that iterative clustering schemes, including Lloyd-like methods, achieve this bound for sub-exponential mixtures such as Laplace-location-scale models, with a precise dependence on Chernoff(F) and initialization quality. For exponential-family mixtures, a Bregman hard clustering variant attains the rate-optimal bound, substantiating a broad, model-agnostic optimality theory for clustering. The results extend minimax clustering theory beyond Gaussian/sub-Gaussian models, provide practical initia lization prescriptions, and discuss robustness, high-dimensional behavior, and semi-supervised possibilities, highlighting both theoretical and applied significance for clustering in diverse data regimes.

Abstract

Clustering is a pivotal challenge in unsupervised machine learning and is often investigated through the lens of mixture models. The optimal error rate for recovering cluster labels in Gaussian and sub-Gaussian mixture models involves ad hoc signal-to-noise ratios. Simple iterative algorithms, such as Lloyd's algorithm, attain this optimal error rate. In this paper, we first establish a universal lower bound for the error rate in clustering any mixture model, expressed through a Chernoff divergence, a more versatile measure of model information than signal-to-noise ratios. We then demonstrate that iterative algorithms attain this lower bound in mixture models with sub-exponential tails, notably emphasizing location-scale mixtures featuring Laplace-distributed errors. Additionally, for datasets better modelled by Poisson or Negative Binomial mixtures, we study mixture models whose distributions belong to an exponential family. In such mixtures, we establish that Bregman hard clustering, a variant of Lloyd's algorithm employing a Bregman divergence, is rate optimal.

Universal Lower Bounds and Optimal Rates: Achieving Minimax Clustering Error in Sub-Exponential Mixture Models

TL;DR

This work derives a universal minimax lower bound for clustering under finite mixture models using Chernoff information, unifying performance guarantees across tail regimes. It then shows that iterative clustering schemes, including Lloyd-like methods, achieve this bound for sub-exponential mixtures such as Laplace-location-scale models, with a precise dependence on Chernoff(F) and initialization quality. For exponential-family mixtures, a Bregman hard clustering variant attains the rate-optimal bound, substantiating a broad, model-agnostic optimality theory for clustering. The results extend minimax clustering theory beyond Gaussian/sub-Gaussian models, provide practical initia lization prescriptions, and discuss robustness, high-dimensional behavior, and semi-supervised possibilities, highlighting both theoretical and applied significance for clustering in diverse data regimes.

Abstract

Clustering is a pivotal challenge in unsupervised machine learning and is often investigated through the lens of mixture models. The optimal error rate for recovering cluster labels in Gaussian and sub-Gaussian mixture models involves ad hoc signal-to-noise ratios. Simple iterative algorithms, such as Lloyd's algorithm, attain this optimal error rate. In this paper, we first establish a universal lower bound for the error rate in clustering any mixture model, expressed through a Chernoff divergence, a more versatile measure of model information than signal-to-noise ratios. We then demonstrate that iterative algorithms attain this lower bound in mixture models with sub-exponential tails, notably emphasizing location-scale mixtures featuring Laplace-distributed errors. Additionally, for datasets better modelled by Poisson or Negative Binomial mixtures, we study mixture models whose distributions belong to an exponential family. In such mixtures, we establish that Bregman hard clustering, a variant of Lloyd's algorithm employing a Bregman divergence, is rate optimal.
Paper Structure (34 sections, 18 theorems, 155 equations, 2 algorithms)

This paper contains 34 sections, 18 theorems, 155 equations, 2 algorithms.

Key Result

theorem 1

Consider the mixture model defined in eq:def_mixture_model and let $\cF = (f_{1}, \cdots, f_k)$ be the family of $k$ probability distributions that comprise the mixture, where $k$ and the distributions $f_a$ scale with $n$. Suppose that $\mathrm{Chernoff} ( \cF ) = \omega( \log k )$. Then, where the $\inf$ is taken over all estimators $\hz \colon (X_1, \cdots, X_n) \to [k]^n$.

Theorems & Definitions (31)

  • theorem 1
  • lemma 1
  • lemma 2
  • lemma 3
  • theorem 2
  • theorem 3
  • lemma 4
  • proof : Proof of Lemma \ref{['lemma:hypothesis_testing']}
  • lemma 5: Lemma C.5 in avrachenkov2020community
  • proof : Proof of Lemma \ref{['lemma:ideal_error_rate']}
  • ...and 21 more